Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Louis Jenkins
    @LouisJenkinsCS
    Hello Arkouda developers, I was wondering for the LANL Netflow dataset, how long did it take to handle the conversion of CSV to HDF5 via hdflow (https://github.com/reuster986/hdflow)? I.E. how many days did it take?
    Michael Merrill
    @mhmerrill
    @LouisJenkinsCS it can take a while depending on the platform used to do the conversion. That data has ~100 million rows per file, so a laptop with 16GB might be really slow because of memory limitations. I usually use split break down the file to 10 million rows per file on my laptop to get conversion to go faster.
    Louis Jenkins
    @LouisJenkinsCS
                                init :   3.085 seconds
                               parse :   5.161 seconds
                         checkParsed :   0.426 seconds
                                docs :   0.007 seconds
                         readExternC :   0.007 seconds
              expandExternArrayCalls :   0.010 seconds
                             cleanup :   0.191 seconds
                        scopeResolve :   1.908 seconds
                      flattenClasses :   0.005 seconds
                           normalize :   2.880 seconds
                     checkNormalized :   0.019 seconds
               buildDefaultFunctions :   1.361 seconds
                 createTaskFunctions :   0.042 seconds
                             resolve :7335.471 seconds
                      resolveIntents : 162.908 seconds
                       checkResolved :   2.580 seconds
    replaceArrayAccessesWithRefTemps :   0.283 seconds
                    flattenFunctions : 180.052 seconds
                  cullOverReferences :   2.034 seconds
                  lowerErrorHandling :   2.783 seconds
                     callDestructors :  45.403 seconds
                      lowerIterators :7063.568 seconds
                            parallel :31500.440 seconds
                               prune :11941.509 seconds
                     bulkCopyRecords :1459.484 seconds
      removeUnnecessaryAutoCopyCalls : 947.959 seconds
                     inlineFunctions :16108.893 seconds
    make: *** [arkouda_server] Bus error
    :(
    It took so long waiting for it to compile
    21 hours :(
    glitch
    @glitch
    Whoa. How much RAM is on the machine you're trying to build on? Also, what are the env. variables and chpl version your building with? i.e. are you building with gasnet turned on, chpl 1.24.1 vs. 1.25.0, etc.
    Louis Jenkins
    @LouisJenkinsCS
    Chapel 1.25; I'm running on the headnode but 64GBs of RAM, didn't think I needed to reserve a compute node for compilation :(
    machine info: Linux bluehive 3.10.0-1160.31.1.el7.x86_64 #1 SMP Wed May 26 20:18:08 UTC 2021 x86_64
    CHPL_HOME: /scratch/ljenkin4/chapel-1.25.0 *
    script location: /gpfs/fs2/scratch/ljenkin4/chapel-1.25.0/util/chplenv
    
    CHPL_TARGET_PLATFORM: linux64
    CHPL_TARGET_COMPILER: llvm
    CHPL_TARGET_ARCH: x86_64
    CHPL_TARGET_CPU: ivybridge *
    CHPL_LOCALE_MODEL: flat
    CHPL_COMM: gasnet *
      CHPL_COMM_SUBSTRATE: ibv *
      CHPL_GASNET_SEGMENT: large *
    CHPL_TASKS: qthreads
    CHPL_LAUNCHER: slurm-gasnetrun_ibv *
    CHPL_TIMERS: generic
    CHPL_UNWIND: none *
    CHPL_MEM: jemalloc
    CHPL_ATOMICS: cstdlib
      CHPL_NETWORK_ATOMICS: none
    CHPL_GMP: bundled
    CHPL_HWLOC: bundled
    CHPL_RE2: bundled
    CHPL_LLVM: bundled *
    CHPL_AUX_FILESYS: none
    glitch
    @glitch
    I'm running on a thinkpad with 16GB ram, Chapel 1.25.0, CHPL_COMM=none and typical times for me are 5-6 minutes
    Louis Jenkins
    @LouisJenkinsCS
    I take it back, 256GBs of RAM
    [ljenkin4@bluehive arkouda]$ free -h
                  total        used        free      shared  buff/cache   available
    Mem:           251G         38G         26G        4.6G        186G        199G
    Swap:           31G        3.3G         28G
    I think the machine is just really old, because my laptop builds Chapel and Arkouda relatively quickly as well
    [ljenkin4@bluehive arkouda]$ grep MemTotal /proc/meminfo
    
    MemTotal:       263519200 kB
    [ljenkin4@bluehive arkouda]$ 
    [ljenkin4@bluehive arkouda]$ lscpu
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                72
    On-line CPU(s) list:   0-71
    Thread(s) per core:    2
    Core(s) per socket:    18
    Socket(s):             2
    NUMA node(s):          2
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 79
    Model name:            Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
    Stepping:              1
    CPU MHz:               2896.984
    CPU max MHz:           3300.0000
    CPU min MHz:           1200.0000
    BogoMIPS:              4199.99
    Virtualization:        VT-x
    L1d cache:             32K
    L1i cache:             32K
    L2 cache:              256K
    L3 cache:              46080K
    NUMA node0 CPU(s):     0-17,36-53
    NUMA node1 CPU(s):     18-35,54-71
    Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
    I'm wondering, is codegen following inlineFunctions? I.E. did it fail when trying to compile the output C code, or during earlier stage of compilation
    Louis Jenkins
    @LouisJenkinsCS
    Also can confirm that Chapel compiler works for simple hello.chpl example, and runs, so I don't think it has to do with CHPL_TARGET_CPU or some kind of illegal opcode.
    glitch
    @glitch
    Hmm.. this is a bit beyond my skills. I'd start with a really stripped down build and remove the gasnet / comm settings and see if you can get that built. Run top or something to watch system resources during the build and see if you can spot something off.
    For comparison my resolve time is around 125s, the lowerIterators is 3s (and pretty much everything after that is low seconds until it makeBinary)
    On the surface it almost looks like you ran into swap issues towards the end and the bus error leads me to believe something is wrong with your system's memory.
    Louis Jenkins
    @LouisJenkinsCS
    I guess other people could be using the head node. I suppose I should reserve a compute node this time and have it handle compilation.
    glitch
    @glitch
    If there is light usage now, I'd try kicking off another build and keeping an eye on the system resource usage. Honestly I'd expect it to take 10-15 minutes.
    Looking at my output stats, resolve and makeBinary are really the spots that take the majority of the time.
    Louis Jenkins
    @LouisJenkinsCS
    I'm running it again on a compute node this time.
    glitch
    @glitch
    So it's been over 45 minutes, did it complete yet? If not what does the memory usage look like?
    Louis Jenkins
    @LouisJenkinsCS
    Took only 10 min
    I used a compute node this time
    Turns out the problem was the headnode
    glitch
    @glitch
    :thumbsup:
    Louis Jenkins
    @LouisJenkinsCS
    Thanks for the help!
    Louis Jenkins
    @LouisJenkinsCS
    This message was deleted
    tests/categorical_test.py .F......FFF... [ 5%]
    tests/check.py ................... [ 12%]
    tests/client_test.py ......... [ 15%]
    tests/coargsort_test.py ...... [ 18%]
    tests/compare_test.py ............... [ 23%]
    tests/datetime_test.py ............. [ 28%]
    tests/dtypes_tests.py ........... [ 32%]
    tests/groupby_test.py .............. [ 37%]
    tests/nan_test.py . [ 38%]
    tests/io_test.py F.F.FF.FFF..FF.FF.sF.. [ 46%]
    tests/io_util_test.py ... [ 47%]
    tests/join_test.py ....... [ 50%]
    tests/logger_test.py ....... [ 53%]
    tests/message_test.py .... [ 54%]
    tests/numeric_test.py .......F..... [ 59%]
    tests/operator_tests.py ............. [ 64%]
    tests/pdarray_creation_test.py .......F...F.F.F.F.. [ 71%]
    tests/regex_test.py ........ [ 74%]
    tests/registration_test.py ................ [ 80%]
    tests/security_test.py .... [ 82%]
    tests/setops_test.py ...... [ 84%]
    tests/sort_test.py ....
    So the "F" means failure? "s" means?
    glitch
    @glitch
    correct, s means skipped, currently there should only be 1 skipped test
    Louis Jenkins
    @LouisJenkinsCS
    Is each test different or are these trials?
    glitch
    @glitch
    they are all individual, you should have gotten a listing of the specific tests that failed
    Louis Jenkins
    @LouisJenkinsCS
    Gotcha, I'll wait until they finish then.
    glitch
    @glitch

    The make test target is really running pytest under the hood which you can invoke manually. i.e.

    python3 -m pytest tests/numeric_test.py

    You can run a specific one via

    python3 -m pytest tests/numeric_test.py::NumericTest::testHistogram

    Passing the -s option will also write all of the output to stdout

    You can see other optoins etc. in the pytest.ini configuration file
    Louis Jenkins
    @LouisJenkinsCS
    Gotcha. I ran make test-all but right now I'm waiting on the test suite to complete so I can see precisely which tests failed; its stalling right now
    I'm wondering, will it fail if it times out? Also, does it create/shutdown the server in each test? I.E. is it possible that it fails because it takes too long to start the server in each test?
    glitch
    @glitch
    From past experience if you're running in multi-locale mode and you think it stalled, do a ps -ef|grep arkouda_server to see if anything is running. Chances are the server process died out and the client in the test is waiting to reconnect ... which is never going to happen.
    Everything should pass and it looks like you have enough failures where I'd kill the make test-all process and then re-run just one of the test files manually with the -soption to see what's going on.
    Louis Jenkins
    @LouisJenkinsCS
    Weird its hanging but... ps -ef|grep arkouda_server yields a running server
    Categorial Test, which ran perfectly the first time around, hangs on the 4th test
    Oh nvm its still running even after the test ended
    OH NVM
    I was looking at the command itself; the server did indeed die
    How do I handle this issue? Is it killed due to default timeout?
    glitch
    @glitch
    No, the server likely seg-faulted during the test. I'd start with a single test class and single test and run it with the -s option so you can see the output
    Also you should probably re-compile Arkouda with the ARKOUDA_DEVELOPER flag turned on (i.e. export ARKOUDA_DEVELOPER=1)
    glitch
    @glitch
    This should display the name of the test running, which can help narrow down to a specific test that is hanging and then we can debug from there:
    python3 -m pytest tests/categorical_test.py::CategoricalTest -v
    Louis Jenkins
    @LouisJenkinsCS
    E       AssertionError: 'type[32 chars] of (Categorical, str, str_); got int instead' != 'type[32 chars] of (arkouda.categorical.Categorical, str, num[21 chars]tead'
    E       - type of argument "other" must be one of (Categorical, str, str_); got int instead
    E       + type of argument "other" must be one of (arkouda.categorical.Categorical, str, numpy.str_); got int instead
    Hmm