Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
    Louis Jenkins
    Chapel 1.25; I'm running on the headnode but 64GBs of RAM, didn't think I needed to reserve a compute node for compilation :(
    machine info: Linux bluehive 3.10.0-1160.31.1.el7.x86_64 #1 SMP Wed May 26 20:18:08 UTC 2021 x86_64
    CHPL_HOME: /scratch/ljenkin4/chapel-1.25.0 *
    script location: /gpfs/fs2/scratch/ljenkin4/chapel-1.25.0/util/chplenv
    CHPL_TARGET_ARCH: x86_64
    CHPL_TARGET_CPU: ivybridge *
    CHPL_COMM: gasnet *
      CHPL_GASNET_SEGMENT: large *
    CHPL_TASKS: qthreads
    CHPL_LAUNCHER: slurm-gasnetrun_ibv *
    CHPL_TIMERS: generic
    CHPL_UNWIND: none *
    CHPL_MEM: jemalloc
    CHPL_ATOMICS: cstdlib
    CHPL_GMP: bundled
    CHPL_HWLOC: bundled
    CHPL_RE2: bundled
    CHPL_LLVM: bundled *
    I'm running on a thinkpad with 16GB ram, Chapel 1.25.0, CHPL_COMM=none and typical times for me are 5-6 minutes
    Louis Jenkins
    I take it back, 256GBs of RAM
    [ljenkin4@bluehive arkouda]$ free -h
                  total        used        free      shared  buff/cache   available
    Mem:           251G         38G         26G        4.6G        186G        199G
    Swap:           31G        3.3G         28G
    I think the machine is just really old, because my laptop builds Chapel and Arkouda relatively quickly as well
    [ljenkin4@bluehive arkouda]$ grep MemTotal /proc/meminfo
    MemTotal:       263519200 kB
    [ljenkin4@bluehive arkouda]$ 
    [ljenkin4@bluehive arkouda]$ lscpu
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                72
    On-line CPU(s) list:   0-71
    Thread(s) per core:    2
    Core(s) per socket:    18
    Socket(s):             2
    NUMA node(s):          2
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 79
    Model name:            Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
    Stepping:              1
    CPU MHz:               2896.984
    CPU max MHz:           3300.0000
    CPU min MHz:           1200.0000
    BogoMIPS:              4199.99
    Virtualization:        VT-x
    L1d cache:             32K
    L1i cache:             32K
    L2 cache:              256K
    L3 cache:              46080K
    NUMA node0 CPU(s):     0-17,36-53
    NUMA node1 CPU(s):     18-35,54-71
    Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
    I'm wondering, is codegen following inlineFunctions? I.E. did it fail when trying to compile the output C code, or during earlier stage of compilation
    Louis Jenkins
    Also can confirm that Chapel compiler works for simple hello.chpl example, and runs, so I don't think it has to do with CHPL_TARGET_CPU or some kind of illegal opcode.
    Hmm.. this is a bit beyond my skills. I'd start with a really stripped down build and remove the gasnet / comm settings and see if you can get that built. Run top or something to watch system resources during the build and see if you can spot something off.
    For comparison my resolve time is around 125s, the lowerIterators is 3s (and pretty much everything after that is low seconds until it makeBinary)
    On the surface it almost looks like you ran into swap issues towards the end and the bus error leads me to believe something is wrong with your system's memory.
    Louis Jenkins
    I guess other people could be using the head node. I suppose I should reserve a compute node this time and have it handle compilation.
    If there is light usage now, I'd try kicking off another build and keeping an eye on the system resource usage. Honestly I'd expect it to take 10-15 minutes.
    Looking at my output stats, resolve and makeBinary are really the spots that take the majority of the time.
    Louis Jenkins
    I'm running it again on a compute node this time.
    So it's been over 45 minutes, did it complete yet? If not what does the memory usage look like?
    Louis Jenkins
    Took only 10 min
    I used a compute node this time
    Turns out the problem was the headnode
    Louis Jenkins
    Thanks for the help!
    Louis Jenkins
    This message was deleted
    tests/categorical_test.py .F......FFF... [ 5%]
    tests/check.py ................... [ 12%]
    tests/client_test.py ......... [ 15%]
    tests/coargsort_test.py ...... [ 18%]
    tests/compare_test.py ............... [ 23%]
    tests/datetime_test.py ............. [ 28%]
    tests/dtypes_tests.py ........... [ 32%]
    tests/groupby_test.py .............. [ 37%]
    tests/nan_test.py . [ 38%]
    tests/io_test.py F.F.FF.FFF..FF.FF.sF.. [ 46%]
    tests/io_util_test.py ... [ 47%]
    tests/join_test.py ....... [ 50%]
    tests/logger_test.py ....... [ 53%]
    tests/message_test.py .... [ 54%]
    tests/numeric_test.py .......F..... [ 59%]
    tests/operator_tests.py ............. [ 64%]
    tests/pdarray_creation_test.py .......F...F.F.F.F.. [ 71%]
    tests/regex_test.py ........ [ 74%]
    tests/registration_test.py ................ [ 80%]
    tests/security_test.py .... [ 82%]
    tests/setops_test.py ...... [ 84%]
    tests/sort_test.py ....
    So the "F" means failure? "s" means?
    correct, s means skipped, currently there should only be 1 skipped test
    Louis Jenkins
    Is each test different or are these trials?
    they are all individual, you should have gotten a listing of the specific tests that failed
    Louis Jenkins
    Gotcha, I'll wait until they finish then.

    The make test target is really running pytest under the hood which you can invoke manually. i.e.

    python3 -m pytest tests/numeric_test.py

    You can run a specific one via

    python3 -m pytest tests/numeric_test.py::NumericTest::testHistogram

    Passing the -s option will also write all of the output to stdout

    You can see other optoins etc. in the pytest.ini configuration file
    Louis Jenkins
    Gotcha. I ran make test-all but right now I'm waiting on the test suite to complete so I can see precisely which tests failed; its stalling right now
    I'm wondering, will it fail if it times out? Also, does it create/shutdown the server in each test? I.E. is it possible that it fails because it takes too long to start the server in each test?
    From past experience if you're running in multi-locale mode and you think it stalled, do a ps -ef|grep arkouda_server to see if anything is running. Chances are the server process died out and the client in the test is waiting to reconnect ... which is never going to happen.
    Everything should pass and it looks like you have enough failures where I'd kill the make test-all process and then re-run just one of the test files manually with the -soption to see what's going on.
    Louis Jenkins
    Weird its hanging but... ps -ef|grep arkouda_server yields a running server
    Categorial Test, which ran perfectly the first time around, hangs on the 4th test
    Oh nvm its still running even after the test ended
    OH NVM
    I was looking at the command itself; the server did indeed die
    How do I handle this issue? Is it killed due to default timeout?
    No, the server likely seg-faulted during the test. I'd start with a single test class and single test and run it with the -s option so you can see the output
    Also you should probably re-compile Arkouda with the ARKOUDA_DEVELOPER flag turned on (i.e. export ARKOUDA_DEVELOPER=1)
    This should display the name of the test running, which can help narrow down to a specific test that is hanging and then we can debug from there:
    python3 -m pytest tests/categorical_test.py::CategoricalTest -v
    Louis Jenkins
    E       AssertionError: 'type[32 chars] of (Categorical, str, str_); got int instead' != 'type[32 chars] of (arkouda.categorical.Categorical, str, num[21 chars]tead'
    E       - type of argument "other" must be one of (Categorical, str, str_); got int instead
    E       + type of argument "other" must be one of (arkouda.categorical.Categorical, str, numpy.str_); got int instead
    ========================================================================================== FAILURES ==========================================================================================
    _________________________________________________________________________________ CategoricalTest.testBinop __________________________________________________________________________________
    self = <categorical_test.CategoricalTest testMethod=testBinop>
        def testBinop(self):
            cat = self._getCategorical()
            catDupe = self._getCategorical()
            catNonDupe = self._getRandomizedCategorical()
                                       True,True,True]) == cat._binop(catDupe,'==')).all())
                                       False,False,False,False]) == cat._binop(catDupe,'!=')).all())
                                       False,False,False,False]) ==
                                       cat._binop('string 1', '==')).all())
                                       False,False,False,False]) ==
                                       cat._binop(np.str_('string 1'), '==')).all())
            self.assertTrue((ak.array([False,True,True,True,True,True,True,True,True,True]) ==
                       cat._binop('string 1', '!=')).all())
            self.assertTrue((ak.array([False,True,True,True,True,True,True,True,True,True]) ==
                       cat._binop(np.str_('string 1'), '!=')).all())
            with self.assertRaises(NotImplementedError):
                cat._binop('string 1', '===')
            with self.assertRaises(TypeError) as cm:
                cat._binop(1, '==')
    >       self.assertEqual(('type of argument "other" must be one of (Categorical, str, str_);' +
                              ' got int instead'),
    E       AssertionError: 'type[32 chars] of (Categorical, str, str_); got int instead' != 'type[32 chars] of (arkouda.categorical.Categorical, str, num[21 chars]tead'
    E       - type of argument "other" must be one of (Categorical, str, str_); got int instead
    E       + type of argument "other" must be one of (arkouda.categorical.Categorical, str, numpy.str_); got int instead
    E       ?                                          ++++++++++++++++++++                  ++++++
    tests/categorical_test.py:129: AssertionError
    ================================================================================== short test summary info ===================================================================================
    Hash is above
    What's your version of numpy?
    Louis Jenkins
    Name: numpy
    Version: 1.20.1
    Summary: NumPy is the fundamental package for array computing with Python.
    Home-page: https://www.numpy.org
    Author: Travis E. Oliphant et al.
    Author-email: None
    License: BSD
    Location: /home/users/p02405/anaconda3/lib/python3.8/site-packages
    Required-by: tifffile, tables, statsmodels, seaborn, scipy, scikit-learn, scikit-image, PyWavelets, pyerfa, patsy, pandas, numexpr, numba, mkl-random, mkl-fft, matplotlib, imageio, hdflow, h5py, Bottleneck, bokeh, bkcharts, astropy
    Ah, ok, we're not compatible with 1.20.x yet (Bears-R-Us/arkouda#670), you'll need to downgrade to the 1.19.x line
    Louis Jenkins