I'm fairly certain the reason for hanging/failure was the server process dying and the client ZMQ socket waiting for reconnection. If that's the case the test will never actually complete and should be considered failed as soon as the server process dies.
However, if you want to try to increase the time out you can install the pytest-timeout
plugin and it enables the cmd line arg --timeout=300
where the value is in seconds. An alternative is to try and use pytest marks to set something on a specific test @pytest.mark.timeout(10, "slow", method-"thread")
(from the pytest docs)
Zoom Invite
Michael Merrill is inviting you to a scheduled Zoom meeting.
Topic: Arkouda Weekly Zoom Meeting Time: recurring meeting Tuesdays @ 1pm ET
Join Zoom Meeting https://us04web.zoom.us/j/77717000423?pwd=TGlmaUN3L2hScFovTy9NRXNnUTE5dz09
Meeting ID: 777 1700 0423 Passcode: kjM3WS
During your first run you should have gotten a message from gasnet recommending a value for GASNET_PHYSMEM_MAX
. This is usually ~2/3 of physical memory or a limit set by the HCA. This limits how much memory can be pinned, but not how much physical memory you can allocate (just how much can be pinned / communicated at any given time).
The amount the server reports comes from https://chapel-lang.org/docs/modules/standard/Memory/Diagnostics.html#Diagnostics.locale.physicalMemory, which should just be the physical memory of a system. Can you run free -g
on the nodes to verify the OS reports ~750G and not ~512G?
CHPL_TARGET_CPU
(https://chapel-lang.org/docs/usingchapel/chplenv.html#chpl-target-cpu) basically controls CPU specialization (march
in gcc). It defaults to unknown for multi-locale since we don't if chapel is cross-compiling. If your login node is the same ISA as compute nodes you could set it to native
. Otherwise, setting to none
will quiet the warning (though that will probably trigger a rebuild). In all the tests we've done Arkouda performance does not benefit from target architecture specialization at the moment, so leaving it unknown
or setting it to none
should not impact performance.