Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    dgrichardson
    @dgrichardson
    I got an arkouda server running (4 nodes with gasnet and an ibv conduit), and connected a notebook. The online instructions were reasonably straight forward to follow.
    Brad Chamberlain
    @bradcray
    Great, congrats! (and thanks for the update)
    dgrichardson
    @dgrichardson
    :)
    Michael Merrill
    @mhmerrill
    awesome!
    we are about to get on our weekly arkouda call if you want to join for a min
    dgrichardson
    @dgrichardson
    Sure. How do I join?
    Michael Merrill
    @mhmerrill

    Zoom Invite

    Michael Merrill is inviting you to a scheduled Zoom meeting.

    Topic: Arkouda Weekly Zoom Meeting Time: recurring meeting Tuesdays @ 1pm ET

    Join Zoom Meeting https://us04web.zoom.us/j/77717000423?pwd=TGlmaUN3L2hScFovTy9NRXNnUTE5dz09

    Meeting ID: 777 1700 0423 Passcode: kjM3WS

    it’s just a half hour for people to touch base
    dgrichardson
    @dgrichardson
    I didn't see how to get the server to use all (or most of) the available memory. Each node has 750GB. gasnet is saying it will only pin 335 GB per node (it looks like my ulimit is unlimited), and the server is saying it will only use 486 GB of RAM.
    Probably I missed some config options?
    Michael Merrill
    @mhmerrill
    there is a command line flag but it needs to beable to see the RAM from an OS call
    OS is called from a Chapel module to get physical memory limit
    black magic incantation ;-)
    Elliot Ronaghan
    @ronawho

    During your first run you should have gotten a message from gasnet recommending a value for GASNET_PHYSMEM_MAX. This is usually ~2/3 of physical memory or a limit set by the HCA. This limits how much memory can be pinned, but not how much physical memory you can allocate (just how much can be pinned / communicated at any given time).

    The amount the server reports comes from https://chapel-lang.org/docs/modules/standard/Memory/Diagnostics.html#Diagnostics.locale.physicalMemory, which should just be the physical memory of a system. Can you run free -g on the nodes to verify the OS reports ~750G and not ~512G?

    dgrichardson
    @dgrichardson
    You are right about 512GB. I should have run free my self instead of just reporting what the sysadmin said. :) But it sound s like that is fine, because it does not affect max useable memory.
    Michael Merrill
    @mhmerrill
    there is a command line flag to set max percent of physical memory useable for large trackable objects
    the default is set to 90%
    i think
    dgrichardson
    @dgrichardson
    How about how much memory the server is willing to use? When the server starts it prints out memory limit = 486....
    If it is in bytes it is around 486GB
    Michael Merrill
    @mhmerrill
    yes
    there are various combinations you can use but it gets complicated if you want to say run 2-locales per physical node
    dgrichardson
    @dgrichardson
    Is that limit per node?
    Because then it would be using most of the RAM and I just read the output incorrectly.
    Michael Merrill
    @mhmerrill
    i think it is i’ll look in the code
    you need to leave a bit of memory for things like stack vars and small stuff the memory tracker is not tracking
    it tracks everything based on locale-0 allocations
    and the amount of physical memory we can discover from the OS
    dgrichardson
    @dgrichardson
    So sounds like everything is working and it is basically using all the RAM. :)
    One thing I had a bit of trouble with was verifying IB was being used. I ended up doing a capture on the HCA and making sure there was RDMA traffic where I thought chapel was running. In the end it looked good.
    2 replies
    I also got a warning that CHPL_TARGET_CPU was set to unknown. Is this a big deal?
    Michael Merrill
    @mhmerrill
    @ronawho ^^^
    Elliot Ronaghan
    @ronawho
    CHPL_TARGET_CPU (https://chapel-lang.org/docs/usingchapel/chplenv.html#chpl-target-cpu) basically controls CPU specialization (march in gcc). It defaults to unknown for multi-locale since we don't if chapel is cross-compiling. If your login node is the same ISA as compute nodes you could set it to native. Otherwise, setting to none will quiet the warning (though that will probably trigger a rebuild). In all the tests we've done Arkouda performance does not benefit from target architecture specialization at the moment, so leaving it unknown or setting it to none should not impact performance.
    Michael Merrill
    @mhmerrill
    i was wrong on the call about my build times: old was ~190sec, new is 275sec
    24 replies
    so that is 85sec difference
    dgrichardson
    @dgrichardson
    We got ~1.2 TB of data into the server with pretty close to 100% CPU utilization on all 4 nodes when doing operations. Sometimes when trying to use too much memory gives an error on the client. Other times it looks like the server crashes. When the crash happened the server printed this:

    home/richard/chapel/arkouda//src/RadixSortLSD.chpl:159: error: Out of memory allocating "array elements"

    /home/richard/chapel/arkouda//src/RadixSortLSD.chpl:159: error: Out of memory allocating "array elements"

    /home/richard/chapel/arkouda//src/RadixSortLSD.chpl:159: error: Out of memory allocating "array elements"

    /home/richard/chapel/arkouda//src/RadixSortLSD.chpl:159: error: Out of memory allocating "array elements"

    Michael Merrill
    @mhmerrill
    we made some changes to the memory tracking recently and these are probably escapes from the tracking
    you could tune down the percentage of memory useable by the server
    using the command line parameter
    sorry this is a finecky thing, this tracking is in place try to keep the server from crashing but sometimes after changes/PRs things get a little brittle
    Elliot Ronaghan
    @ronawho
    The other aspect is that Arkouda memory tracking is based on available physical memory, and not all memory is available to Chapel in all configs. Could you try setting export GASNET_MAX_SEGSIZE="0.95/H" to see if that helps (should prevent you from getting Chapel OOMs, but you will likely still get the Arkouda client ones)
    Michael Merrill
    @mhmerrill
    @ronawho (I think) made a memory tracking change recently to make the server go faster but I think we now have an issue
    9 replies
    maybe
    Michael Merrill
    @mhmerrill
    @dgrichardson you could open an issue if you want in the arkouda repo detailing your crash
    pleas
    please
    dgrichardson
    @dgrichardson
    There seem to be two sets of formatting in the stdout. Most have the timestamp and look like they are from some kind of logging module that formats things. The other have no timestamp and a line number from .chpl and look more like a raw printf. Maybe those are the escapees? I'll make a report if I can get it to happen again. I didn't realize how much stdout there would be, and it has scrolled off my terminal.