Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Ahsan Barkati
    @ahsanbarkati

    Hey devs,

    I have been trying to get the stats of je malloc. I wrote this sample program:

    #include<stdlib.h>
    #include<jemalloc/jemalloc.h>
    #include<stdio.h>
    #include <unistd.h>
    
    void get(char *s){
        long out = 0;
        size_t sz = sizeof(out);
        je_mallctl(s, &out, &sz, NULL, 0);
        printf("%s: %ld\n", s, out);
    }
    
    int main(){
        int* a[10];
        for(int i=0;i<10;i++){
    
            int n = 500<<20;
            a[i] = (int*) je_calloc(n, sizeof(int));
    
            for(int j=0;j<n;j++)
                a[i][j] = j;
    
            get("stats.allocated");
            get("stats.active");
            get("stats.resident");
            get("stats.mapped");
            printf("\n");
            sleep(2);
        }
        return 0;
    }

    I expect to see the active/resident memory to increase in each iteration, but I see them to be constant. Can you please let me know if I am doing anything wrong?

    I am running it using this command:
    cc je.c -o je -I`jemalloc-config --includedir` \
    -L`jemalloc-config --libdir` -Wl,-rpath,`jemalloc-config --libdir` \
    -ljemalloc `jemalloc-config --libs` && ./je
    Ahsan Barkati
    @ahsanbarkati
    The output is:
    stats.allocated: 2147512640
    stats.active: 2147540992
    stats.resident: 2153078784
    stats.mapped: 2162225152
    
    stats.allocated: 2147512640
    stats.active: 2147540992
    stats.resident: 2153078784
    stats.mapped: 2162225152
    
    stats.allocated: 2147512640
    stats.active: 2147540992
    stats.resident: 2153078784
    stats.mapped: 2162225152
    ...
    David Goldblatt
    @davidtgoldblatt
    The big thing is that you need an epoch mallctl call before being able to get up-to-date stats
    Two smaller things are that those stats calls take size_t rather than long (which are usually the same size, so it’s probably not the cause of the issue), and it would be wise to check the return value of mallcltl
    Ahsan Barkati
    @ahsanbarkati
    @davidtgoldblatt Thanks, that solves the issue. One followup question, do we have an overhead/performance downside while calling mallctl epoch?
    David Goldblatt
    @davidtgoldblatt
    It’s reasonably expensive
    It takes most of the malloc locks in the process
    Ahsan Barkati
    @ahsanbarkati
    Thanks a lot @davidtgoldblatt. Can you please point me to the code where we handle the epoch mallclt?
    David Goldblatt
    @davidtgoldblatt
    Sure, it’s in src/ctlc.
    src/ctl.c rather
    the function epoch_ctl
    Ahsan Barkati
    @ahsanbarkati
    Thanks
    David Goldblatt
    @davidtgoldblatt
    No problem
    Ahsan Barkati
    @ahsanbarkati
             uint64_t epoch;                                                                             
             size_t u64sz;                                                                               
    
             epoch = 1;                                                                                  
             u64sz = sizeof(uint64_t);                                                                   
             err = je_mallctl("epoch", (void *)&epoch, &u64sz, (void *)&epoch,                           
              sizeof(uint64_t));
    Just wanted to confirm, if this is the right way to call the epoch
    David Goldblatt
    @davidtgoldblatt
    Yep, that looks right to me
    Dave Rigby
    @daverigby
    Hey. (Hopefully) qq - how much of an “optimization” is sdalloc over regular dallocx in typical usage? Are we talking 10% faster, or 10x faster?
    David Goldblatt
    @davidtgoldblatt
    Somewhere in between them
    running malloc/dallocx in a loop takes around 12.8 ns/iteration on my machine
    and malloc/sdallocx takes around 7.2ns / iteration
    David Goldblatt
    @davidtgoldblatt
    (this is all test/stress/microbench)
    But also note that that includes some dynamic linking overhead, which should subtract some small fixed amount from both numbers
    Also, in real workloads, the metadata lookups for dallocx will sometimes be cache or TLB misses
    It was a lot of work for us to turn on sized deallocation (via -fsized-deallocation), but it was definitely worth it
    Dave Rigby
    @daverigby
    @davidtgoldblatt Thanks, that’s some useful numbers. Was trying to get a sense of if it’s worth us making us of sdalloc for C++ sized delete.
    David Goldblatt
    @davidtgoldblatt
    It’s definitely worth it; to some extent, we don’t really consider free to be on the fast path any more (“If the caller really cared about performance, they’d be using sized deletion”). This isn’t really true in that we wouldn’t add extra debug checks and whatnot the way we do on truly slow paths (like thread cache flushing), but it does give a sense of how much drastically hotter the sdallocx pathways are
    Dave Rigby
    @daverigby
    Super, thanks
    Arighna Chakrabarty
    @ArighnaIITG

    Hi @davidtgoldblatt ,
    I am trying to debug a memory leak issue in my nginx code
    I have configured jemalloc using the flag --enable-prof, and then built nginx with jemalloc.

    I export both the MALLOC_CONF and the LD_PRELOAD env variables.

    MALLOC_CONF=prof_leak:true,lg_prof_sample:0,prof_final:true
    LD_PRELOAD=/usr/local/lib/libjemalloc.so.2

    But, then I get a segmentation fault whenever I try to issue any nginx related command.

    image.png
    What should I do here? My goal is to check functional leaks in the nginx code.
    Dave Rigby
    @daverigby
    @ArighnaIITG hard to say with that amount of info. I’d suggest runnning it under gdb or similar debugger and seeing exactly where the segfault is coming from.
    CodeATA
    @CodeATA
    Hi all. Is there a way to cross compile a RISC-V jemalloc on an x86 machine?
    David Goldblatt
    @davidtgoldblatt
    I think it should be possible, with appropriate config settings (see INSTALL.md), and possibly setting some cache variables (although I think this should be unnecessary these days)
    Victor Oliveira
    @victormatheus
    Hi folks, any help would be much appreciated. I'm trying to tune jemalloc to reduce the amount of minor page-faults in my application. What I got so far is to loop over all arenas setting arena.{}.dirty_decay_ms and arena.{}.muzzy_decay_ms to -1. I have confirmed they work by querying with mallctl. According to the docs it seems this would cause jemalloc to never return memory to the system (so no madvise syscalls I believe). Maybe I'm just misunderstanding how this is supposed to work... In any case, after doing that I'm still seeing that malloc and calloc are calling into pages_purge_forced in pages.c which in turn calls madvise.
    David Goldblatt
    @davidtgoldblatt
    @victormatheus, we'll still call madvise in a couple of places, even with decay turned off
    calloc is one (where we use it to zero pages)
    Actually, I think that might be it
    I was going to say that we'll also call it if a deallocation exceeds the oversize threshold, but thinking more I don't think that's correct
    Victor Oliveira
    @victormatheus
    @davidtgoldblatt thanks a bunch for the quick reply! Do you think the general approach makes sense for reducing page faults? I think I'm still seeing page purges on normal malloc, I wonder if it could be because I'm using a quite old version of jemalloc (5.0.1).
    Victor Oliveira
    @victormatheus
    As another topic, we've tried to upgrade jemalloc but that caused a lot of perf regressions, I'm wondering if folks have hit that in the past and ways I could approach the problem a track it down.
    David Goldblatt
    @davidtgoldblatt
    Huh, it could be
    In general, on the workloads we test against, increasing versions of jemalloc strictly improve on memory and CPU
    Do you have a sense of where the regressions are coming from?
    (E.g. differences in perf profiles)?
    Jack·Boos·Yu
    @JackBoosY
    Hi guys, today I received an issue about jemalloc-cmake, it cannot be configured on the arm64 window, any suggestions?

    The out put is:
    Change Dir:
    F:/vcpkg/buildtrees/jemalloc/arm64-windows-dbg/GetPageSize/CMakeFiles/CMakeTmp

    Run Build
    Command(s):F:/vcpkg/downloads/tools/ninja/1.10.1-windows/ninja.exe
    cmTC_f4bb5 && [1/2] Building C object
    CMakeFiles\cmTC_f4bb5.dir\getpagesize.c.obj

    [2/2] Linking C executable cmTC_f4bb5.exe

    Jack·Boos·Yu
    @JackBoosY
    RUN_OUTPUT: This version of %1 is not compatible with the version of Windows you're running. Check your computer's system information and then contact the software publisher
    Arm programs cannot be run on non-arm machines. I think we should only use build instead of running it.
    Jack·Boos·Yu
    @JackBoosY
    Feel free to discuss this issue in microsoft/vcpkg#15660.