Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    David Goldblatt
    @davidtgoldblatt
    I know @santagada has been looking at this a little bit
    We're in this weird situation where none of the core team know much about the windows perf/dev environment
    But support for it as a platform was hard-won, and not something we want to give up
    So I'm sure there's lots of low-hanging fruit
    (And indeed, @santagada has found some)
    Jason Gibson
    @jasongibson
    I'd be happy to share some profiling data for a sample workload. Perhaps that'd be useful as a point of comparison to the more familiar unix behavior.
    David Goldblatt
    @davidtgoldblatt
    Yes I think so if you don't mind
    Even if we don't work on it right away, knowing where the bodies are buried still might be helpful
    Jason Gibson
    @jasongibson
    Sure, I'll do that. Would that discussion be best done here, email, or somewhere else?
    David Goldblatt
    @davidtgoldblatt
    Github issue maybe?
    Jason Gibson
    @jasongibson
    It will probably be next week before I'll be able to do it, though.
    Ok
    David Goldblatt
    @davidtgoldblatt
    Oh no problem at all
    gnzlbg
    @gnzlbg
    Are there any constraint on the alignment passed to sdallocx ?
    If I make an allocation with an alignment requirement of 128, can I call sdallocx with an alignment requirement of 16, or 256 ?
    Or do I have to use 128 when deallocating ?
    Qi Wang
    @interwq
    @gnzlbg : you’ll have to use the same alignment with sdallocx
    as it’s used for determine the actual size class
    In general, the *allocx APIs all require the same flags to be passed in when deallocating. Sometimes you can get away with, say arena_index or tcache_enabled, but there might be side effects.
    David Goldblatt
    @davidtgoldblatt
    (this all is not terribly well documented)
    gnzlbg
    @gnzlbg
    thanks
    Kamil Gorlo
    @kgs

    Hi all! I am trying to debug some potential memory leak in my application (nginx + luajit) but when jemalloc prof sampling is turned on on one of the nginx workers, after some time it crashes :/ I have no idea why this is happening. Any pointers if this is jemalloc problem or sth else?

    Here is backtrace from core file.

    Core was generated by `nginx: worker process                                                     '.
    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  0x00007f97ad28e4e0 in _ULx86_64_tdep_trace (cursor=cursor@entry=0x7ffcc5521510, buffer=buffer@entry=0x7f97a950c2a0, size=size@entry=0x7ffcc5521154) at x86_64/Gtrace.c:481
    481     x86_64/Gtrace.c: No such file or directory.
    Missing separate debuginfos, use: debuginfo-install bidder-http-0.4.7135-1.generic.x86_64
    (gdb) bt
    #0  0x00007f97ad28e4e0 in _ULx86_64_tdep_trace (cursor=cursor@entry=0x7ffcc5521510, buffer=buffer@entry=0x7f97a950c2a0, size=size@entry=0x7ffcc5521154) at x86_64/Gtrace.c:481
    #1  0x00007f97ad28d4ff in unw_backtrace (buffer=0x7f97a950c2a0, size=size@entry=128) at mi/backtrace.c:69
    #2  0x00007f97ad268e91 in je_prof_backtrace (bt=bt@entry=0x7ffcc5521d90) at src/prof.c:595
    #3  0x00007f97ad207158 in prof_alloc_prep (update=true, je_prof_active=<optimized out>, usize=3072, tsd=<optimized out>) at include/jemalloc/internal/prof_inlines_b.h:158
    #4  imalloc_body (tsd=<optimized out>, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2116
    #5  imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2258
    #6  realloc (ptr=<optimized out>, arg_size=<optimized out>) at src/jemalloc.c:2736
    #7  0x00007f97ac865fce in lj_mem_realloc (L=L@entry=0x7f959e256310, p=p@entry=0x0, osz=osz@entry=0, nsz=nsz@entry=3072) at lj_gc.c:818
    #8  0x00007f97ac868a5b in newhpart (hbits=<optimized out>, t=0x7f97a93fcb80, L=0x7f959e256310) at lj_tab.c:66
    #9  lj_tab_resize (L=0x7f959e256310, t=0x7f97a93fcb80, asize=0, hbits=<optimized out>) at lj_tab.c:278
    #10 0x00007f97ac868c8f in rehashtab (L=<optimized out>, t=<optimized out>, ek=<optimized out>) at lj_tab.c:387
    #11 0x00007f97ac868cc0 in lj_tab_rehash (L=<optimized out>, t=<optimized out>) at lj_tab.c:393
    #12 0x00007f97ac865a18 in gc_onestep (L=L@entry=0x7f959e256310) at lj_gc.c:665
    #13 0x00007f97ac865c55 in lj_gc_step (L=L@entry=0x7f959e256310) at lj_gc.c:689
    #14 0x00007f97ac8a376d in lj_trace_exit (J=0x7f97a9e9cf10, exptr=<optimized out>) at lj_trace.c:882
    #15 0x00007f97ac864373 in lj_vm_exit_handler () from /usr/local/bidder-http/luajit/lib/libluajit-5.1.so.2
    #16 0x000000000043e2b9 in ngx_worker_process_cycle (cycle=0x41487f6c00000000, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:750
    Backtrace stopped: previous frame inner to this frame (corrupt stack?)

    I am using trunk jemalloc + trunk libunwind compilled statically.

    Kamil Gorlo
    @kgs
    LuaJIT is compiled with GC64 support and SYSMALLOC (which effectively replaces internal allocator for system one - in this case: jemalloc)
    David Goldblatt
    @davidtgoldblatt
    My guess is that the jitted stack frames are confusing libunwind somehow
    Maybe try with the gcc unwinder?
    Some quick googling indicates that luajit might be able to be configured to emit dwarg info for jitted frames; perhaps one of the unwinders can be made to understand it
    Kamil Gorlo
    @kgs
    Hi, thanks for answer! Tried gcc, even worse, core dump almost immediately. With libunwind I can survive almost 30 minutes on high load (~100K req/sec).
    Dave Rigby
    @daverigby
    Hi all. I’m having some “fun” with jemalloc + boost TSD + MSVC - looks like boost’s tls_callback on thread destruction is getting called after jemalloc’s. This is a problem because Boost’s tls_callback deallocates memory, which calls into jemalloc (after its TSD has been deallocated) and Bad Things happen (crash in Release, assert in Debug).
    I guess my question is - is there any particular reason for the order which _tls_callback is registered - currently it is “.CRT$XLU” - i.e. between A and Z it’s called 19th, whereas Boost uses position $CLX (3rd)?
    David Goldblatt
    @davidtgoldblatt
    I don't think so
    I think in general the windows TLS stuff is some of the least-understood-by-anyone code in the codebase
    So screwiness there is probably more suspect than screwiness elsewhere
    Qi Wang
    @interwq
    IIRC, we have some workaround for deallocating after TSD cleanup (we should then initialize TSD to a minimal / no-cleanup state to fulfill the requests), down this path https://github.com/jemalloc/jemalloc/blob/da50d8ce87cb21963596825ebc5faf6d8abd4d2c/src/tsd.c#L295
    David Goldblatt
    @davidtgoldblatt
    I think the issue here is that the tsd_t itself is deallocated
    Qi Wang
    @interwq
    Like David mentioned, no idea if it’s broken on Windows though.
    Ah because Windows TSD is allocated?
    David Goldblatt
    @davidtgoldblatt
    yeah
    Qi Wang
    @interwq
    I see...
    David Goldblatt
    @davidtgoldblatt
    Is there a fundamental reason why a base_t can't live inside its corresponding arena?
    There's always a 1:1 mapping, right?
    Qi Wang
    @interwq
    Right now the 1:1 mapping is true. I was thinking about changing that though.
    I wanted to utilize the a0’s base allocator more often, for the auto / default arenas.
    For manual arenas, I think a 1:1 is needed to support reset.
    Dave Rigby
    @daverigby
    @davidtgoldblatt / @interwq - thanks for the responses, sorry I missed them (I’m on GMT here). I’ve got a local change which shifts jemalloc’s position to ‘A’ (i.e. before anything else); which fixes the issue I’m seeing. I’ll continue testing and see if it looks solid.
    Dave Rigby
    @daverigby
    Maybe one other option would be to reset tsd_t ptr to say null when it’s deallocated, so if another deallocation came in then it would be possible to know the TLS doesn’t exist anymore (not sure if jemalloc can handle that?)
    Kamil Gorlo
    @kgs
    Hi again, silly question: when I have linked libjemalloc.so.2 to my binary (Nginx in this case) is it guaranteed that all malloc calls will go through jemalloc?
    Here is ldd output:
    ldd /usr/local/bidder-http/bin/nginx 
            linux-vdso.so.1 =>  (0x00007ffc64ff3000)
            libjemalloc.so.2 => /usr/local/bidder-http/bin/libjemalloc.so.2 (0x00007fae0f48f000)
            libm.so.6 => /lib64/libm.so.6 (0x00007fae0f184000)
            libdl.so.2 => /lib64/libdl.so.2 (0x00007fae0ef80000)
            libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fae0ed64000)
            libluajit-5.1.so.2 => /usr/local/bidder-http/luajit/lib/libluajit-5.1.so.2 (0x00007fae0eae6000)
            libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fae0e885000)
            libssl.so.10 => /lib64/libssl.so.10 (0x00007fae0e613000)
            libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007fae0e1b1000)
            libz.so.1 => /lib64/libz.so.1 (0x00007fae0df9b000)
            libc.so.6 => /lib64/libc.so.6 (0x00007fae0dbce000)
            libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fae0d8c4000)
            libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fae0d6ae000)
            /lib64/ld-linux-x86-64.so.2 (0x000055c243b67000)
            libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007fae0d45f000)
            libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007fae0d178000)
            libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007fae0cf74000)
            libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007fae0cd41000)
            libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007fae0cb32000)
            libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007fae0c92d000)
            libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fae0c714000)
            libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fae0c4ed000)
    I am asking because when I run ltrace on running process I see sth like:
    exe->malloc(32768)                                                                                                                = 0x7f3353b79580
    exe->malloc(1024)                                                                                                                 = 0x7f3353b3e800
    libjemalloc.so.2->malloc(96)                                                                                                      = 0x7f335a369000
    libjemalloc.so.2->malloc(4388)                                                                                                    = 0x7f3353c28800
    libjemalloc.so.2->malloc(1792)                                                                                                    = 0x7f3354da3e00
    libjemalloc.so.2->malloc(8704)                                                                                                    = 0x7f3353c73000
    libjemalloc.so.2->malloc(4096)                                                                                                    = 0x7f335b8b2000
    libjemalloc.so.2->malloc(8192)                                                                                                    = 0x7f33539a7000
    libc.so.6->malloc(91)                                                                                                             = 0x7f335a369180
    libc.so.6->malloc(16)                                                                                                             = 0x7f335a46c7f0
    libc.so.6->malloc(16)                                                                                                             = 0x7f335a46c800
    exe->malloc(1024)                                                                                                                 = 0x7f3353b3e800
    libjemalloc.so.2->malloc(96)                                                                                                      = 0x7f335a3692a0
    libjemalloc.so.2->malloc(4388)                                                                                                    = 0x7f3353c29c00
    libjemalloc.so.2->malloc(1856)                                                                                                    = 0x7f335a1e0000
    libjemalloc.so.2->malloc(9216)                                                                                                    = 0x7f3353c75800
    libjemalloc.so.2->malloc(4096)                                                                                                    = 0x7f335414d000
    libjemalloc.so.2->malloc(8192)                                                                                                    = 0x7f33539aa000
    How to interpret that?
    Dave Rigby
    @daverigby
    It ultimately depends on how everything is linked. You might also want to look at the LD_DEBUG env var and run on debugging for symbol resolution, check it’s always linking malloc in from libjemalloc.so
    Kamil Gorlo
    @kgs
    OK, will look into this LD_DEBUG, didn't know that, thanks! Regarding how it is linked: my nginx binary is using some other *.so libraries. Should they be told somehow to use malloc from libjemalloc or linking order is important, or sth else? Some .so are loaded from LuaJIT in runtime (but I think they use jemalloc as fair I can tell).