Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Mario Mintel
    @mintelm
    hi, is there a configuration option for jemalloc to limit the allocation to a single VMA?
    i'm trying to implement a different CoW approach in redis as research, and redis uses jemalloc and it would be really nice if redis would only use a single VMA for its data
    Dave Rigby
    @daverigby
    by “VMA” do you mean a single mmap() from the OS? I think you can do something like that via the extent_hooks API which basically lets you have full control of how jemalloc manages extent lifetime.
    Mario Mintel
    @mintelm
    i think a single mmap() would be sufficient yes, i just want the data to be contained in a single area (e.g., data should be in 0x000000 - 0x400000 instead of 0x000000 - 0x200000 and 0x400000 - 0x600000)
    Dave Rigby
    @daverigby
    So yes, the extent hook stuff should sort you out. It might be more straightforward to still create (mmap) memory from the OS on-demand when jemalloc wants it, but you could to that from a pre-reserved region you have requested from the OS.
    Mario Mintel
    @mintelm
    ok, i will look into that, thank you
    Paul Smith
    @madscientist
    Hi all. Our software has been using jemalloc 4.x for 10 years or so and we've been sitting with the very latest commit on the stable-4 branch for the last 4 years. I've been idly toying with the idea of updating to the latest 5.x version. We made some small tweaks to 4.x but the one major feature we added was the ability to dump profiling to a memory buffer as an alternative to writing to a file: our software is a distributed system and we need to be able to collect profiling over the network in response to requests for metrics on different nodes in the system: writing to a file is not helpful in our environment. Of course, I can forward-port the changes but I was wondering if anything like this already exists in jemalloc 5.x, or if there are any thoughts on what would be a good way forward?
    Qi Wang
    @interwq

    @madscientist You can use mallctl(“prof.dump”…) with a filename pointing to shared memory (e.g. /dev/shm).

    Another direction, is that we plan to add a prof dump hook, i.e. a user defined callback invoked upon prof dump, which can be used to redirect the output. However it’s still an experimental feature (and will be unofficial / undocumented in the upcoming 5.3 release).

    Paul Smith
    @madscientist
    Thanks. However we use jemalloc on Linux, MacOS, and Windows and would prefer to not create OS-specific solutions for this.
    skygragon
    @skygragon
    Hi there, is it possible that chunk->node->en_arena is null? I hit a core during realloc due to an empty en_arena
    image.png
    skygragon
    @skygragon
    image.png
    David Goldblatt
    @davidtgoldblatt
    I don't think so, but that code path also indicates a fairly old version of jemalloc, so it's somewhat before my time
    Qi Wang
    @interwq

    @madscientist In that case, you can take a look at jemalloc/jemalloc#2119 which added the experiemental prof hooks. Note that there are 2 hooks — one for backtrace (when sampling an allocation), and the other is the dump_hook which is triggered when output profiles.

    Feel free to let us know if it fits your use case.

    Paul Smith
    @madscientist
    Thanks for the pointer. It doesn't quite work for me; it appears that this will still create a file and dump all the profiling to it, then it will invoke the hook you register with the filename. To use this, I would need to implement a hook which would open and read that file into a buffer then send the content over the network, then delete the file. Of course, this would work in general. However what I really want is a way to replace the entire write of the data so that instead of writing to a file it will write directly to a memory buffer, then when the dump is complete I will send that memory buffer over the network. I don't want the data to be written to a file then read back in.
    Peter Fraenkel
    @pnf
    @interwq I am intrigued by that PR for a different reason. The ability to insert a stack frame (as illustrated in test_prof_backtrace_hook_augment) would be incredibly useful (e.g. to set some kind of context thread-locally from java), but I can't see how to make what you add show up meaningfully in the jeprof shell or visualizations.
    Qi Wang
    @interwq
    @pnf the use case we have right now is integrating python frames with native frames — it does require quite a bit of work on the consumer side, e.g. building a python frame lookup database on the fly, then combine the results when post-process. But like you said it’s not part of jeprof. The end results are quite nice though: it gives us an unified view with both python and native stacks in the same dataset.
    Alexander Lapenkov
    @Lapenkov
    @madscientist alas, this is the approach we decided to take (callback after writing to a file). I don't think we'll change the interface. You may try forward porting the change, it seems that change in prof between 4.x.x is not that dramatic.
    mandagod
    @mandagod
    Have any good Android jemalloc optimizition suggestion?
    satibabu
    @satibabu
    hi, I am new to jemalloc and trying to configure the build for my use case. When I set tcache_max in the --with-malloc-conf, it doesn't seem to take in to affect. When I read tcache_max via mallctl it returns 0 for opt.tcache_max and 32K for arenas.tcache_max. What is the right way to set this? Also what is the different between opt.tcache_max vs arenas.tcache_max
    Qi Wang
    @interwq
    What’s the full configure cmd —with-malloc-conf you are using?
    satibabu
    @satibabu
    --with-malloc-conf=narenas:48,tcache:true,background_thread:true,tcache_max:4096,dirty_decay_ms:3000,muzzy_decay_ms:3000 --disable-fill
    Qi Wang
    @interwq
    Looks fine to me. Just to confirm, you are using jemalloc 5.3, is that right?
    tcache_max is added in 5.3, previously it was lg_tcache_max and cannot be smaller than 16K (assuming 4k page size)
    satibabu
    @satibabu
    That explains, I am using 5.2.1. Thanks for the pointer on lg_tcache_max, I will try that
    BTW I do have an higher level question. I am trying to compare tcmalloc with jemalloc. The memory footprint when using jemalloc more than. doubles compared to tcmalloc. What attributes are controlling this?
    Qi Wang
    @interwq
    No problem. For lg_tcache_max, the default is 15 (2^15==32K), and lowest you can go is 14.
    If you can obtain malloc_stats, I’ll be happy to take a look (e.g. from malloc_stats_print(), or use the interval based stats output in 5.3)
    satibabu
    @satibabu
    All my online reading seems to indicate jemalloc footprint is smaller with better performance compared to tcmalloc. I am trying to see that results in my application
    Qi Wang
    @interwq
    But generally I’d say try jemalloc 5.3 first
    satibabu
    @satibabu
    Thanks again. Let me try jemalloc 5.3 and will revert back with my results
    satibabu
    @satibabu
    @interwq Do yo have recommendation on configuration for performance?
    Qi Wang
    @interwq
    The options in https://github.com/jemalloc/jemalloc/blob/dev/TUNING.md should be a good starting point. Depending on which metrc (memory vs CPU) you value more, there are several options. Usually decay and narenas are most effective. Also enable bg thds will solve some edge cases.
    satibabu
    @satibabu
    Sure thanks
    Qi Wang
    @interwq
    NP. The stats_interval option can output stats easily. If anything unexpected, feel free to share the stats output
    satibabu
    @satibabu
    Do you think fill option affects performance dramatically?
    Qi Wang
    @interwq
    Certainly. However it’s default off in the opt mode, since it’s mainly useful for debugging. Are you enabling it?
    satibabu
    @satibabu
    default was on as I see it from configure output
    is there a way to compile in opt mode?
    Qi Wang
    @interwq

    config_fill is on, which means the binary is fill-capable — there’s also a runtime swtich opt.junk, which is default off in opt mode.

    As long as you don’t specify —enable-debug, it should be opt build

    satibabu
    @satibabu
    Ran quick test with 5.3 on a 48 core system. with narenas=192 (4 x the cores) my app allocated 8.0G of memory with 3.1G resident
    With narenas=48 , the allocated reduced to 7.3G with 3.0G resident
    Qi Wang
    @interwq
    What do you mean by allocated? If it’s the jemalloc counter, that shouldn’t change with different narenas.
    satibabu
    @satibabu
    This is from top ... by 'allocated' I meant VIRTUAL memory allocated and RESIDENT meaning commited physical pages
    Qi Wang
    @interwq
    Oh please don’t pay attention to the VM size — since jemalloc 5.0, we started caching VM and also being generous on VM reservation. See https://jemalloc.net/jemalloc.3.html#opt.retain
    satibabu
    @satibabu
    OK, how can I get the resident memory footprint down? RIght now its double the tcmalloc at similar performance level
    Qi Wang
    @interwq
    Can you try running with MALLOC_CONF=stats_print:true? It will print stats at the end of the program.
    satibabu
    @satibabu
    well my application cannot terminate. The way I am doing is start my application workload, sample these stats. Is there a way to dump jemalloc stats somewhere without terminating the application
    Qi Wang
    @interwq
    In that case, try MALLOC_CONF=stats_interval:1000000000, in which case it prints stats ~every 1G of allocations
    satibabu
    @satibabu
    is there a way to write the stats to a file?
    Qi Wang
    @interwq
    No such option built in. Maybe try redirect stdout to a file.