Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Yao Yue
    @thinkingfish
    I will reintroduce it in the ccommon repo and pull that into pelikan again, bypassing the review process so I don’t have to wait to make the rest of the changes.
    since the code is already vetted
    The changes are contained in this PR: twitter/pelikan#8
    Sebastian Waisbrot
    @seppo0010
    Sorry for that!
    Yao Yue
    @thinkingfish
    No worries. The dependency model/process is confusing to the least.
    I will however have the changes for review when I send a PR in pelikan
    So if there are problems we can fix it then
    (since pelikan PR will contain any diff in deps/ccommon too)
    Sagar Vemuri
    @sagar0
    hey, didn't know that you are already communicating here!
    :thumbsup:
    Yao Yue
    @thinkingfish
    @seppo0010 merged your two ccommon PR after testing on linux :smile:
    Sebastian Waisbrot
    @seppo0010
    :grinning:
    Yao Yue
    @thinkingfish
    OK catching up on my github backlog, after clearing out my email backlog
    Sebastian Waisbrot
    @seppo0010
    no meeting today, I assume?
    Yao Yue
    @thinkingfish
    Folks I made good on my promise on tests, 96.6% line coverage for time module :)
    Yao Yue
    @thinkingfish
    @seppo0010 it's now public :)
    Sagar Vemuri
    @sagar0
    Congratulations all!
    :smile:
    Manju Rajashekhar
    @manjuraj
    Congrats Folks :)
    Brings back memories of cache and coding in C!
    Yao Yue
    @thinkingfish
    @manjuraj How’s scala treating you? ;)
    btw, I think we need to steal more code from fatcache
    Manju Rajashekhar
    @manjuraj
    @thinkingfish it more like scala and python and GPUs :)
    Is there a SSD backend to pelican? that would be amazing!
    Really love how the pelican is architected - looks like lot of thought and attention has gone into building this
    Yao Yue
    @thinkingfish
    @manjuraj Sure! It was a shame that fatcache was never used in Twitter’s production, and partly because we don’t want to maintain too many codebases. But now that hurdle can be removed if we merge the relevant components into pelikan.
    We basically can start with just moving slab.[ch], item.[ch], itemx.[ch] and sha1.[ch] and form a new storage module called slab_ssd
    There are $$$ to be saved with some of our biggest memory-dominant use cases
    I guess the only uncertainty here is hardware provisioning- the FE platforms are not equipped with SSD, and adding SSD would mean platform diversity in cache’s HW fleet
    Manju Rajashekhar
    @manjuraj
    Nice! Agree would the $$$ saved which would be huge -- which is why it was developed in the first place!
    Let me know if I can be of any help around that
    Yao Yue
    @thinkingfish
    I will certainly bug you when I have questions :)
    Manju Rajashekhar
    @manjuraj
    :) i'll be here lurking around
    Ben Manes
    @ben-manes
    I had lunch with Dormando (memcached) recently about TinyLFU to improve the hit rate. That might be a good fit for pelikan too. (Overview: http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html)
    Yao Yue
    @thinkingfish
    interesting. The reason I looked at, and decided to shelf something like ARC, was the memory overhead
    but 8 bytes is affordable
    @ben-manes I haven’t finished the paper yet, but do you have insight into how well it works with spiky access patterns?
    E.g. an item got very hot in a small time window, accumulating a large counter value, but fades quickly afterwards.
    Would the algorithm allow such an item to “overstay” its usefulness because of the count?
    Yao Yue
    @thinkingfish
    Another thing I need to think about is how this complicates slab-based eviction. The tradeoffs among (fragmentation, size distribution lock-in, cache replacement granularity) is often hard to reason clearly.
    Ben Manes
    @ben-manes
    Caffeine only used a CountMinSketch (8b/entry). Early experiments with the doorkeeper show that could be reduced by 4x. There is also the possibility of using more compact sketches, like a Cuckoo filter (instead of a bloom filter), and growing sub-linearly assuming a Pareto distribution means most will have a low count
    The use of SLRU allowed the victim to be chosen reasonably well and fade out in a reasonable time frame if mispredicted. Since most accesses are to the same hot entries, the reset function of (sample at 10x the size) ages at a good rate too. All of the traces I have so far are very favorable. Yahoo! is running an integration into HBase to see how it effects their workloads. If you have traces I can run them in easily
    Ben Manes
    @ben-manes
    @thinkingfish Also the sketch uses saturated counters so within a sample period the maximum is 15 (4-bit). This means that with the window, reset, and probation space a spiky access patter should fade out quickly. I'm in SF four days a week so if it helps, I can meet up for lunch to discuss details.
    Yao Yue
    @thinkingfish
    @ben-manes let me read the literature/implementation more carefully and I will ping you with any questions/follow-up. Thanks for suggesting this – it is very interesting!
    Ben Manes
    @ben-manes
    @thinkingfish fyi, the Cassandra folks noticed a HashDoS exploit. Java only has 32-bit hash codes and the sketch used a random seed, but is TinyLFU is a deterministic filter. We added a small amount of randomness without degrading the hit rate or performance.
    Yao Yue
    @thinkingfish
    :thumbsup:
    Yao Yue
    @thinkingfish
    @kevyang twitter/ccommon#124
    Yao Yue
    @thinkingfish
    Homebrew/homebrew-core#769
    Yao Yue
    @thinkingfish
    twitter/pelikan#114
    Not a huge fan of dockerfile syntax