I am new to these algorithms and curious about the differences between Sliding HyperLogLog(https://hal.archives-ouvertes.fr/hal-00465313/file/sliding_HyperLogLog.pdf) vs HyperLogLog Series. I want to create thousands of counters for providing sliding window based cardinality estimation. e.g. cardinality estimate for last 30 days, last 7 days and last 24 hours.
I am trying to find the answers to these questions and it would be really helpful if I get a quick response.
Hey, I have a
case class Thing(name: String). I need to "reduce" a
Set[Thing] into a
Set[Thing] where resultant set is the one with a max count of identical names. That is,
Set(Thing("Cory"), Thing("Cory"), Thing("Ahmad"), Thing("Kevin"), Thing("Kevin")) "reduces" to
Set(Thing("Cory"), Thing("Cory"), Thing("Kevin"), Thing("Kevin")).
How do I neatly put this into one of the structures defined in algebird?
Maxlooked promising at first but I still don't see how to leverage it.
Map[K, Long]where you keep track of the counts, might make it clearer
maxRhowrepresents here: https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/HyperLogLog.scala#L393
Monoid, and also serialize it in and out. I've essentially solved the serialization issue, but there doesn't seem to be a
ExpHist, even though it has
add. Is there some issue around the laws I should be aware of? I see that I can take
approximateSum, which gives me an
Approximate, so that might work, if there were an easy way to go back from
ExpHistfor the sake of the serialization. Any and all thoughts are appreciated. :-)