Hi,
I am new to these algorithms and curious about the differences between Sliding HyperLogLog(https://hal.archives-ouvertes.fr/hal-00465313/file/sliding_HyperLogLog.pdf) vs HyperLogLog Series. I want to create thousands of counters for providing sliding window based cardinality estimation. e.g. cardinality estimate for last 30 days, last 7 days and last 24 hours.
I am trying to find the answers to these questions and it would be really helpful if I get a quick response.
Thanks.
Hey, I have a case class Thing(name: String)
. I need to "reduce" a Set[Thing]
into a Set[Thing]
where resultant set is the one with a max count of identical names. That is,
Set(Thing("Cory"), Thing("Cory"), Thing("Ahmad"), Thing("Kevin"), Thing("Kevin"))
"reduces" toSet(Thing("Cory"), Thing("Cory"), Thing("Kevin"), Thing("Kevin"))
.
How do I neatly put this into one of the structures defined in algebird?
Max
looked promising at first but I still don't see how to leverage it.
Map[K, Long]
where you keep track of the counts, might make it clearer
maxRhow
represents here: https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/HyperLogLog.scala#L393Max[Byte]
there.
ExpHist
as a Monoid
, and also serialize it in and out. I've essentially solved the serialization issue, but there doesn't seem to be a Monoid
instance for ExpHist
, even though it has empty
and add
. Is there some issue around the laws I should be aware of? I see that I can take approximateSum
, which gives me an Approximate
, so that might work, if there were an easy way to go back from Approximate
to ExpHist
for the sake of the serialization. Any and all thoughts are appreciated. :-)