Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Alex Levenson
@isnotinvain
a setting in codecov or in github?
Sam Ritchie
@sritchie
@isnotinvain hey, sorry!
it’s a setting in codecov
@isnotinvain that’s the thing I linked - where we change the bot name
Alex Levenson
@isnotinvain
ok thanks
Neville Li
@nevillelyh
hi just checking if there's a plan to publish 2.12 artifacts any time soon?
P. Oscar Boykin
@johnynek
0.12.4 is publish as 2.12, unfortunately, that is binary incompatible in some areas, so if you have dependencies they may need to be updated to that.
P. Oscar Boykin
@johnynek
0.13.0 is published, by the way, with 2.12 support.
and typelevel/algebra and cats compatibility (Algebird.Monoid extends cats.Monoid)
Denis Rosset
@denisrosset
@johnynek Congrats!
Michael G. Noll
@miguno
Question: Is there any existing API to serialize a CMS instance (e.g. TopPctCMS)? Or would I need to do my own thing with Kryo/Chill/whatever?
P. Oscar Boykin
@johnynek
@miguno we have not implemented that, although, the default chill stuff should "just work" pretty well.
Michael G. Noll
@miguno
thanks @johnynek
Michael G. Noll
@miguno
FYI: The CMS serde I came up with is captured in https://github.com/confluentinc/examples/blob/3.2.x-algebird/kafka-streams/src/main/scala/io/confluent/examples/streams/algebird/TopCMSSerde.scala. It's just a few lines of code, essentially it's two calls to Chill's ScalaKryoInstantiator. If you have a minute to spare, feel free to let me know if that is how Chill should be used here.
Gary Struthers
@garyaiki
Updating from 12.2 to 13 there's a compile error:value inverse is not a member of com.twitter.algebird.Field[A] and another:not found: value assertNotZero. In build.sbt I changed algebird, algebird-core to algebird-core, algebird-util. What else do I need to do?
Gary Struthers
@garyaiki
OK, now I see Algebird's Field is replaced by Typelevel algebra Field.
Rémy-Christophe Schermesser
@ElPicador
I'm trying to use algebird from Java. I would like to use a numericAggregator on a standard java types: Long. But I couldn't manage to get a instance of Numeric[java.lang.Long]. Any idea?
Kai(luo) Wang
@kailuowang
@/all any concerns if we break cats.kernel bin compat in the upcoming 1.0.0 release?
for context, we have several open PRs that breaks it. typelevel/cats#1712 typelevel/cats#1527 typelevel/cats#1878
P. Oscar Boykin
@johnynek
generally this is bad news and very long compatibility was a goal with kernel.
@kailuowang
I'll look at them.
Kai(luo) Wang
@kailuowang
I created a ticket to track / facilitate discussion on this issue
typelevel/cats#1879
Usman Ijaz
@uijaz59_twitter

Hi,

I am new to these algorithms and curious about the differences between Sliding HyperLogLog(https://hal.archives-ouvertes.fr/hal-00465313/file/sliding_HyperLogLog.pdf) vs HyperLogLog Series. I want to create thousands of counters for providing sliding window based cardinality estimation. e.g. cardinality estimate for last 30 days, last 7 days and last 24 hours.

  • Will the hyperloglog series evict/forget the older data for example in my case data older then 30 days?
  • Does the size of hyperloglog series increase with time? For a 12 bit counter, what would be the minimum and maximum size?

I am trying to find the answers to these questions and it would be really helpful if I get a quick response.

Thanks.

Shumon Madzhumder
@shumn

Hey, I have a case class Thing(name: String). I need to "reduce" a Set[Thing] into a Set[Thing] where resultant set is the one with a max count of identical names. That is,

Set(Thing("Cory"), Thing("Cory"), Thing("Ahmad"), Thing("Kevin"), Thing("Kevin")) "reduces" to
Set(Thing("Cory"), Thing("Cory"), Thing("Kevin"), Thing("Kevin")).

How do I neatly put this into one of the structures defined in algebird?

An empty set won't reduce to anything in my case, so there is no identity element here. But, it is also not evident to me how it falls into a semigroup since the result of the reduction is "many" not "one". Max looked promising at first but I still don't see how to leverage it.
Shumon Madzhumder
@shumn
so it should reduce either to one thing. if counts are the same it will reduce to many. There are more fields in Thing not only name. I just made it simple within the context of this example.
Shumon Madzhumder
@shumn
change Set to List
P. Oscar Boykin
@johnynek
this seems related to a topk type problem
I'm not 100% sure what you want is actually associative... which means it may not be a semigroup/monoid
you can write a Fold, which is more general, but sorry I don't immediately see an answer to your problem
i think representing as Map[K, Long] where you keep track of the counts, might make it clearer
but pruning that is not associative.
we have cms with topk, but it is only approximately associative
Shumon Madzhumder
@shumn
Ok. I realize what I am asking may not be even correct. Just needed some validation of this fact. I can always do a groupBy just wondering if there is an abstract algaebraic construct for this.
P. Oscar Boykin
@johnynek
not that immediately comes to mind
it is related to a count min sketch
Shumon Madzhumder
@shumn
OK, I will look these up. i.e count min sketch and TopK
P. Oscar Boykin
@johnynek
:thumbsup:
Vaibhav Tulsyan
@xennygrimmato
Hello, I wish to contribute to algebird. Is there some beginner issue that needs to be fixed? I can start off with something small. I would appreciate some guidance from the maintainers of the project. Thanks! :)
P. Oscar Boykin
@johnynek
Vaibhav Tulsyan
@xennygrimmato
Thanks @johnynek. I'm assuming all these issues still require work.
Can you please give me some background on this issue - twitter/algebird#326
I'll try to understand what the issues with the test are.
P. Oscar Boykin
@johnynek
@xennygrimmato I think the link was to a different line when I made it, and now commits have changed what is on that line. I think I was pointing here: https://github.com/twitter/algebird/blob/develop/algebird-test/src/test/scala/com/twitter/algebird/HyperLogLogTest.scala#L57 basically, all the random HLL's we create have a single element in them. That is not great. Really, we should generate a list of longs, for instance, and add all of them to the HLL. That would be a better test. I expect the tests to still pass, but it is something we should think about: high quality and good coverage random generation.
Vaibhav Tulsyan
@xennygrimmato
@johnynek Ok, that makes sense. Let me think of a good test case for this then. I'll discuss my approach on the issue itself, would you prefer that? I can create a PR after that
Mateusz Fedoryszak
@matfed
Hello all! I'm looking for implicit class adding Scalding-style sumByKey/aggregateByKey to ordinary Scala collections. Is there such thing in Algebird? If not, I'd be happy to create pull request.
P. Oscar Boykin
@johnynek
@matfed there is this: https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/MapAlgebra.scala#L196 we could do with a syntax enrichment package probably. That would be nice to add.
Vaibhav Tulsyan
@xennygrimmato
@johnynek Is there some documentation for the SparseHLL case class? I want to understand what maxRhow represents here: https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/HyperLogLog.scala#L393
Specifically, I want to know the use of Max[Byte] there.