Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Pankaj Gupta
@pankajroark
our goal here is really simple, just consume via source to help make internal testing easier.
thanks @johnynek we’ll share the doc
P. Oscar Boykin
@johnynek
that is an explanation of how we could setup for travis to publish every green develop build.
sounds good.
(also, we are discussing internally a new backend away from storm)
Piyush Narang
@piyushnarang
ah nice are you building a new one from scratch or something like flink?
P. Oscar Boykin
@johnynek
basically, you deploy N nodes pulling from a kafka queue and using compare-and-swap on the stores. Something very simple. No shuffling, etc...
our data rate is small enough...
Piyush Narang
@piyushnarang
yeah that makes sense
we had something similar in one of my teams at spotify
P. Oscar Boykin
@johnynek
we may go another route (this is not my team building it), such as flink or kafka streams, but there is a desire, it seems, from the team to do something simple from scratch.
Piyush Narang
@piyushnarang
ok makes sense
P. Oscar Boykin
@johnynek
@piyushnarang @ttim twitter/summingbird#745
can I get a green light there?
Piyush Narang
@piyushnarang
@johnynek Timur is out on vacation this week so I think he might not take a look soon. I can try and take a look. I’m not super familiar with SB code so it would be nice to loop in Pankaj too. He’d probably have more detailed feedback
P. Oscar Boykin
@johnynek
okay. This is a bugfix, with more test coverage of code that is not actually exercised storm or scalding summingbird, so the risk is very minor.
(but no one knows this code but me.... so....)
Piyush Narang
@piyushnarang
ok cool, I’ll take a look today. Hadn’t checked out the review so wasn’t sure what it touched
haha self +1 ;-)
P. Oscar Boykin
@johnynek
ok. Thanks.
:)
Pankaj Gupta
@pankajroark
Does anyone know how to use OrderedSerialization with Summingbird? I believe adding RequiredBinaryComparator trait to BatchedStore should do it.
P. Oscar Boykin
@johnynek
I don't think that will. That trait is for job
you can set the configuration flag.
Pankaj Gupta
@pankajroark
I see, is adding that flag all that’s needed?
P. Oscar Boykin
@johnynek
that makes the scalding job fail if any Ordering is not an OrderedSerialization.
the user still has to import the scroogeOrdSer method:
Pankaj Gupta
@pankajroark
I see, so user needs to import scroogeOrdSer methods directly in the BatchedStore code.
That’s where Ordering seems to be used in Summingbird scalding platform
and in service
Pankaj Gupta
@pankajroark
gotcha thanks
P. Oscar Boykin
@johnynek
I think the BatchedStore is constructed with VersionedStore usually, that gets the Ordering[K] from where it is instantiated
Pankaj Gupta
@pankajroark
I see
P. Oscar Boykin
@johnynek
so, where they create the stores, that's where they need to use the macro
Pankaj Gupta
@pankajroark
makes sense, thanks a lot
P. Oscar Boykin
@johnynek
good luck
Pankaj Gupta
@pankajroark
thanks, OrderedSerialization is really cool, trying to sell it to summingbird users :)
P. Oscar Boykin
@johnynek
yeah, it makes a pretty huge difference
we built a library for feature engineering on top of summingbird that requires it
Pankaj Gupta
@pankajroark
I see
P. Oscar Boykin
@johnynek
that way by the time we pass to summingbird we only have OrderedSerialization
Pankaj Gupta
@pankajroark
I see, makes sense
P. Oscar Boykin
@johnynek
heads up on the dag optimization stuff: https://github.com/stripe/dagon
since we have a lot of need to iterate on this code internally, we have forked that and polished it quite a bit. I'll send PRs to scalding and summingbird to use this instead. We need this to optimize some giant graphs that we are generating before sending to summingbird (and also scalding).
Piyush Narang
@piyushnarang
@johnynek that’s pretty cool. Curious to hear more about why / how your graphs are so big
Pankaj Gupta
@pankajroark
FYI: The data loss issue we talked about in last meeting is not in Summingbird, it’s at a higher layer(Tsar). So no issues with Summingbird there.
P. Oscar Boykin
@johnynek
Thanks for following up.