our goal here is really simple, just consume via source to help make internal testing easier.
thanks @johnynek we’ll share the doc
P. Oscar Boykin
@johnynek
that is an explanation of how we could setup for travis to publish every green develop build.
sounds good.
(also, we are discussing internally a new backend away from storm)
Piyush Narang
@piyushnarang
ah nice are you building a new one from scratch or something like flink?
P. Oscar Boykin
@johnynek
basically, you deploy N nodes pulling from a kafka queue and using compare-and-swap on the stores. Something very simple. No shuffling, etc...
our data rate is small enough...
Piyush Narang
@piyushnarang
yeah that makes sense
we had something similar in one of my teams at spotify
P. Oscar Boykin
@johnynek
we may go another route (this is not my team building it), such as flink or kafka streams, but there is a desire, it seems, from the team to do something simple from scratch.
Piyush Narang
@piyushnarang
ok makes sense
P. Oscar Boykin
@johnynek
@piyushnarang@ttimtwitter/summingbird#745
can I get a green light there?
Piyush Narang
@piyushnarang
@johnynek Timur is out on vacation this week so I think he might not take a look soon. I can try and take a look. I’m not super familiar with SB code so it would be nice to loop in Pankaj too. He’d probably have more detailed feedback
P. Oscar Boykin
@johnynek
okay. This is a bugfix, with more test coverage of code that is not actually exercised storm or scalding summingbird, so the risk is very minor.
(but no one knows this code but me.... so....)
Piyush Narang
@piyushnarang
ok cool, I’ll take a look today. Hadn’t checked out the review so wasn’t sure what it touched
haha self +1 ;-)
P. Oscar Boykin
@johnynek
ok. Thanks.
:)
Pankaj Gupta
@pankajroark
Does anyone know how to use OrderedSerialization with Summingbird? I believe adding RequiredBinaryComparator trait to BatchedStore should do it.
since we have a lot of need to iterate on this code internally, we have forked that and polished it quite a bit. I'll send PRs to scalding and summingbird to use this instead. We need this to optimize some giant graphs that we are generating before sending to summingbird (and also scalding).
Piyush Narang
@piyushnarang
@johnynek that’s pretty cool. Curious to hear more about why / how your graphs are so big
Pankaj Gupta
@pankajroark
FYI: The data loss issue we talked about in last meeting is not in Summingbird, it’s at a higher layer(Tsar). So no issues with Summingbird there.