Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jul 29 2021 00:15
    masonedmison starred Spinoco/fs2-kafka
  • Mar 24 2021 04:23
    kevinalh starred Spinoco/fs2-kafka
  • Jan 28 2021 23:41
    mebubo starred Spinoco/fs2-kafka
  • Jan 17 2021 14:53
    PengLoveHONG starred Spinoco/fs2-kafka
  • Nov 07 2020 05:55
    nachocodexx starred Spinoco/fs2-kafka
  • May 08 2020 19:25
    ostronom closed #27
  • May 08 2020 19:18
    ostronom opened #27
  • Mar 14 2020 14:26

    AdamChlupacek on v0.4.0-M4

    (compare)

  • Mar 14 2020 14:26

    AdamChlupacek on 0.4-spinoco-compat

    Setting version to 0.4.0-M4 Setting version to 0.4.0-SNAPSH… (compare)

  • Mar 14 2020 13:34

    mraulim on 0.4-spinoco-compat

    Add timeout for publishing in k… Added trusty as a dist for runn… Fixed timeout on leader query. and 2 more (compare)

  • Mar 14 2020 13:34
    mraulim closed #26
  • Mar 14 2020 09:31
    AdamChlupacek synchronize #26
  • Mar 14 2020 09:31

    AdamChlupacek on publish-connection-timeout

    Add timeout for leaders stream. (compare)

  • Mar 13 2020 18:47
    AdamChlupacek synchronize #26
  • Mar 13 2020 18:47

    AdamChlupacek on publish-connection-timeout

    Fixed timeout on leader query. (compare)

  • Mar 13 2020 17:16
    AdamChlupacek synchronize #26
  • Mar 13 2020 17:16

    AdamChlupacek on publish-connection-timeout

    Added trusty as a dist for runn… (compare)

  • Mar 13 2020 17:13
    AdamChlupacek review_requested #26
  • Mar 13 2020 17:13
    AdamChlupacek opened #26
  • Mar 13 2020 17:07

    AdamChlupacek on publish-connection-timeout

    Add timeout for publishing in k… (compare)

Adam Chlupacek
@AdamChlupacek
In that case you can totally do this. Btw as far as I understand spark, its just some kind of concurrent data processing?
sken
@sken77
yep, thats basically what it is
Adam Chlupacek
@AdamChlupacek
If thats so, then fs2 can easily replace that as well (no library required as we have Stream.join), so I say you wont have SMACK anymore but more like MFCK :D
sken
@sken77
hahah we need to work on a catchy name
Kåre Blakstad
@kareblak
I've been thinking on implementing some sort of mapAccumulate over Kafka log compacted topics, to share the accumulated state among distributed nodes. That's Spark for ye, without alot of the fancyness that is
Adam Chlupacek
@AdamChlupacek
That is just basically doing mapAccumulate over the stream that you get from subscribing to a topic right? Or do you have something more complex in mind?
Kåre Blakstad
@kareblak
Writing the accumulator back to the log compacted topic and making sure it's being somewhat synced on the other nodes
changes to the accumulator being mirrored to kafka
I'm doing alot of distributed fs2 stuff, and recursive state sure is a bitch
For now i'm pipelining changes to redis and reading them out on topic rebalance
Adam Chlupacek
@AdamChlupacek
Well, if your function in MapAccumulate is pure, then you do not need to mirror the changes anywhere, since the ops should always result with the same state in the end.
you will only need to think about recover so that you have the same starting point.
Kåre Blakstad
@kareblak
Yes, thats right, given that the same nodes continue consuming from the same topics
Which is not the case on autoscaling, broken down nodes etc
If you have some sort of state in a mapAccumulate on node A, and A breaks down, you might want to offload the work of node A to node B, which will not have the same notion of state as node A
You could solve it with offset and topic assignment, but that would take some of the flex away
Adam Chlupacek
@AdamChlupacek
hmm, yeah, so how about backing up the state with some kind of persistent store, that gets written to every N messages from kafka, thus you can recover to the state at any point. At most you will have to read the N messages again to get to the state where A broke down. But since the function in mapAccumulate is pure, it does not matter that you will perform it multiple times?
Kåre Blakstad
@kareblak
f in mapAccum is pure, so no problem there
downstream, theres som impure stuff, but that shouldnt be any problem
the persistent store you're talking about could be compacted topics
which would make the whole model pretty nice
And also, N is never enough
becaause N is dependent on N-1 etc, so a snapshot of the actual accumulator, or at least the result from mapAccumulate is needed
Adam Chlupacek
@AdamChlupacek
yeah I guess. Anyhow you right that the compacted topics could be used for this.
Kåre Blakstad
@kareblak
I'm not using fs2-kafka atm, btw, so this would be a dirty impl on top of kafka-streams...
How do you guys usually store offsets with fs2-kafka? file system? db?
Adam Chlupacek
@AdamChlupacek
We do not. You have to manage them yourself. ie you could use the compacted topics for the offsets, that is actually how the native kafka client does it
Kåre Blakstad
@kareblak
Yeah, I know, but how do you usually do it?
I like the idea of handing this responsibility over to the user...
Adam Chlupacek
@AdamChlupacek
For some we use cassandra for some we use kafka. Depending on the needs
We should probably give an example how to use the persistent topics for it tho. So that people have easier time migrating.
Kåre Blakstad
@kareblak
Yeah, that would be nice, I guess
I love your Head/Tail offset apporach. It's just completely lacking in the kafka-clients
Adam Chlupacek
@AdamChlupacek
It all comes from allowing users to manage their own offsets. Plus few years of experience with using kafka in a large system :D
Kåre Blakstad
@kareblak
:thumbsup:
Andi Miller
@andimiller
have you considered adding the timestamp field in the producer?
KSQL gets a bit upset if I read data written by fs2-kafka
Adam Chlupacek
@AdamChlupacek
We are adhering to the kafka protocol, if you need some timestamps in the messages written to kafka, you can just add it there yourself. Or am I missing something? I dont want to add some additional data to everything just because of KSQL.
Andi Miller
@andimiller
there is part of the kafka protocol which has timestamps I believe
but it's optional
Adam Chlupacek
@AdamChlupacek
Oooh, I see what you mean, we are just not populating the value, but it is in protocols. all that is required is to expose it via API as of now we jsut set it to None all the me, you can PR it if you want.
Andi Miller
@andimiller
I may do if I can make it work :)
Adam Chlupacek
@AdamChlupacek
Hmm actually this may required more thought to it, now that i am reading the link :/
Andi Miller
@andimiller
I think it's a case of adding a new codec in protocols
because it bumps the magic byte
Pavel Chlupacek
@pchlupacek
@andimiller any idea what version of kafka introduced this ?
Adam Chlupacek
@AdamChlupacek
In that case yeah, a whole new codec is required
Pavel Chlupacek
@pchlupacek
if you can please, could you open issue? We’ll look onto that
Andi Miller
@andimiller
in theory 0.10.2.0