Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    Maciej Ciołek

    Hello all - it`s great to start doing it :smile: So final idea was summed up in issue comment, but I am posting it also here:

    Batch Producer
    It is needed to implement the producer, which will support batch writes of events. It can be done by reusing parts of existsing producer or with Akka Actor + TCP.

    Topic per journal
    The write journal will write all events to partitioned topic, wheres the partitioning will be done by hashing persistenceIds. This will ensure that we will keep events in write-order per partition.

    Recovery per persistenceId
    This can be done in two ways:

    • reading events from main topic partition and skipping not acceptable events - the recover will take longer time than creating eg. view per persistenceId
    • creating view per persistenceId - but with this approach, we will facing max number of topic in Kafka limits and requires to implement switch logic from reading this view into main topic
      Hmm, maybe it can be configureable - user will choose which strategy is suitable for him regarding Kafka limitations. Recovery from main topic could also be connected with storing snapshots, which will shorten the time of recovery.

    Read per persistenceId
    Same solutions as described above

    Read allPersistneceIds
    The topic (view) should be created which will be feeded by background job. The job will read all events from main topics and push new persistenceIds if sequenceNr of event is equal to 0.

    Read byEventType
    The topic (view) should be created per eventType which will be feeded by background job. The job will read all events from main topic partitions and copying the tagged events. With this approach we can ensure, that eventsByType view will contain events totally ordered by partition (almost persistenceId), but it will not ensure that we have total order of events between partitions.

    The background job mentioned above should start on each node, which will request any of query stream, eg. allPersistenceIdQuery or allEventsByTagQuery. It also can be configurable to not start this job, if user do not want to. Those jobs will use default Java consumer and will belong to same groupId which ensures that we will not read same event on more time.

    Maciej Ciołek

    Here https://gist.github.com/maciekciolek/c645e28cc708b26177eb1444d077d09a is the snippet of code, which allows produce batch of events per single partition.

    From my experience current Java client is not an extensible piece of code :worried: I am not certain if we should base our producer on their code, mayby we can use only message formats and implement producer with Akka TCP (http://doc.akka.io/docs/akka/2.4.7/scala/io-tcp.html).

    @Fristi as I remember you proposed to share Kafka`s protocol defined int scodec?
    @all what do you think about idea of implementing batch producer for Kafka with Akka TCP?

    I took your code and played around with @maciekciolek and I agree it is not really idiomatic Scala code. It would be easier to use scodec for this instead. You will be able to translate case classes into bytes and put them onto the wire. As opposed to working with lot's of side-effecting functions, threads, polling, etc seems to a bit brittle to me.
    I have some code which serializes a Produce request to kafka, but it doesn't take any node selection, etc into account. It's based on scodec.
    Someone else also did this afew months back see: https://github.com/ataraxer/vitava
    Martin Zapletal
    A short design document for the Background task implementation. Any feedback please let me know https://docs.google.com/document/d/1omM9dgB6XBNggTLaV-xtUH8msNYBAevq5tyM1kIVsq0/edit?usp=sharing
    Gabriel Giussi

    Hello everyone, I want to ask you what will be a good usage of kafka as storage backend for eventuate? You have a use case already?
    I was interested in Kafka because it can work as an Event Bus, allowing me to persist an event and publish it to the event bus in an atomic operation. So Kafka would be acting as the storage backend for my eventuate-based microservices and the mechanism to communicate the different microservices as well.
    However, I kept reading eventuate documentation until I finally realize that Eventuate is Akka on steroids: it not only allows AP, it also gives you the event bus with a lot of features.

    These actors can be considered as event-driven and event-sourced microservices, collaborating on a causally ordered event stream in a reliable and partition-tolerant way. Furthermore, when partitioned, they remain available for local writes and automatically catch up with their collaborators when the partition heals.

    The only constraint is that all our microservices must be eventuate-based (this doesn't mean that I must take the DDD+Actor approach), so I think that if we want to communicate a bunch of polyglot microservices, it might be a good use case for Kafka as storage backend.

    In my case I think I will try with all my microservices with Eventuate and Cassandra is good enough because I don't have any requirement for the storage backend beyond fault-tolerance and high availability.

    Martin Krasser
    @zapletal-martin Thanks for writing the document, you can find my comments in the document's chat
    @gabrielgiussi maybe https://gitter.im/RBMHTechnology/eventuate is a better place to discuss Kafka use cases for Eventuate.
    FYI, I am currently writing a pure scala kafka driver. Codecs for the protocol is basically done. Currently working at the producer (which fetches broker metadata and routes produce requests to the right broker). See the PR for more info. Still lacking a few things, like error handling, bootstrap servers and a good test test up (testing against a cluster, multiple kafka versions, etc). Want to help? There's plenty of stuff to do, repo can be found at: https://github.com/vectos/scala-kafka
    Martin Zapletal
    I now have a working PoC of the Background task. I was able to verify the assumptions I posted in the design document earlier and use Kafka's group rebalance to avoid duplicates in the resulting views after a failure after publishing, but not committing. It is a very general solution and will require lots of refactoring and tests, but proves the concept, https://github.com/zapletal-martin/distributed-causal-stream-processing
    Alexander Lomov
    Hello all! Is it a valid assumption that the work on akka-persistence-kafka is discontinued?
    Maciej Ciołek
    Hello, it seems that right now there is no current work over the akka-persistence-kafka, but I have some plans and ideas to start it within near future. Are you interested in using this journal?
    Alexander Lomov
    Hi Maciej, I'm currently evaluating if the journal is usable for the project I'm involved in, it's a proof of concept. So I'm trying to make it work with the most recent Scala 2.11, Akka 2.4 and Kafka 0.10. I'll gladly contribute for the plugin project if it works for me.
    There is another implementation now also: https://github.com/evolution-gaming/kafka-journal