Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Raymond Roestenburg
    @RayRoestenburg
    Oh that sounds great!
    yes it’s basically our goal to make distributed applications easy on Kubernetes, would be great to see where we need to improve, to make it easier for many cases.
    kerr
    @hepin1989
    Yes, we processed 1 billion messages in ~10 minutes:)
    and our telemetering part is built with Akka too
    another question is does cloudflow support deploy a group of pipelines within a single runtime?
    By the way; your book is very great.
    Raymond Roestenburg
    @RayRoestenburg
    @hepin1989 Thanks, great that you like the book!
    within a single runtime, you mean one JVM?
    kerr
    @hepin1989
    image.png
    Yes, some complex biz logic may be implemented with multiple pipeline stages/streamlet here, or the one should decide how big his streamlet is? currently ,we have ~10-20 stages in one of our application.
    Raymond Roestenburg
    @RayRoestenburg
    We’ve decided that it is better to run streamlets in pods (1 streamlet, 1 pod). It’s a great simplification, especially for scaling. The overhead of the JVM is something that can be tuned over time, and will improve over time (newer JDKs, graal etc). You can very easily 'do more work' in one streamlet, if that is required.
    You could have a streamlet with a large Akka Streams graph for instance, or lots of Spark or Flink code.
    kerr
    @hepin1989
    Thanks, I see.
    Raymond Roestenburg
    @RayRoestenburg
    if you decide to separate the logic in many streamlets, you have more options for independent scaling
    It does mean of course that you will use more resources, so it is a design trade-off that you can make.
    With 1 billion in 10 minutes, you probably don’t mind that it uses a bit more resources ;-)
    kerr
    @hepin1989
    @RayRoestenburg But resources is part of the KPI now:(
    Raymond Roestenburg
    @RayRoestenburg
    I can imagine! yeah, it is a design choice, that you can iterate on.
    kerr
    @hepin1989
    Another question ,have cloudflow have streamlet to streamlet backpressure support?
    Seems like I need to go with akkastreamlet to have it.
    Raymond Roestenburg
    @RayRoestenburg
    The (kafka) producers and consumers internally do backpressure (when using akka streamlet), but it is not the same kind of end-to-end back pressure as in a directly connected akka streams graph. (The producers and consumers can both work at independent speeds). We’re thinking of other connections between streamlets, which might provide more direct end to end backpressure, if that is needed.
    The akka streamlet will not overload the kafka producer, and it will not consumer faster than it can do it’s work.
    So from that perspective, the goal of back pressure is achieved.
    It is of course possible to load more data into kafka than the retention size supports, so you do need to monitor consumer lag
    kerr
    @hepin1989
    we are currently using https://github.com/apache/rocketmq ; internally it's named metaq; but it's currenlty not supporting backpressure.
    and yes, latency
    thanks for your deep inspection.
    Raymond Roestenburg
    @RayRoestenburg
    no problem.
    kerr
    @hepin1989
    @RayRoestenburg I think I can submit a new PR later on, I have updated the PR.
    type member seem not that good too
    kali786516
    @kali786516
    just double checking cloudflow is not drag and drop tool for building pipelines like snaplogic am I right ?
    Craig Blitz
    @cmblitz
    @kali786516 No, Cloudflow is oriented to the Java/Scala developer who wants to be able to focus on business logic of end-to-end streaming data pipelines, and have the underlying platform handle the deployment and operational burden. We enable the developer to access the full power of the underlying processing platforms (Spark, Flink, Akka Streams,...). In the future, I can imagine us offering a drag-and-drop to wire components together, but we don't offer that today. Instead, we offer wiring together reusable components via simple flat file.
    kali786516
    @kali786516
    @cmblitz thanks Craig that explains to me ...
    kali786516
    @kali786516
    can I execute cloud flow pipelines in Cloudera stack lets say if I have huge on-prem Cloudera cluster can I use existing cluster or I have to insall or build kubernetes cluster on Cloudera cluster stack?
    also does cloud flow has custom sources lets say solace as source
    Craig Blitz
    @cmblitz
    @kali786516 Cloudflow is native to Kubernetes. We don't work against an existing YARN cluster or Spark cluster, but rather spin up our own jobs for Spark or Flink. Anything that is external to Cloudflow can be treated as a source. IIRC, Solace as a Kafka-compatible API which would be easy to integrate with Cloudflow. Internally, Cloudflow uses its own Kafka cluster to manage communications between processing stages, though we can also leverage external Kafka clusters for this purpose.
    Craig Blitz
    @cmblitz
    @kali786516 our integration technology is called Alpakka, built on Akka Streams, which provides fully-reactive, streams oriented connections to eternal sources and sinks. We have a kafka connector, among many others, and a vibrant open source community to create new connectors.
    Anil Kumar
    @akandach
    I am new to cloudflow. Started with samples. I am getting the below error while running sensor-data-java
    [info] Loading settings for project sensor-data-scala-build from cloudflow-plugins.sbt,plugins.sbt ...
    [info] Loading project definition from C:\Softwares\java akka\cloudflow-master\examples\sensor-data-scala\project
    [info] Loading settings for project sensorData from build.sbt ...
    [info] Set current project to sensor-data-scala (in build file:/C:/Softwares/java%20akka/cloudflow-master/examples/sensor-data-scala/)
    fatal: not a git repository (or any of the parent directories): .git
    [info] Streamlet 'sensordata.SensorDataFileIngress' found
    [info] Streamlet 'sensordata.SensorDataStreamingIngress' found
    [info] Streamlet 'sensordata.SensorDataToMetrics' found
    [info] Streamlet 'sensordata.MetricsValidation' found
    [info] Streamlet 'sensordata.RotorSpeedFilter' found
    [info] Streamlet 'sensordata.RotorspeedWindowLogger' found
    [info] Streamlet 'sensordata.SensorDataHttpIngress' found
    [info] Streamlet 'sensordata.SensorDataMerge' found
    [info] Streamlet 'sensordata.ValidMetricLogger' found
    [info] Streamlet 'sensordata.InvalidMetricLogger' found
    [error] java.lang.RuntimeException: The current project is not a valid Git project.
    [error] at scala.sys.package$.error(package.scala:30)
    [error] at cloudflow.sbt.BuildNumberPlugin$.generateBuildNumber(BuildNumberPlugin.scala:45)
    [error] at cloudflow.sbt.BuildNumberPlugin$.$anonfun$projectSettings$1(BuildNumberPlugin.scala:30)
    [error] at sbt.std.Transform$$anon$3.$anonfun$apply$2(Transform.scala:46)
    Gerard Maas
    @maasg
    Hi @akandach, welcome to our channel and thanks for using Cloudflow!
    Looking at the stack trace, the key is this line:
    fatal: not a git repository (or any of the parent directories): .git

    Looks like your example is running in an non-git folder. The quickest solution would be to do a

    git init

    and commit a file to have a HEAD on the master branch

    Our sbt plugins rely on git to create reproducible hashes of the artifacts being built.
    Anil Kumar
    @akandach
    Thanks @maasg . will try with git init
    @maasg do we have samples for java with maven?
    Gerard Maas
    @maasg
    @akandach we offer Java APIs but don't support maven at the moment. The sbt plugin system does a lot of the heavy lifting.
    Craig Blitz
    @cmblitz
    Hi all, we just released Cloudflow 1.3 as part of Lightbend Platform. With a subscription to Lightbend Platform, you get access to our operator-based installed, the UI, and other operational goodies. Documentation available at https://developer.lightbend.com/docs/cloudflow/current/.
    Anil Kumar
    @akandach
    can some one help me how to call make the rest call from AkkaStreamlet? Any sample ?
    Age Mooij
    @agemooij

    Hi @akandach . An Akka streams based streamlet is basically just an Akka streams graph, so the right docs to read would be the Akka streams docs.

    For example, you could use a `.mapAsync(...) and use Akka Http (client) to perform a REST call and then flatten the resulting async response back into the stream. There's a simp[lified example here: https://akka.io/alpakka-samples/http-csv-to-kafka/step1.html

    Anil Kumar
    @akandach
    thanks @agemooij