These are chat archives for petabridge/akka-bootcamp

5th
Jun 2018
Keith Emanuel
@KeithEmanuel
Jun 05 2018 03:47
Hi all! I'm wondering if Akka is a good fit for my use case and about best practices. I'm going through the tutorial right now. I'm working on a distributed ETL and scheduling system as a side project. Akka seems like a great fit.
Aaron Stannard
@Aaronontheweb
Jun 05 2018 03:47
ETL is definitely something that Akka.NET can do well
lets you model it as a streaming process rather than a batch one
which has a lot of benefits
Keith Emanuel
@KeithEmanuel
Jun 05 2018 03:48
My quesions are, how should files be passed between actors? Via messaging, or storing in a database or file system?
Aaron Stannard
@Aaronontheweb
Jun 05 2018 03:48
depends on how big they are
Akka.Remote, the network infrastructure that powers remote communication between actors in Akka.NET
Keith Emanuel
@KeithEmanuel
Jun 05 2018 03:48
and what about live replication of data between databases? Is that use case easy to implement?
Aaron Stannard
@Aaronontheweb
Jun 05 2018 03:49
1 connection between nodes is typically a single TCP connection
so you don't want to ram a multi-GB file down that pipe
since it kind of acts like a pig-in-a-python
so what folks tend to do instead, if they're using Akka.NET to steer a large ETL system
Keith Emanuel
@KeithEmanuel
Jun 05 2018 03:49
haha, multi gb is larger than I was thinking
Aaron Stannard
@Aaronontheweb
Jun 05 2018 03:50
i.e. we've had folks design stuff that runs a rendering pipeline for uncompressed, high res hollywood movies before
Keith Emanuel
@KeithEmanuel
Jun 05 2018 03:50
so, for a 50MB file, would you recommend the file in the message?
Aaron Stannard
@Aaronontheweb
Jun 05 2018 03:50
where the files you're working with start out in the hundreds of GB to TB range depending on stuff
nah, that's still a bit too big for the pipe
you'd either need to break up that file into a sequence of smaller messages
so that way Akka.NET can interleave messages for multiple ETL processes / actors running simultaneously
or do what I was hinting it, which is store the file into something like azure blob storage
and just pass around a uri to those files inside your Akka.NET actor messages
if you're working with some type of file that is easily divisble
Keith Emanuel
@KeithEmanuel
Jun 05 2018 03:52
that's what I was thinking
Aaron Stannard
@Aaronontheweb
Jun 05 2018 03:52
like a CSV or whatever
the first approach might work
but if you're dealing with big binary files
you'd probably be better off going with the other route
IMHO
Keith Emanuel
@KeithEmanuel
Jun 05 2018 03:52
Cool, thanks
Aaron Stannard
@Aaronontheweb
Jun 05 2018 03:52
as for replication and such
we actually have a module as part of Akka.Cluster that can help with that
Akka.Cluster.DistributedData
still in beta at the moment, but it uses CRDTs (special data structures that can be merged consistently) and a replicator
it's eventually consistent, but it's also really resilient against things like network partitions
in case you're interested in learning more about it
Keith Emanuel
@KeithEmanuel
Jun 05 2018 03:55
Awesome, I'll look into that.
Last question, and I might just need to finish the tutorial to find this one out, but what is the best way to store workflows (ex. for ETL) in a central repo and have the worker nodes load them up and start execution according to schedules?
Is that implemented in Akka, or is that something I would need to write?
Aaron Stannard
@Aaronontheweb
Jun 05 2018 03:57
storing the workflow as a DSL or something you can load from a database
is something you'd need to be write
but, we do have a handy bit of tech you can use to execute it
which is Akka.Streams
higher-level abstraction on top of Akka.NET actors
comes with a lot of built-in stages for doing stuff like aggregating, transforming, splitting outputs, etc
the WebCrawler example I built, that we use for teaching Akka.Cluster, actually uses it to run the performance-intensive parts: https://github.com/petabridge/akkadotnet-code-samples/blob/master/Cluster.WebCrawler/src/WebCrawler.Shared.IO/DownloadCoordinator.cs
it has a bit of a learning curve, but it's quite powerful and concise
without knowing more about your specific case I can't give you a 100% "you should totally use this" recommendation
but I'd definitely spend some time looking at it and decide for yourself though
Keith Emanuel
@KeithEmanuel
Jun 05 2018 04:00
Thanks, for the quick and excellent responses.
Bob
@crowcoder
Jun 05 2018 18:29
I'm trying to understand what happens to child actors. For instance, in lesson 1.4 "Add TailActor as a child of TailCoordinatorActor", inside OnReceive() there is : Context.ActorOf(Props.Create(() => new TailActor(msg.ReporterActor, msg.FilePath)));. Does this actor that is created need to be stopped or disposed somehow?
Aaron Stannard
@Aaronontheweb
Jun 05 2018 22:39
@crowcoder you can manually terminate it if you want
but when you kill the parent actor or shut down the actor system
that will automatically clean up all children
Akka.NET guarantees that all children actors are terminated before the parent completes shutdown
does that answer your question?