Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Torsten Scholak
    @tscholak
    :+1:
    cool, thanks for the chat!
    Yves Parès
    @YPares
    you're welcome :)
    Yves Parès
    @YPares
    @/all Our discussion @tscholak had me thinking. I was wondering if you guys could be interested in a chan for pro/semi-pro haskell devs (or serious enthusiasts) to specifically talk about architecturing Haskell programs, and more specifically composing effects. So a scope broader than this gitter or dataHaskell, but narrower than r/haskell or #haskell on IRC. Also I think small scale should be better, so I think it should be on invitation first to see how is goes first.
    Could be on gitter, discord or slack. I envision the products of these discussions could be a wiki detailing some use cases, the patterns that were followed, some guidelines re what is the best effect lib to architecture you program in such specific case, etc.
    This would facilitate the discussion around conventions and interoperability between libs and toolkits.
    Stefan Dresselhaus
    @Drezil
    @YPares i like the idea. I was already suggesting developer-blogs etc. at our company to present/document what and how we (or better i -.-) do things.
    Torsten Scholak
    @tscholak
    @YPares sounds great!
    @gelisam gave a talk recently on different effect “patterns” in haskell
    Yves Parès
    @YPares
    Okay :) I opened a lobby at https://gitter.im/pure-fp-architects/community . I invited you two to the main room
    But the lobby is public, it's for people who want to manifest they are interested in the discussion. I shared the link on https://gitter.im/dataHaskell/Lobby
    Torsten Scholak
    @tscholak
    @YPares how difficult would it be to give porcupine distributed computing capabilities as provided by Dask, https://distributed.dask.org/en/latest/ ?
    Yves Parès
    @YPares
    @tscholak So there's already https://github.com/tweag/funflow/tree/master/funflow-jobs that exists, but I never used it with porcupine. I added a few days ago an entry about distributed computing to the FAQ in porcupine's README
    funflow-jobs is pretty rough though for the moment I think
    Torsten Scholak
    @tscholak
    I’m fighting with Dask at work since yesterday. there’s a lot that just doesn’t work, and you can’t reason about behaviour well enough to debug strategically
    3 replies
    Yves Parès
    @YPares
    what's the use case? (if discloseable)
    Torsten Scholak
    @tscholak
    a trivial map job, open a bunch files, do a transformation, save to files. embarrassingly parallel, currently also embarrassingly broken
    I want to be able to scale seamlessly between prototyping on a laptop and running it on our cluster
    everything written in python...
    configuration in yaml
    Yves Parès
    @YPares
    Problems arise when you try to distribute it? Or even in the one-node case?
    Torsten Scholak
    @tscholak
    even in the one thread case...
    Yves Parès
    @YPares
    Oh damn...
    Torsten Scholak
    @tscholak
    code behaves differently when wrapped into dask delayed computations
    and it’s hard to figure out why
    Yves Parès
    @YPares
    Why was Dask chosen over, say, Spark?
    Torsten Scholak
    @tscholak
    dask is perceived to be more leightweight and more “pythonic”
    also, it was my idea to give it a try...
    the current alternative is a bash script ;)
    a bash script that splits up the work, dispatches and launches jobs on the cluster
    Yves Parès
    @YPares
    @tscholak Yes, that's exactly what we had too at my client, some custom solution to distribute bash/docker commands. We switched to celery last year, but now we think we would have fared better with a simple job queue (like RQ) on top of rabbit or redis
    Tim Pierson
    @o1lo01ol1o
    It's worth mentioning transient again here. However, it's currently rough in terms of developer UX and the materials are not super accessible. (It's basically ContT over IO with semantics similar to ListT. However, composing all the streaming in a distributed setting is never trivial.) The best source of current information is probably via the author in the gitter: https://gitter.im/Transient-Transient-Universe-HPlay/Lobby. I know he's working on a new release.
    Yves Parès
    @YPares
    @o1lo01ol1o Will have a look :)
    Hi guys, I stumble upon a GHC bug which might bit you too. Happily, the workaround (once you have found it ^^) is easy:
    tweag/porcupine#75
    Torsten Scholak
    @tscholak
    @YPares is there an Ormolu room somewhere?
    Mark Karpov
    @mrkkrp
    @tscholak Hi, Ormolu main developer here. We don't have a dedicated room right now. Maybe we should create one!
    Torsten Scholak
    @tscholak
    :+1:
    Yves Parès
    @YPares
    @tscholak Sorry, I forgot to reply ^^ Thanks @mrkkrp
    Tim Pierson
    @o1lo01ol1o
    @YPares I'm looking at porcupine for a couple of use cases. Is is possible to control how source data files are provided to the PTask? Say my PTask is a foldM and I need to stream all the source files? Similarly, if the output files from the PTask are (effectfully) written at intervals from the fold, is there a reasonable way to interface with a datasink?
    In one particular case, I've been using streamly to do the concurrent transformations, is there a recommended way to purely pass those to a datasink or do I need to setup a fifo queue in STM?
    Torsten Scholak
    @tscholak
    bump, but with pipes instead of streamly
    Yves Parès
    @YPares
    Hi @o1lo01ol1o , by stream all the source files you mean have a Stream (Of FileContent) m ?
    Tim Pierson
    @o1lo01ol1o
    @YPares in one case, I have an IsStream s => s m (Bytestring) in another, I know that I have "files" that will need to be "streamed" :)
    Yves Parès
    @YPares
    If so, the simplest is to use loadDataStream with one VirtualFile, which be considered to be repeated. If you write-config-template you'll see that the path to your files by default includes a variable part ({index} for instance) where index is the LocVariable (just a String wrapper) you gave to loadDataStream
    writeDataStream does the same. It expects in input a Stream (Of (index, FileContent))
    Yves Parès
    @YPares
    @o1lo01ol1o You tell me if that fits your needs :) if not we'll see what we can do
    The stream of indices can come from whatever place you want (hardcoded in the source, read from the content of another file, a getOption call, etc.)
    But if you don't have an index notion, for now these high-level stream loaders/writers cannot help you. All the use cases we had so far dealt with indiced sources/sinks. It'd be doable, but you'd need to write a custom SerialsFor NoWrite (Stream (Of Stuff) m) to have a DataSource/Sink that directly outputs a stream
    Tim Pierson
    @o1lo01ol1o
    Ok. I'm still thinking about this and will have to play around more to see what's possible. Thanks @YPares
    Michel Kuhlmann
    @michelk
    I want to write a small application, which downloads periodically river gauging station data from different urls and store it in a TimeScaleDB. Would you recommend to use porcupine for that? Thanks for a short suggestion.
    Michał J. Gajda
    @mgajda
    @YPares Hi Yves, how are you doing?