Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • May 20 05:04

    ocramz on gh-pages

    Add `sampling` (compare)

  • May 19 09:03

    ocramz on gh-pages

    Add kdt, Supervised Learning se… (compare)

  • Apr 14 01:32
    tonyday567 removed as member
  • Jan 30 07:37

    ocramz on gh-pages

    Add arrayfire (compare)

  • Jan 02 12:51

    ocramz on gh-pages

    add inliterate (compare)

  • Jan 02 12:43

    ocramz on gh-pages

    update hvega entry (compare)

  • Jul 01 2019 09:43
    dmvianna added as member
  • Jun 15 2019 04:55

    ocramz on gh-pages

    Add pcg-random (compare)

  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz labeled #42
  • Jun 14 2019 16:08
    ocramz opened #42
  • Jun 14 2019 16:08
    ocramz opened #42
  • Jun 06 2019 18:21

    ocramz on gh-pages

    Fix graphite link Merge pull request #41 from alx… (compare)

  • Jun 06 2019 18:21
    ocramz closed #41
  • Jun 06 2019 18:21
    ocramz closed #41
  • Jun 06 2019 17:32
    alx741 opened #41
chessai
@chessai

Suppose a 2d array initialisation whereby each entry depends on the entries immediately above the current entry and to the left and above and to the left. That can naturally be computed in parallel by traversing along the diagonals. Example:

 1 2 3
 4 5 6
 7 8 9

You start with 1, then compute 4 and 2 in parallel then 7 5 3 in parallel then 8 6 in parallel then 9 in parallel

Im having trouble coming up with a way to do this with arrayfire/repa/massiv
Alexey Kuleshevich
@lehins
I had plans (still do) on creating a stencil that allows doing the waterfall parallelization. I read about it a paper about PASTHA, I believe. I didn't think people would need such functionality, so I didn't even rush on adding such thing, I guess I was wrong ;)
@chessai Doing so in a reusable (general multidimensional) fashion is certainly non trivial, but for a specific case like a 2D array it is not too hard to do with massiv, but not straightforward either. Let me think about it for a moment and I'll try to come up with an example for you.
Alexey Kuleshevich
@lehins
By the way, here is a paper that describes the method I want to implement (at least variation of it): https://dl.acm.org/citation.cfm?id=1708052&dl=ACM&coll=DL
Alexey Kuleshevich
@lehins
@chessai Here is almost something what you were looking for, in fact it is a bit better. Cause you don't want to schedule individual elements to be computed in parallel, but rather chunks of the diagonals. This won't matter for small arrays, but for bigger ones I suspect there will be difference. Anyways, here it is:
waterfallCreate ::
     (Mutable r Ix2 a, PrimMonad m, MonadUnliftIO m, MonadThrow m)
  => Sz2
  -> (Maybe a -> Maybe a -> a)
  -> (a -> a -> a)
  -> m (Array r Ix2 a)
waterfallCreate sz@(Sz2 m n) g f =
  createArray_ Par sz $ \scheduler marr -> do
    forM_ (0 ..: m) $ \i -> writeM marr (i :. 0) . g Nothing =<< A.read marr (i - 1 :. 0)
    --  ^ fill left edge
    forM_ (1 ..: n) $ \j -> do
      writeM marr (0 :. j) . (`g` Nothing) =<< A.read marr (0 :. j - 1)
      --  ^ fill the top edge
      let jIter = rangeStepSize Seq j (-1) (Sz (min (m - 1) j))
      iforSchedulerM_ scheduler jIter $ \i' -> writeF marr (i' + 1)
    forM_ (2 ..: m) $ \i -> do
      let jIter = rangeStepSize Seq (n - 1) (-1) (Sz (min (n - 1) (m - i)))
      iforSchedulerM_ scheduler jIter $ \i' -> writeF marr (i' + i)
  where
    writeF marr i j = do
      left <- readM marr (i :. j - 1)
      top <- readM marr (i - 1 :. j)
      writeM marr (i :. j) $ f left top
    {-# INLINE writeF #-}
{-# INLINE waterfallCreate #-}
Alexey Kuleshevich
@lehins
The reason for two functions g and f instead of one, because it will be more performant if you supply separate functions: one that produces elements for the borders and another one for the inside, this way function f always knows that it will get the neighbors it needs and doesn't have to check for that edge case. Here is a cool part, because you know that the whole array will be filled up, and since f and g are pure, we are guaranteed that the whole waterfallCreate is pure, so we can safely wrap it in unsafePerformIO. More over the interior read and write functions, are guaranteed not to be out of bounds, so once you are done with all the testing you can improve the performance even further by switching to unsafe* functions:
waterfallCreate ::
     Mutable r Ix2 a => Sz2 -> (Maybe a -> Maybe a -> a) -> (a -> a -> a) -> Array r Ix2 a
waterfallCreate sz@(Sz2 m n) g f =
  unsafePerformIO $
  createArray_ Par sz $ \scheduler marr -> do
    void $ write marr (0 :. 0) $ g Nothing Nothing
    forM_ (1 ..: m) $ \i ->
      unsafeWrite marr (i :. 0) . g Nothing . Just =<< unsafeRead marr (i - 1 :. 0)
    forM_ (1 ..: n) $ \j -> do
      unsafeWrite marr (0 :. j) . (`g` Nothing) . Just  =<< unsafeRead marr (0 :. j - 1)
      let jIter = rangeStepSize Seq j (-1) (Sz (min (m - 1) j))
      iforSchedulerM_ scheduler jIter $ \i' -> writeF marr (i' + 1)
    forM_ (2 ..: m) $ \i -> do
      let jIter = rangeStepSize Seq (n - 1) (-1) (Sz (min (n - 1) (m - i)))
      iforSchedulerM_ scheduler jIter $ \i' -> writeF marr (i' + i)
  where
    writeF marr i j = do
      left <- unsafeRead marr (i :. j - 1)
      top <- unsafeRead marr (i - 1 :. j)
      unsafeWrite marr (i :. j) $ f left top
    {-# INLINE writeF #-}
{-# INLINE waterfallCreate #-}
Couple examples won't hurt:
λ> g mx my = maybe (fromMaybe 0 my) (+1) mx
λ> waterfallCreate (Sz2 4 6) g (+) :: Array P Ix2 Int
Array P Par (Sz (4 :. 6))
  [ [ 0, 1, 2, 3, 4, 5 ]
  , [ 0, 1, 3, 6, 10, 15 ]
  , [ 0, 1, 4, 10, 20, 35 ]
  , [ 0, 1, 5, 15, 35, 70 ]
  ]
λ> waterfallCreate (Sz2 6 4) g (+) :: Array P Ix2 Int
Array P Par (Sz (6 :. 4))
  [ [ 0, 1, 2, 3 ]
  , [ 0, 1, 3, 6 ]
  , [ 0, 1, 4, 10 ]
  , [ 0, 1, 5, 15 ]
  , [ 0, 1, 6, 21 ]
  , [ 0, 1, 7, 28 ]
  ]
Alexey Kuleshevich
@lehins

To sum it up:

  • Not sure about arrayfire, but I kow for a certain you can't do that in repa
  • Small mistake in what you said, you can't compute 9 in parallel, since it depends on 8 and 6

    8 6 in parallel then 9 in parallel

  • For parallelization it is always better to schedule bigger tasks, rather than a bunch of small ones, that is why splitting loading of diagonals in chunks with iforSchedulerM_ is better. That being said, if you still feel like you want to load each element in parallel, you can easily do that by switching to manually doing scheduleWork_ per each element:
    iforM_ jIter $ \i' -> scheduleWork_ scheduler $ writeF marr (i' + 1)
  • Another optimization I want to do, once I get to actually adding that to massiv is try to perform loading in blocks, rather than strips of diagonals, just cause it is better for cache locality and will be faster, but that is a totally separate story ;)
chessai
@chessai
@lehins thanks! This is extremely helpful!
chessai
@chessai
Makes me excited about massiv
Yves Parès
@YPares
@chessai Given you know in advance the size of your array, it seems to me that'd me easier to do using classic lists and evaluation strategies, and then write the data to the 2D array
Alexey Kuleshevich
@lehins
@YPares If you are talking about things like pseq or parMap then I've never seen them being faster than massiv arrays. But most importantly, with lists you will likely end up dealing with boxed values, so you'll end up with terrible performance. I personally would advise against your suggested approach, but I'd love to see being proven wrong ;)
Yves Parès
@YPares
@lehins Oh no I'm not saying that'd be faster :) Probably yours will be, but if it's a matter of just initializing the array (and not processing an existing one) maybe this wouldn't have been much of a factor. Yeah, thinking about that I wouldn't have used pseq anyway, probably. More likely brutal IO loops with async to init in place the array, and that's close to what you proposed.
Alexey Kuleshevich
@lehins
Using async would be slower too, it would introduce much more overhead than the solution above. Using a single Scheduler for loading the full array we ensure that there are only as many threads as there are cores at any time and only two MVars will be created for the whole operation. While with async you spawn 3 threads and 1 MVar each time you call concurrently!
So if you were to load a 100x100 array you'd end up creating 3 threads per element which would result in staggering 30000 threads
green threads are cheap, but not that cheap ;)
Marco Z
@ocramz
@/all is anyone interested in taking over xeno( https://github.com/ocramz/xeno/ )? I don't have time to maintain it anymore and it needs some maintenance love.
@unhammer ?
r_mohan
@r_mohan_twitter
I've use some basic Haskell. Generally I use R for bayesian regression. What Haskell libraries should I use if I were to code it in Haskell ?
W Robert Long
@WRobertLong
@r_mohan_twitter take a look at the
@r_mohan_twitter take a look at the monad-bayes library
r_mohan
@r_mohan_twitter
Yves Parès
@YPares
@r_mohan_twitter Yep that's the one
MMesch
@MMesch

@r_mohan_twitter , we wrote a little intro about it here in case that you are interested:

https://www.tweag.io/posts/2019-09-20-monad-bayes-1.html
https://www.tweag.io/posts/2019-11-08-monad-bayes-2.html

otherwise these papers give a good overview of what it's doing:

https://dl.acm.org/citation.cfm?id=3236778
http://mlg.eng.cam.ac.uk/pub/pdf/SciGhaGor15.pdf

r_mohan
@r_mohan_twitter
Thanks. Specialized gitter channels are better than general google search :-)
MMesch
@MMesch
:+1: don't hesitate to ask questions about it
Austin Huang
@austinvhuang

on probabilistic programming - i'm curious what people w/ more of a PL / FP background think of "Functional Tensors for Probabilistic Programming" https://arxiv.org/pdf/1910.10775.pdf

(wouldn't recommend this for day-to-day practical modeling yet, this is still pretty research-y for now)

Marco Z
@ocramz
@austinvhuang I actually read it some days ago. I very much agree with treating quantities as terms of some abstract syntax rather than constants. This is similar to how "push arrays" are implemented, and in fact deferring computation as long as possible to perfom symbolic simplifications is a very good idea
btw are tou going to PROBPROG in April?
*you
Austin Huang
@austinvhuang
@ocramz I’m not sure yet, wasn’t planning too though. we should definitely meet up if you attend!
Doug Burke
@DougBurke
I'm not yet sure if I'll go to PROBPROG, but as I live just down the road I may be interested in a meetup outside the conference.
Samuel Schlesinger
@SamuelSchlesinger
Hey all, its been years since I've been in here. Anyone want a hand with anything?
Tony Day
@tonyday567
hey sam, check out numhask-array for where I got to with higher kinded numbers.
feedback would be nice
Guillaume Desforges
@GuillaumeDesforges

Hi, it's been a while (studies and stuff). I'm still motivated tho! I am currently looking at http://www.datahaskell.org/, so I have a few questions:

  • who maintains it ?
  • is it still updated ?

And more general questions : is there a general roadmap, a "place" where people here would like to take Data Haskell to ?

Sorry if those topics have been discussed many times already, but I believe that as time goes by it can be clearer and change

Marco Z
@ocramz
@GuillaumeDesforges I maintained the homepage and the docs for almost three years
as for the roadmap .. everyone has different ideas on how that should look like.
Samuel Schlesinger
@SamuelSchlesinger
looking now @tonyday567
Gregory Nwosu
@gregnwosu
I think I've just found my spiritual home
Marco Z
@ocramz
hi @gregnwosu !
Gregory Nwosu
@gregnwosu
:wave:
what advantages does monad-bayes have over more traditional probablistic programming library such as say pymc3
MMesch
@MMesch
I am just an occasional user but I find Monad Bayes quite comfortable to use. Here are three things that come to my mind at the moment, compared to pymc3:
  • it integrates with standard Haskell syntax, you can sample from standard datatypes, functions and use do notation to combine those operations. With pymc3 you have to deal with theano tensors etc ...
  • Haskell syntax make the code really concise. It really looks almost like what you would write with standard math notation in an article.
  • Monad Bayes provides an abstraction on top of different inference representations and you can build new ones out of these basic building blocks. For example, mcmc requires a different representation of a prob distribution (in terms of accumulated log likelihood of the samples) than sequential monte carlo, or inverse sampling (cumulative distribution function) or a particle filter. Checkout table 1 in https://pdfs.semanticscholar.org/76ad/0090bf4a076391fe2cc6d6029f79ebc66308.pdf . AFAIK in pymc3 you basically chose a configurable out-of-the box sampler and than run it.
MMesch
@MMesch
you can sample from standard datatypes, functions
Gregory Nwosu
@gregnwosu
oh cool, so in theory you can make any program probablistic?
MMesch
@MMesch
this might read misleading - it is actually quite easy to sample from whatever custom datatype with itas well. It thus integrates quite naturally with other libraries.