ocramz on gh-pages
Add `sampling` (compare)
ocramz on gh-pages
Add kdt, Supervised Learning se… (compare)
ocramz on gh-pages
Add arrayfire (compare)
ocramz on gh-pages
add inliterate (compare)
ocramz on gh-pages
update hvega entry (compare)
ocramz on gh-pages
Add pcg-random (compare)
ocramz on gh-pages
Fix graphite link Merge pull request #41 from alx… (compare)
Suppose a 2d array initialisation whereby each entry depends on the entries immediately above the current entry and to the left and above and to the left. That can naturally be computed in parallel by traversing along the diagonals. Example:
1 2 3
4 5 6
7 8 9
You start with 1, then compute 4 and 2 in parallel then 7 5 3 in parallel then 8 6 in parallel then 9 in parallel
waterfallCreate ::
(Mutable r Ix2 a, PrimMonad m, MonadUnliftIO m, MonadThrow m)
=> Sz2
-> (Maybe a -> Maybe a -> a)
-> (a -> a -> a)
-> m (Array r Ix2 a)
waterfallCreate sz@(Sz2 m n) g f =
createArray_ Par sz $ \scheduler marr -> do
forM_ (0 ..: m) $ \i -> writeM marr (i :. 0) . g Nothing =<< A.read marr (i - 1 :. 0)
-- ^ fill left edge
forM_ (1 ..: n) $ \j -> do
writeM marr (0 :. j) . (`g` Nothing) =<< A.read marr (0 :. j - 1)
-- ^ fill the top edge
let jIter = rangeStepSize Seq j (-1) (Sz (min (m - 1) j))
iforSchedulerM_ scheduler jIter $ \i' -> writeF marr (i' + 1)
forM_ (2 ..: m) $ \i -> do
let jIter = rangeStepSize Seq (n - 1) (-1) (Sz (min (n - 1) (m - i)))
iforSchedulerM_ scheduler jIter $ \i' -> writeF marr (i' + i)
where
writeF marr i j = do
left <- readM marr (i :. j - 1)
top <- readM marr (i - 1 :. j)
writeM marr (i :. j) $ f left top
{-# INLINE writeF #-}
{-# INLINE waterfallCreate #-}
g
and f
instead of one, because it will be more performant if you supply separate functions: one that produces elements for the borders and another one for the inside, this way function f
always knows that it will get the neighbors it needs and doesn't have to check for that edge case. Here is a cool part, because you know that the whole array will be filled up, and since f
and g
are pure, we are guaranteed that the whole waterfallCreate
is pure, so we can safely wrap it in unsafePerformIO
. More over the interior read and write functions, are guaranteed not to be out of bounds, so once you are done with all the testing you can improve the performance even further by switching to unsafe*
functions:
waterfallCreate ::
Mutable r Ix2 a => Sz2 -> (Maybe a -> Maybe a -> a) -> (a -> a -> a) -> Array r Ix2 a
waterfallCreate sz@(Sz2 m n) g f =
unsafePerformIO $
createArray_ Par sz $ \scheduler marr -> do
void $ write marr (0 :. 0) $ g Nothing Nothing
forM_ (1 ..: m) $ \i ->
unsafeWrite marr (i :. 0) . g Nothing . Just =<< unsafeRead marr (i - 1 :. 0)
forM_ (1 ..: n) $ \j -> do
unsafeWrite marr (0 :. j) . (`g` Nothing) . Just =<< unsafeRead marr (0 :. j - 1)
let jIter = rangeStepSize Seq j (-1) (Sz (min (m - 1) j))
iforSchedulerM_ scheduler jIter $ \i' -> writeF marr (i' + 1)
forM_ (2 ..: m) $ \i -> do
let jIter = rangeStepSize Seq (n - 1) (-1) (Sz (min (n - 1) (m - i)))
iforSchedulerM_ scheduler jIter $ \i' -> writeF marr (i' + i)
where
writeF marr i j = do
left <- unsafeRead marr (i :. j - 1)
top <- unsafeRead marr (i - 1 :. j)
unsafeWrite marr (i :. j) $ f left top
{-# INLINE writeF #-}
{-# INLINE waterfallCreate #-}
λ> g mx my = maybe (fromMaybe 0 my) (+1) mx
λ> waterfallCreate (Sz2 4 6) g (+) :: Array P Ix2 Int
Array P Par (Sz (4 :. 6))
[ [ 0, 1, 2, 3, 4, 5 ]
, [ 0, 1, 3, 6, 10, 15 ]
, [ 0, 1, 4, 10, 20, 35 ]
, [ 0, 1, 5, 15, 35, 70 ]
]
λ> waterfallCreate (Sz2 6 4) g (+) :: Array P Ix2 Int
Array P Par (Sz (6 :. 4))
[ [ 0, 1, 2, 3 ]
, [ 0, 1, 3, 6 ]
, [ 0, 1, 4, 10 ]
, [ 0, 1, 5, 15 ]
, [ 0, 1, 6, 21 ]
, [ 0, 1, 7, 28 ]
]
To sum it up:
arrayfire
, but I kow for a certain you can't do that in repa
8 6 in parallel then 9 in parallel
iforSchedulerM_
is better. That being said, if you still feel like you want to load each element in parallel, you can easily do that by switching to manually doing scheduleWork_
per each element:iforM_ jIter $ \i' -> scheduleWork_ scheduler $ writeF marr (i' + 1)
massiv
is try to perform loading in blocks, rather than strips of diagonals, just cause it is better for cache locality and will be faster, but that is a totally separate story ;)pseq
or parMap
then I've never seen them being faster than massiv arrays. But most importantly, with lists you will likely end up dealing with boxed values, so you'll end up with terrible performance. I personally would advise against your suggested approach, but I'd love to see being proven wrong ;)
pseq
anyway, probably. More likely brutal IO loops with async to init in place the array, and that's close to what you proposed.
async
would be slower too, it would introduce much more overhead than the solution above. Using a single Scheduler
for loading the full array we ensure that there are only as many threads as there are cores at any time and only two MVars
will be created for the whole operation. While with async
you spawn 3 threads and 1 MVar each time you call concurrently
!
xeno
( https://github.com/ocramz/xeno/ )? I don't have time to maintain it anymore and it needs some maintenance love.
@r_mohan_twitter , we wrote a little intro about it here in case that you are interested:
https://www.tweag.io/posts/2019-09-20-monad-bayes-1.html
https://www.tweag.io/posts/2019-11-08-monad-bayes-2.html
otherwise these papers give a good overview of what it's doing:
https://dl.acm.org/citation.cfm?id=3236778
http://mlg.eng.cam.ac.uk/pub/pdf/SciGhaGor15.pdf
on probabilistic programming - i'm curious what people w/ more of a PL / FP background think of "Functional Tensors for Probabilistic Programming" https://arxiv.org/pdf/1910.10775.pdf
(wouldn't recommend this for day-to-day practical modeling yet, this is still pretty research-y for now)