Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Dec 08 10:45
    lehins commented #122
  • Dec 08 10:45
    lehins closed #122
  • Dec 08 10:34
    sullyj3 edited #122
  • Dec 08 10:34
    sullyj3 opened #122
  • Nov 27 17:18
    lehins closed #119
  • Nov 27 12:18
    lehins synchronize #119
  • Nov 27 02:14
    lehins synchronize #119
  • Nov 26 22:27
    lehins opened #121
  • Nov 26 22:24
    lehins synchronize #119
  • Nov 26 22:18
    lehins edited #119
  • Nov 26 22:17
    lehins opened #120
  • Nov 26 21:55
    lehins opened #119
  • Aug 15 10:19
    lehins closed #118
  • Aug 15 00:07
    lehins synchronize #118
  • Aug 14 22:52
    lehins synchronize #118
  • Aug 14 14:15
    lehins synchronize #118
  • Aug 13 19:37
    lehins synchronize #118
  • Aug 07 19:50
    lehins synchronize #118
  • Aug 07 18:58
    lehins synchronize #118
  • Aug 07 14:04
    lehins synchronize #118
Man of Letters
@man_of_letters:mozilla.org
[m]
well, that's strange then, because I swear my hmatrix doesn't use many cores
(not openblas flag set)
though it probably uses SIMD, etc.
Alexey Kuleshevich
@lehins:matrix.org
[m]
From what I've seen online it is either controlled either by the env variable or som eruntime setting
Man of Letters
@man_of_letters:mozilla.org
[m]
oh, ok, again good to know
Alexey Kuleshevich
@lehins:matrix.org
[m]
Man of Letters
@man_of_letters:mozilla.org
[m]
I have an ancient Ubuntu, so probably the default flags are different
^^^ that link is openblas, though
we talking results without openblas now, right?
Alexey Kuleshevich
@lehins:matrix.org
[m]
Yeah, I don't really use any of that stuff aside from benchmarks for massiv. So I am no expert on openblas
Oh sorry, you are right

we are talking results without openblas now, right?

openblas or blas

Cause it seems hmatrix uses either or:

        if flag(openblas)
            if !flag(disable-default-paths)
                extra-lib-dirs:     /usr/lib/openblas/lib
            extra-libraries:    openblas
        else
            extra-libraries:    blas lapack
Man of Letters
@man_of_letters:mozilla.org
[m]
yes, either-or
I read in blas/lapack docs that they are strictly single core and my experience confirms it, but there must be other versions in new Ubuntus for which it doesn't apply any more or perhaps it stopped applying long ago, but now they have different default setting and run multicore
Alexey Kuleshevich
@lehins:matrix.org
[m]
Yeah, I can definitely confirm that without openblas flag in hmatrix it still uses all cores
Man of Letters
@man_of_letters:mozilla.org
[m]
ta
man_of_letters:mozilla.org @man_of_letters:mozilla.org fixes his docs
Alexey Kuleshevich
@lehins:matrix.org
[m]

I read in blas/lapack docs that they are strictly single core and my experience confirms it, but there must be other versions in new Ubuntus for which it doesn't apply any more or perhaps it stopped applying long ago, but now they have different default setting and run multicore

Ok, so this is exactly where I got my impression that massiv was much faster than hmatrix without an openblas flag on my older computer whre I used Ubuntu

I think I was comparing single core hmatrix to multicore massiv
back than I mean
Here is something I just learnt, adding more cores to Haskell RTS actually slows down blas:
$ stack bench :mult --ba '--match pattern Par +RTS -N1'
hmatrix-bench> benchmarks
Running 1 benchmarks...
Benchmark mult: RUNNING...
benchmarking HMatrix/MxM Double - (500x800 X 800x500)/Par
time                 1.254 ms   (1.124 ms .. 1.413 ms)
                     0.940 R²   (0.914 R² .. 0.992 R²)
mean                 1.238 ms   (1.178 ms .. 1.358 ms)
std dev              267.3 μs   (172.2 μs .. 436.6 μs)
variance introduced by outliers: 94% (severely inflated)

benchmarking HMatrix/MxM Float - (500x800 X 800x500)/Par
time                 718.8 μs   (673.9 μs .. 774.4 μs)
                     0.942 R²   (0.903 R² .. 0.978 R²)
mean                 772.8 μs   (710.1 μs .. 991.5 μs)
std dev              339.4 μs   (120.9 μs .. 675.6 μs)
variance introduced by outliers: 99% (severely inflated)

benchmarking Massiv/MxM P Double - (500x800 X 800x500)/Par
time                 62.91 ms   (62.26 ms .. 63.55 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 63.97 ms   (63.25 ms .. 66.65 ms)
std dev              2.304 ms   (455.6 μs .. 4.013 ms)

benchmarking Massiv/MxM P Float - (500x800 X 800x500)/Par
time                 79.19 ms   (78.63 ms .. 79.77 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 80.06 ms   (79.75 ms .. 80.38 ms)
std dev              557.9 μs   (372.2 μs .. 849.1 μs)
Man of Letters
@man_of_letters:mozilla.org
[m]
that's because the cores are busy and can't be used by blas?
Alexey Kuleshevich
@lehins:matrix.org
[m]
It definitely makes sense. It's just I didn't think about
Man of Letters
@man_of_letters:mozilla.org
[m]
benchmarks lie so hard ;D
Alexey Kuleshevich
@lehins:matrix.org
[m]

What it means is that I should be comparing this massiv's performance:

benchmarking Massiv/MxM P Double - (500x800 X 800x500)/Par
time                 6.887 ms   (6.749 ms .. 7.048 ms)
                     0.993 R²   (0.978 R² .. 0.999 R²)
mean                 7.009 ms   (6.888 ms .. 7.235 ms)
std dev              425.6 μs   (220.2 μs .. 638.2 μs)
variance introduced by outliers: 33% (moderately inflated)

To hmatrix as such:

benchmarking HMatrix/MxM Double - (500x800 X 800x500)/Par
time                 1.254 ms   (1.124 ms .. 1.413 ms)
                     0.940 R²   (0.914 R² .. 0.992 R²)
mean                 1.238 ms   (1.178 ms .. 1.358 ms)
std dev              267.3 μs   (172.2 μs .. 436.6 μs)
variance introduced by outliers: 94% (severely inflated)
Man of Letters
@man_of_letters:mozilla.org
[m]
you are only making things worse!
Alexey Kuleshevich
@lehins:matrix.org
[m]
which is very much in alignment with simd
Man of Letters
@man_of_letters:mozilla.org
[m]
so that's single-core RTS plus multicore blas/lapack vs multicore RTS?
Alexey Kuleshevich
@lehins:matrix.org
[m]
No it is compared with multicore RTS massiv:
benchmarking Massiv/MxM P Double - (500x800 X 800x500)/Par
time                 6.887 ms   (6.749 ms .. 7.048 ms)
                     0.993 R²   (0.978 R² .. 0.999 R²)
mean                 7.009 ms   (6.888 ms .. 7.235 ms)
std dev              425.6 μs   (220.2 μs .. 638.2 μs)
variance introduced by outliers: 33% (moderately inflated)
Man of Letters
@man_of_letters:mozilla.org
[m]
yes, that's what I meant
Alexey Kuleshevich
@lehins:matrix.org
[m]
multicore blas/lapack with multicore RTS
is this
benchmarking HMatrix/MxM Double - (500x800 X 800x500)/Par
time                 2.457 ms   (1.275 ms .. 4.701 ms)
                     0.276 R²   (0.256 R² .. 0.971 R²)
mean                 1.716 ms   (1.392 ms .. 3.427 ms)
std dev              1.666 ms   (311.3 μs .. 3.915 ms)
variance introduced by outliers: 98% (severely inflated)
Man of Letters
@man_of_letters:mozilla.org
[m]
got it
Kevin C
@dataopt
I ran into the following compilation error (macOS 12.5, ghc 8.10.7). Any help to resolve this greatly appreciated.
src/Data/Massiv/Array/Manifest/Unboxed.hs:147:33: error:
    • Couldn't match type ‘m’ with ‘ST (PrimState m)’
      ‘m’ is a rigid type variable bound by
        the type signature for:
          initialize :: forall ix (m :: * -> *).
                        (Index ix, PrimMonad m) =>
                        MArray (PrimState m) U ix e -> m ()
        at src/Data/Massiv/Array/Manifest/Unboxed.hs:147:3-12
      Expected type: m ()
        Actual type: ST (PrimState m) ()
    • In the expression: VGM.basicInitialize marr
      In an equation for ‘initialize’:
          initialize (MUArray _ marr) = VGM.basicInitialize marr
      In the instance declaration forManifest U e’
    • Relevant bindings include
        marr :: MVU.MVector (PrimState m) e
          (bound at src/Data/Massiv/Array/Manifest/Unboxed.hs:147:25)
        initialize :: MArray (PrimState m) U ix e -> m ()
          (bound at src/Data/Massiv/Array/Manifest/Unboxed.hs:147:3)
    |
147 |   initialize (MUArray _ marr) = VGM.basicInitialize marr
    |                                 ^^^^^^^^^^^^^^^^^^^^^^^^
cabal: Failed to build massiv-1.0.1.1 (which is required by exe:massiv-stuff
3 replies
Alexey Kuleshevich
@lehins
@dataopt do cabal update and it should work now. I'll explain a bit later why you've encountered this problem
1 reply
James Sully
@sullyj3
How do I get a Source from a DL? take B as an example. I'm struggling to find the correct conversion functions
James Sully
@sullyj3
Nevermind, found it, it was compute
James Sully
@sullyj3

Would I be able to get a code review on this solution to advent of code day 8 part 1?
https://github.com/sullyj3/adventofcode2022/blob/day8massiv/src/Day08.hs#L33

I feel like there are a bunch of things that could hopefully be improved, arising from unfamiliarity with the library

  • Is it possible to implement mapOuterSlices without converting to DL?
  • Is it possible to implement arrScan without imposing Manifest on the output?
  • More generally, is it possible to get better fusion?
  • Is there a better way to count trues than mapping to 1 or 0 and summing?
Alexey Kuleshevich
@lehins
I can give you a few pointers that should give significant speedups:
  • Switch representation B everywhere to U, since both Int and Bool have Unbox instance.
  • Add inline pragmas on all functions.
  • Switch flip evalState z $ traverseA ... to runST $ flip evalState z $ traversePrim ...

Is it possible to implement arrScan without imposing Manifest on the output?

No, and you don't really want that either. Monadic computation cannot be fused. Same problem appears in vector package.

Alexey Kuleshevich
@lehins

More generally, is it possible to get better fusion?

From my experience it is not always beneficial to fuse all computation. Sometime creating intermediate arrays actually speeds up computation. So, it is good to fuse whenever possible, but if array representations in massiv do not allow you to do something, that means very likely it is not possible. For example it is not possible to make mapOuterSlices return D representation without computing the DL into a manifest array. I have a whole talk on this topic: https://skillsmatter.com/skillscasts/17365-multi-dimensional-arrays-that-do-not-exist

James Sully
@sullyj3

Thanks very much for your help!

Monadic computation cannot be fused

I'm a little confused by what you mean by monadic here? It so happens that I implemented arrScan with a traverse using State (since I wasn't sure how else to do it) but surely it is morally pure? Isn't it analogous to eg scanl in Prelude? Shouldn't there be a non monadic way of writing it?

I have a whole talk on this topic

Sounds interesting! I will have a watch!

Alexey Kuleshevich
@lehins

Is there a better way to count trues than mapping to 1 or 0 and summing?

Depending on your workload it might be could be beneficial to do it sequentially with slength . sfilter id, but you'd need to benchmark that . I think what you are doing there is totally fine, since it will fuse all the computation by reading from the visibles 4 arrays

James Sully
@sullyj3
ok cool.
Alexey Kuleshevich
@lehins

implemented arrScan with a traverse using State

Yep, this is monadic and that's exactly what I meant.

Isn't it analogous to eg scanl in Prelude?

In prelude it is possible to do because we are doing with lists, so it is not really analogous.

James Sully
@sullyj3
I see. Ok, thank you!
Alexey Kuleshevich
@lehins

I wasn't sure how else to do it

It is possible to implement streaming scanl with flat vector, in fact the vector package has those implemented. I just never got around to adding it to massiv.

So, yeah it would be possible to do this. I'll send you later on today an implementation of scanl that would let you fuse this part a bit better using vector streams.

James Sully
@sullyj3
Oh, nice, cheers!