Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Nov 27 17:18
    lehins closed #119
  • Nov 27 12:18
    lehins synchronize #119
  • Nov 27 02:14
    lehins synchronize #119
  • Nov 26 22:27
    lehins opened #121
  • Nov 26 22:24
    lehins synchronize #119
  • Nov 26 22:18
    lehins edited #119
  • Nov 26 22:17
    lehins opened #120
  • Nov 26 21:55
    lehins opened #119
  • Aug 15 10:19
    lehins closed #118
  • Aug 15 00:07
    lehins synchronize #118
  • Aug 14 22:52
    lehins synchronize #118
  • Aug 14 14:15
    lehins synchronize #118
  • Aug 13 19:37
    lehins synchronize #118
  • Aug 07 19:50
    lehins synchronize #118
  • Aug 07 18:58
    lehins synchronize #118
  • Aug 07 14:04
    lehins synchronize #118
  • Aug 07 10:33
    lehins synchronize #118
  • Jul 11 01:00
    lehins synchronize #118
  • Jul 10 21:41
    lehins synchronize #118
  • Jul 05 00:04
    lehins synchronize #118
Man of Letters
@man_of_letters:mozilla.org
[m]
haha
Alexey Kuleshevich
@lehins:matrix.org
[m]
Orders of magnitude
Man of Letters
@man_of_letters:mozilla.org
[m]
TIL
Alexey Kuleshevich
@lehins:matrix.org
[m]
I take it back. Compiling without an openblas makes hmatrix just a little bit slower. I don't remember how I got "orders of magnitude" difference, but in my defence it has been a while since I looked at it:
benchmarking HMatrix/MxM Double - (500x800 X 800x500)/Par
time                 9.140 ms   (4.980 ms .. 12.18 ms)
                     0.527 R²   (0.349 R² .. 0.651 R²)
mean                 4.794 ms   (2.775 ms .. 6.977 ms)
std dev              5.082 ms   (3.646 ms .. 5.969 ms)
variance introduced by outliers: 98% (severely inflated)

benchmarking HMatrix/MxM Float - (500x800 X 800x500)/Par
time                 6.678 ms   (4.953 ms .. 8.027 ms)
                     0.743 R²   (0.519 R² .. 0.907 R²)
mean                 6.208 ms   (5.417 ms .. 6.890 ms)
std dev              1.838 ms   (1.391 ms .. 2.718 ms)
variance introduced by outliers: 94% (severely inflated)

benchmarking Massiv/MxM P Double - (500x800 X 800x500)/Par
time                 6.874 ms   (6.559 ms .. 7.147 ms)
                     0.991 R²   (0.987 R² .. 0.995 R²)
mean                 7.485 ms   (7.157 ms .. 8.063 ms)
std dev              1.246 ms   (789.7 μs .. 1.867 ms)
variance introduced by outliers: 79% (severely inflated)

benchmarking Massiv/MxM P Float - (500x800 X 800x500)/Par
time                 6.783 ms   (6.671 ms .. 6.896 ms)
                     0.997 R²   (0.993 R² .. 0.999 R²)
mean                 6.832 ms   (6.766 ms .. 6.933 ms)
std dev              238.9 μs   (168.5 μs .. 365.5 μs)
variance introduced by outliers: 16% (moderately inflated)
Man of Letters
@man_of_letters:mozilla.org
[m]
almost 3 times slower, but you said previously it was on how many cores? 16? and this one is on a single core, I presume? a bit strange...
Alexey Kuleshevich
@lehins:matrix.org
[m]
No, during benchmark it used all cores too.
Man of Letters
@man_of_letters:mozilla.org
[m]
OTOH, "variance introduced by outliers: 98% (severely inflated)" makes it very suspect
Alexey Kuleshevich
@lehins:matrix.org
[m]

98% (severely inflated)

taht's normal for multi core benchmarks

Man of Letters
@man_of_letters:mozilla.org
[m]
oh, ok
Alexey Kuleshevich
@lehins:matrix.org
[m]
There is always a lot of noise
Man of Letters
@man_of_letters:mozilla.org
[m]
thanks again for the measurements
I think I was confused about multicore when run not with openblas, probably because I'm running with -N1 (actually, even without -threaded) --- that may be why it's single core for me
Alexey Kuleshevich
@lehins:matrix.org
[m]
My pleasure. I just got this computer so it is really fun for me to see how much faster all of the benchmarks have gotten 😀
Man of Letters
@man_of_letters:mozilla.org
[m]
while, presumably, openblas can parallelize even with -N1 (not tested)
Alexey Kuleshevich
@lehins:matrix.org
[m]
In the matter of fact hmatrix doesn't care about RTS flags like -N since it uses parallelization on C side
Man of Letters
@man_of_letters:mozilla.org
[m]
well, that's strange then, because I swear my hmatrix doesn't use many cores
(not openblas flag set)
though it probably uses SIMD, etc.
Alexey Kuleshevich
@lehins:matrix.org
[m]
From what I've seen online it is either controlled either by the env variable or som eruntime setting
Man of Letters
@man_of_letters:mozilla.org
[m]
oh, ok, again good to know
Alexey Kuleshevich
@lehins:matrix.org
[m]
Man of Letters
@man_of_letters:mozilla.org
[m]
I have an ancient Ubuntu, so probably the default flags are different
^^^ that link is openblas, though
we talking results without openblas now, right?
Alexey Kuleshevich
@lehins:matrix.org
[m]
Yeah, I don't really use any of that stuff aside from benchmarks for massiv. So I am no expert on openblas
Oh sorry, you are right

we are talking results without openblas now, right?

openblas or blas

Cause it seems hmatrix uses either or:

        if flag(openblas)
            if !flag(disable-default-paths)
                extra-lib-dirs:     /usr/lib/openblas/lib
            extra-libraries:    openblas
        else
            extra-libraries:    blas lapack
Man of Letters
@man_of_letters:mozilla.org
[m]
yes, either-or
I read in blas/lapack docs that they are strictly single core and my experience confirms it, but there must be other versions in new Ubuntus for which it doesn't apply any more or perhaps it stopped applying long ago, but now they have different default setting and run multicore
Alexey Kuleshevich
@lehins:matrix.org
[m]
Yeah, I can definitely confirm that without openblas flag in hmatrix it still uses all cores
Man of Letters
@man_of_letters:mozilla.org
[m]
ta
man_of_letters:mozilla.org @man_of_letters:mozilla.org fixes his docs
Alexey Kuleshevich
@lehins:matrix.org
[m]

I read in blas/lapack docs that they are strictly single core and my experience confirms it, but there must be other versions in new Ubuntus for which it doesn't apply any more or perhaps it stopped applying long ago, but now they have different default setting and run multicore

Ok, so this is exactly where I got my impression that massiv was much faster than hmatrix without an openblas flag on my older computer whre I used Ubuntu

I think I was comparing single core hmatrix to multicore massiv
back than I mean
Here is something I just learnt, adding more cores to Haskell RTS actually slows down blas:
$ stack bench :mult --ba '--match pattern Par +RTS -N1'
hmatrix-bench> benchmarks
Running 1 benchmarks...
Benchmark mult: RUNNING...
benchmarking HMatrix/MxM Double - (500x800 X 800x500)/Par
time                 1.254 ms   (1.124 ms .. 1.413 ms)
                     0.940 R²   (0.914 R² .. 0.992 R²)
mean                 1.238 ms   (1.178 ms .. 1.358 ms)
std dev              267.3 μs   (172.2 μs .. 436.6 μs)
variance introduced by outliers: 94% (severely inflated)

benchmarking HMatrix/MxM Float - (500x800 X 800x500)/Par
time                 718.8 μs   (673.9 μs .. 774.4 μs)
                     0.942 R²   (0.903 R² .. 0.978 R²)
mean                 772.8 μs   (710.1 μs .. 991.5 μs)
std dev              339.4 μs   (120.9 μs .. 675.6 μs)
variance introduced by outliers: 99% (severely inflated)

benchmarking Massiv/MxM P Double - (500x800 X 800x500)/Par
time                 62.91 ms   (62.26 ms .. 63.55 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 63.97 ms   (63.25 ms .. 66.65 ms)
std dev              2.304 ms   (455.6 μs .. 4.013 ms)

benchmarking Massiv/MxM P Float - (500x800 X 800x500)/Par
time                 79.19 ms   (78.63 ms .. 79.77 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 80.06 ms   (79.75 ms .. 80.38 ms)
std dev              557.9 μs   (372.2 μs .. 849.1 μs)
Man of Letters
@man_of_letters:mozilla.org
[m]
that's because the cores are busy and can't be used by blas?
Alexey Kuleshevich
@lehins:matrix.org
[m]
It definitely makes sense. It's just I didn't think about
Man of Letters
@man_of_letters:mozilla.org
[m]
benchmarks lie so hard ;D
Alexey Kuleshevich
@lehins:matrix.org
[m]

What it means is that I should be comparing this massiv's performance:

benchmarking Massiv/MxM P Double - (500x800 X 800x500)/Par
time                 6.887 ms   (6.749 ms .. 7.048 ms)
                     0.993 R²   (0.978 R² .. 0.999 R²)
mean                 7.009 ms   (6.888 ms .. 7.235 ms)
std dev              425.6 μs   (220.2 μs .. 638.2 μs)
variance introduced by outliers: 33% (moderately inflated)

To hmatrix as such:

benchmarking HMatrix/MxM Double - (500x800 X 800x500)/Par
time                 1.254 ms   (1.124 ms .. 1.413 ms)
                     0.940 R²   (0.914 R² .. 0.992 R²)
mean                 1.238 ms   (1.178 ms .. 1.358 ms)
std dev              267.3 μs   (172.2 μs .. 436.6 μs)
variance introduced by outliers: 94% (severely inflated)
Man of Letters
@man_of_letters:mozilla.org
[m]
you are only making things worse!
Alexey Kuleshevich
@lehins:matrix.org
[m]
which is very much in alignment with simd
Man of Letters
@man_of_letters:mozilla.org
[m]
so that's single-core RTS plus multicore blas/lapack vs multicore RTS?
Alexey Kuleshevich
@lehins:matrix.org
[m]
No it is compared with multicore RTS massiv:
benchmarking Massiv/MxM P Double - (500x800 X 800x500)/Par
time                 6.887 ms   (6.749 ms .. 7.048 ms)
                     0.993 R²   (0.978 R² .. 0.999 R²)
mean                 7.009 ms   (6.888 ms .. 7.235 ms)
std dev              425.6 μs   (220.2 μs .. 638.2 μs)
variance introduced by outliers: 33% (moderately inflated)
Man of Letters
@man_of_letters:mozilla.org
[m]
yes, that's what I meant
Alexey Kuleshevich
@lehins:matrix.org
[m]
multicore blas/lapack with multicore RTS
is this
benchmarking HMatrix/MxM Double - (500x800 X 800x500)/Par
time                 2.457 ms   (1.275 ms .. 4.701 ms)
                     0.276 R²   (0.256 R² .. 0.971 R²)
mean                 1.716 ms   (1.392 ms .. 3.427 ms)
std dev              1.666 ms   (311.3 μs .. 3.915 ms)
variance introduced by outliers: 98% (severely inflated)
Man of Letters
@man_of_letters:mozilla.org
[m]
got it
Kevin C
@dataopt
I ran into the following compilation error (macOS 12.5, ghc 8.10.7). Any help to resolve this greatly appreciated.
src/Data/Massiv/Array/Manifest/Unboxed.hs:147:33: error:
    • Couldn't match type ‘m’ with ‘ST (PrimState m)’
      ‘m’ is a rigid type variable bound by
        the type signature for:
          initialize :: forall ix (m :: * -> *).
                        (Index ix, PrimMonad m) =>
                        MArray (PrimState m) U ix e -> m ()
        at src/Data/Massiv/Array/Manifest/Unboxed.hs:147:3-12
      Expected type: m ()
        Actual type: ST (PrimState m) ()
    • In the expression: VGM.basicInitialize marr
      In an equation for ‘initialize’:
          initialize (MUArray _ marr) = VGM.basicInitialize marr
      In the instance declaration forManifest U e’
    • Relevant bindings include
        marr :: MVU.MVector (PrimState m) e
          (bound at src/Data/Massiv/Array/Manifest/Unboxed.hs:147:25)
        initialize :: MArray (PrimState m) U ix e -> m ()
          (bound at src/Data/Massiv/Array/Manifest/Unboxed.hs:147:3)
    |
147 |   initialize (MUArray _ marr) = VGM.basicInitialize marr
    |                                 ^^^^^^^^^^^^^^^^^^^^^^^^
cabal: Failed to build massiv-1.0.1.1 (which is required by exe:massiv-stuff
3 replies
Alexey Kuleshevich
@lehins
@dataopt do cabal update and it should work now. I'll explain a bit later why you've encountered this problem
1 reply