These are chat archives for bluescarni/piranha

27th
Jan 2016
Francesco Biscani
@bluescarni
Jan 27 2016 09:35
@isuruf are you there?
Isuru Fernando
@isuruf
Jan 27 2016 09:35
Yes
Francesco Biscani
@bluescarni
Jan 27 2016 09:35
would you have time to walk me through that slow benchmark mentioned in the pull request?
Isuru Fernando
@isuruf
Jan 27 2016 09:37
Sure.
Francesco Biscani
@bluescarni
Jan 27 2016 09:37
thanks... is there a test that can be run immediately exhibiting the slowness?
I am trying with test_series_expansion_URatP and series_expansion_sincos_piranha at the moment
Isuru Fernando
@isuruf
Jan 27 2016 09:38

yes.
benchmarks/series_expansion_sincos_piranha.cpp
vs
benchmarks/series_expansion_sincos_flint.cpp

from the PR

Francesco Biscani
@bluescarni
Jan 27 2016 09:39
ok... but for those there's a factor of 2 of difference roughly here, I was wondering if there was a more extreme example available
Ralf Stephan
@rwst
Jan 27 2016 09:40
@bluescarni note the different values for N
Francesco Biscani
@bluescarni
Jan 27 2016 09:40
ok, then I can change manually the N value in that benchmark to reproduce the behaviour?
Ralf Stephan
@rwst
Jan 27 2016 09:40
yes
Francesco Biscani
@bluescarni
Jan 27 2016 09:40
or I can just start studying this
ok thanks
gonna try a few things
Francesco Biscani
@bluescarni
Jan 27 2016 09:59
so there's a few interesting things coming out of the profiler
for instance, there are about 18M calls to the C++ random number generatir
generator
this is part of the routine that estimates the final series size
this does not even account for much, about 13% of the total time
what are the typical sizes of the series involved in this computation?
Isuru Fernando
@isuruf
Jan 27 2016 10:03
N-1
N-1 is the degree. so N terms
Francesco Biscani
@bluescarni
Jan 27 2016 10:04
ok so about 200
Ralf Stephan
@rwst
Jan 27 2016 10:04
but millions means there is something cubic not quadratic
Francesco Biscani
@bluescarni
Jan 27 2016 10:05
yes it looks like the whole series size estimation is a huge overhead for small/univariate series
Isuru Fernando
@isuruf
Jan 27 2016 10:05
sin(x+x^2) is calculated by calculating sum((x+x^2)^i/i!*(-1)^i for i odd < N)
Francesco Biscani
@bluescarni
Jan 27 2016 10:06
I can try to disable the machinery and see what happens
Ralf Stephan
@rwst
Jan 27 2016 10:06
@isuruf sure? not via tan()?
Isuru Fernando
@isuruf
Jan 27 2016 10:07
Nope. tan code is commented.
Ralf Stephan
@rwst
Jan 27 2016 10:08
do we calculate the (poly)^i bit successively or new for each i?
Isuru Fernando
@isuruf
Jan 27 2016 10:08
New
Ralf Stephan
@rwst
Jan 27 2016 10:08
ah
!
Francesco Biscani
@bluescarni
Jan 27 2016 10:09
there's 200 calls to pow()
and 20000 series multiplications
Ralf Stephan
@rwst
Jan 27 2016 10:10
off to lunch...
Francesco Biscani
@bluescarni
Jan 27 2016 10:10
is there a lot of scalar * series multiplications in this example?
@rwst later!
something like (x+1)*3 I mean
Isuru Fernando
@isuruf
Jan 27 2016 10:16
Yes. See the formula I gave above
Francesco Biscani
@bluescarni
Jan 27 2016 10:17
and the formula is implemented internally using piranha's overloaded operators?
Isuru Fernando
@isuruf
Jan 27 2016 10:17
(x+x^2)^i/i!*(-1)^i
yes
Francesco Biscani
@bluescarni
Jan 27 2016 10:18
ok then at least some of the slowness is explained
piranha is horribly un-optimised for these things, if you do a scalar X series multiplication it will always be transformed into a series X series multiplication
I meant to do something about it, it's not difficult to improve
but always lacked a clear motivation to do so
I will have a go at it
Isuru Fernando
@isuruf
Jan 27 2016 10:23
Oh. There are some things I thought as being not compute heavy. For eg. sin(x+x^2)*cos(x+x^2) is represented as 1*sin(x+x^2)*cos(x+x^2). I don't check the numeric value at the beginning and just multiply the whole result
Poly is a piranha rational polynomial. Coeff is piranha::rational and Series is the type for CRTP
Francesco Biscani
@bluescarni
Jan 27 2016 10:30
series multiplication in general is rather costly
it's heavily optimised for very large series
Francesco Biscani
@bluescarni
Jan 27 2016 10:56
does the sincos test also verify the output?
Ralf Stephan
@rwst
Jan 27 2016 11:26
No.
Francesco Biscani
@bluescarni
Jan 27 2016 11:40
as a workaround I removed series estimation when in single thread mode, now the benchmark runs in 900 ms. But I am not sure it is correct
ah ok but wait... the two tests for flint and piranha run different Ns
Francesco Biscani
@bluescarni
Jan 27 2016 11:46
runtime is now about 19 seconds with N=500
Francesco Biscani
@bluescarni
Jan 27 2016 11:52
any idea on how dense these polynomials are?
Francesco Biscani
@bluescarni
Jan 27 2016 12:04
so according to this: http://www.wolframalpha.com/input/?i=Taylor+Series+of+Sin%5B2*x*%28x%2B1%29%5D it basically is a univariate dense polynomial right?
Francesco Biscani
@bluescarni
Jan 27 2016 12:19
there's also about 32M calls to integer GCD, not sure how flint handles this
anyway I think that the bottom line is that I can squeeze out some extra performance still (maybe another factor of 2?)
but ultimately it all boils down to the dense univariate character of the computation
the cost of accessing a coefficient in a vector (flint) vs looking it up in a hash table (piranha) is always gonna be at the very least 1 order of magnitude smaller
Francesco Biscani
@bluescarni
Jan 27 2016 12:26
according to some measurements I did in the past the hash table cost vs a vector during multiplication is around a factor of 20 actually
Francesco Biscani
@bluescarni
Jan 27 2016 12:32
I suppose it would make some sense to switch to a vector dense representation internally for univariate cases
Francesco Biscani
@bluescarni
Jan 27 2016 13:44
but that has never been the point of Piranha really
Ralf Stephan
@rwst
Jan 27 2016 13:44
@bluescarni it was an algorithmic problem in series.h. I'm now down from 3500ms to ...
450ms with N=200 and
N=1000 takes now 133 seconds with this patch:
--- a/symengine/series.h
+++ b/symengine/series.h
@@ -284,14 +284,15 @@ public:
             return Series::sin(c)*Series::series_cos(t, var, prec) + Series::cos(c)*Series::series_sin(t, var, prec);
         }
         //! fast sin(x)
-        Poly res_p(0);
+        Poly res_p(0), monom(s), ssquare(s*s);
         Coeff prod(1);
         for (unsigned int i = 0; i < prec / 2; i++) {
             const short j = 2 * i + 1;
             if (i != 0)
                 prod /= 1 - j;
             prod /= j;
-            res_p += Series::pow(s, j, prec) * prod;
+            res_p += monom * prod;
+            monom *= ssquare;
         }
         return res_p;

@@ -338,14 +339,15 @@ public:
             const Poly t = s - c;
             return Series::cos(c)*Series::series_cos(t, var, prec) - Series::sin(c)*Series::series_sin(t, var, prec);
         }
-        Poly res_p(1);
+        Poly res_p(1), ssquare(s*s), monom(s*s);
         Coeff prod(1);
         for (unsigned int i = 1; i <= prec / 2; i++) {
             const short j = 2 * i;
             if (i != 0)
                 prod /= 1 - j;
             prod /= j;
-            res_p += Series::pow(s, j, prec) * prod;
+            res_p += monom * prod;
+            monom *= ssquare;
         }
         return res_p;
 //
Francesco Biscani
@bluescarni
Jan 27 2016 13:51
with piranha you mean?
Ralf Stephan
@rwst
Jan 27 2016 13:51
yes
Francesco Biscani
@bluescarni
Jan 27 2016 13:52
ok.. so the modification I did with series estimation should cut down another factor of 3 or so in the runtime
still pretty far from flint
but in the factor of 20 ballpark
Francesco Biscani
@bluescarni
Jan 27 2016 14:47
can't really understand how n = 500 results in 32M GCD though
Isuru Fernando
@isuruf
Jan 27 2016 14:50
Is it possible that a line like rational == 1 invokes GCD ?
Francesco Biscani
@bluescarni
Jan 27 2016 14:51
cannot really remember if that was optimised in some way
according to the timings here it's spending almost half of the time normalising the divisors of two polynomials before multiplying them
this was intended as an optimisation: you normalise all coefficients to the same divisor, and then do the multiplication on integers instead of rationals
but maybe this is a pessimization in this case, who knows
Isuru Fernando
@isuruf
Jan 27 2016 14:56
For me 77.90% of the time is spent on canonicalise
Francesco Biscani
@bluescarni
Jan 27 2016 14:56
yeah
Ralf Stephan
@rwst
Jan 27 2016 14:57
That is normal if you add lots of rationals
Francesco Biscani
@bluescarni
Jan 27 2016 14:59
how does flint handle polys over the rationals?
ok this is really weird... after I removed that normalisation step the test with N=200 seems to run in 40ms
maybe I botched something up
Francesco Biscani
@bluescarni
Jan 27 2016 15:08
this does not seem very right... now N=1000 takes 543ms with Piranha
this is with your patch above @rwst, plus my local changes to Piranha
is there any way in which I can check the correctness?
Isuru Fernando
@isuruf
Jan 27 2016 15:10
Check res.as_basic() == flint_res.as_basic()
With a patch to piranha I was able to get
234ms for piranha vs 1596ms for flint
Francesco Biscani
@bluescarni
Jan 27 2016 15:11
which patch?
Isuru Fernando
@isuruf
Jan 27 2016 15:12
diff --git a/src/mp_rational.hpp b/src/mp_rational.hpp
index 25555eb..470eb72 100644
--- a/src/mp_rational.hpp
+++ b/src/mp_rational.hpp
@@ -238,10 +238,17 @@ class mp_rational
                {
                        // NOTE: all this should never throw because we only operate on mp_integer objects,
                        // no conversions involved, etc.
-                       if (m_den.is_unitary() && other.m_den.is_unitary()) {
+                       if (m_den.is_unitary() || other.m_den.is_unitary()) {
                                // Both are integers, we can just add without canonicalising.
                                // NOTE: this is safe is this and other are the same thing.
-                               m_num += other.m_num;
+                if (m_den.is_unitary() && other.m_den.is_unitary()) {
+                                   m_num += other.m_num;
+                } else if (m_den.is_unitary()) {
+                    m_den = other.m_den;
+                    m_num = m_num*m_den+other.m_num;
+                } else {
+                    m_num = m_num+other.m_num*m_den;
+                }
                        } else if (m_den == other.m_den) {
                                // Denominators are the same, add numerators and canonicalise.
                                // NOTE: safe if this and other coincide.
Basically integer+rational does not need to be canonicalized
This is what flint does
Francesco Biscani
@bluescarni
Jan 27 2016 15:14
I feel embarrassed to ask, but is this always the case?
I mean that you don't need to canonicalise when adding int to rat?
Isuru Fernando
@isuruf
Jan 27 2016 15:17
You don't need to canonicalise
Consider a/b + s where a,b and s are integers. gcd(a, b) = 1
then a/b+s = (a+b*s)/b
But gcd(a+b*s, b) = gcd(a, b) = 1
Ralf Stephan
@rwst
Jan 27 2016 15:18
using our generic sin modified to compile cleanly (Flint is fickle) N=1000 now needs 720ms
Isuru Fernando
@isuruf
Jan 27 2016 15:19
gcd(a+b*s, s) = gcd(a, s) comes from the fact that gcd(a, b) = gcd(a-b, b)
Sorry, edited the formula
Ralf Stephan
@rwst
Jan 27 2016 15:22
Isn't canonicalization only needed before output to humans, and to let the numbers not grow too big? so it's a matter when to canonicalize not if at all.
Isuru Fernando
@isuruf
Jan 27 2016 15:23
Also for comparisons
Ralf Stephan
@rwst
Jan 27 2016 15:23
ah ok
Francesco Biscani
@bluescarni
Jan 27 2016 15:23
god dammit good catch @isuruf
I feel so dumb:D
ok I guess we need to see what happens once this patch is applied... could you send a PR please?
Isuru Fernando
@isuruf
Jan 27 2016 15:25
Sure
Francesco Biscani
@bluescarni
Jan 27 2016 15:25
thanks!
once it is ready I can re-run the tests with my local modifications on top of that and see where we get to
@rwst my understanding is that heuristically speaking it's better to canonicalise as early as possible, as the complexity grows with the size of the operands
but I just read it once somewhere so not sure in practice
Isuru Fernando
@isuruf
Jan 27 2016 15:38
Sent a PR
Isuru Fernando
@isuruf
Jan 27 2016 15:47
Sorry about the noise. I had reset symengine and the benchmark had reverted to N=200
Francesco Biscani
@bluescarni
Jan 27 2016 15:48
thanks for the PR!
Ralf Stephan
@rwst
Jan 27 2016 15:50
Strange things happen. ATM I'm at a loss why with my Flint series version a series.as_basic() shows 1+... but series.get_coeff(0)=0...
Francesco Biscani
@bluescarni
Jan 27 2016 15:51
I think here we need to set the denominator as well or not? https://github.com/isuruf/piranha/blob/canonicalise/src/mp_rational.hpp#L246
Isuru Fernando
@isuruf
Jan 27 2016 15:51
@bluescarni , yes
Francesco Biscani
@bluescarni
Jan 27 2016 15:54
I was also wondering if m_num = m_num*other.m_den + other.m_num; can use somehow multiply_accumulate() for performance reasons, but maybe it's premature optimisation