These are chat archives for symengine/symengine

3rd
Jul 2015
Francesco Biscani
@bluescarni
Jul 03 2015 12:07
I am surprised that TCMALLOC slows things down... is this in expand2b only or in all the tests?
here it gives a moderate speedup (erasing the performance penalty you measured when including thread_pool)
In any case, if my interpretation is correct, you will run into thread-related slowdowns once you start using multithreading
Ondřej Čertík
@certik
Jul 03 2015 12:56
@bluescarni thanks for figuring out the slowdown!
TCMALLOC is faster on my computer. But you don't need to use it if it slows things down.
Francesco Biscani
@bluescarni
Jul 03 2015 13:19
@certik np! not really advocating TCMALLOC by default or anything of the sort
just trying to figure out where the slowdown comes from
my current theory is that when threading-related stuff is compiled in (such as when including thread_pool) the compiler does something differently which causes the slowdown
I think this might be related to the behaviour of the memory allocator in single-thread vs multithread
as switching to TCMALLOC bypasses the default malloc() implementation and recovers the lost performance
but it's a theory
Sumith Kulal
@Sumith1896
Jul 03 2015 17:15
Are you guys getting a speedup with tcmalloc?
I'm not, let me try again though
If I am getting you right @bluescarni , when thread_pool is called, some compiler optimizations go off and we get a slowdown.
One possibility is that it is due to memory allocator only
Francesco Biscani
@bluescarni
Jul 03 2015 18:23
I think it has to do with memory allocation, more than compiler optimisation
I am seeing a moderate speedup with tcmalloc in the expand2b benchmark
Sumith Kulal
@Sumith1896
Jul 03 2015 18:24
Oh, are you working on the first bad commit or head of packint branch?
Francesco Biscani
@bluescarni
Jul 03 2015 18:24
I am switching between bad and good commit
not tested head so far
Sumith Kulal
@Sumith1896
Jul 03 2015 18:25
Okay, I am working on the head
and I have to include hash_set too
Francesco Biscani
@bluescarni
Jul 03 2015 18:26
you need to test one thing at a time, otherwise it will be difficult to understand what influences what
Sumith Kulal
@Sumith1896
Jul 03 2015 18:26
Yes, I should be trying out that commit only
Maybe that's why the tcmalloc etc were not consistent, ignore all my previous comments
Francesco Biscani
@bluescarni
Jul 03 2015 18:27
sure np.. it's that there's a few things happening at the same time, need to proceed step by step in order to untangle what is going on
Sumith Kulal
@Sumith1896
Jul 03 2015 18:27
Agreed
Francesco Biscani
@bluescarni
Jul 03 2015 18:28
btw my internet is complete crap today, if I disappear suddenly it's probably due to that :/
Sumith Kulal
@Sumith1896
Jul 03 2015 18:28
np, happens to me all the time :smile:
very unreliable here, but back in university it is very good.
So did you find any workaround, thread_pool is the problem right?
Francesco Biscani
@bluescarni
Jul 03 2015 18:29
yeah well I am in Germany (not German myself though) so I would expect the infrastructure would be good, but the connection has been crap for 3 months now
yes
that's the problem
Sumith Kulal
@Sumith1896
Jul 03 2015 18:29
We use only single thread right?
Francesco Biscani
@bluescarni
Jul 03 2015 18:30
that is correct
I think that this is what is going on:
thread_pool defines a global variable which is a pool of threads
Sumith Kulal
@Sumith1896
Jul 03 2015 18:31
So can we have a compiler flag for Piranha were we call thread_pool only if needed?
random guess, I have no clue of feasibility
Francesco Biscani
@bluescarni
Jul 03 2015 18:31
this variable gets created and linked into the executable only when thread_pool is included
when the compiler/linker see some thread usage of any kind into the executable, it will switch the memory allocator to be thread safe or something along these lines
it's gonna be hard to avoid thread_pool
it's included from many places in piranha
besides, if the theory above is correct, the moment you start using multithreading in symengine
you will run into the same issue
independently of thread_pool
to me, the first order of business would be a thorough investigation of what we are seeing
specifically:
  • normal alloc vs tcmalloc
  • mpz_addmul vs the mpz_class operators
  • test if the slowdown happens anyway if thread_pool is not included but some other piece of threading functionality is included in symengine
that is, try to create a std::thread in poly_mul and see if the slowdown is back even without thread_pool
Sumith Kulal
@Sumith1896
Jul 03 2015 18:36
Cool, these cases need study
Francesco Biscani
@bluescarni
Jul 03 2015 18:36
all the combinations of the above should be tested in the first bad commit
Sumith Kulal
@Sumith1896
Jul 03 2015 18:36
Yes
Francesco Biscani
@bluescarni
Jul 03 2015 18:37
that is what I would do anyway, but you should probably talk to @certik to see if he thinks this is worth pursuing
maybe at the moment you don't care too much, and it is more important for your gsoc project to move onto something else
just saying :)
Sumith Kulal
@Sumith1896
Jul 03 2015 18:38
If we are hard depending on Piranha, I think we have to figure this out
Francesco Biscani
@bluescarni
Jul 03 2015 18:39
it is an interesting thing I would like to more about anyway, and it's pretty cool that you identified this
*know more
I am used to see some slowdown related to multithreading but I was not expecting that just including a header would trigger this behaviour
the more you know
Sumith Kulal
@Sumith1896
Jul 03 2015 18:40
@shivamvats will be working on SymEngine later, the process will speed later.
I would like to try my best to figure this out
Isuru Fernando
@isuruf
Jul 03 2015 18:42
With the inclusion of piranha.hpp I see around a 5% more cost in malloc
Used callgrind to see the functions being called on both cases
Francesco Biscani
@bluescarni
Jul 03 2015 18:44
@isuruf that is pretty interesting! could you check by only including thread_pool.hpp?
Isuru Fernando
@isuruf
Jul 03 2015 18:45
By replacing #include <piranha/piranha.hpp> ?
Sumith Kulal
@Sumith1896
Jul 03 2015 18:46
Yup
Francesco Biscani
@bluescarni
Jul 03 2015 18:46
yes
Sumith Kulal
@Sumith1896
Jul 03 2015 18:46
and including others necessary
mp_integer maybe
Francesco Biscani
@bluescarni
Jul 03 2015 18:46
just #include <piranha/thread_pool.hpp>
yes then you can try with only <piranha/mp_integer.hpp>
Isuru Fernando
@isuruf
Jul 03 2015 18:51
Same result for thread_pool.hpp
and also for mp_integer.hpp
Sumith Kulal
@Sumith1896
Jul 03 2015 18:52
@isuruf Are you trying the first bad commit?
Isuru Fernando
@isuruf
Jul 03 2015 18:52
yes
Sumith Kulal
@Sumith1896
Jul 03 2015 18:52
weird :/
mp_integer has no thread_pool
afaik
Thanks @isuruf
I'll try this too.
Isuru Fernando
@isuruf
Jul 03 2015 18:55
With both of them, exact same functions from piranha gets called.
https://drive.google.com/file/d/0B3Xhz9AGyxDObkZndkpNSVF6UW8/view?usp=sharing
Francesco Biscani
@bluescarni
Jul 03 2015 18:56
did you replace all occurrences of piranha.hpp?
Isuru Fernando
@isuruf
Jul 03 2015 18:57
Oops. There are two occurrences. I only replaced the one in rings.h
With only mp_integer.hpp it's back to the good commit
No piranha functions gets called
Francesco Biscani
@bluescarni
Jul 03 2015 19:09
ok this is pretty helpful actually, I would've expected the compiler/linker to remove completely any code relating to the thread pool as it is never being used
but evidently that is not the case
mhmh
I have a couple of ideas on how to inhibit the instatiation of the thread pool
Francesco Biscani
@bluescarni
Jul 03 2015 19:15
all piranha functions in that screenshot you posted are related to the thread pool
Sumith Kulal
@Sumith1896
Jul 03 2015 19:18
So do we need to amend Piranha for this?
What do you have in mind?
Francesco Biscani
@bluescarni
Jul 03 2015 19:18
I was thinking of turning the thread pool in a template class
well yes, this would be something that goes into Piranha
so that it does not get instantiated by a simple inclusion of the header file
you actually have to use it in order to have it compiled in
the order of my messages is screwed up, sorry about that
Isuru Fernando
@isuruf
Jul 03 2015 19:20
Is it the default that 4 threads gets created?
Francesco Biscani
@bluescarni
Jul 03 2015 19:20
@isuruf it creates a number of threads equal to the number of logical cores detected on the system
or 1 if it cannot detect that number
Isuru Fernando
@isuruf
Jul 03 2015 19:21
Okay
I posted a question here
off to eat something, I'll try the template idea later