These are chat archives for symengine/symengine

16th
Feb 2015
Ondřej Čertík
@certik
Feb 16 2015 04:28

@shivamvats yes, you can definitely do cross sympy-csympy proposal. csympy is a complement to sympy, and we want to have optimal algorithms in sympy, and especially in a case like this, when we are not 100% sure what the best data structures are, we should start in Python and SymPy. Regarding limits, I don't think you have to modify those at all, they simply use the series module from sympy, so if we change the internals of those, limits will automatically use it.

@Sumith1896 as to what should be implemented, first in sympy: the polynomials must support necessary routines for series manipulation, the start is already there in ring_series.py, but it needs to be finished. Otherwise I think the polynomials in sympy are pretty good. We should talk to Mateusz (the author of the polys module) about this as well.
For csympy --- I think we need at least enough to support series expansion, and then enough to implement things like cancel().

Overall, I would stick to spare representation. Overall it seems it will deliver very good performance for all cases, and only in few cases, dense or other representation might be slightly more efficient.

Draft the proposals and ping me, we can iterate upon it. If some of it will get done before GSoC starts, then great. There will always be things to work on during the summer.
Ondřej Čertík
@certik
Feb 16 2015 04:42
Here are things that I am currently playing with and that you should also understand about the polys:
  • the sparse polys in SymPy use a tuple of ints (5, 2, 3) to represent the monomial x^5 y^2 z^3. Then you have a dictionary (hashtable) where you simply store the coefficients, so 3x^5y^2z^3+x becomes {(5, 2, 3): 3, (1, 0, 0):1} and that's your sparse representation.
    The exponents however can be packed into one integer, e.g. if each requires 8 bits, then you need total of 24 bits and you can use up to 64bit integers (or perhaps 128bit using the __int128 type in C++). So you pack them into one integer, that's called Kronecker substitution, and then you use just that integer as the key in the hashtable, so everything becomes much faster. Also, multiplying two monomials then just becomes adding the ints of exponents, etc. Of course, if the exponents are too large to be packed, then the code needs to notice this and raise an exception, the user then needs to use the tuple representation (std::vector<int> in C++, though std::array is faster, as std::vector is allocating on heap, so Piranha implements its own fast small vector that uses static allocation).
  • the integer arithmetic, the coefficients of the monomials, must be very fast. SymPy uses either Python ints (slow) or gmpy (fast). In C++, we also use GMP, but even faster is to use machine integers for small integers and then switch to GMP for large integers. This trick can and should be used for all CSymPy integers, it should speed things up considerably.
  • the hashtable should be tailored for this purpose, so Piranha stores the values directly in the array (if there is only one value in a bucket), only if there is more values in a bucket, it uses a linked list I think. Also, there is a trick to use a 1:1 hash function, Piranha uses that.
  • finally, we also need polynomials with rational coefficients and probably rational exponents as well, etc.