These are chat archives for symengine/symengine

9th
Nov 2015
harakas
@harakas
Nov 09 2015 10:41
I'm a real noob with cython. Could you help me out with writing a simple "replacer" class. I made a start at http://pastebin.com/G56H2ip4
Isuru Fernando
@isuruf
Nov 09 2015 11:58
Sure. Can you clarify what you are trying to do here? This is basically subs right?
harakas
@harakas
Nov 09 2015 12:34
Yeah it's a simple dictionary substituter, that does not sympify and build the dictionary every time.
Isuru Fernando
@isuruf
Nov 09 2015 12:59
I'm getting a compile error as well. That's strange
Isuru Fernando
@isuruf
Nov 09 2015 13:04
Compile error is due to c2py not being declared in symengine_wrapper.pxd
harakas
@harakas
Nov 09 2015 13:18
So I need to either declare it there or implement my own? Also I don't understand what the difference between symengine.pxd and symengine_wrapper.pxd is.
Isuru Fernando
@isuruf
Nov 09 2015 13:19

Yes.
symengine.pxd declares the functions from symengine C++ library.
symengine_wrapper.pxd declares the functions for symengine python library.

symengine_wrapper.pxd can be used to cimport symengine python wrapper classess.

Isuru Fernando
@isuruf
Nov 09 2015 14:30

@certik, declaring c2py in symengine_wrapper.pxd gives an error.

Error compiling Cython file:
------------------------------------------------------------
...
include "config.pxi"

class SympifyError(Exception):
    pass

cdef c2py(RCP[const symengine.Basic] o):
    ^
------------------------------------------------------------

symengine_wrapper.pyx:20:5: Function signature does not match previous declaration

Do you know how to overcome this?

Ondřej Čertík
@certik
Nov 09 2015 14:56
@isuruf looks like it is defined already somewhere.
Isuru Fernando
@isuruf
Nov 09 2015 14:57

I declared in symengine_wrapper.pxd like
cdef c2py(RCP[const symengine.Basic] o)

And the implementation is in symengine_wrapper.pyx, but the error is there

Here's the diff of what I did
diff --git a/symengine/lib/symengine_wrapper.pxd b/symengine/lib/symengine_wrapper.pxd
index edd342f..e413dc0 100644
--- a/symengine/lib/symengine_wrapper.pxd
+++ b/symengine/lib/symengine_wrapper.pxd
@@ -128,4 +128,6 @@ cdef class DenseMatrix(MatrixBase):
     pass

 cdef class Log(Basic):
-    pass
\ No newline at end of file
+    pass
+
+cdef c2py(RCP[const symengine.Basic] o)
Ondřej Čertík
@certik
Nov 09 2015 15:00
@bluescarni thanks for the answers. @rwst you are right, that we should use Flint. I only benchmarked Piranha for the multivariate case, and as you noticed, it is so far ahead of anything else (e.g. faster than any of the benchmarks here: http://nemocas.org/benchmarks.html), that I assumed it will be faster for the univariate case as well. @bluescarni what is the reason that Flint is faster? The speedup is big, and for series expansion, it might make sense to either use Flint or use our own implementation of the same algorithm --- the reason for reimplementation might be to be able to switch the integer implementation easily, as the Flint integer might be slower than piranha or our own future implementation.
Isuru Fernando
@isuruf
Nov 09 2015 15:06
@certik, #651 has some code for flint integer as well. Only thing missing is getting mpz_srcptr from a const flint::fmpzxx. I'l use piranha's approach and let you know the timings among the 3 implementations
harakas
@harakas
Nov 09 2015 15:10
@isuruf I'm getting the same error.
Ondřej Čertík
@certik
Nov 09 2015 15:10
@harakas thanks for the feedback. For the core of SymEngine, overloading +, -, /, ... will only complicate things, as that will force us to reimplement RCP --- which in Release mode we already do anyway, but in Debug mode that's actually quite non-trivial, in fact there is no other implementation that I am aware of that will catch dangling pointers, except the Teuchos::RCP that we use. And safety in Debug mode is essential. When you use symengine as a user, then of course you want to write a+b, instead of add(a, b). If you use it from Python, Ruby, Julia, then we of course overload these operators there, so there is no issue. So the only remaining reason to overload them is if you want to use symengine from C++ itself, as a user, or perhaps as part of some other library, that expects these operators to be overloaded. For that, we provide the Expression class, that overloads these operators in a clean way. It should be possible to use the Expression class in Piranha, for example. However, it's another layer to worry about, so we don't use it in the core, since we care about speed, so we try to keep things we don't need out of the core.
Ralf Stephan
@rwst
Nov 09 2015 15:11
@bluescarni @certik , would it be difficult to add the univariate case as special case to piranha?
Ondřej Čertík
@certik
Nov 09 2015 15:11
@rwst I think we definitely have to have it as a special case in SymEngine.
I don't mind whether from Piranha, or Flint, or our own implementation.
I thought the sparse representation in Piranha is superior in all cases, and it indeed does seem to be superior if you have multivariate polynomials. You can construct special cases where it is not superior, but it will not lose by much. However, the univariate case that you showed @rwst, the sparse representation in Piranha is a lot worse. So I think the benefit of having univariate polynomials as a special case is too big to ignore.
harakas
@harakas
Nov 09 2015 15:14
@certik thanks for the explanation.
Ondřej Čertík
@certik
Nov 09 2015 15:15
@harakas no problem. Let us know if you have some ideas how things can be perhaps implemented in a better way. The above are the reasons why we designed it the way we did, but perhaps there is a better way.
@harakas ah, as to the checks for matrices ---- so for speed reasons we just use Debug time asserts in C++, that are disabled in Release mode. The philosophy of it is as follows ---- you should never ever trigger the assert in Debug mode, if you do, it needs to be fixed somehow. So as a special case, you compile the Python wrappers with symengine in Debug mode, and if you trigger the Debug time assert, it means that the Python wrappers are missing a check for the dimensions. So once you can never trigger the Debug time assert, then in Release mode it will also always work.
In other words, things that you can change at runtime need to be checked at runtime, as you said.
Ralf Stephan
@rwst
Nov 09 2015 15:20
@certik , @bluescarni pointed me to https://github.com/fredrik-johansson/bland which implements recursive rings on top of Flint. However, their documentation is not accessible and I'm still fighting with understanding the code, which promises multivar polynomials with Flint.
Ondřej Čertík
@certik
Nov 09 2015 15:30
@rwst I've heard about bland, though I think it's a bit older code. I thought they have some kind of multivariate polynomials in Nemo.
@rwst do you know what data structures Flint uses for the univariate case?
Ralf Stephan
@rwst
Nov 09 2015 15:37
@certik No if you don't mean the coefficients which can be fmpz_t, fmpq_t and a lot of other ones.
Exponents are always integer.
harakas
@harakas
Nov 09 2015 16:02
@isuruf I got my replacer working. well. not the c2py function, but I can run it through a terrible hack. it could be a cython bug. then again by my experience so far today, cython has terrible error messages.
Isuru Fernando
@isuruf
Nov 09 2015 16:08
That's great. If you find any performance issues, send us some benchmarks. We have few synthetic benchmarks but we are always interested in benchmarks in real applications.
Ralf Stephan
@rwst
Nov 09 2015 16:52
@certik Nemo is impressive speedwise. They will be yet another number theory package however. I'm more interested in expressions.
Ondřej Čertík
@certik
Nov 09 2015 16:58
@rwst yes, the goal of symengine is expressions, that's our main emphasis. But these polynomials are also very useful, so I want to have something that's competitive in SymEngine as well, preferably just calling other libraries like Flint or Piranha.
@isuruf with regards to #651, why doesn't piranha::integer allow conversion from an integer?
I know the Teuchos::Ptr also doesn't allow implicit conversions from a pointer, you have to be explicit.
Isuru Fernando
@isuruf
Nov 09 2015 17:00
It does allow conversion, but it is explicit. I asked @bluescarni about it in piranha chatroom.
Ondřej Čertík
@certik
Nov 09 2015 17:00
The motivation there is I think to avoid all kinds of subtle bugs, I think to accidentally converting a pointer that you don't want to convert.
Piranha's motivations are the same
Ondřej Čertík
@certik
Nov 09 2015 17:02
I think that's a good design.
We should probably write a simple integer class ourselves, that would show the API that symengine expects from an integer class.
@isuruf another thing that occurred to me is that maybe we can use Expression in the tests, since it's much more readable to use +, *, /, .. than add(mul(div.... As long as Expression is never used in the core itself, then it shouldn't matter.
Isuru Fernando
@isuruf
Nov 09 2015 17:10
@certik, yes. I worked on implicit conversion from Expression to RCP<const Basic>. see #657
One more thing we could do is add << for RCP<const Basic>, so that std::cout << sin(Expression(1)); works
Ondřej Čertík
@certik
Nov 09 2015 17:12
I used to overload <<, but it is already overloaded in Teuchos.
And I didn't figure out a way without changing Teuchos source code.
So you still have to print using std::cout << *sin(Expression(1))
Or assign it to an Expression variable first.
Isuru Fernando
@isuruf
Nov 09 2015 17:14
Yes
Francesco Biscani
@bluescarni
Nov 09 2015 19:08
@certik @rwst I could definitely look into adding a univariate specialisation in Piranha for that. I am wondering though what is the use case scenario for these expansions? What I mean is that if the goal is to do frequently univariate series expansions and not use them further, then the optimisation would make sense. On the other hand, if they are the starting blocks of longer computations that turn out to be multivariate, then we would be optimising an increasingly small part of the workload . I would like to get a sense of how series expansions fit in the larger scheme
Francesco Biscani
@bluescarni
Nov 09 2015 19:14
@certik @rwst Regarding Piranha's overhead for small operands, it arises from different things (but I have no idea quantitatively how much each of these contributes):
  • hash set vs plain vector for the representation of polynomials - off the bat this will introduce a constant time multiplier for each and every operation
Ondřej Čertík
@certik
Nov 09 2015 19:16
@bluescarni that is indeed a very good question. I don't have that many applications in mind at the moment for univariate cases. But it is that you just have a general univariate expression like sin(x)*cos(x) / tan(x) and you just want to know the series. You are right, that once you start doing things like sin(k*x), then you are in the multivariate case.
Francesco Biscani
@bluescarni
Nov 09 2015 19:16
  • multithreading overhead: each time a poly multiplication is performed, there some locking/unlocking going to decide how many threads to use, this might become noticeable in small operations
  • estimation of the result size: Piranha operates on the assumption that the polynomials are sparse, and it cannot know in advance how many terms will be in the result of a poly multiplication. It uses a statistical approach for that which has an overhead that increases with the smallness of the operands
It might as well be that for very small operands this estimation overhead dominates the actual multiplication
Ondřej Čertík
@certik
Nov 09 2015 19:19
@isuruf, as we discussed in #657, you can get the right << behavior by overloading it for Expression and then simply overload sin for Expression as well. And disable all implicit conversions. I really like that approach, it's very clean.
@bluescarni how do you feel about implicit conversion of things like ints (Expression e = 1)?
Francesco Biscani
@bluescarni
Nov 09 2015 19:19
These are at least the things that come to mind immediately
it's unfortunate that implicit conversion/construction is needed for that to work, but the price to pay for that syntactic niceness is too high IMO
I'd rather write Expression e{1}
Ondřej Čertík
@certik
Nov 09 2015 19:20
What is the price to pay for implicit conversion from ints?
mpz_class does it.
Francesco Biscani
@bluescarni
Nov 09 2015 19:21
it's the general extra reasoning and complexity that thinking about implicit conversions introduces
Ondřej Čertík
@certik
Nov 09 2015 19:21
I really don't like implicit conversions between Expression and RCP<const Basic>, it's opening a can of worms that is just too difficult to reason about.
Francesco Biscani
@bluescarni
Nov 09 2015 19:22
well if you restrict the the implicit conversion from int it might not be too bad
Ondřej Čertík
@certik
Nov 09 2015 19:22
If you just define a constructor from int, then you can use Expression e{1}, but that's not implicit conversion.
how exactly do you define implicit conversion?
It seems that implicit conversion from Expression is done using operator const RCP<const Basic> &()
Francesco Biscani
@bluescarni
Nov 09 2015 19:23
well to be clear, you have implicit conversion and implicit construction which interact together... in order to write Expression e = 1; you need to have a non-explicit constructor from int
if you mark your constructor explicit, that syntax will not work
Ondřej Čertík
@certik
Nov 09 2015 19:24
Ah that's right.
Francesco Biscani
@bluescarni
Nov 09 2015 19:24
Expression e{1}; will still work though
Ondřej Čertík
@certik
Nov 09 2015 19:24
So is the rule that you follow, that if you have a constructor from another type, you always mark it using the explicitkeyword?
Francesco Biscani
@bluescarni
Nov 09 2015 19:25
yes... all constructors apart from the move/default/copy ones I always mark explicit
Ondřej Čertík
@certik
Nov 09 2015 19:25
ok.
So I think we should document it.
The painful thing is that things like Expression e{1} work, but std::vector<Expression> v = {1, 2, 3} might not.
Francesco Biscani
@bluescarni
Nov 09 2015 19:26
I like to be warned by the compiler when I am not using a function with the correct parameter :) I just like it in general, but it makes syntax a bit less nice
yep
Ondřej Čertík
@certik
Nov 09 2015 19:27
That sucks.
Francesco Biscani
@bluescarni
Nov 09 2015 19:27
it does... so if you really care about that you might want to provide the implicit constructor
the whole initializer_list has been a bit of a fiasco and a missed opportunity IMO
Ondřej Čertík
@certik
Nov 09 2015 19:28
Fortran is the same way, if I have a real array, I cannot just use real(dp), parameter :: a(3) = [1.5_dp, 2, 3], I have to use real(dp), parameter :: a(3) = [1.5_dp, 2._dp, 3._dp], for some reason it won't convert the integer to real. But I can do real(dp), parameter :: b = 1.
@bluescarni but in the current code in symengine, isn't this implicit: https://github.com/symengine/symengine/blob/e5660d01b62e9f68e2e865dae83d8a453a5f2bdb/symengine/expression.h#L39 ?
Ralf Stephan
@rwst
Nov 09 2015 19:30
@bluescarni I'm personally committed now to using piranha for the first implementation of expansions. Whether a specialisation is added in piranha or in symengine later is of lesser priority to me, also because with piranha alone this would resolve the most pressing problems in SymPy.
Francesco Biscani
@bluescarni
Nov 09 2015 19:32
yes they are implicit
mandating explicit-ness can be seen as controversial, didn't want to force it into another project... I just tried to adopt the SymEngine style when I wrote that initially
brb dinner time :)
Ondřej Čertík
@certik
Nov 09 2015 19:39
@bluescarni I see. I think we use implicit constructors sometimes, but for the Expression I think we should definitely disable all implicit conversions, at least initially.
@rwst, indeed, that was my conclusion as well, since with the multivariate benchmarks Piranha is either by far the fastest, or very close to be the fastest.
I didn't realize that there is a gap for univariate polynomials.
Indeed, it is a specialization that can be done later.
Ondřej Čertík
@certik
Nov 09 2015 21:05
@bluescarni one more question --- do you always use nothrow move constructor and assignment?
I.e. if you can you always define them as nothrow?
Francesco Biscani
@bluescarni
Nov 09 2015 21:47
yes normally that's the case
usually it's either a pointer swap or an operation equivalent to memcpy
so nothing that can throw
and defining them noexcept enables certain optimisations, e.g. in std::vector if I recall correctly
I need to run for a while, I will be back later to read the chat
harakas
@harakas
Nov 09 2015 22:02
@isuruf I decided to fork my own branch as extending existing sutff with Cython is not feasible due to Cython bugs and inflexibility. The replacer, or as I call it the Dict now, sped up my code by 6 times. Converting from python dictionary for each subs() was apparently very inefficient. Here's what I did: symengine/symengine.py@45f2782
Ondřej Čertík
@certik
Nov 09 2015 22:44
@harakas yes, I've hit similar issues when dealing with const and templates in Cython: http://grokbase.com/t/gg/cython-users/144f36bmre/vector-rcp-const-basic-doesnt-compile
@harakas what exactly is the bottleneck in the current code that your patch improves? I see you introduced your own Dict, and then use it in the subs_dict method.
Is the idea that the current wrapper takes Python dict, and then repeatedly calls subs on each element in the dict, as opposed to first converting the dict to a C++ dictionary (map_basic_basic) and just calling subs_dict on this dictionary?
If so, then the current wrapper is broken, and we need to fix it.
Ondřej Čertík
@certik
Nov 09 2015 22:50
I like the way you did it --- the way I see the fix is that we should allow the user to iteratively construct the C++ dictionary, and then pass it to subs_dict, exactly as you did it. This has the advantage that you already have the C++ dictionary constructed, so you can now call subs_dict several times without an overhead. We should also allow the subs_dict to be called with a Python dictionary, and convert it to a C++ dictionary (i.e. your Dict) first, before calling subs_dict.
Either way, the philosophy is to construct all these data structures in C++, and then just call a C++ method or a function. The Python wrappers need to allow that.
harakas
@harakas
Nov 09 2015 22:55
@certik in my application I'm calling subs() repeatedly with a very large key-value set. as is, the subs() constructs the c++ dictionary every call from scratch. which is obviously very inefficient.
Ondřej Čertík
@certik
Nov 09 2015 22:56
Indeed, I think the solution is to expose a C++ dictionary as a type Dict and allow the subs to take either Python dict or our own Dict, exactly as you did.
Do you want to submit it as a PR? So that we can polish it and get it in.
harakas
@harakas
Nov 09 2015 22:59
I don't know how that works. I've made other changes as well, some of which you may not want, and some of which are very hackish.
Ondřej Čertík
@certik
Nov 09 2015 23:00
In your application, do you construct the Dict once and then not touch it, or do you need to frequently update it?
harakas
@harakas
Nov 09 2015 23:01
I'm updating it frequently. basically adding stuff as I process the equations. and I query it myself as well.
Ondřej Čertík
@certik
Nov 09 2015 23:01
Yes, then your solution looks optimal to me.
The way I see it is that we need to expose the C++ type map_basic_basic into Python.
@harakas just submit code that you think would benefit others as a pull request. And either we can merge it if it looks good, or we can help polish it.
I would probably follow the Python method names for the Dict. Perhaps we should call it DictBasic since both keys and values must be of type Basic.
Also you should provide a method that takes Python dictionary and converts it to DictBasic.
then in subs_dict, you check if the argument is of type DictBasic, if so, use it directly, otherwise convert from a Python dictionary first to DictBasic, then use it.
Let me know what you think of this design.