These are chat archives for elemental/chat

29th
Nov 2016
Ryan H. Lewis
@rhl-
Nov 29 2016 03:09
i've reverted the changes to fix the copr builds. I'm trying to reconcile having one spec file for builds off master and builds off release
github doesn't have consistent archive names
there are different ways to skin the cat, not sure yet of any that are reasonable
i dont want to maintain two spec files..
Ryan H. Lewis
@rhl-
Nov 29 2016 03:27
looking into Elemental on PPC now.
Jack Poulson
@poulson
Nov 29 2016 03:45
I honestly haven't spent the time to look into the COPR builds yet
trying to focus on getting http://libelemental.org/documentation/dev/tour.html and the underlying code in order
Ryan H. Lewis
@rhl-
Nov 29 2016 04:02
@poulson no i just fixed them :)
I am now looking into fixing up the package some more so that I can build the releases/master using the same file
Jack Poulson
@poulson
Nov 29 2016 04:09
ah, nice
Ryan H. Lewis
@rhl-
Nov 29 2016 04:11
i do need to look into why the PPC builds fail
Ryan H. Lewis
@rhl-
Nov 29 2016 05:20
@poulson is it possible that the PPC failures are due to a missing operator<< for MPC/MPFR types ? https://copr-be.cloud.fedoraproject.org/results/rhl/elemental/fedora-rawhide-ppc64le/00482253-elemental/build.log.gz
Ryan H. Lewis
@rhl-
Nov 29 2016 05:25
where does EL_HAVE_MPC get defined
yeah, something is screwy here
-- GMP version .. found in /usr/include, but at least version 6.0.0 is required
-- Could NOT find GMP (missing: GMP_VERSION_OK) (Required is at least version "6.0.0")
DEBUG util.py:421: gmp-c++ ppc64le 1:6.1.1-1.fc25 fedora 30 k
DEBUG util.py:421: gmp-devel ppc64le 1:6.1.1-1.fc25 fedora 185 k
that was what was installed
version 6.1.1
Ryan H. Lewis
@rhl-
Nov 29 2016 05:36
it looks like the headers no longer contain the version strings like you think they do
the version numbers are now defined in #included files which are architecture specific
Ryan H. Lewis
@rhl-
Nov 29 2016 05:53
i think you can use: gcc -dM -E /usr/include/gmp.h | grep __GNU_MP_VERSION to get the versions properly
but, im not sure how to do this in cmake atm
[rhl@a1467cb589ed build]$ gcc -dM -E /usr/include/gmp.h | grep __GNU_MP_VERSION | grep -v RELEASE
#define __GNU_MP_VERSION_PATCHLEVEL 1
#define __GNU_MP_VERSION_MINOR 1
#define __GNU_MP_VERSION 6
Ryan H. Lewis
@rhl-
Nov 29 2016 06:11
-  file(READ "${GMP_INCLUDES}/gmp.h" _gmp_version_header)
+  execute_process( COMMAND ${CMAKE_CXX_COMPILER} -dM -E "${GMP_INCLUDES}/gmp.h"
+                  COMMAND grep __GNU_MP_VERSION 
+                  COMMAND grep -v RELEASE
+                  OUTPUT_VARIABLE _gmp_version_header)
seems to fix it
Jack Poulson
@poulson
Nov 29 2016 06:40
The more portable approach seems to be a call to mpfr_version: http://stackoverflow.com/questions/7469182/how-to-check-the-version-of-gmp-mpfr-and-camlidl
I mean, mpfr_get_version
Ryan H. Lewis
@rhl-
Nov 29 2016 07:11
i dont see that binary
Ryan H. Lewis
@rhl-
Nov 29 2016 07:25
it looks like my solution captures gcc, clang, and icc
at least
Jack Poulson
@poulson
Nov 29 2016 15:30
It is a function
There is also a __gmp_version charactee string
It is important that we grab more than just the major version number
it seems Visual Studio does not provide a way to query the preprocessor directives
or, at least, didn't circa 2011
Jack Poulson
@poulson
Nov 29 2016 16:22
@rhl- What makes you think the PPC failure is an operator << issue?
also, why did you merge?
Ryan H. Lewis
@rhl-
Nov 29 2016 16:23
Sorry! I'm trying to see if this fixes the issues with PPC
It seems related to ostream with MPC objects
Jack Poulson
@poulson
Nov 29 2016 16:23
I looked through the logs but didn't notice that
is there a particular piece of evidence for it?
Ryan H. Lewis
@rhl-
Nov 29 2016 16:24
Well just that all the logs print that MPC is misconfigured
But that doesn't seem to print on other architectures
Jack Poulson
@poulson
Nov 29 2016 16:24
ah
Ryan H. Lewis
@rhl-
Nov 29 2016 16:24
I noticed the version is no longer properly read
So I thought that could fix it
Jack Poulson
@poulson
Nov 29 2016 16:25
can you not build from a fork?
Ryan H. Lewis
@rhl-
Nov 29 2016 16:25
Nope
Jack Poulson
@poulson
Nov 29 2016 16:25
your PR broke the minor version detection
Ryan H. Lewis
@rhl-
Nov 29 2016 16:25
Ugh. Really?
Ok I'll revert
Jack Poulson
@poulson
Nov 29 2016 16:26
I'm not positive
but I thought it was only checking the major version
maybe it is fine
haven't had time to check yet...
Ryan H. Lewis
@rhl-
Nov 29 2016 16:26
I reverted it
No it checks all the versions
Jack Poulson
@poulson
Nov 29 2016 16:27
it definitely breaks xlc, pgcc, and MSVC
it would be nice to not break MSVC
Ryan H. Lewis
@rhl-
Nov 29 2016 16:27
Yeah we need CI for all of that
Jack Poulson
@poulson
Nov 29 2016 16:27
MSVC isn't completely functional ATM but it would be nice to not throw up more roadblocks
Ryan H. Lewis
@rhl-
Nov 29 2016 16:30
maybe we just shouldn’t enforce the GMP version
Jack Poulson
@poulson
Nov 29 2016 16:30
so GMP/MPFR/MPC should not at all be effecting the PPC failures, as the fact that GMP wasn't properly detected means that no GMP/MPFR/MPC is used in the tests
Ryan H. Lewis
@rhl-
Nov 29 2016 16:30
it looks like it may be getting used
Jack Poulson
@poulson
Nov 29 2016 16:30
the EL_HAVE_MPC directive shouldn't end up defined and the El::BigFloat tests shouldn't run
Ryan H. Lewis
@rhl-
Nov 29 2016 16:31
let me find the bit of the log
unrelated: 72: ERROR (qd_real::log): Non-positive argument <— shows up all over the place
Jack Poulson
@poulson
Nov 29 2016 16:31
QD is not related to GMP
Ryan H. Lewis
@rhl-
Nov 29 2016 16:32
oh, hm. thats what I thought it was
ok, bummer
well, GMP is still not being properly detected
Jack Poulson
@poulson
Nov 29 2016 16:32
QD is David Bailey et al.'s double-double and quad-double package
GMP is arbitrary-precision
Ryan H. Lewis
@rhl-
Nov 29 2016 16:32
right
Jack Poulson
@poulson
Nov 29 2016 16:33
QD is kind of fiddley
it's likely a QD issue on PPC
Ryan H. Lewis
@rhl-
Nov 29 2016 16:33
is there an upstream github or something
bug page
Jack Poulson
@poulson
Nov 29 2016 16:34
LOL
it is currently just hung off of Bailey's page
Ryan H. Lewis
@rhl-
Nov 29 2016 16:34
take that as no
Jack Poulson
@poulson
Nov 29 2016 16:35
but there should be one
I might have forked it on GitHub
I'm pretty sure no one is actively maintaining it, but perhaps I'm not giving Bailey enough credit
the QD readme seems to explicitly discuss PPC
so it must have worked at some point
Ryan H. Lewis
@rhl-
Nov 29 2016 16:37
ive requested from someone in the fedora community for access to a PPC machine
Jack Poulson
@poulson
Nov 29 2016 16:37
Ryan H. Lewis
@rhl-
Nov 29 2016 16:38
ostream &operator<<(ostream &os, const qd_real &qd) {
  bool showpos = (os.flags() & ios_base::showpos) != 0;
  bool uppercase = (os.flags() & ios_base::uppercase) != 0;
  return os << qd.to_string(os.precision(), os.width(), os.flags(), 
      showpos, uppercase, os.fill());
}
looks questionable
it should be os << __ ; return os;
also the os << ; needs to be separated out
like os << f(x) is probably ambiguous.
the latter is almost certainly the problem
Jack Poulson
@poulson
Nov 29 2016 16:41
huh? ostream::operator= returns an ostream: http://en.cppreference.com/w/cpp/io/basic_ostream/operator_ltlt
Ryan H. Lewis
@rhl-
Nov 29 2016 16:41
there are issues with order of operations and the operator << when calling functions
maybe I didn’t provide the right link
Jack Poulson
@poulson
Nov 29 2016 16:42
the return definitely happens after everything else
Ryan H. Lewis
@rhl-
Nov 29 2016 16:42
yeah the return is probably o
k
but
the input to os << may not be what is expected
Jack Poulson
@poulson
Nov 29 2016 16:43
what I meant by "QD is fiddley" is that it relies on precise floating-point properties of double
that only has to do with printing
Ryan H. Lewis
@rhl-
Nov 29 2016 16:44
yeah, well, the issue we are having is with the ostream operators
says the stacktrace
Jack Poulson
@poulson
Nov 29 2016 16:44
how do you know that?
ah
isn't the qd_real::log issue unrelated?
Ryan H. Lewis
@rhl-
Nov 29 2016 16:44
yes, i said it was unrealted
unrelated
i mean, i dont know, maybe qd is totally f’d on PPC
Jack Poulson
@poulson
Nov 29 2016 16:53
is there a way to just disable the PPC build in the mean time?
or try to disable QD in the PPC build and see if everything goes through?
Ryan H. Lewis
@rhl-
Nov 29 2016 17:02
uh, I can try to disable QD for PPC
i asked about the PPC, essentially its encouraged to fix it
i have to file bugs and stuff against Elemental if PPC fails
and maybe jump through more hoops
I think it would be better to fix it now
we could just disable QD everywhere
and see if it fixes it
Ryan H. Lewis
@rhl-
Nov 29 2016 17:09
@poulson separately, im an advocate of not checking the GMP version
and just taking whats there
since its too complicated to properly check the version string.
Jack Poulson
@poulson
Nov 29 2016 17:10
this is honestly pretty standard fare for build systems
Ryan H. Lewis
@rhl-
Nov 29 2016 17:11
checking the version?
Jack Poulson
@poulson
Nov 29 2016 17:11
the autoconf approach is to check for a particular routine that only exists in the sufficiently new version
which is pretty easy
Ryan H. Lewis
@rhl-
Nov 29 2016 17:11
sure, thats fine
but, you are grepping a header file
Jack Poulson
@poulson
Nov 29 2016 17:11
yes, assuming that works then it is fine
I was not aware of them changing their approach
have to run now, but the change should have been documented in a release
any idea what release it changed?
Ryan H. Lewis
@rhl-
Nov 29 2016 17:13
looks like between 6.0.0 (required) and 6.1.1
Ryan H. Lewis
@rhl-
Nov 29 2016 19:37
the folks at fedora-ppc hooked me up with a fedora ppc little endian VM
they also offered a big endian VM
Ryan H. Lewis
@rhl-
Nov 29 2016 20:58
ok and finally a stack trace
(gdb) bt
#0  0x00003fffb553d554 in __memcpy_power7 () from /lib64/libc.so.6
#1  0x00003fffb583a5f4 in std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long) () from /lib64/libstdc++.so.6
#2  0x00003fffb5825cd4 in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) ()
   from /lib64/libstdc++.so.6
#3  0x000000002000ea4c in std::operator<< <std::char_traits<char> > (__s=0x2001eea8 "  runtime: ", __out=...) at /usr/include/c++/6.2.1/ostream:561
#4  El::BuildStream<char [12], double, char [9]> (item=..., os=...) at /home/fedora/rpmbuild/BUILD/Elemental-master/include/El/core/environment/impl.hpp:167
#5  El::Output<char [12], double, char [9]> () at /home/fedora/rpmbuild/BUILD/Elemental-master/include/El/core/environment/impl.hpp:227
#6  main (argc=<optimized out>, argv=<optimized out>) at /home/fedora/rpmbuild/BUILD/Elemental-master/examples/number_theory/ZDependenceSearch.cpp:86
weird
Output(" runtime: ",runtime," seconds");
Ryan H. Lewis
@rhl-
Nov 29 2016 21:27
oh I see
wow what a bug
Ryan H. Lewis
@rhl-
Nov 29 2016 21:38
hm. cant reproduce this with a small test case. looks like it may be some kind of subtle bug in ZDependenceSearch causing some memory corruption
Ryan H. Lewis
@rhl-
Nov 29 2016 22:01
here is the output of valgrind: https://paste.fedoraproject.org/493465/14804569/
Ryan H. Lewis
@rhl-
Nov 29 2016 22:38
I replaced all calls to Output(_) with std::cout in ZDependenceSearch save one actually, and there is no longer a segmentation fault
rerunning valgrind now.
the output of valgrind is now: https://paste.fedoraproject.org/493505/59216148/
Ryan H. Lewis
@rhl-
Nov 29 2016 22:49
i cant seem to produce a minimal test case. They all work.
ugh.