These are chat archives for dereneaton/ipyrad

13th
Nov 2017
tommydevitt
@tommydevitt
Nov 13 2017 03:16 UTC
@isaacovercast Yeah, tried it a few times with the same result
Deren Eaton
@dereneaton
Nov 13 2017 14:53 UTC
I'm also guessing github service was down temporarily
tommydevitt
@tommydevitt
Nov 13 2017 15:35 UTC
just tried again, same result.
tommydevitt
@tommydevitt
Nov 13 2017 15:46 UTC
some kind of memory issue?
Deren Eaton
@dereneaton
Nov 13 2017 19:03 UTC
hey @tommydevitt , I see, yeah it seems to be a git issue (https://stackoverflow.com/questions/9905257/git-push-fatal-unable-to-create-thread-resource-temporarily-unavailable/9905822). Is this happening on a normal laptop/workstation or on a HPC cluster? Maybe you can try setting the number of threads used by git to be 1 and try again. Are you able to clone other repositories?
The ipyrad repository is kind of big because we keep a lot of test notebooks saved in it, but it's not abnormally large by any means, so I wouldn't think this is failing due to something particular about ipyrad.
Ninh Vu
@NinhVu
Nov 13 2017 19:26 UTC
Installed ipyrad via conda and attempted to generate a parameter text file. I got the error below. I updated ipyrad version to latest 0.6.27, but no luck. I don't have a good guess to the problem. I suspect there is something about my ubuntu setup. efglserv@eglseqserv:/media/efglserv/radspace/lakeTrout/ipyRADtest$ ipyrad -p testin123_params.txt
Traceback (most recent call last):
File "/home/efglserv/anaconda3/bin/ipyrad", line 6, in <module>
from pkg_resources import load_entry_point
File "/home/efglserv/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 3138, in <module>
@_call_aside
File "/home/efglserv/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 3122, in _call_aside
f(args, *kwargs)
File "/home/efglserv/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 3151, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/home/efglserv/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 664, in _build_master
ws.require(requires)
File "/home/efglserv/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 981, in require
needed = self.resolve(parse_requirements(requirements))
File "/home/efglserv/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 867, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'ipyrad==0.6.27' distribution was not found and is required by the application
Deren Eaton
@dereneaton
Nov 13 2017 19:27 UTC
Hi @NinhVu, you are trying to install ipyrad into a Python 3 environment, but it is currently only supported by Python 2. You can fix this by installing a new Python 2 anaconda environment, and then installing ipyrad into that.
tommydevitt
@tommydevitt
Nov 13 2017 19:33 UTC

Thanks @dereneaton. One of our HPC system admins was able to clone it and put it in my directory. Not sure why I was unable. But when I tried

conda install --use-local bpp

I got

Fetching package metadata ....
WARNING: The remote server could not find the noarch directory for the
requested channel with url: file:///home1/02745/tdevitt/miniconda2/conda-bld

It is possible you have given conda an invalid channel. Please double-check
your conda configuration using `conda config --show`.

If the requested url is in fact a valid conda channel, please request that the
channel administrator create `noarch/repodata.json` and associated
`noarch/repodata.json.bz2` files, even if `noarch/repodata.json` is empty.
$ mkdir noarch
$ echo '{}' > noarch/repodata.json
$ bzip2 -k noarch/repodata.json
...............

PackageNotFoundError: Packages missing in current channels:

  - bpp

We have searched for the packages in the following channels:

  - file:///home1/02745/tdevitt/miniconda2/conda-bld/linux-64
  - file:///home1/02745/tdevitt/miniconda2/conda-bld/noarch
  - https://conda.anaconda.org/bioconda/linux-64
  - https://conda.anaconda.org/bioconda/noarch
  - https://conda.anaconda.org/conda-forge/linux-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.continuum.io/pkgs/main/linux-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/linux-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/linux-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/linux-64
  - https://repo.continuum.io/pkgs/pro/noarch
  - https://conda.anaconda.org/r/linux-64
  - https://conda.anaconda.org/r/noarch
Ninh Vu
@NinhVu
Nov 13 2017 19:35 UTC
@dereneaton Thank you Deren!
Deren Eaton
@dereneaton
Nov 13 2017 19:40 UTC
nvm
hmm
The bpp recipe in the ipyrad channel does not ship a compiled binary, but compiles on the system with the following commands:
gcc -o bpp -O3 bpp.c tools.c -lm
gcc -o bpp_avx -O3 -DUSE_AVX -mavx bpp.c tools.c -lm
gcc -o MCcoal -DSIMULATION bpp.c tools.c -lm
In the README file of the v4.0 release there is some more information about compiling...
which is the following:

(1) To compile, try one of the following

   Linux/UNIX gcc compiler:
      gcc -o bpp -O3 bpp.c tools.c -lm
      gcc -o MCcoal -DSIMULATION bpp.c tools.c -lm
      gcc -o bpp_sse -O3 -DUSE_SSE -msse3 bpp.c tools.c -lm
      gcc -o bpp_avx -O3 -DUSE_AVX -mavx bpp.c tools.c -lm

      gcc -o bpp -O3 bpp.c tools.c -lm
      gcc -o MCcoal -DSIMULATION bpp.c tools.c -lm

   INTEL icc compiler:
      icc -o bpp -fast bpp.c tools.c -lm
      icc -o MCcoal -DSIMULATION -fast bpp.c tools.c -lm

   MAC OSX intel:
      cc -o bpp -O3 bpp.c tools.c -lm
      cc -o MCcoal -DSIMULATION -O3 bpp.c tools.c -lm
Deren Eaton
@dereneaton
Nov 13 2017 19:45 UTC
I'm not sure if the different compilation methods would matter though, if it isn't compatible with an old version of GLIBC, and won't compile, then that's a bpp problem, I suppose.
it might be easier to try downloading the source code and installing locally than to try to install it with the existing conda recipe.
Or, if there a system-compiled version on your cluster then you could just load that module and then still use the ipyrad.analysis tools. It just needs to be able to find a functioning bpp binary.
tommydevitt
@tommydevitt
Nov 13 2017 19:52 UTC
@dereneaton ok, I've installed bpp locally instead of with the existing conda recipe. I could compile bpp3.3 but not bpp4.0.
Deren Eaton
@dereneaton
Nov 13 2017 19:53 UTC
that's good, because only 3.3 is compatible with the ipyrad tools, now that I look at it. They changed the params file format, which will require some tweaking.
tommydevitt
@tommydevitt
Nov 13 2017 19:53 UTC
@dereneaton OK, great.
tommydevitt
@tommydevitt
Nov 13 2017 20:23 UTC

@dereneaton And now, as I try to execute the jupyter notebook I created previously, I'm getting new errors where I didn't before.

conda install -c eaton-lab toytree

File "<ipython-input-7-6d8e63ed4afb>", line 3
    conda install eaton-lab toytree
                ^
SyntaxError: invalid syntax

This is after reinstalling jupyter via 'pip install jupyter'

Deren Eaton
@dereneaton
Nov 13 2017 21:44 UTC
Looks like you are typing a bash command into an ipython terminal.
type exit to leave ipython, and then enter the command.
or you can enter ! conda install toytree -c eaton-lab in ipython. The ! symbol will run the executed line of code as a bash line.
Ninh Vu
@NinhVu
Nov 13 2017 23:25 UTC
Step 1 is slow. I have 20 sorted fastq files totaling ~335 GB. illumina title line is preserved for each read in each fastq file. I have 56 cores of 'host compute node' and is running ipyrad in virtual python environment (Python 2.7). Half an hour pass and I'm only seeing %5 loading reads. At this rate, it will be an overnight procedure. Am I impatience? I'm not seeing where data is being loaded. It shows that a lot of cores/threads are being used, but at less than 1% of CPU.
Deren Eaton
@dereneaton
Nov 13 2017 23:36 UTC
Hi @NinhVu, if you are using the sorted_fastq entry then it is assumed that your data are already demultiplexed. Is that the case for your data? If so, all that step 1 is doing is reading the number of reads in each file. It is typically quite fast. For example, I just ran a data set with 1.3 billion reads from 102 sorted data files in ~13 minutes on a 40 core machine. It is typically limited only by the speed at which data can be read from the disk. If it is running much slower for you, then it may be that the program is not properly connecting to all the available cores. Are you running ipyrad across multiple nodes with the --MPI flag, or just on a single node/machine?