These are chat archives for dereneaton/ipyrad

18th
Sep 2017
Wind-ant
@Wind-ant
Sep 18 2017 02:22
Hi,my qsub script is like following:
/GS01/software/bin/mpirun --mca btl openib,self\
ipcluster start --n8 --engines=MPI --ip=* --daemonize
ipyrad -p params-c70m80.txt -s 67 -c 8 -d -f
export TMPDIR=/GS01/home/chej2td/Dataset/
ipcluster stop
However the process is very low speed and I don’t know why,do you know how to speed up the process except incresing the nodes or cpu?
bbarker505
@bbarker505
Sep 18 2017 02:55
@dereneaton Thanks Deren - Apparently I was not setting the cleaning parameters appropriately before - the program has moved on to clustering so I think all is good!
tommydevitt
@tommydevitt
Sep 18 2017 03:48

@dereneaton @isaacovercast Hey guys, I'm still trying to use ipcluster on a multi-node MPI setup. When I try to connect to an ipyparallel cluster, I get

TimeoutErrorTraceback (most recent call last)
<ipython-input-10-2db68591f40a> in <module>()
      4 
      5 ## connect to the client
----> 6 ipyclient = ipp.Client(profile="MPI96")
      7 
      8 ## print how many engines are connected

/home1/02745/tdevitt/miniconda2/lib/python2.7/site-packages/ipyparallel/client/client.pyc in __init__(self, url_file, profile, profile_dir, ipython_dir, context, debug, sshserver, sshkey, password, paramiko, timeout, cluster_id, **extra_args)
    493 
    494         try:
--> 495             self._connect(sshserver, ssh_kwargs, timeout)
    496         except:
    497             self.close(linger=0)

/home1/02745/tdevitt/miniconda2/lib/python2.7/site-packages/ipyparallel/client/client.pyc in _connect(self, sshserver, ssh_kwargs, timeout)
    613         evts = poller.poll(timeout*1000)
    614         if not evts:
--> 615             raise error.TimeoutError("Hub connection request timed out")
    616         idents, msg = self.session.recv(self._query_socket, mode=0)
    617         if self.debug:

TimeoutError: Hub connection request timed out

Any ideas?

arminf
@arminf82
Sep 18 2017 08:40
Hello Pyrads ;-). Regarding a closed issue from Github (https://github.com/dereneaton/ipyrad/issues/266). There was a mistake in my question. Actually, I was talking about "unlinked SNPs" not unlinked Loci. In RAD-data, of course we assume that loci are unlinked. In some loci I get several SNPs, that are not unlinked by nature. Is there a possibility to implement a function, where the user can choose how to filter the final dataset individually? E.g. for SplitsTree I would like to compare the "normal" dataset with one with unlinked SNPs. That’s why I try to transform the unlinked Loci structure file into nexus format.
Isaac Overcast
@isaacovercast
Sep 18 2017 17:22
@Wind-ant What exactly do you mean by "low speed"? The amount of time each step takes is directly related to the number of cores and the amount of RAM you allocate, as well as the number of samples, and the number and length of loci. There is really nothing you can do to speed up an assembly besides allocate more resources. Step 6 will go slow if you don't have enough ram Allocated, which we normally recommend as 4GB per core.
@tommydevitt Sometimes you have to wait a while (1-2 minutes) before the cluster spins up fully. How are you launching the ipcluster? Can you get it to work locally if you run it in non-mpi mode?
Isaac Overcast
@isaacovercast
Sep 18 2017 17:28
@bioballs We already write out 'unlinked' snps files for phylip, geno and structure formats. All the .u. file formats contain one random snp per locus. http://ipyrad.readthedocs.io/output_formats.html#full-output-formats
I see that SplitsTree is capable of reading phylip files so you can just use the *.u.phy file that is written.
Deren Eaton
@dereneaton
Sep 18 2017 21:20
Hey @Wind-ant, I think that it is probably running abnormally due to the command that you used. Neither ipyrad or ipcluster should be called with a mpirun command. Instead, ipcluster itself will attempt to make the appropriate MPI call when you tell it --engines=MPI to start engines across available cores using MPI.
Deren Eaton
@dereneaton
Sep 18 2017 21:27

@Wind-ant Also, unless you tell ipyrad to use the launched ipcluster instance with --ipcluster it will not look for it and instead attempt to automatically launch its own ipcluster instance.
A simpler way to run ipyrad across 8 cores is with the following command:

ipyrad -p params-c70m80.txt -s 67 -f

Or, if the ipcluster autolaunch is not working, then launch it yourself like this:

ipcluster start --n=8 --daemonize
sleep 10
ipyrad -p params-c70m80.txt -s 67 -f --ipcluster

The way you were running I'm guessing that it may have started many ipcluster instances all at once with the same name/profile, and that might have confused things and caused everything to totally bog down.