These are chat archives for dereneaton/ipyrad

16th
Feb 2018
Jean-Rémi Trotta
@jrtrottablanc
Feb 16 2018 13:50

Hi! I'm currently running a GBS analysis splited into two independent assemblies: 1) denovo-reference 2) reference. The reference consists of chloroplast sequence. I have been able to succesfully run the denovo-reference assembly, but the reference only method stucks at step 6:

  Step 3: Clustering/Mapping reads
  [####################] 100%  indexing reference    | 0:00:01  
  [####################] 100%  dereplicating         | 0:00:37  
  [####################] 100%  mapping               | 0:01:01  
  [####################] 100%  fetch mapped reads    | 0:00:01  
  [####################] 100%  chunking              | 0:00:00  
  [####################] 100%  aligning              | 0:00:34  
  [####################] 100%  concatenating         | 0:00:00  

  Step 4: Joint estimation of error rate and heterozygosity
  [####################] 100%  inferring [H, E]      | 0:00:01  

  Step 5: Consensus base calling 
  Mean error  [0.00195 sd=0.00075]
  Mean hetero [0.00928 sd=0.00740]
  [####################] 100%  calculating depths    | 0:00:00  
  [####################] 100%  chunking clusters     | 0:00:00  
  [####################] 100%  consens calling       | 0:00:00  

  Step 6: Clustering at 0.8 similarity across 4 samples
  [####################] 100%  concat/shuffle input  | 0:00:00  
  [                    ]   0%  clustering across     | 1 day, 0:00:17

Below the metrics I got so far:

Summary stats of Assembly chloroplast
------------------------------------------------
        state  reads_raw  reads_passed_filter  refseq_mapped_reads  \
AD1697      5     805763               801554                 4841   
AD1698      5     877755               873136                 8356   
AD1699      5     883672               879561                 5762   
AD1700      5     515499               513218                 5946   

        refseq_unmapped_reads  clusters_total  clusters_hidepth  hetero_est  \
AD1697                 553120              18                16    0.000023   
AD1698                 639097              20                10    0.007354   
AD1699                 663072              16                12    0.017384   
AD1700                 393378              16                12    0.012343   

        error_est  reads_consens  
AD1697   0.001107             16  
AD1698   0.002816              9  
AD1699   0.002267             11  
AD1700   0.001608             10

Is it because of the really small number of clusters retrieved? Is this an expected behavior with such low amount of data?
Thanks for your help!

dionyes
@dionyes
Feb 16 2018 15:40
@dereneaton *also wanted to add that I'm using Ubunto 16.04 (Xenial)
Eaton Lab
@eaton-lab
Feb 16 2018 16:19
@dionyes Is this your linux box, or is it an HPC system? If the latter then perhaps they have blocked anaconda for some reason (which would be crazy, but one of the only ideas I can think of so far). You are able to google but you cannot follow links to the ipyrad docs? Are you visiting here: http://ipyrad.readthedocs.io/ ? And it sounds like you got conda installed but when you search for packages it says that it is not connected to the internet. Seems like a network issue to me, but I have no idea why it would affect some links and not others.
Hi @jrtrottablanc , no this is not expected behavior, it should go super fast for your data set. I think it may be hitting a bug when the number of reads is so low. We'll look into it.
Glib Mazepa
@mazepago_twitter
Feb 16 2018 17:07
The data are paired end demultiplexed already, the naming is like following: P_susanus_Iran_ERP_5156_unassembledR2.fastq and the barcodes file is like that (contains only 3 first indivs, I was using earlier versions of ipyrad in a similar way): IR_SL_174_unassembled TAGACCG
IR_SL_175_unassembled TTGACA
IR_SL_176_unassembled TTGACG
dionyes
@dionyes
Feb 16 2018 18:19
@eaton-lab Hiya, so I am using a linux box. If I type http://ipyrad.readthedocs.io/ it just spins and then says connection time out, and on google there is a display of other sections of the website and whichever one I click also doesn't go through. If I try to download miniconda2 from the terminal it has a connection failure. I did download anaconda from the internet, but I am unable to run the conda update conda part as well as trying to run the conda install ipyrad commands.
dionyes
@dionyes
Feb 16 2018 19:03
@eaton-lab hiya! So I just added an s to the http url on the browser and it opened the website, But in the terminal with or without the “ s” it still says connection failure
dionyes
@dionyes
Feb 16 2018 20:49

@eaton-lab Hey so this is the error for conda " CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://repo.continuum.io/pkgs/main/linux-64/repodata.json.bz2
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='repo.continuum.io', port=443): Max retries exceeded with url: /pkgs/main/linux-64/repodata.json.bz2 (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f53ed636550>, 'Connection to 192.168.1.2 timed out. (connect timeout=9.15)'))",),)
"