by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    dmyang93
    @dmyang93
    cut_up_fasta.py GIST_INU_1_megahit_out/final.contigs.fa -c 10000 -o 0 --merge_last -b GIST_INU_1_10k.bed > GIST_INU_1_contigs_10k.fa
    concoct_coverage_table.py GIST_INU_1_10k.bed GIST_INU_1.bam > coverage.tsv
    Those are my command, Is it any problematic?
    Johannes Alneberg
    @alneberg
    No, the commands look good
    what happens if you run: samtools bedcov GIST_INU_1_10k.bed GIST_INU_1.bam | head?
    dmyang93
    @dmyang93
    same stdout in upper message shows, except for error message part
    Johannes Alneberg
    @alneberg
    Ok, so you still get the Errors in BED line 'k141_107977 0 275 k141_107977' lines?
    dmyang93
    @dmyang93
    yes,
    ...
    Errors in BED line 'k141_107981    0    283    k141_107981'
    Errors in BED line 'k141_107982    0    283    k141_107982'
    Errors in BED line 'k141_107983    0    339    k141_107983'
    Errors in BED line 'k141_107985    0    308    k141_107985'
    Errors in BED line 'k141_107986    0    363    k141_107986'
    Errors in BED line 'k141_107987    0    301    k141_107987'
    Errors in BED line 'k141_107988    0    301    k141_107988'
    Errors in BED line 'k141_107989    0    1477    k141_107989'
    Errors in BED line 'k141_107990    0    288    k141_107990'
    Errors in BED line 'k141_107991    0    370    k141_107991'
    Errors in BED line 'k141_107992    0    265    k141_107992'
    Errors in BED line 'k141_107993    0    618    k141_107993'
    Errors in BED line 'k141_107995    0    487    k141_107995'
    Errors in BED line 'k141_107996    0    289    k141_107996'
    Johannes Alneberg
    @alneberg
    What version of samtools do you have? samtools --version
    dmyang93
    @dmyang93
    samtools 1.3 Using htslib 1.3 Copyright (C) 2015 Genome Research Ltd.
    Johannes Alneberg
    @alneberg
    cat you post the first ten lines of the .bed file here? head GIST_INU_1_10k.bed
    dmyang93
    @dmyang93
    k141_3 0 340 k141_3
    k141_4 0 283 k141_4
    k141_5 0 254 k141_5
    k141_6 0 283 k141_6
    k141_7 0 349 k141_7
    k141_8 0 298 k141_8
    k141_9 0 489 k141_9
    k141_10 0 337 k141_10
    k141_11 0 283 k141_11
    k141_12 0 654 k141_12
    Johannes Alneberg
    @alneberg
    Trying to reproduce this error. Please be with me
    I'm afraid I'm not able to reproduce this...
    Johannes Alneberg
    @alneberg
    Do you have access to a newer version of samtools? I would even recommend you installing concoct in a conda environment so that you will get the dependencies correct right away (hopefully)
    dmyang93
    @dmyang93
    ok, im not in office, so i'll do that later and promise to report results to you.
    Johannes Alneberg
    @alneberg
    Ok, yes please do so. This error does seem weird.
    dmyang93
    @dmyang93
    I reinstalled concoct in conda environment, but same error show again.
    Running concoct, users should use sorted bam file? I just used bam file, not sorted. I've made bam file by converting sorted sam file, and not additionally sorting after getting bam file.
    dmyang93
    @dmyang93
    I solved it!! I checked error part of coverage file(k141_107981, k141_107982, k141_107983, etc. as i sent upper chat) , then I found all of them have zero coverage in coverage file. So I removed those lines with zero coverage, and removed those contigs from the assembly file too.(They are almost short contigs). Then, CONCOCT runs well and make results!! However, I have a question yet. The synthetic data making proper reuslts by CONCOCT have zero coverage part in the coverage file, too. But they used CONCOCT well. As a result, I solved problem, but I don't know what is a fundamental cause of problem. So, if you wants, I'll help you fix the CONCOCT to be applied to any data. I wonder the reason, as much as you, and I solve the fundamental problem, too.
    dmyang93
    @dmyang93
    We should think about this problem more... because I succeeded binning of one experimental data, but I failed to other experimental data, although I removed zero coverage part by using my python script. I think, I'll check my python script make something wrong coverage_nonzero file. Then I'll report the result.
    Johannes Alneberg
    @alneberg
    Hmm, this is still a mystery to me. Did you get the samtools bedcov GIST_INU_1_10k.bed GIST_INU_1.bam | head command to work?
    dmyang93
    @dmyang93
    There is no change in the result from that command . Because I edited coverage file, not bam file or bed file. So same stdout 'Errors in BED line 'k141~' ~' is shown. Also, I found a condition under which concoct run properly, which is contig names in the assembly file and coverage file shouldn't have a space(' '). My experimental data's contig names have spaces(ex. k141_3 flag=1 multi=3.0000 len=340), but synthetic data not(ex. NODE_1_length_220433_cov_306.047028). I changed to my experimental data's contig names without spaces(ex. k141_3_flag=1_multi=3.0000_len=340), then concoct run well!. And I have seen in other data that this condition works well. So, I think your concoct algorithm doesn't consider spaces in contig names. How do you think?? Additionally, the test without zero coverage is wrong test, because when I tested that condition, I removed spaces in the contig names, without considering space as important.
    Johannes Alneberg
    @alneberg
    Yes that's correct. The coverage file should not have spaces in the contig ids. Those arose since you used a different script to generate the coverage file than the one supplied with concoct. Therefore it would be best to get the Bed file error fixed... What system are you running this on? Linux, mac?
    dmyang93
    @dmyang93
    I used Linux(Ubuntu).
    Johannes Alneberg
    @alneberg
    Ok, good. Lets take this from the beginning then. Can you run: which python and conda listfor me
    dmyang93
    @dmyang93
    /usr/bin/python
    conda list shows so many packages. Is there any way to upload file?
    Johannes Alneberg
    @alneberg
    So if which python
    It seems that you're not using the conda environment
    Did you activate it?
    conda info -e should show you your environments
    dmyang93
    @dmyang93
    ddocent_env /home/dongmin/.conda/envs/ddocent_env base * /home/dongmin/miniconda3 cocacola_env /home/dongmin/miniconda3/envs/cocacola_env concoct_env /home/dongmin/miniconda3/envs/concoct_env ddocent_env /home/dongmin/miniconda3/envs/ddocent_env gimme /home/dongmin/miniconda3/envs/gimme qiime2-2018.6 /home/dongmin/miniconda3/envs/qiime2-2018.6
    Johannes Alneberg
    @alneberg
    Right, so if you do source activate concoct_env you should get a different value for which python?
    Danny Ionescu
    @ionescu_danny_twitter
    Hi. I am trying to use concoct. I am getting errors running the script concoct_coverage_table.py. If I run this in a python 2.7 environment (concoct) I am getting the following error:dionescu@allegro:/data/scratch/dionescu/IK/data/scratch/dionescu/IK/Concoct_binning$ concoct_coverage_table.py contigs_10k.bed mapping/issyk_kul_megahit.contigs.fa.IS.bam.bam > coverage_table.tsv
    Traceback (most recent call last):
    File "/home/dionescu/miniconda3/envs/concoct/bin/concoct_coverage_table.py", line 18, in <module>
    import pandas as pd
    File "/home/dionescu/.local/lib/python2.7/site-packages/pandas/init.py", line 35, in <module>
    "the C extensions first.".format(module))
    ImportError: C extension: /home/dionescu/.local/lib/python2.7/site-packages/pandas/_libs/tslibs/conversion.so: undefined symbol: PyFPE_jbuf not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.
    (concoct) dionescu@allegro:/data/scratch/dionescu/IK/data/scratch/dionescu/IK/Concoct_binning$ pip install --upgrade pandas
    If I am running it in a python 3.6.7 environment then the error is different: (Concoct) dionescu@allegro:/data/scratch/dionescu/IK/data/scratch/dionescu/IK/Concoct_binning$ concoct_coverage_table.py contigs_10k.bed mapping/issyk_kul_megahit.contigs.fa.IS
    .bam.bam > c overage_table.csv
    ERROR: fail to open index BAM file 'mapping/issyk_kul_megahit.contigs.fa.IS10_S3.bam.bam'
    Traceback (most recent call last):
    File "/home/dionescu/CONCOCT/scripts/concoct_coverage_table.py", line 77, in <module>
    generate_input_table(args.bedfile, args.bamfiles, samplenames=samplenames)
    File "/home/dionescu/CONCOCT/scripts/concoct_coverage_table.py", line 28, in generate_input_table
    sys.stderr.write(out)
    TypeError: write() argument must be str, not bytes
    Please advise on how to solve this.
    klotzor
    @klotzor

    Hello there,
    I ran a de novo assembly of metagenomics ngs data with MEGAHIT and afterwards I followed the CONCOCT tutorial. When running

    concoct_coverage_table.py final.contigs_10K.bed ~/*sorted.bam > coverage_table.tsv

    I get an error message for each line in the bed file:
    ...
    Errors in BED line 'k141_25554 0 324 k141_25554.0'
    Errors in BED line 'k141_25555 0 377 k141_25555.0'
    Errors in BED line 'k141_25556 0 380 k141_25556.0'
    Errors in BED line 'k141_25557 0 623 k141_25557.0'
    Errors in BED line 'k141_25558 0 350 k141_25558.0'
    Errors in BED line 'k141_25559 0 301 k141_25559.0'
    ...

    I only have one sample. Maybe that is causing an exception?

    Any help would be appreciated. Thank you.

    klotzor
    @klotzor

    I just realized that this error came up before. My 'which python' output is

    /home/falker/miniconda2/bin/python

    So I assume I am using the conda enviroment. Any other suggestions @alneberg ?

    klotzor
    @klotzor

    I got the conda enviroment right (I guess) but now I get the error

    pkg_resources.DistributionNotFound: The 'concoct==1.0.0' distribution was not found and is required by the application

    That makes sense because my concoct version is 0.4.1

    How do I install version 1.0.0 with conda?

    Johannes Alneberg
    @alneberg
    @klotzor, Concoct version 1.0 should be installable by conda right away. What does conda install concoct give you?
    On the other hand, if you only have one sample, I'm not sure if it's worth the effort to run CONCOCT, it really isn't that efficient for just one sample
    klotzor
    @klotzor

    @alneberg I got it running in my normal enviroment. I had to switch to python 3, install scikit-learn with pip and now it is currently running (I can see the samtools process).

    It is one sample containing an unknown number of bacterial species (most likely also undescribed ones). So I can skip concoct for my analysis?

    My goal is to get the different species into bins using 5 different binning tools to finally merge them all with DAS tools to high confidence bins.
    Johannes Alneberg
    @alneberg
    How complex is the sample, in terms of expected number of species?
    klotzor
    @klotzor
    That's hard to tell. Blast gave us about 30 known species, maxbin2 produced 17 bins
    klotzor
    @klotzor
    @alneberg abawaca just finished and returned 10 clusters
    Johannes Alneberg
    @alneberg
    Are you also verifying your clusters with something like CheckM? I'm just worried that the input information you're using is not enough to disentangle the genomes
    klotzor
    @klotzor
    Good point. I will do that. Do you think DAS tool is the way to go for such a limited dataset? I can't get any more samples because it is a rare disease in an almost extinct kangaroo species ;)
    So anything I can find is of interest
    Johannes Alneberg
    @alneberg
    Maybe a lot of long reads?
    But yeah, you should be able to get a few genomes out of the sample, but it shouldn't be possible to resolve all the genomes
    klotzor
    @klotzor
    Yeah we don't expect that. Anyhow, thanks for your valuable input!
    leojequier
    @leojequier
    @klotzor Hi, I had the same issue. There is a problem with the cut_up_fasta.py script.