Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Johannes Alneberg
    @alneberg
    Ok, yes please do so. This error does seem weird.
    dmyang93
    @dmyang93
    I reinstalled concoct in conda environment, but same error show again.
    Running concoct, users should use sorted bam file? I just used bam file, not sorted. I've made bam file by converting sorted sam file, and not additionally sorting after getting bam file.
    dmyang93
    @dmyang93
    I solved it!! I checked error part of coverage file(k141_107981, k141_107982, k141_107983, etc. as i sent upper chat) , then I found all of them have zero coverage in coverage file. So I removed those lines with zero coverage, and removed those contigs from the assembly file too.(They are almost short contigs). Then, CONCOCT runs well and make results!! However, I have a question yet. The synthetic data making proper reuslts by CONCOCT have zero coverage part in the coverage file, too. But they used CONCOCT well. As a result, I solved problem, but I don't know what is a fundamental cause of problem. So, if you wants, I'll help you fix the CONCOCT to be applied to any data. I wonder the reason, as much as you, and I solve the fundamental problem, too.
    dmyang93
    @dmyang93
    We should think about this problem more... because I succeeded binning of one experimental data, but I failed to other experimental data, although I removed zero coverage part by using my python script. I think, I'll check my python script make something wrong coverage_nonzero file. Then I'll report the result.
    Johannes Alneberg
    @alneberg
    Hmm, this is still a mystery to me. Did you get the samtools bedcov GIST_INU_1_10k.bed GIST_INU_1.bam | head command to work?
    dmyang93
    @dmyang93
    There is no change in the result from that command . Because I edited coverage file, not bam file or bed file. So same stdout 'Errors in BED line 'k141~' ~' is shown. Also, I found a condition under which concoct run properly, which is contig names in the assembly file and coverage file shouldn't have a space(' '). My experimental data's contig names have spaces(ex. k141_3 flag=1 multi=3.0000 len=340), but synthetic data not(ex. NODE_1_length_220433_cov_306.047028). I changed to my experimental data's contig names without spaces(ex. k141_3_flag=1_multi=3.0000_len=340), then concoct run well!. And I have seen in other data that this condition works well. So, I think your concoct algorithm doesn't consider spaces in contig names. How do you think?? Additionally, the test without zero coverage is wrong test, because when I tested that condition, I removed spaces in the contig names, without considering space as important.
    Johannes Alneberg
    @alneberg
    Yes that's correct. The coverage file should not have spaces in the contig ids. Those arose since you used a different script to generate the coverage file than the one supplied with concoct. Therefore it would be best to get the Bed file error fixed... What system are you running this on? Linux, mac?
    dmyang93
    @dmyang93
    I used Linux(Ubuntu).
    Johannes Alneberg
    @alneberg
    Ok, good. Lets take this from the beginning then. Can you run: which python and conda listfor me
    dmyang93
    @dmyang93
    /usr/bin/python
    conda list shows so many packages. Is there any way to upload file?
    Johannes Alneberg
    @alneberg
    So if which python
    It seems that you're not using the conda environment
    Did you activate it?
    conda info -e should show you your environments
    dmyang93
    @dmyang93
    ddocent_env /home/dongmin/.conda/envs/ddocent_env base * /home/dongmin/miniconda3 cocacola_env /home/dongmin/miniconda3/envs/cocacola_env concoct_env /home/dongmin/miniconda3/envs/concoct_env ddocent_env /home/dongmin/miniconda3/envs/ddocent_env gimme /home/dongmin/miniconda3/envs/gimme qiime2-2018.6 /home/dongmin/miniconda3/envs/qiime2-2018.6
    Johannes Alneberg
    @alneberg
    Right, so if you do source activate concoct_env you should get a different value for which python?
    Danny Ionescu
    @ionescu_danny_twitter
    Hi. I am trying to use concoct. I am getting errors running the script concoct_coverage_table.py. If I run this in a python 2.7 environment (concoct) I am getting the following error:dionescu@allegro:/data/scratch/dionescu/IK/data/scratch/dionescu/IK/Concoct_binning$ concoct_coverage_table.py contigs_10k.bed mapping/issyk_kul_megahit.contigs.fa.IS.bam.bam > coverage_table.tsv
    Traceback (most recent call last):
    File "/home/dionescu/miniconda3/envs/concoct/bin/concoct_coverage_table.py", line 18, in <module>
    import pandas as pd
    File "/home/dionescu/.local/lib/python2.7/site-packages/pandas/init.py", line 35, in <module>
    "the C extensions first.".format(module))
    ImportError: C extension: /home/dionescu/.local/lib/python2.7/site-packages/pandas/_libs/tslibs/conversion.so: undefined symbol: PyFPE_jbuf not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.
    (concoct) dionescu@allegro:/data/scratch/dionescu/IK/data/scratch/dionescu/IK/Concoct_binning$ pip install --upgrade pandas
    If I am running it in a python 3.6.7 environment then the error is different: (Concoct) dionescu@allegro:/data/scratch/dionescu/IK/data/scratch/dionescu/IK/Concoct_binning$ concoct_coverage_table.py contigs_10k.bed mapping/issyk_kul_megahit.contigs.fa.IS
    .bam.bam > c overage_table.csv
    ERROR: fail to open index BAM file 'mapping/issyk_kul_megahit.contigs.fa.IS10_S3.bam.bam'
    Traceback (most recent call last):
    File "/home/dionescu/CONCOCT/scripts/concoct_coverage_table.py", line 77, in <module>
    generate_input_table(args.bedfile, args.bamfiles, samplenames=samplenames)
    File "/home/dionescu/CONCOCT/scripts/concoct_coverage_table.py", line 28, in generate_input_table
    sys.stderr.write(out)
    TypeError: write() argument must be str, not bytes
    Please advise on how to solve this.
    klotzor
    @klotzor

    Hello there,
    I ran a de novo assembly of metagenomics ngs data with MEGAHIT and afterwards I followed the CONCOCT tutorial. When running

    concoct_coverage_table.py final.contigs_10K.bed ~/*sorted.bam > coverage_table.tsv

    I get an error message for each line in the bed file:
    ...
    Errors in BED line 'k141_25554 0 324 k141_25554.0'
    Errors in BED line 'k141_25555 0 377 k141_25555.0'
    Errors in BED line 'k141_25556 0 380 k141_25556.0'
    Errors in BED line 'k141_25557 0 623 k141_25557.0'
    Errors in BED line 'k141_25558 0 350 k141_25558.0'
    Errors in BED line 'k141_25559 0 301 k141_25559.0'
    ...

    I only have one sample. Maybe that is causing an exception?

    Any help would be appreciated. Thank you.

    klotzor
    @klotzor

    I just realized that this error came up before. My 'which python' output is

    /home/falker/miniconda2/bin/python

    So I assume I am using the conda enviroment. Any other suggestions @alneberg ?

    klotzor
    @klotzor

    I got the conda enviroment right (I guess) but now I get the error

    pkg_resources.DistributionNotFound: The 'concoct==1.0.0' distribution was not found and is required by the application

    That makes sense because my concoct version is 0.4.1

    How do I install version 1.0.0 with conda?

    Johannes Alneberg
    @alneberg
    @klotzor, Concoct version 1.0 should be installable by conda right away. What does conda install concoct give you?
    On the other hand, if you only have one sample, I'm not sure if it's worth the effort to run CONCOCT, it really isn't that efficient for just one sample
    klotzor
    @klotzor

    @alneberg I got it running in my normal enviroment. I had to switch to python 3, install scikit-learn with pip and now it is currently running (I can see the samtools process).

    It is one sample containing an unknown number of bacterial species (most likely also undescribed ones). So I can skip concoct for my analysis?

    My goal is to get the different species into bins using 5 different binning tools to finally merge them all with DAS tools to high confidence bins.
    Johannes Alneberg
    @alneberg
    How complex is the sample, in terms of expected number of species?
    klotzor
    @klotzor
    That's hard to tell. Blast gave us about 30 known species, maxbin2 produced 17 bins
    klotzor
    @klotzor
    @alneberg abawaca just finished and returned 10 clusters
    Johannes Alneberg
    @alneberg
    Are you also verifying your clusters with something like CheckM? I'm just worried that the input information you're using is not enough to disentangle the genomes
    klotzor
    @klotzor
    Good point. I will do that. Do you think DAS tool is the way to go for such a limited dataset? I can't get any more samples because it is a rare disease in an almost extinct kangaroo species ;)
    So anything I can find is of interest
    Johannes Alneberg
    @alneberg
    Maybe a lot of long reads?
    But yeah, you should be able to get a few genomes out of the sample, but it shouldn't be possible to resolve all the genomes
    klotzor
    @klotzor
    Yeah we don't expect that. Anyhow, thanks for your valuable input!
    leojequier
    @leojequier
    @klotzor Hi, I had the same issue. There is a problem with the cut_up_fasta.py script.
    leojequier
    @leojequier
    @klotzor I thinks coverage_table.py returns an error because the subcontig id in the first column doesn't match the id in the 4th column . However, if you change the line 21 in the cut_up_fasta.py from: print("{0}\t{2}\t{3}\t{0}.{1}" ....
    to: print("{0}.{1}\t{2}\t{3}\t{0}.{1}" ......
    It also adds the subcontig number to the first column and coverage_table.py doesn't return an error anymore.
    Johannes Alneberg
    @alneberg
    @leojequier, sorry for slow reply, I have been on vacation. The line you refer to is intentional since the first column should refer back to the original contig name and not the subcontig. This is since the bam files are created against the original contigs and not the subcontigs. Is it possible that you have mapped your reads against the subcontigs instead? Does this make sense? I'm reviewing these scripts at the moment and hope to be able to track down these kind of bugs.
    leojequier
    @leojequier
    @alneberg Yes, you're right I mapped the reads to the subcontigs. I corrected that and it works fine now! Thanks
    Johannes Alneberg
    @alneberg
    @leojequier, great news! Thank you for reporting back!
    md shaminur
    @Shaminur_gitlab
    I am trying to run CONCOCT but I can not understand how can I prepare such bam file "indexed bam files where each sample has been mapped against the original contigs" ?
    Johannes Alneberg
    @alneberg
    Hi @Shaminur_gitlab, maybe it's not optimally worded in the documentation. So each bam file within the directory should contain the same contigs, but each bam file contain reads from a single sample, but all samples taken together. Does that make sense?
    So you map each reads sample with a regular mapper (bowtie2, bwa, ...) and those will create a bam file each
    Josie Paris
    @josieparis
    Hi @alneberg ! I'm really sorry to double post (I also opened a github issue), but I'm having problems with running concoct --composition_file contigs_10K.fa --coverage_file coverage_table.tsv -b concoct_output with the following errors: Traceback (most recent call last):
    File "/gpfs/ts0/home/jrp228/.local/bin/concoct", line 4, in <module>
    import('pkg_resources').run_script('concoct==1.1.0', 'concoct')
    File "/gpfs/ts0/shared/software/Python/3.6.4-foss-2018a/lib/python3.6/site-packages/pkg_resources/init.py", line 743, in run_script
    self.require(requires)[0].run_script(script_name, ns)
    File "/gpfs/ts0/shared/software/Python/3.6.4-foss-2018a/lib/python3.6/site-packages/pkg_resources/init.py", line 1498, in run_script
    exec(code, namespace, namespace)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/concoct-1.1.0-py3.6-linux-x86_64.egg/EGG-INFO/scripts/concoct", line 90, in <module>
    results = main(args)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/concoct-1.1.0-py3.6-linux-x86_64.egg/EGG-INFO/scripts/concoct", line 20, in main
    composition, cov, cov_range = load_data(args)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/concoct-1.1.0-py3.6-linux-x86_64.egg/concoct/input.py", line 25, in load_data
    read_length = args.read_length
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/concoct-1.1.0-py3.6-linux-x86_64.egg/concoct/input.py", line 92, in load_coverage
    axis='index')
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/ops.py", line 2030, in f
    level=level)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/ops.py", line 1917, in _combine_series_frame
    return self._combine_match_index(other, func, level=level)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 5097, in _combine_match_index
    copy=False)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 3792, in align
    broadcast_axis=broadcast_axis)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 8428, in align
    fill_axis=fill_axis)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 8514, in _align_series
    fdata = fdata.reindex_indexer(join_index, lidx, axis=1)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1224, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3087, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
    ValueError: cannot reindex from a duplicate axis
    any help greatly appreciated!! :)
    Johannes Alneberg
    @alneberg
    Hello! Might be good to double post to get my attention. Sorry about that. I just wrote a reply on the github issue. Please let me know if it is of any help
    md shaminur
    @Shaminur_gitlab
    @Alneberg , Thanks, I can sense about that meaning.
    Josie Paris
    @josieparis
    Hey @alneberg Thanks!!! will chat to you on the issue :)
    helenistheking
    @helenistheking
    Hi I am not having an issue but I do not understand the significance of why the reference contigs are cut into 10K lengths? Thank you
    Johannes Alneberg
    @alneberg
    Hello @helenistheking! The concoct binning algorithm does not take contig length into account but it is usually important to put more emphasis on longer contigs. By cutting contigs into smaller parts, the longer contigs will have a higher influence on the clusters formed. The 10K choice is fairly arbitrarily but there is a tradeoff in speed if the number of chunks becomes too large (from cutting up into too short parts)