Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    Good point. I will do that. Do you think DAS tool is the way to go for such a limited dataset? I can't get any more samples because it is a rare disease in an almost extinct kangaroo species ;)
    So anything I can find is of interest
    Johannes Alneberg
    Maybe a lot of long reads?
    But yeah, you should be able to get a few genomes out of the sample, but it shouldn't be possible to resolve all the genomes
    Yeah we don't expect that. Anyhow, thanks for your valuable input!
    @klotzor Hi, I had the same issue. There is a problem with the script.
    @klotzor I thinks returns an error because the subcontig id in the first column doesn't match the id in the 4th column . However, if you change the line 21 in the from: print("{0}\t{2}\t{3}\t{0}.{1}" ....
    to: print("{0}.{1}\t{2}\t{3}\t{0}.{1}" ......
    It also adds the subcontig number to the first column and doesn't return an error anymore.
    Johannes Alneberg
    @leojequier, sorry for slow reply, I have been on vacation. The line you refer to is intentional since the first column should refer back to the original contig name and not the subcontig. This is since the bam files are created against the original contigs and not the subcontigs. Is it possible that you have mapped your reads against the subcontigs instead? Does this make sense? I'm reviewing these scripts at the moment and hope to be able to track down these kind of bugs.
    @alneberg Yes, you're right I mapped the reads to the subcontigs. I corrected that and it works fine now! Thanks
    Johannes Alneberg
    @leojequier, great news! Thank you for reporting back!
    md shaminur
    I am trying to run CONCOCT but I can not understand how can I prepare such bam file "indexed bam files where each sample has been mapped against the original contigs" ?
    Johannes Alneberg
    Hi @Shaminur_gitlab, maybe it's not optimally worded in the documentation. So each bam file within the directory should contain the same contigs, but each bam file contain reads from a single sample, but all samples taken together. Does that make sense?
    So you map each reads sample with a regular mapper (bowtie2, bwa, ...) and those will create a bam file each
    Josie Paris
    Hi @alneberg ! I'm really sorry to double post (I also opened a github issue), but I'm having problems with running concoct --composition_file contigs_10K.fa --coverage_file coverage_table.tsv -b concoct_output with the following errors: Traceback (most recent call last):
    File "/gpfs/ts0/home/jrp228/.local/bin/concoct", line 4, in <module>
    import('pkg_resources').run_script('concoct==1.1.0', 'concoct')
    File "/gpfs/ts0/shared/software/Python/3.6.4-foss-2018a/lib/python3.6/site-packages/pkg_resources/", line 743, in run_script
    self.require(requires)[0].run_script(script_name, ns)
    File "/gpfs/ts0/shared/software/Python/3.6.4-foss-2018a/lib/python3.6/site-packages/pkg_resources/", line 1498, in run_script
    exec(code, namespace, namespace)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/concoct-1.1.0-py3.6-linux-x86_64.egg/EGG-INFO/scripts/concoct", line 90, in <module>
    results = main(args)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/concoct-1.1.0-py3.6-linux-x86_64.egg/EGG-INFO/scripts/concoct", line 20, in main
    composition, cov, cov_range = load_data(args)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/concoct-1.1.0-py3.6-linux-x86_64.egg/concoct/", line 25, in load_data
    read_length = args.read_length
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/concoct-1.1.0-py3.6-linux-x86_64.egg/concoct/", line 92, in load_coverage
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/", line 2030, in f
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/", line 1917, in _combine_series_frame
    return self._combine_match_index(other, func, level=level)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/", line 5097, in _combine_match_index
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/", line 3792, in align
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/", line 8428, in align
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/", line 8514, in _align_series
    fdata = fdata.reindex_indexer(join_index, lidx, axis=1)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/internals/", line 1224, in reindex_indexer
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/indexes/", line 3087, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
    ValueError: cannot reindex from a duplicate axis
    any help greatly appreciated!! :)
    Johannes Alneberg
    Hello! Might be good to double post to get my attention. Sorry about that. I just wrote a reply on the github issue. Please let me know if it is of any help
    md shaminur
    @Alneberg , Thanks, I can sense about that meaning.
    Josie Paris
    Hey @alneberg Thanks!!! will chat to you on the issue :)
    Hi I am not having an issue but I do not understand the significance of why the reference contigs are cut into 10K lengths? Thank you
    Johannes Alneberg
    Hello @helenistheking! The concoct binning algorithm does not take contig length into account but it is usually important to put more emphasis on longer contigs. By cutting contigs into smaller parts, the longer contigs will have a higher influence on the clusters formed. The 10K choice is fairly arbitrarily but there is a tradeoff in speed if the number of chunks becomes too large (from cutting up into too short parts)
    Jeroen Frank
    @dmyang93 I run into this exact problem when I follow the instructions listed here:
    but accidentally pass the ORIGINAL assembly to 'concoct --composition_file' ('original_contigs.fa') INSTEAD of the CHUNKED assembly ('contigs_10K.fa', obtained by running the '' script). Hope this helps!
    Hello, I am experiencing the same "Errors in BED line.." issue. I have tried reinstalling concoct through conda and created an environment. The cut_up_contigs script seems to work perfectly well but the concoct_coverage_table script always throws this error. The bed file looks okay to me, but the "samtools becov" command throws the same errors. Any ideas on what to try next? Help is greatly appreciated!
    Just kidding! After about 2 weeks of struggling with this issue, of course I find the source of the problem just minutes after asking for help. It was an error on my part. I accidentally had a bam file in the directory with the others that had not been included in the assembly. Thanks anyway!
    Daniel Fischer
    Hi, I have an issue running concoct parallel in a server environment (managed by slurm). I started concoct with settings -t 40 and tried to configure the sbatch call with cpus-per-task=40, the logfile tells then also that 40 OMP threads were initiated, but when I check via 'top' from the calculation node, I see that only one cpu is used. After 14 days the job timed-out and also claimed that only 2.5% of the requested cpu walltime has been used. I installed concoct=1.1.0 via bioconda. Have you heard of a similar issue before?
    Johannes Alneberg
    @Fischuu_twitter That's worrying indeed! A similar issue gives rise to an endless openblas warning (BinPro/CONCOCT#232) but I assume you don't get that warning? What more information can you supply? Python 3 or 2? Did concoct not even get to the INFO:root:Will call vbgmm with parameters:... step (see log example in this issue: BinPro/CONCOCT#238 ) ?
    Daniel Fischer
    @alneberg Hi, thanks for your answer! I found that issue you mentioned, but this seems to be not my problem, at least I do not get that warning. After a few test runs, I started to think that the issue is actually a 'local' problem. I tried to run concoct with a small testing dataset and there is ran smoothly, also using 40 OMP threads, so everything is fine. But when I feed the large dataset into it it freezes (last output in slurm error log is 'Generate input data' and in concocts own logs: INFO:root:Will call vbgmm with parameters: /scratch/(...)/concoct_out/, 4000, 1000, 40, 500). What I started to think, does concoct use any system folders for temporary files or such? On our cluster the /tmp folder is very small and we need to set the $TMP_DIR variable in the hope that tools respect this variable. The coverage_table file has about 20 Mio lines, python version is 3.6.7
    Lewis M Ward

    Hi, I'm running into the "Errors in BED line..." problem when running (first few lines and the final error message pasted below). Same result when I try running samtools bedcov. I'm working with a new conda installation of concoct and have tried your suggestions to other posters above, but with no change in the output so far. Any idea what else could be causing this or what else I could try? Any help would be much appreciated!

    Errors in BED line 'k127_592283 0 315 k127_592283.concoct_part_0'
    Errors in BED line 'k127_643785 0 329 k127_643785.concoct_part_0'
    Errors in BED line 'k127_283271 0 304 k127_283271.concoct_part_0'
    Errors in BED line 'k127_669536 0 363 k127_669536.concoct_part_0'
    Errors in BED line 'k127_746789 0 329 k127_746789.concoct_part_0'
    Errors in BED line 'k127_206016 0 1047 k127_206016.concoct_part_0'
    Traceback (most recent call last):
    File "/n/home00/lmward/.conda/envs/concoct_env2/bin/", line 91, in <module>
    generate_input_table(args.bedfile, args.bamfiles, samplenames=samplenames)
    File "/n/home00/lmward/.conda/envs/concoct_env2/bin/", line 61, in generate_input_table
    df = pd.read_table(fh, header=None)
    File "/n/home00/lmward/.conda/envs/concoct_env2/lib/python3.7/site-packages/pandas/io/", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
    File "/n/home00/lmward/.conda/envs/concoct_env2/lib/python3.7/site-packages/pandas/io/", line 448, in _read
    parser = TextFileReader(fp_or_buf, kwds)
    File "/n/home00/lmward/.conda/envs/concoct_env2/lib/python3.7/site-packages/pandas/io/", line 880, in init
    File "/n/home00/lmward/.conda/envs/concoct_env2/lib/python3.7/site-packages/pandas/io/", line 1114, in _make_engine
    self._engine = CParserWrapper(self.f,
    File "/n/home00/lmward/.conda/envs/concoct_env2/lib/python3.7/site-packages/pandas/io/", line 1891, in init
    self._reader = parsers.TextReader(src, **kwds)
    File "pandas/_libs/parsers.pyx", line 532, in pandas._libs.parsers.TextReader.cinit
    pandas.errors.EmptyDataError: No columns to parse from file

    @lmward Not an expert, but you may want to check your bam files. If any single file is empty or if a file is not your original reads mapped to the assembled contigs, it will throw this error. Just a suggestion!
    Francisco Zorrilla
    has any one successfully converted jgi_summarize_bam_contig_depths files into a concoct coverage table? I opened an issue #286, please comment there if you have any insights