by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Johannes Alneberg
    @alneberg
    Did you activate it?
    conda info -e should show you your environments
    dmyang93
    @dmyang93
    ddocent_env /home/dongmin/.conda/envs/ddocent_env base * /home/dongmin/miniconda3 cocacola_env /home/dongmin/miniconda3/envs/cocacola_env concoct_env /home/dongmin/miniconda3/envs/concoct_env ddocent_env /home/dongmin/miniconda3/envs/ddocent_env gimme /home/dongmin/miniconda3/envs/gimme qiime2-2018.6 /home/dongmin/miniconda3/envs/qiime2-2018.6
    Johannes Alneberg
    @alneberg
    Right, so if you do source activate concoct_env you should get a different value for which python?
    Danny Ionescu
    @ionescu_danny_twitter
    Hi. I am trying to use concoct. I am getting errors running the script concoct_coverage_table.py. If I run this in a python 2.7 environment (concoct) I am getting the following error:dionescu@allegro:/data/scratch/dionescu/IK/data/scratch/dionescu/IK/Concoct_binning$ concoct_coverage_table.py contigs_10k.bed mapping/issyk_kul_megahit.contigs.fa.IS.bam.bam > coverage_table.tsv
    Traceback (most recent call last):
    File "/home/dionescu/miniconda3/envs/concoct/bin/concoct_coverage_table.py", line 18, in <module>
    import pandas as pd
    File "/home/dionescu/.local/lib/python2.7/site-packages/pandas/init.py", line 35, in <module>
    "the C extensions first.".format(module))
    ImportError: C extension: /home/dionescu/.local/lib/python2.7/site-packages/pandas/_libs/tslibs/conversion.so: undefined symbol: PyFPE_jbuf not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.
    (concoct) dionescu@allegro:/data/scratch/dionescu/IK/data/scratch/dionescu/IK/Concoct_binning$ pip install --upgrade pandas
    If I am running it in a python 3.6.7 environment then the error is different: (Concoct) dionescu@allegro:/data/scratch/dionescu/IK/data/scratch/dionescu/IK/Concoct_binning$ concoct_coverage_table.py contigs_10k.bed mapping/issyk_kul_megahit.contigs.fa.IS
    .bam.bam > c overage_table.csv
    ERROR: fail to open index BAM file 'mapping/issyk_kul_megahit.contigs.fa.IS10_S3.bam.bam'
    Traceback (most recent call last):
    File "/home/dionescu/CONCOCT/scripts/concoct_coverage_table.py", line 77, in <module>
    generate_input_table(args.bedfile, args.bamfiles, samplenames=samplenames)
    File "/home/dionescu/CONCOCT/scripts/concoct_coverage_table.py", line 28, in generate_input_table
    sys.stderr.write(out)
    TypeError: write() argument must be str, not bytes
    Please advise on how to solve this.
    klotzor
    @klotzor

    Hello there,
    I ran a de novo assembly of metagenomics ngs data with MEGAHIT and afterwards I followed the CONCOCT tutorial. When running

    concoct_coverage_table.py final.contigs_10K.bed ~/*sorted.bam > coverage_table.tsv

    I get an error message for each line in the bed file:
    ...
    Errors in BED line 'k141_25554 0 324 k141_25554.0'
    Errors in BED line 'k141_25555 0 377 k141_25555.0'
    Errors in BED line 'k141_25556 0 380 k141_25556.0'
    Errors in BED line 'k141_25557 0 623 k141_25557.0'
    Errors in BED line 'k141_25558 0 350 k141_25558.0'
    Errors in BED line 'k141_25559 0 301 k141_25559.0'
    ...

    I only have one sample. Maybe that is causing an exception?

    Any help would be appreciated. Thank you.

    klotzor
    @klotzor

    I just realized that this error came up before. My 'which python' output is

    /home/falker/miniconda2/bin/python

    So I assume I am using the conda enviroment. Any other suggestions @alneberg ?

    klotzor
    @klotzor

    I got the conda enviroment right (I guess) but now I get the error

    pkg_resources.DistributionNotFound: The 'concoct==1.0.0' distribution was not found and is required by the application

    That makes sense because my concoct version is 0.4.1

    How do I install version 1.0.0 with conda?

    Johannes Alneberg
    @alneberg
    @klotzor, Concoct version 1.0 should be installable by conda right away. What does conda install concoct give you?
    On the other hand, if you only have one sample, I'm not sure if it's worth the effort to run CONCOCT, it really isn't that efficient for just one sample
    klotzor
    @klotzor

    @alneberg I got it running in my normal enviroment. I had to switch to python 3, install scikit-learn with pip and now it is currently running (I can see the samtools process).

    It is one sample containing an unknown number of bacterial species (most likely also undescribed ones). So I can skip concoct for my analysis?

    My goal is to get the different species into bins using 5 different binning tools to finally merge them all with DAS tools to high confidence bins.
    Johannes Alneberg
    @alneberg
    How complex is the sample, in terms of expected number of species?
    klotzor
    @klotzor
    That's hard to tell. Blast gave us about 30 known species, maxbin2 produced 17 bins
    klotzor
    @klotzor
    @alneberg abawaca just finished and returned 10 clusters
    Johannes Alneberg
    @alneberg
    Are you also verifying your clusters with something like CheckM? I'm just worried that the input information you're using is not enough to disentangle the genomes
    klotzor
    @klotzor
    Good point. I will do that. Do you think DAS tool is the way to go for such a limited dataset? I can't get any more samples because it is a rare disease in an almost extinct kangaroo species ;)
    So anything I can find is of interest
    Johannes Alneberg
    @alneberg
    Maybe a lot of long reads?
    But yeah, you should be able to get a few genomes out of the sample, but it shouldn't be possible to resolve all the genomes
    klotzor
    @klotzor
    Yeah we don't expect that. Anyhow, thanks for your valuable input!
    leojequier
    @leojequier
    @klotzor Hi, I had the same issue. There is a problem with the cut_up_fasta.py script.
    leojequier
    @leojequier
    @klotzor I thinks coverage_table.py returns an error because the subcontig id in the first column doesn't match the id in the 4th column . However, if you change the line 21 in the cut_up_fasta.py from: print("{0}\t{2}\t{3}\t{0}.{1}" ....
    to: print("{0}.{1}\t{2}\t{3}\t{0}.{1}" ......
    It also adds the subcontig number to the first column and coverage_table.py doesn't return an error anymore.
    Johannes Alneberg
    @alneberg
    @leojequier, sorry for slow reply, I have been on vacation. The line you refer to is intentional since the first column should refer back to the original contig name and not the subcontig. This is since the bam files are created against the original contigs and not the subcontigs. Is it possible that you have mapped your reads against the subcontigs instead? Does this make sense? I'm reviewing these scripts at the moment and hope to be able to track down these kind of bugs.
    leojequier
    @leojequier
    @alneberg Yes, you're right I mapped the reads to the subcontigs. I corrected that and it works fine now! Thanks
    Johannes Alneberg
    @alneberg
    @leojequier, great news! Thank you for reporting back!
    md shaminur
    @Shaminur_gitlab
    I am trying to run CONCOCT but I can not understand how can I prepare such bam file "indexed bam files where each sample has been mapped against the original contigs" ?
    Johannes Alneberg
    @alneberg
    Hi @Shaminur_gitlab, maybe it's not optimally worded in the documentation. So each bam file within the directory should contain the same contigs, but each bam file contain reads from a single sample, but all samples taken together. Does that make sense?
    So you map each reads sample with a regular mapper (bowtie2, bwa, ...) and those will create a bam file each
    Josie Paris
    @josieparis
    Hi @alneberg ! I'm really sorry to double post (I also opened a github issue), but I'm having problems with running concoct --composition_file contigs_10K.fa --coverage_file coverage_table.tsv -b concoct_output with the following errors: Traceback (most recent call last):
    File "/gpfs/ts0/home/jrp228/.local/bin/concoct", line 4, in <module>
    import('pkg_resources').run_script('concoct==1.1.0', 'concoct')
    File "/gpfs/ts0/shared/software/Python/3.6.4-foss-2018a/lib/python3.6/site-packages/pkg_resources/init.py", line 743, in run_script
    self.require(requires)[0].run_script(script_name, ns)
    File "/gpfs/ts0/shared/software/Python/3.6.4-foss-2018a/lib/python3.6/site-packages/pkg_resources/init.py", line 1498, in run_script
    exec(code, namespace, namespace)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/concoct-1.1.0-py3.6-linux-x86_64.egg/EGG-INFO/scripts/concoct", line 90, in <module>
    results = main(args)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/concoct-1.1.0-py3.6-linux-x86_64.egg/EGG-INFO/scripts/concoct", line 20, in main
    composition, cov, cov_range = load_data(args)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/concoct-1.1.0-py3.6-linux-x86_64.egg/concoct/input.py", line 25, in load_data
    read_length = args.read_length
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/concoct-1.1.0-py3.6-linux-x86_64.egg/concoct/input.py", line 92, in load_coverage
    axis='index')
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/ops.py", line 2030, in f
    level=level)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/ops.py", line 1917, in _combine_series_frame
    return self._combine_match_index(other, func, level=level)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 5097, in _combine_match_index
    copy=False)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 3792, in align
    broadcast_axis=broadcast_axis)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 8428, in align
    fill_axis=fill_axis)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 8514, in _align_series
    fdata = fdata.reindex_indexer(join_index, lidx, axis=1)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1224, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)
    File "/gpfs/ts0/home/jrp228/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3087, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
    ValueError: cannot reindex from a duplicate axis
    any help greatly appreciated!! :)
    Johannes Alneberg
    @alneberg
    Hello! Might be good to double post to get my attention. Sorry about that. I just wrote a reply on the github issue. Please let me know if it is of any help
    md shaminur
    @Shaminur_gitlab
    @Alneberg , Thanks, I can sense about that meaning.
    Josie Paris
    @josieparis
    Hey @alneberg Thanks!!! will chat to you on the issue :)
    helenistheking
    @helenistheking
    Hi I am not having an issue but I do not understand the significance of why the reference contigs are cut into 10K lengths? Thank you
    Johannes Alneberg
    @alneberg
    Hello @helenistheking! The concoct binning algorithm does not take contig length into account but it is usually important to put more emphasis on longer contigs. By cutting contigs into smaller parts, the longer contigs will have a higher influence on the clusters formed. The 10K choice is fairly arbitrarily but there is a tradeoff in speed if the number of chunks becomes too large (from cutting up into too short parts)
    Jeroen Frank
    @jfrank87_gitlab
    @dmyang93 I run into this exact problem when I follow the instructions listed here: https://concoct.readthedocs.io/en/latest/usage.html
    but accidentally pass the ORIGINAL assembly to 'concoct --composition_file' ('original_contigs.fa') INSTEAD of the CHUNKED assembly ('contigs_10K.fa', obtained by running the 'cut_up_fasta.py' script). Hope this helps!
    gilusoo
    @gilusoo
    Hello, I am experiencing the same "Errors in BED line.." issue. I have tried reinstalling concoct through conda and created an environment. The cut_up_contigs script seems to work perfectly well but the concoct_coverage_table script always throws this error. The bed file looks okay to me, but the "samtools becov" command throws the same errors. Any ideas on what to try next? Help is greatly appreciated!
    gilusoo
    @gilusoo
    Just kidding! After about 2 weeks of struggling with this issue, of course I find the source of the problem just minutes after asking for help. It was an error on my part. I accidentally had a bam file in the directory with the others that had not been included in the assembly. Thanks anyway!
    Daniel Fischer
    @Fischuu_twitter
    Hi, I have an issue running concoct parallel in a server environment (managed by slurm). I started concoct with settings -t 40 and tried to configure the sbatch call with cpus-per-task=40, the logfile tells then also that 40 OMP threads were initiated, but when I check via 'top' from the calculation node, I see that only one cpu is used. After 14 days the job timed-out and also claimed that only 2.5% of the requested cpu walltime has been used. I installed concoct=1.1.0 via bioconda. Have you heard of a similar issue before?
    Johannes Alneberg
    @alneberg
    @Fischuu_twitter That's worrying indeed! A similar issue gives rise to an endless openblas warning (BinPro/CONCOCT#232) but I assume you don't get that warning? What more information can you supply? Python 3 or 2? Did concoct not even get to the INFO:root:Will call vbgmm with parameters:... step (see log example in this issue: BinPro/CONCOCT#238 ) ?
    Daniel Fischer
    @Fischuu_twitter
    @alneberg Hi, thanks for your answer! I found that issue you mentioned, but this seems to be not my problem, at least I do not get that warning. After a few test runs, I started to think that the issue is actually a 'local' problem. I tried to run concoct with a small testing dataset and there is ran smoothly, also using 40 OMP threads, so everything is fine. But when I feed the large dataset into it it freezes (last output in slurm error log is 'Generate input data' and in concocts own logs: INFO:root:Will call vbgmm with parameters: /scratch/(...)/concoct_out/, 4000, 1000, 40, 500). What I started to think, does concoct use any system folders for temporary files or such? On our cluster the /tmp folder is very small and we need to set the $TMP_DIR variable in the hope that tools respect this variable. The coverage_table file has about 20 Mio lines, python version is 3.6.7
    Lewis M Ward
    @lmward

    Hi, I'm running into the "Errors in BED line..." problem when running concoct_coverage_table.py (first few lines and the final error message pasted below). Same result when I try running samtools bedcov. I'm working with a new conda installation of concoct and have tried your suggestions to other posters above, but with no change in the output so far. Any idea what else could be causing this or what else I could try? Any help would be much appreciated!

    Errors in BED line 'k127_592283 0 315 k127_592283.concoct_part_0'
    Errors in BED line 'k127_643785 0 329 k127_643785.concoct_part_0'
    Errors in BED line 'k127_283271 0 304 k127_283271.concoct_part_0'
    Errors in BED line 'k127_669536 0 363 k127_669536.concoct_part_0'
    Errors in BED line 'k127_746789 0 329 k127_746789.concoct_part_0'
    Errors in BED line 'k127_206016 0 1047 k127_206016.concoct_part_0'
    ...
    Traceback (most recent call last):
    File "/n/home00/lmward/.conda/envs/concoct_env2/bin/concoct_coverage_table.py", line 91, in <module>
    generate_input_table(args.bedfile, args.bamfiles, samplenames=samplenames)
    File "/n/home00/lmward/.conda/envs/concoct_env2/bin/concoct_coverage_table.py", line 61, in generate_input_table
    df = pd.read_table(fh, header=None)
    File "/n/home00/lmward/.conda/envs/concoct_env2/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
    File "/n/home00/lmward/.conda/envs/concoct_env2/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
    parser = TextFileReader(fp_or_buf, kwds)
    File "/n/home00/lmward/.conda/envs/concoct_env2/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in init
    self._make_engine(self.engine)
    File "/n/home00/lmward/.conda/envs/concoct_env2/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
    self._engine = CParserWrapper(self.f,
    self.options)
    File "/n/home00/lmward/.conda/envs/concoct_env2/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in init
    self._reader = parsers.TextReader(src, **kwds)
    File "pandas/_libs/parsers.pyx", line 532, in pandas._libs.parsers.TextReader.cinit
    pandas.errors.EmptyDataError: No columns to parse from file

    gilusoo
    @gilusoo
    @lmward Not an expert, but you may want to check your bam files. If any single file is empty or if a file is not your original reads mapped to the assembled contigs, it will throw this error. Just a suggestion!
    Francisco Zorrilla
    @franciscozorrilla
    has any one successfully converted jgi_summarize_bam_contig_depths files into a concoct coverage table? I opened an issue #286, please comment there if you have any insights