Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    luciauy
    @luciauy
    Hi again! Sadly, it didn't work. I also looked in the tutorial and it says for the name of the files <anything>1.fastq.gz and <anything>2.fastq.gz, which is what I had earlier. May be my error lies elsewhere!
    hariszaf
    @hariszaf
    Hi again!
    here is the tutorial
    I will remove the .pdf as it is not synced...
    When you tried this way, did you have exactly the same error?
    luciauy
    @luciauy
    I got another error, not the same. However, it broke sooner I think?
    Error correction using BayesHammer is completed!
    Merging step by SPAdes is completed
    all the first steps are done! clustering is about to start!
    rm: cannot remove 'se.': No such file or directory
    rm: cannot remove 'nospace.
    ': No such file or directory
    Fatal error: /home/pema_latest.bds, line 530, pos 1. Exec failed.
    Exit value : 1
    Command : rm se. nospace.
    hariszaf
    @hariszaf
    yeap. that is because it does not actually run over the samples.
    ok. i would suggest you to remove everything from your mydata directory and then add only the _R1_001.fastq.gz and _R2_001.fastq.gz files.
    check so on your mydata folder to have only these files.
    you could also completely remove the output_directory so pema builds a new one
    luciauy
    @luciauy
    I just did that. Remove everything, and run it again (with just one sample just to check). But I got the same error.
    file namas: We1_R1_001.fastq.gz We1_R2_001.fastq.gz
    hariszaf
    @hariszaf
    ok. give me a few minutes to check
    luciauy
    @luciauy
    Thanks a lot!!!
    hariszaf
    @hariszaf
    and if you don't mind could you also send me the "converted" files ?
    luciauy
    @luciauy
    "converted" files? the re-named fastq.gzs?
    hariszaf
    @hariszaf
    yes yes exactly
    luciauy
    @luciauy
    cool yes
    can I upload them somewhere=
    ?
    hariszaf
    @hariszaf
    you can upload them here or send them to me via an email ( haris-zaf@hcmr.gr )
    naurasd
    @naurasd

    Hi Haris, have an issue with ITS analysis in the pandaseq step.

    Task failed:
            Program & line     : '/home/pema_latest.bds', line 444
            Task Name          : ''
            Task ID            : 'pema_latest.bds.20210107_223115_440/task.pema_latest.line_444.id_924'
            Task PID           : '15358'
            Task hint          : '/home/tools/PANDAseq/bin/pandaseq -f filtered_max_ERR0000033_1.fastq.gz.1P.fastq.00.0_0.cor.fastq.gz -r filtered_max_ERR0000033_2.fastq.gz.2P.fastq.00'
            Task resources     : 'cpus: 1   mem: -1.0 B     timeout: 86400  wall-timeout: 86400'
            State              : 'ERROR'
            Dependency state   : 'ERROR'
            Retries available  : '1'
            Input files        : '[]'
            Output files       : '[]'
            Script file        : '/home1/naurasd/pema_latest.bds.20210107_223115_440/task.pema_latest.line_444.id_924.sh'
            Exit status        : '1'
            StdErr (10 lines)  :
                    0xa9ce40:2      STAT    OVERLAPS        0
                    0xa9ce40:1      STAT    TIME    Thu Jan  7 23:46:16 2021
                    0xa9ce40:1      STAT    ELAPSED 0
                    0xa9ce40:1      STAT    READS   0
                    0xa9ce40:1      STAT    NOALGN  0
                    0xa9ce40:1      STAT    LOWQ    0
                    0xa9ce40:1      STAT    BADR    0
                    0xa9ce40:1      STAT    SLOW    0
                    0xa9ce40:1      STAT    OK      0
                    0xa9ce40:1      STAT    OVERLAPS        0
    
    Fatal error: /home/pema_latest.bds, line 446, pos 4. Task/s failed.
    pema_latest.bds, line 418 :     for ( string correctFile : correct )  {
    pema_latest.bds, line 446 :        wait

    The reads in this sample cotain a lot of N bases, don't know if this causes the problem. Thanks for your help

    hariszaf
    @hariszaf

    Hi Nauras!
    Wishes for a happy new year!

    Could you tell me what pema version you re using?
    However, it's probably a data related issue...
    An option would be to try to tune the length of merging you ask for, maybe by making it smaller it might work

    hariszaf
    @hariszaf

    Hi @/all !
    Here is a short note for the pema v.2.0.2 release!
    Midori 2 is now available, including not just metazoa sequences but all Eukaryotes taxa.
    In addition, you can now use your custom, in-house database for the taxonomy assignment.
    You can find the new version through Docker and SingularityHub.

    A PEMA documentation site is also now available.
    Some hints and tips are also available there.

    naurasd
    @naurasd

    Hi @hariszaf . I was running PEMA on my COI dataset with the new version. I used the new parameters.tsv file required for the newest version but got the following error during the taxonomy assignment step:

    Fatal error: /home/pema_latest.bds, line 1291, pos 1. Map 'paramsOfTaxonomy' does not have key 'midori_version'.
    pema_latest.bds, line 1254 :    if ( ( paramsOfTaxonomy{'gene'} == 'gene_COI' || paramsOfTaxonomy{'gene'} == "gene_ITS") &&  paramsOfTaxonomy{'clusteringAlgoForCOI_ITS'} == 'algo_SWARM' ) {
    pema_latest.bds, line 1266 :       if ( paramsOfTaxonomy{'gene'} == "gene_COI" ) {
    pema_latest.bds, line 1290 :          if ( paramsOfTaxonomy{'custom_ref_db'} != 'Yes' ) {
    pema_latest.bds, line 1291 :             task java -Xmx64g -jar $path/tools/RDPTools/classifier.jar classify -t $path/tools/RDPTools/TRAIN/$paramsOfTaxonomy{'midori_version'}/rRNAClassifier.properties -o tax_assign_swarm_COI_temp.txt rdpClassifierInput.fasta
    
    ProgramCounter.pop(100):        Node ID does not match!
            PC         : PC: size 10 / 0, nodes: 1422 -> 6727 -> 6746 -> 6785 -> 6792 -> 6827 -> 6833 -> 6834 -> 6835 -> 6837
            Node Id    : 6840
            bdsNode Id : 6837
    ProgramCounter.pop(100):        Node ID does not match!
            PC         : PC: size 9 / 0, nodes: 1422 -> 6727 -> 6746 -> 6785 -> 6792 -> 6827 -> 6833 -> 6834 -> 6835
            Node Id    : 6837
            bdsNode Id : 6835
    ProgramCounter.pop(100):        Node ID does not match!
            PC         : PC: size 8 / 0, nodes: 1422 -> 6727 -> 6746 -> 6785 -> 6792 -> 6827 -> 6833 -> 6834
            Node Id    : 6835
            bdsNode Id : 6834
    ProgramCounter.pop(100):        Node ID does not match!
            PC         : PC: size 7 / 0, nodes: 1422 -> 6727 -> 6746 -> 6785 -> 6792 -> 6827 -> 6833
            Node Id    : 6834
            bdsNode Id : 6833
    ProgramCounter.pop(100):        Node ID does not match!
            PC         : PC: size 6 / 0, nodes: 1422 -> 6727 -> 6746 -> 6785 -> 6792 -> 6827
            Node Id    : 6833
            bdsNode Id : 6827
    ProgramCounter.pop(100):        Node ID does not match!
            PC         : PC: size 5 / 0, nodes: 1422 -> 6727 -> 6746 -> 6785 -> 6792
            Node Id    : 6827
            bdsNode Id : 6792
    ProgramCounter.pop(100):        Node ID does not match!
            PC         : PC: size 4 / 0, nodes: 1422 -> 6727 -> 6746 -> 6785
            Node Id    : 6792
            bdsNode Id : 6785
    ProgramCounter.pop(100):        Node ID does not match!
            PC         : PC: size 3 / 0, nodes: 1422 -> 6727 -> 6746
            Node Id    : 6785
            bdsNode Id : 6746
    ProgramCounter.pop(100):        Node ID does not match!
            PC         : PC: size 2 / 0, nodes: 1422 -> 6727
            Node Id    : 6746
            bdsNode Id : 6727
    ProgramCounter.pop(100):        Node ID does not match!
            PC         : PC: size 1 / 0, nodes: 1422
            Node Id    : 6727
            bdsNode Id : 1422

    I don't know what the problem is and I don't kno why the error shows

    if ( paramsOfTaxonomy{'custom_ref_db'} != 'Yes' )

    I did specify No for custom_ref_db in the parameters file, I checked again. Your help is much appreciated.

    4 replies
    hariszaf
    @hariszaf
    Hey Nauras. FIrst time i see this. I ll work on that and i' ll text tyou onci i have any news
    Trevor Clack
    @Tclack88

    @hariszaf : General question about the taxonomy format. I've been working the past few weeks on a series of bash scripts to get everything in the right format for the .fasta file and the taxonomy.tsv. I noticed in your examples here

    the first 3 columns are Seq_ID, Domain, Kingdom however under Domain you have "Root" and Kingdom you have "Eukaryota" (I've never seen "Root" as a domain under any system). The first line example with "Eunice_vittata" I've found online (genebank) that Domain is Eukaryota and Kingdom is Metazoa. Everything else is the same.

    My question is: Does the software assume a certain format (2 empire, 3 empire, etc.) or is it just important that the system be consistent between the taxonomy and .fasta files?

    eg. For the Eunice_vittata, I'd like to have the columns be:

    Seq_ID, Domain, Kingdom, Phylum, Class, ...
    >SEQXX,Eukaryota, Metazoa, Annelida, Polychaeta, ...

    instead of

    Seq_ID, Domain, Kingdom, Phylum, Class, ...
    >SEQXX, Root, Eukaryota, Annelida, Polychaeta, ...
    hariszaf
    @hariszaf
    Hi @Tclack88
    I am quite sure it is the latter. You do not have to follow a certain taxonomy or something; i am confident that this is up to you.
    However, you do need to keep it consistent and take care of not falling into possible mistakes such as “convergent evolution”.
    You can check also this great tutorial here if you are about to train the RDPClassifier.
    FYI the root part you saw in my example I think it was due to the Midori initial taxonomies
    2 replies
    Trevor Clack
    @Tclack88

    In the java part of step 2 here, is the RDPClassifier coming from: https://github.com/rdpstaff/RDPTools? I couldn't find it anywhere on the website or PEMAs GitHub. If it is indeed that or elsewhere, it would be great to include a link I came across that from a google search

    -- but on a related note, the build for their RDPClassifier seems to be failing as outlined many times by their issues. If you have an environment (conda or else) that works, providing an environment.yml or otherwise a list of supporting packages and versions would be very helpful

    hariszaf
    @hariszaf
    Hi @Tclack88
    I am not sure I got you point right.
    Hoping that I am addressing your question, Step 2 is implemented by PEMA automatically.
    So if you have built the input files as described you just need to provide them to PEMA and you are ready to go.
    I have added it there just to give an overview - maybe it is more confusing than helpful...
    6 replies
    Trevor Clack
    @Tclack88

    Here to add more to the mayhem. Running with the RDPClassifier.

    I'm getting this error after running singularity run -B analysis_directory/:/mnt/analysis pema_v.2.0.2.sif:

    mkdir: cannot create directory 'TRAIN/CLEAN_COI_custom_db': Read-only file system /home/scripts/trainDbForRDPClassifier.sh: line 15: ready4train_taxonomy.txt: Read-only file system /home/scripts/trainDbForRDPClassifier.sh: line 17: ready4train_seqs.fasta: Read-only file system Picked up JAVA_TOOL_OPTIONS: -XX:+UseContainerSupport Exception in thread "main" java.io.FileNotFoundException: ready4train_taxonomy.txt (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at java.io.FileInputStream.<init>(FileInputStream.java:93) at java.io.FileReader.<init>(FileReader.java:58) at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.<init>(ClassifierTraineeMaker.java:60) at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.main(ClassifierTraineeMaker.java:170) at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:77) cp: cannot create regular file 'TRAIN/CLEAN_COI_custom_db/': Not a directory Fatal error: /home/pema_latest.bds, line 177, pos 7. Exec failed. Exit value : 1 Command : bash /home/scripts/trainDbForRDPClassifier.sh /mnt/analysis/custom_ref_db/*.tsv /mnt/analysis/custom_ref_db/*.fasta CLEAN_COI_custom_db pema_latest.bds, line 163 : if ( paramsFirstAssignment{'custom_ref_db'} == 'Yes' ){ pema_latest.bds, line 169 : if ( paramsFirstAssignment{'gene'} == 'gene_COI' ){ pema_latest.bds, line 177 : sys bash $path/scripts/trainDbForRDPClassifier.sh $taxonomyFile $sequenceFile $paramsFirstAssignment{'name_of_custom_db'}

    Sounds like it's saying the environment is "ready only" and so it can't create. Is this a singulairty issue or PEMA issue? I don't have sudo permission on the system

    2 replies
    hariszaf
    @hariszaf
    @Tclack88 give me some time to check on it. You should not need to have sudo rights
    hariszaf
    @hariszaf

    @Tclack88 Hi Trevor!
    Could you please try to run the following command ??

    singularity run --writable-tmpfs -B /path_to_your/analysis_directory/:/mnt/analysis pema_v.2.0.3.sif

    I think that you will be ok! :slight_smile:
    Though, we need to check before we smile! haha

    17 replies
    Trevor Clack
    @Tclack88

    Do you have a guide as to what the output from PEMA means (specifically from the RDPClassifier)? I couldn't find anything on GitHub. Here's a list of the output files that gets created. What's important to focus on?

    1.quality_control/       7.gene_dependent/
    2.trimmomatic_output/       8.output_per_sample/
    3.correct_by_BayesHammer/  all_samples.fasta*
    4.merged_by_PANDAseq/       checkpoints_for_COI_RDPClassifier_test_10feb/
    5.dereplicate_by_obiuniq/  final_all_samples.fasta*
    6.linearized_files/       parameters0f.COI_RDPClassifier_test_10feb.tsv*

    One critical file I'm expecting to find is a "percent match" , i.e. return all the sequences that have some match to entries in my personal database (custum_ref_db) and tell me there's a 98% match with SEQ7 for example. I found a file that has a number associated with each taxonomy.

    The most promising seems to be 7.gene_dependent/gene_COI/SWARM subdirectory. Here there's a "tax_assign_SWARM_COI.txt" But these number seem to deal with something else

    hariszaf
    @hariszaf
    Hi Trevor! I am working a tab on the pema site for the output but it will take some time
    The most important output is the the final_table.tsv file that you will find in the path you mentioned
    Trevor Clack
    @Tclack88
    image.png

    @Hariszaf
    hmm, then is it possible there's an error with PEMA or my data format? (Normally I would keep this in a thread, but threads on gitter don't seem to allow for adding photos). It's still hard to see the the first entry is the same as the 2nd, it is classified under OSU and seems to have arbitrarily been cut off. The remaining are under the 2nd column (ERR1) as per the single tab.

    No numbers that I can see either

    hariszaf
    @hariszaf
    Trevor could you send me this file on pema@hcmr.gr?
    59 replies
    hariszaf
    @hariszaf

    @Tclack88

    Another thing I'd love to poke holes in is the ITL capability of PEMA, but there doesn't seem to be instruction on that.

    i do not think i m following..

    4 replies
    Trevor Clack
    @Tclack88

    @hariszaf , do you still have the output folders from running my data? I'm asking because I'm wondering if there's a gap in my understanding of the parameters.tsv or if there's a bug in PEMA. For example, there's a threshold .6 which makes me think that matches below .6 will be discarded. But when I look under directory 7.gene_dependent/gene_COI/SWARM, there's a file called tax_assign_swarm_COI.txt which has numbers associated with each phyclum, class etc. And some of those (down in the species level) are pretty low, like .37 and so on.

    Am I misunderstanding this cutoff number?

    19 replies
    Trevor Clack
    @Tclack88

    I know you have a lot to deal with from me. Hopefully this will just be a quick one. I just ran something overnight and it timed out (since as discussed above I'm doing this on a virtual machine I don't have the advantage of a speedy and powerful HPC) , but I see from the printout there's a checkpoint file that has been created. After a timed-out error, this gets created:
    Creating checkpoint file '/home/pema_latest.bds.line_889.chp' INFO: Cleaning up image...

    I don't want to loose the 20+ hours of work so far, can I simply continue the run command
    (singularity run --fakeroot --writable -B ./analysis_directory:/mnt/analysis pema_fixperms
    )

    and it will continue where it left off? I'm uncertain if this is something you built into pema or if it's part of one of the tools you used. It's of course in the clustering section.

    hariszaf
    @hariszaf

    Hi Trevor. Sorry for the late answer.

    singularity exec -B /<path_of_your_analysis_directory>:/mnt/analysis pema_v.1.3.1.sif 
    /home/tools/BDS/.bds/bds -r /mnt/analysis/trimming.chp

    here is a common case for running a checkpoint

    just fix your flags accordingly
    i think it's going to be ok!
    Leonardo Moreno
    @j2moreno
    Does Pema support ITS analysis? Or is it still pending?
    8 replies
    AntonBilsen
    @AntonBilsen
    Hello all, I am currently having the following problem when using PEMA to analyze an invertebrate community using the COI gene. The program runs fine up to and including the creation of the MOTU table (the names of the different species are visible in the finalTriplesPerSampleFromSwarmClustering.txt files in the 8.output_per_sample file). However, after this, I get the following message: "mv: cannot stat '*.chp': No such file or directory." Does anyone know what I may be doing wrong here?
    hariszaf
    @hariszaf
    Hi Anton! Is there any chance you run multiple times pema on the same output directory..?
    The error you get is that pema looks for the checkpoints it builts to move them all in a certain directory but it doesn't find anyone.
    Probably it's nothing for you as everything should be already done!
    AntonBilsen
    @AntonBilsen
    @hariszaf Before each new run of PEMA, I created a new output directory (via the parameters.tsv file) for the data I was going to analyze. Indeed, most of PEMA's operations were successfully completed, but is there a chance that I am missing important information due to the error I mentioned above?
    hariszaf
    @hariszaf
    not actually. checkpoints can be considered as screenshots of your analysis at a specific point that allows you to partially re-run your analysis downstream from that point.
    for example, let's assume you want to leave your preprocessing parameters as they are, but you want to change the d parameter of swarm
    then you may change d in the parameters.tsv file and use the correspoding .chpto run pema from the asv inference step and downstream
    AntonBilsen
    @AntonBilsen
    @hariszaf Okay, thank you!
    hariszaf
    @hariszaf
    @AntonBilsen hope it helps! :slight_smile:
    Leonardo Moreno
    @j2moreno
    Trying to use the UNITE database for ITS analysis. Is there any way to feed the unite CREST release into PEMA? https://unite.ut.ee/repository.php
    hariszaf
    @hariszaf
    Hi @j2moreno , Leonardo pema already uses the UNITE database for the case of ITS. I would suggest to give a first try with that.
    If you need to run with a latest version of UNITE, you cannot do that for the time being in a straight forward way.
    However, i think there is an indirect way to do so:
    You can start a container adding the UNITE files in your my_analysis directory and mount those along with everything else and once you do that, you need to move them in the /home/tools/CREST/LCAClassifier/parts/flatdb/unite directory. then you go back in the /home directory and run ./pema_latest.bds
    However, if you re facing difficulties with the command line I could add this version in a few weeks from now.
    Leonardo Moreno
    @j2moreno
    @hariszaf perfect thanks for the suggestion! Easy fix