Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 31 2019 17:58
    jorgemachucav starred galaxyproject/tools-iuc
  • Jan 31 2019 17:45
    bebatut opened #2270
  • Jan 31 2019 16:18
    cpreviti synchronize #2267
  • Jan 31 2019 14:15
    cpreviti synchronize #2267
  • Jan 31 2019 12:42
    bernt-matthias review_requested #2269
  • Jan 31 2019 12:42
    bernt-matthias edited #2269
  • Jan 31 2019 12:41
    bernt-matthias edited #2269
  • Jan 31 2019 12:40
    bernt-matthias synchronize #2269
  • Jan 31 2019 12:13
    cpreviti commented #2267
  • Jan 31 2019 12:07
    nsoranzo commented #2267
  • Jan 31 2019 12:01
    cpreviti synchronize #2267
  • Jan 31 2019 11:21
    cpreviti synchronize #2267
  • Jan 31 2019 09:47
    cpreviti synchronize #2267
  • Jan 31 2019 09:27
    cpreviti synchronize #2267
  • Jan 30 2019 20:38
    bernt-matthias commented #2131
  • Jan 30 2019 20:19
    hepcat72 commented #2239
  • Jan 30 2019 19:50
    lparsons commented #2239
  • Jan 30 2019 18:36
    bgruening commented #2268
  • Jan 30 2019 15:23
    nsoranzo commented #2268
  • Jan 30 2019 15:23
    nsoranzo commented #2267
Martin Cech
@martenson
Marius van den Beek
@mvdbeek
Is this really Heng Li’s bwa ?
looks weird
Martin Cech
@martenson
he is an author on the paper
Marius van den Beek
@mvdbeek
I am aware, and he made an announcement, but all his other projects are in the lh3 namespace
paper is a good indicator though
ok, there’s also a link from https://github.com/lh3/bwa
Marius van den Beek
@mvdbeek
not recommended for production uses at the moment
idk, let’s wait
Brad Langhorst
@bwlang
i’ve tested this… it’s fast.
produced exactly the same alignments in my small test
Marius van den Beek
@mvdbeek
does it have the same options ?
Brad Langhorst
@bwlang
I didn’t compare carefully - it did not strike me as different though
Usage: bwa2 mem [options] <idxbase> <in1.fq> [in2.fq]
Options:
  Algorithm options:
    -o STR        Output SAM file name
    -t INT        number of threads [1]
    -k INT        minimum seed length [19]
    -w INT        band width for banded alignment [100]
    -d INT        off-diagonal X-dropoff [100]
    -r FLOAT      look for internal seeds inside a seed longer than {-k} * FLOAT [1.5]
    -y INT        seed occurrence for the 3rd round seeding [20]
    -c INT        skip seeds with more than INT occurrences [500]
    -D FLOAT      drop chains shorter than FLOAT fraction of the longest overlapping chain [0.50]
    -W INT        discard a chain if seeded bases shorter than INT [0]
    -m INT        perform at most INT rounds of mate rescues for each read [50]
    -S            skip mate rescue
    -o            output file name missing
    -P            skip pairing; mate rescue performed unless -S also in use
Scoring options:
   -A INT        score for a sequence match, which scales options -TdBOELU unless overridden [1]
   -B INT        penalty for a mismatch [4]
   -O INT[,INT]  gap open penalties for deletions and insertions [6,6]
   -E INT[,INT]  gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1]
   -L INT[,INT]  penalty for 5'- and 3'-end clipping [5,5]
   -U INT        penalty for an unpaired read pair [17]
Input/output options:
   -p            smart pairing (ignoring in2.fq)
   -R STR        read group header line such as '@RG\tID:foo\tSM:bar' [null]
   -H STR/FILE   insert STR to header if it starts with @; or insert lines in FILE [null]
   -j            treat ALT contigs as part of the primary assembly (i.e. ignore <idxbase>.alt file)
   -v INT        verbose level: 1=error, 2=warning, 3=message, 4+=debugging [3]
   -T INT        minimum score to output [30]
   -h INT[,INT]  if there are <INT hits with score >80% of the max score, output all in XA [5,200]
   -a            output all alignments for SE or unpaired PE
   -C            append FASTA/FASTQ comment to SAM output
   -V            output the reference FASTA header in the XR tag
   -Y            use soft clipping for supplementary alignments
   -M            mark shorter split hits as secondary
   -I FLOAT[,FLOAT[,INT[,INT]]]
                 specify the mean, standard deviation (10% of the mean if absent), max
                 (4 sigma from the mean if absent) and min of the insert size distribution.
                 FR orientation only. [inferred]
Note: Please read the man page for detailed description of the command line and options.
looks like a superset of bwa
Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]

Algorithm options:

       -t INT        number of threads [1]
       -k INT        minimum seed length [19]
       -w INT        band width for banded alignment [100]
       -d INT        off-diagonal X-dropoff [100]
       -r FLOAT      look for internal seeds inside a seed longer than {-k} * FLOAT [1.5]
       -y INT        seed occurrence for the 3rd round seeding [20]
       -c INT        skip seeds with more than INT occurrences [500]
       -D FLOAT      drop chains shorter than FLOAT fraction of the longest overlapping chain [0.50]
       -W INT        discard a chain if seeded bases shorter than INT [0]
       -m INT        perform at most INT rounds of mate rescues for each read [50]
       -S            skip mate rescue
       -P            skip pairing; mate rescue performed unless -S also in use
       -e            discard full-length exact matches

Scoring options:

       -A INT        score for a sequence match, which scales options -TdBOELU unless overridden [1]
       -B INT        penalty for a mismatch [4]
       -O INT[,INT]  gap open penalties for deletions and insertions [6,6]
       -E INT[,INT]  gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1]
       -L INT[,INT]  penalty for 5'- and 3'-end clipping [5,5]
       -U INT        penalty for an unpaired read pair [17]

       -x STR        read type. Setting -x changes multiple parameters unless overriden [null]
                     pacbio: -k17 -W40 -r10 -A1 -B1 -O1 -E1 -L0  (PacBio reads to ref)
                     ont2d: -k14 -W20 -r10 -A1 -B1 -O1 -E1 -L0  (Oxford Nanopore 2D-reads to ref)
                     intractg: -B9 -O16 -L5  (intra-species contigs to ref)

Input/output options:

       -p            smart pairing (ignoring in2.fq)
       -R STR        read group header line such as '@RG\tID:foo\tSM:bar' [null]
       -H STR/FILE   insert STR to header if it starts with @; or insert lines in FILE [null]
       -j            treat ALT contigs as part of the primary assembly (i.e. ignore <idxbase>.alt file)

       -v INT        verbose level: 1=error, 2=warning, 3=message, 4+=debugging [3]
       -T INT        minimum score to output [30]
       -h INT[,INT]  if there are <INT hits with score >80% of the max score, output all in XA [5,200]
       -a            output all alignments for SE or unpaired PE
       -C            append FASTA/FASTQ comment to SAM output
       -V            output the reference FASTA header in the XR tag
       -Y            use soft clipping for supplementary alignments
       -M            mark shorter split hits as secondary

       -I FLOAT[,FLOAT[,INT[,INT]]]
                     specify the mean, standard deviation (10% of the mean if absent), max
                     (4 sigma from the mean if absent) and min of the insert size distribution.
                     FR orientation only. [inferred]

Note: Please read the man page for detailed description of the command line and options.
Marius van den Beek
@mvdbeek
in principle our caching approach is much more fine-grained in that we can decide which parameters need to match
but we don’t have the UI and we need dataset hashes to make this really efficient and useful
Martin Cech
@martenson
wrong channel :)
Marius van den Beek
@mvdbeek
oops
:D
M Bernt
@bernt-matthias

@jmchilton here are a few examples:

ctd/CVInspector.ctd:      <ITEMLIST name="ignore_cv" type="string" description="A list of CV identifiers 
which should be ignored." required="false" advanced="false">
ctd/CVInspector.ctd:        <LISTITEM value="UO"/>
ctd/CVInspector.ctd:        <LISTITEM value="PATO"/>
ctd/CVInspector.ctd:        <LISTITEM value="BTO"/>
ctd/CVInspector.ctd:      </ITEMLIST>
<ITEMLIST name="target_modifications" type="string" description="List the amino acids to be searched for and their mass modifications, specified using UniMod (www.unimod.org) terms, e.g. &apos;Carbamidomethyl (C)&apos;" required="false" advanced="false" restrictions="15N-oxobutanoic (N-term C),2-dimethylsuccinyl (C),2-monomethylsuccinyl (C),2-nitrobenzyl (Y),2-succinyl (C),2HPG (R),3-deoxyglucosone (R),3-phosphoglyceryl (K),3sulfo (N-term),4-ONE (C),4-ONE (H),4-ONE (K),4-ONE+Delta:H(-2)O(-1) (C),4-ONE+Delta:H(-2)O(-1) (H),4-ONE+Delta:H(-2)O(-1) (K),4AcAllylGal (C),a-type-ion (C-term),AccQTag (K),AccQTag (N-term),Acetyl (C),Acetyl (H),Acetyl (K),Acetyl (N-term),Acetyl (S),Acetyl (T),Acetyl (Y),Acetyl:13C(2) (K),Acetyl:2H(3) (H),Acetyl:2H(3) (K),Acetyl:2H(3) (N-term),Acetyl:2H(3)...">
        <LISTITEM value="Phospho (S)"/>
        <LISTITEM value="Phospho (T)"/>
        <LISTITEM value="Phospho (Y)"/>
</ITEMLIST>

Since both cases don't have restrictions we can't render them as <select> and repeats don't work, because I can't set the defaults.

My favorite solution would be a select where the user can add values.

The repeat units are very simple here, i.e. single string (but can be int,float as well)
@nsoranzo : btw: <param><value></value></param> does not work in a test :(
Nicola Soranzo
@nsoranzo
I only knew it would work for the help attribute.
John Chilton
@jmchilton
Thanks for the examples - yeah, that doesn't really fall nicely in the Galaxy tool framework at all. We probably want something like collection_type in the workflow editor - a free-form text box... but with multiple and suggestions.
Marius van den Beek
@mvdbeek

My favorite solution would be a select where the user can add values.

:+1: that would be nice

:D
Unexpected HTTP status code: 500: {"err_msg": "Error attempting to parse file tool_data_table_conf.xml.sample: Merging tabular data tables with non matching columns is not allowed: twobit:{'path': 1, 'name': 0, 'value': 0} != twobit:{'value': 1, 'dbkey': 0, 'name': 1}"}
Nicola Soranzo
@nsoranzo
Trying to merge tools/extract_genomic_dna/tool_data_table_conf.xml.sample with Galaxy's config/tool_data_table_conf.xml.sample ?
Marius van den Beek
@mvdbeek
yes
probably a good check, but kind of surprising
there is a mismatch in the colum definition in extract_genomic_dna and the data manager
I suppose the data manager one should win
Nicola Soranzo
@nsoranzo
There's a definition also in Galaxy config/tool_data_table_conf.xml.sample , from which the data manager correctly derives.
Since 2015, commit galaxyproject/galaxy@30f1815
Marius van den Beek
@mvdbeek
does anyone know why we put emboss5 in the IUC conda channel ?
it’s pretty much the only package there that we still need in non-deprecated iuc tools
M Bernt
@bernt-matthias
Marius van den Beek
@mvdbeek
could be
I’ll open a PR, let’s see if it gets in
M Bernt
@bernt-matthias
Is this for the container tests? I remember that in the first iteration there was only one failing emboss tool which I tried to fix here galaxyproject/tools-iuc#2725.
Marius van den Beek
@mvdbeek
yeah, and now we need to build a container that includes the IUC channel, while we already had a container
that was emboss 5 only
Marius van den Beek
@mvdbeek
do all the emboss 5 tools need perl to run / should I add perl to the run requirements ?
M Bernt
@bernt-matthias
I think its only for the 4 perl script that are included in the galaxy wrappers. I guess requirements are fine.
Marius van den Beek
@mvdbeek
Marius van den Beek
@mvdbeek
@nsoranzo the old source tarball is at ftp://emboss.open-bio.org/pub/EMBOSS/old/5.0.0/EMBOSS-5.0.0.tar.gz, should we use that instead of cheating our way around the linter ?