Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Apr 11 11:32
    jpfeuffer commented #5209
  • Apr 11 11:30
    jpfeuffer commented #5209
  • Apr 11 09:00
    timosachsenberg synchronize #5209
  • Apr 11 07:25
    timosachsenberg commented #5209
  • Apr 11 07:24
    timosachsenberg synchronize #5209
  • Apr 11 07:16
    timosachsenberg commented #5209
  • Apr 10 17:03
    dvdvgt commented #5256
  • Apr 10 17:01
    dvdvgt commented #5256
  • Apr 10 16:40
    dvdvgt synchronize #5256
  • Apr 10 15:44
    timosachsenberg synchronize #5209
  • Apr 10 15:39
    timosachsenberg commented #5209
  • Apr 10 09:22
    rubengruenberg synchronize #5209
  • Apr 10 08:04
    24sharkS synchronize #5226
  • Apr 09 23:01

    openms-jenkins-bot on nightly

    Update InternalCalibration_Mode… OpenMSDBInfo also write unimod … (compare)

  • Apr 09 14:06
    timosachsenberg commented #5209
  • Apr 09 14:04

    timosachsenberg on jpfeuffer-patch-1

    (compare)

  • Apr 09 14:04

    timosachsenberg on develop

    OpenMSDBInfo also write unimod … (compare)

  • Apr 09 14:04
    timosachsenberg closed #5272
  • Apr 09 13:48

    timosachsenberg on IC-package-install

    (compare)

  • Apr 09 13:48

    timosachsenberg on develop

    Update InternalCalibration_Mode… (compare)

Julianus Pfeuffer
@jpfeuffer
Are you using a workflow manager or command line?
Yap Jun Hong (Nemo)
@nemoyjh
workflow manager

Huh.. so I set all the output options in FileInfo to True (m, p, s, d, c, v, i), and all I got was

File type: mzML

Validating mzML file against XML schema version 1.1.0
Success - the file is valid!

Semantically validating mzML file:
Success - the file is semantically valid!

Julianus Pfeuffer
@jpfeuffer
Hmm that is indeed weird. Do you have the log of the conversion step?
Yap Jun Hong (Nemo)
@nemoyjh
oh i should've left them as false
Is there a way to PM you so I don't flood this channel?
done
Timo Sachsenberg
@timosachsenberg

Hi all, I was trying to analyze the mass spectrum file obtained from Q Exactive™ Mass Spectrometer (Thermo Scientific). It has .RAW as file extension which is not supported by OpenMS. Any idea how can I analyze my data?

You can check out http://proteowizard.sourceforge.net/download.html

Yap Jun Hong (Nemo)
@nemoyjh

Hello, I've run into a different error. I'm trying to debug to the best of my abilities.

What I'm doing is using SpecLibSearcher with a .msp library and an input .mzML file to identify peptides from an MS/MS spectra (Note: Same spectra as the one I mentioned above). This is the pipeline:

input file -> FileConverter -> PeakPickerWavelet -> SpecLibSearcher -> Output file

I run into this error on SpecLibSearcher

Unexpected internal error (Prefix of string '2_0' successfully converted to an integer value. Additional characters found at position 2)

I've found the source of this error in this part of, line 810:

https://github.com/OpenMS/OpenMS/blob/develop/src/openms/include/OpenMS/DATASTRUCTURES/StringUtils.h

The specific comment is:

"was the string parsed (white spaces are skipped automatically!) completely? If not, we have a problem because a previous split might have used the wrong split char"

Which string could this be referring to? the .msp library one or the parsed spectra? Is there a way to change the settings to fix the parse?

Thank you

Julianus Pfeuffer
@jpfeuffer
Do you have some more output of SpecLibSearcher before it fails, so we can narrow it down to the string in question?
Yap Jun Hong (Nemo)
@nemoyjh
Hmm, I've set the debug value to 1, but there doesn't seem to be extra information. Am I missing something?
Julianus Pfeuffer
@jpfeuffer
and the standard output?
only this one line?
Yap Jun Hong (Nemo)
@nemoyjh
yea, aside from all the 'value of int option' stuff, this is it
(in function: int __cdecl OpenMS::StringUtils::toInt(const class OpenMS::String &)) !
there's "value of string option", which uses what file paths I use. Could it be referring to that?
Julianus Pfeuffer
@jpfeuffer
can you search your input file or your library for that string "2_0"?
it is probably at a place where a pure integer is expected
Yap Jun Hong (Nemo)
@nemoyjh
hold on, let me find a way to open .msp files. My input is a spectra and the meta_data doesn't seem to have a 2_0.
Julianus Pfeuffer
@jpfeuffer
If you can upload the files I can also look at it
Yap Jun Hong (Nemo)
@nemoyjh

the .msp file is from here:

https://chemdata.nist.gov/dokuwiki/doku.php?id=peptidew:lib:human20140529

first file option, 2014_05_29_human_consensus_final_true_lib.tar.gz

Julianus Pfeuffer
@jpfeuffer
ah interesting resource
Yap Jun Hong (Nemo)
@nemoyjh
I'm absolutely new to this field and what I'm doing is an interview test, so I'd rather not send the main file over. I'm trying to use openms to figure this out. Is this chemdata NIST website commonly used?
Julianus Pfeuffer
@jpfeuffer
hm really depends on what you are expecting to be in your spectra ;)
Yap Jun Hong (Nemo)
@nemoyjh
ah well, I'll figure that out. Is the problem with the library? What tool are you using to check it so I can check it myself next time?
Julianus Pfeuffer
@jpfeuffer
yes it is the library.
Yap Jun Hong (Nemo)
@nemoyjh
oh
Julianus Pfeuffer
@jpfeuffer
the first line is already: Name: AAAAAAAAAAAAAAAGAGAGAK/2_0
Yap Jun Hong (Nemo)
@nemoyjh
i see
Yap Jun Hong (Nemo)
@nemoyjh
thanks
LA307
@LA307
@timosachsenberg and @jpfeuffer ; thanks a lot. Your hint with the FeatureFinderMultiplex basically worked out, only the parameter fine tuning is difficult. Does only the FeatureFinderMultiplex contain this function or also some of the other FeatureFinders?
Timo Sachsenberg
@timosachsenberg
I think this is the only one. May I ask what you want to achieve on a higher level? e.g. preparing data for let's say - a benchmarking dataset or for machine learning? You can contact me in private if you don't want to disclose that in public. Maybe there are other ways to achieve the same goal.
Roger Olivella
@rolivella
Hi all! I'm using PeptideIndexer with "Rattus norvegicus" (10116, https://www.uniprot.org/uniprot/?query=organism%3A%22Rattus+norvegicus+%28Rat%29+%5B10116%5D%22&sort=score) fasta file (SP+TrEMBL) and there're proteins like this: https://www.uniprot.org/uniprot/A0A0G2K866 with this kind of peptides: "ILRMPHXXXXXXXXXXTSFP" so I put -aaa_max 10 but I get an error. In the log first I can see this message: "Searching with up to 10 ambiguous amino acid(s) and 0 mismatch(es)!". But later I get:
Error occurred in line 345 of file /OpenMS/src/openms/include/OpenMS/ANALYSIS/ID/AhoCorasickAmbiguous.h (in function: Pattern<FuzzyAC>(PeptideSet)) !
Could you please help? Thanks!
Chris Bielow
@cbielow
Spontaneously, I'd say that searching with -aaa_max 10 is not a good idea since ANY peptide with length<=9 (+1 for the tryptic cleavage site) will get matched to this protein. Certainly not what you want??? Also the runtime is exponential in the number of AAA's, so it is going to take a very long time to complete the search. And maybe memory will explode (virtually). The error itself (or the lack of more text) is a bit weird.... I'd need to look into that.
Roger Olivella
@rolivella
I'm just trying to use PeptideIndexer with this Rat FASTA but I can't because it complains of this Xs aminoacids that are present in almost 50 peptides (the one I gave you is just an example, there are peptides with just one X and others with 10 Xs). I'm trying to overcome it using -aaa_max but it doesn't work...
Timo Sachsenberg
@timosachsenberg
@cbielow would it make sense to add an option to just ignore those peptides during mapping? or what would you suggest
Julianus Pfeuffer
@jpfeuffer
But what happens if you use less?
Then you just do not map to these parts. That is what I would assume.
If you use aaa_max=3 you can reach 3 AAs into this stretch of ten X but the prefix/suffix needs to match
Chris Bielow
@cbielow
@jpfeuffer is correct. Any match which is reported can only contain at most -aaa_max number of X's.
Chris Bielow
@cbielow
@rolivella Did peptideindexer fail because it could not match all peptides? And that is why you resorted to increasing aaa_max? Usually there is another reason why this happens. Can you make the FASTA+idXML available somewhere or maybe just say where the problem was?
Roger Olivella
@rolivella
This is the complete error message:
Debug level: 1
 >> PeptideIndexer -aaa_max 3 -debug 1 -threads 4 -enzyme:specificity full -IL_equivalent -in 2021MQ006_FEAG_036_01_2ug.raw_mascot.idXML -out 2021MQ006_FEAG_036_01_2ug.raw_mascot_peptideindexer.idXML -write_protein_sequence -fasta UP_Rat.fasta_decoy.fasta -decoy_string ###REV### -decoy_string_position prefix -unmatched_action warn -missing_decoy_action warn -write_protein_sequence
File '/users/pr/qsample/.OpenMS/OpenMS.ini' is deprecated.
Updating missing/wrong entries in '/users/pr/qsample/.OpenMS/OpenMS.ini' with defaults!
Value of string option 'test': 0
The OpenMS team is collecting usage statistics for quality control and funding purposes.
We will never give out your personal data, but you may disable this functionality by
setting the environmental variable OPENMS_DISABLE_UPDATE_CHECK to ON.
Connecting to REST server successful.
Debug level (after ini file): 1
Value of string option 'no_progress': 0
Value of string option 'in': 2021MQ006_FEAG_036_01_2ug.raw_mascot.idXML
Value of string option 'out': 2021MQ006_FEAG_036_01_2ug.raw_mascot_peptideindexer.idXML
Value of string option 'fasta': UP_Rat.fasta_decoy.fasta
Progress of 'Loading idXML':
-- done [took 5.00 s (CPU), 5.06 s (Wall)] --
Info: using 'Trypsin' as enzyme (obtained from idXML) for digestion.
Peptide identification engine: MASCOT
Enzyme: Trypsin
Progress of 'Load first DB chunk':
-- done [took 0.66 s (CPU), 0.81 s (Wall)] --
Peptide sequence '.(Acetyl)RXXXTLPM(Oxidation)AYALR' contains one or more ambiguous amino acids (B|J|Z|X).
Peptide sequence 'XXEALEEFSHLEGNPSIK' contains one or more ambiguous amino acids (B|J|Z|X).
Peptide sequence '.(Acetyl)QLSLQFXSDDEXXELLXMFTXXSASESRGK' contains one or more ambiguous amino acids (B|J|Z|X).
Peptide sequence 'SRPKKREGVXXXXXXELAK' contains one or more ambiguous amino acids (B|J|Z|X).
Peptide sequence '.(Acetyl)SRLAYAMPLTXXXRK' contains one or more ambiguous amino acids (B|J|Z|X).
Peptide sequence 'RLLVELDKVHM(Oxidation)XQNDVAXXXXXXELM(Oxidation)ELDK' contains one or more ambiguous amino acids (B|J|Z|X).
Peptide sequence '.(Acetyl)LSRPPPGXXVKK' contains one or more ambiguous amino acids (B|J|Z|X).
Peptide sequence '.(Acetyl)LDSQM(Oxidation)NELLKKVXXGPPPRSL' contains one or more ambiguous amino acids (B|J|Z|X).
Peptide sequence '.(Acetyl)YYRAPEVILGM(Oxidation)GYKENGQXVXHVQRGLIC(Carbamidomethyl)C(Carbamidomethyl)' contains one or more ambiguous amino acids (B|J|Z|X).
Peptide sequence 'DVLIM(Oxidation)DLC(Carbamidomethyl)KDTVXXXXENQEFVLQEDGTLVHKQSGK' contains one or more ambiguous amino acids (B|J|Z|X).
Peptide sequence '.(Acetyl)TYTC(Carbamidomethyl)RYTGGMLYEIVLXXXXXXXXXGPPLNGQK' contains one or more ambiguous amino acids (B|J|Z|X).
Peptide sequence 'LAFIEMRSDATIKRNXXWXXXR' contains one or more ambiguous amino acids (B|J|Z|X).
Peptide sequence 'EGGPFSTXXXXXXXXXXHPM(Oxidation)R' contains one or more ambiguous amino acids (B|J|Z|X).
One or more peptides contained illegal amino acids. This is not allowed!
Please either remove the peptide or replace it with one of the unambiguous ones (while allowing for ambiguous AA's to match the protein).
Mapping 103874 peptides to 59880 proteins.
Searching with up to 3 ambiguous amino acid(s) and 0 mismatch(es)!
Building trie ...Error: Unexpected internal error (the value 'RXXXTIPMAYAIR' was used but is not valid; Input peptide to FuzzyAC must NOT contain ambiguous amino acids (B/J/Z/X)!)
Error occurred in line 345 of file /OpenMS/src/openms/include/OpenMS/ANALYSIS/ID/AhoCorasickAmbiguous.h (in function: Pattern<FuzzyAC>(PeptideSet)) !
Thanks!
Chris Bielow
@cbielow
thanks for the data. This is indeed something novel. Mascot returned PSMs which are not resolved, i.e. still contain the ambguous aminoacids in the reported peptide sequence, e.g. "LAFIEMRSDATIKRNXXWXXXR". Currently, we do not support the mapping of these sequences to the database (since this has never occurred before). It can be done by exact matching, but there are checks in place which prevent this.
Is there maybe a switch in Mascot's config, which allows you to change how AAA's are reported?
If not, the only option I see is to either remove the hits from the idXML before running Peptideindexer, or remove them from the database before searching. Its only a handful of hits, so the global result is not affected much in this case.