Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
  • Dec 05 20:00
    timbitz commented #138
  • Dec 05 20:00
    timbitz commented #138
  • Dec 05 19:58

    timbitz on master

    Update README.md (compare)

  • Dec 05 19:45
    timbitz commented #138
  • Dec 05 19:44
    timbitz commented #138
  • Nov 30 16:23
    astrophys commented #138
  • Nov 30 16:19
    astrophys commented #138
  • Nov 29 23:48
    timbitz commented #138
  • Nov 02 12:39
    astrophys edited #138
  • Nov 02 12:34
    astrophys edited #138
  • Nov 02 12:33
    astrophys edited #138
  • Nov 02 12:33
    astrophys opened #138
  • Oct 18 03:20
    pierreg2 commented #137
  • Oct 12 18:52
    timbitz commented #137
  • Oct 12 18:51
    timbitz commented #137
  • Oct 12 18:44

    timbitz on master

    Update README.md (compare)

  • Oct 12 18:43
    timbitz closed #135
  • Oct 12 18:43
    timbitz commented #135
  • Sep 29 12:09
    pierreg2 opened #137
  • Sep 20 15:09
    shregeno8 closed #136
Alexander Neumann
Hey, thanks a lot for clearing that up, now that you say it this makes total sense!
Hi @timbitz, we ran Whippet using the pre-built hg19 index, and the results looked great, but when we tried to use our own BAMs to build the supplemented index, when visualising the output using IGV the results didn’t look real. For example, it was calling alternate 3’ splice sites separated by 1 base (with a probability 0.917 and DeltaPsi of 0.23) which on direct visualisation we couldn’t find any evidence for (see picture below). We were wondering if you had any pointers on where we might have gone wrong? Just for reference, we were using julia 0.6.4 and whippet 0.11, and we built the ab-initio index using whippet-index, supplying ensembl human grch38 gtf + merged 6 sample bams, which were split into 3x treatment and 3xcontrol. The differential quantification was then run by whippet-quant and whippet-delta for the 3x treatment vs 3x control.
Tim Sterne-Weiler
Hey @nathompson4 -- Hmm, are you sure there are no reads in any of the bam files with that 1-off 3' splice site, and no transcripts in the hg38 set with that splice site? Just trying to figure out how that node was added to the index in the first place.
Hi, we no can't see any bam files containing that splice site and there's not an established transcript at that splice site either. Any suggestions on what may have gone wrong would be gratefully received.
Tim Sterne-Weiler
Okay, I don't actually have any idea how that can happen to be honest. You are using STAR or HISAT to align the reads I assume?-- During indexing there needs to be evidence for each splice site from BAM or GTF to create a node. If you post or e-mail me (tim.sterne.weiler@utoronto.ca) a download link to the bam files, and gtf file, I can try to dig in a little deeper.
JP Villemin
Hi Tim, do you have any idea how whippet works when there is a difference in read depth between samples ? We recently observed a batch effect ( probably due to a different number of reads mapped) between two pool of rna_seq sequenced separately. Is there a way to take into account batch effects when doing an analyse with whippet ? Thanks ++
JP Villemin
Hi Tim, sorry to bother you again, circ rna can be detected when quantifying but are not taking account when doing delta ? right ? why ? Thanks ++
Tim Sterne-Weiler
Hey @ZheFrench_twitter, Yah I mean differential read-depth is likely to produce artifacts in any analysis-- especially if you are trying to correlate two experiments across all genes or AS events where one set has better coverage over lowly expressed genes than another. Only thing I can think of is to filter for genes/AS events with a minimal coverage level, which might help. You can also try the --biascorrect flag which could help if there are other common RNA-seq biases at play.
Tim Sterne-Weiler
In terms of circular RNAs... yes you are correct-- Whippet quantifies these if you use the --circ flag, but delta doesn't handle them currently. This is for two reasons (a) we didn't comprehensively benchmark circular splicing prediction and accuracy in the paper (b) it was simply more convenient to ignore them in delta... without the circular output each .psi.gz file from the same index has the same number of lines (so delta doesn't have to load the full files as it goes through multiple conditions, it just iterates line by line and ignores circular entries). I'd be willing to add this feature, but I'd want to dive a bit deeper in a systematic way first.
Hi Tim, I'm a new user to Whippet. I tried and found it runs fast (thank you for developing this useful tool!), and I got an expected psi.gz results; however, when I used whippet-delta, I could only get chromosome 1 result in output.diff.gz. I tried decompressing the psi.gz and running again, and this time a got data from chromosome 1 to 3. I wonder why I could not get full results? I typed" julia bin/whippet-delta.jl -a WT.psi.gz, -b KO.psi.gz," just like you suggested. Can you help me?
Hi Tim, I am starting the analysis of the results; I had a question. I am working on cancer data, so I was looking at the section of the paper of Whippet on the increase in high entropy AS in Cancer. On the figure 7C, there is a heatmap of the splicing event with significative changes calculated by the U test. So you didn't use the probability from the diff file ? I was wondering if I could do something similar with the probability from the diff file or if I would have to do the U test in between my tumour and control sample.
Alex Nesta
Hello, I'm a new Whippet user coming from rMATS. In rMATS, you can filter preferential splicing based on # of reads that align to the splice junction. This is an important filtering step because if an alternatively spliced isoform only has two reads while the canonical has 1, you will see a 50% PSI, when in reality, this is not likely to be biologically relevant. How can I do the same or something similar with Whippet?
Alex Nesta
@Alex-Nesta "It also outputs the mean deltaPsi (Psi_A - Psi_B) and the probability that there is some change in deltaPsi given the read depth of the AS event, such that P(|deltaPsi| > 0.0)" I found that in the readme. works for me. Thanks.
Alex Nesta
Hi, I'm looking for a way to identify which specific isoform contains the alternative splicing event called by whippet. This is easily done with Intron retentions and core exons using bedtools. However, the tandem starts and ends are proving to be much more difficult to identify which isoform from the annotation is identified with whippet. I believe the only way to accurately do this is to modify the whippet code to display the isoform for the result from whippet-delta. Any thoughts?
Tim Sterne-Weiler
Hi @froggy5207 were there any errors thrown by whippet-delta.jl?
Hi @paulinefx the probability from whippet-delta.jl is only for comparing PSI values, not entropy. So in the paper we use that non-parametric test to compare entropy values between tumor and normal datasets.
Hi @Alex-Nesta to filter events by read depth, you can use the -r flag to whippet-delta.jl. There is a complete list of command-line options for all the programs by using the -h flag.
Tim Sterne-Weiler
In terms of which isoforms contain the alternative node, I haven't added this feature yet, but would consider it. You're right that without writing code and only using off-the-shelf tools this should be do-able for CE nodes, but perhaps not TS/TE nodes, which you'd need to directly compare the txStart or txEnd coordinates to the annotated isoforms. Shouldn't be too hard if you want to write a little script to do that though. In terms of adding code to Whippet, I'm not sure how long it'll take me to get to adding this feature, I have a number of more pressing issues/features in line that I need to implement first.
Alex Nesta

Running whippet delta with single samples is not working properly for me. I am only getting output.diff results for chromosome 1 and no other chromosomes in the output. The same procedure works fine when comparing multiple samples. I even tried using single samples which have been shown to worked fine when testing multiple samples.

This is what my command looks like when I am only comparing single samples:

julia ~/.julia/v0.6/Whippet/bin/whippet-delta.jl -a $jdir/HS578T.psi.gz, -b $jdir/HS578BST.psi.gz,

If I run the following I get a complete output with all chromosomes:

julia ~/.julia/v0.6/Whippet/bin/whippet-delta.jl -a $jdir/HS578T.psi.gz,$jdir/MCF7.psi.gz -b $jdir/HS578BST.psi.gz,$jdir/MCF10A.psi.gz

Has anyone gotten an output from whippet delta using single samples with all chromosomes? I checked the .psi files and those contain info for all chromosomes...

Tim Sterne-Weiler
@Alex-Nesta @froggy5207 -- Hey, so I looked into this and you're right-- I think there is a bug in the BufferedStreams.jl package that is returning a blank readline in the middle of a file (probably related to some buffer refilling that is not being handled correctly)... this is obviously not what anyone would expect and so my code interprets this as an EOF. I'm not sure when this bug was introduced or specifically where the bug is in that package (assuming it is in that package)... But in the meantime I can fix how whippet-delta.jl handles this-- This is actually a major issue for whippet-delta.jl in particular, so I'll push a bugfix immediately. Stay tuned.
Tim Sterne-Weiler
@Alex-Nesta @froggy5207 Whippet v0.11.1 was tagged/merged so the problem should be solved now. You can update with Pkg.update().
Hi Tim, is there is a way to output the read coverage for each alternative event identified by Whippet? Thanks for a great piece of software btw.

First - thanks for making this tool! I like the way it works a lot.

I'm having a slight issue with the output in psi.gz - there are quite a few nodes that give NA for all fields (excluding ID, node, location, strand) that appear to correspond to constitutively included exons with hundreds+ junction reads when I view in IGV. From what I can tell, this is especially true for exons that have alternative 5/3ss - all potential nodes corresponding to this exon are given NA.

I built my annotation using a gencode 29 gff, and without input from a bam file (my data is unstranded). Perhaps this is an issue with 'low quality' transcripts in the annotation? I did not filter for TSL prior to building the graph. Here are a few example nodes that I get NAs for:

Three consecutive nodes that cover one constitutive exon:


My issue is that I want to look at cryptic, low frequency events in introns. If I filter my annotation, I'll lose a lot of annotated cryptic exons that are of interest to me. I could attempt to clean up the annotation, then supplement it with a bam, but as I say the data is unstranded. I'm not sure how much of an issue that is. What do you think is going on here?

JP Villemin
Hey Tim, Can whippet worked with single Cells RNA-Seq (thinking to smart-seq2 library where full length transcript should be sequenced in theory )?
There is no particular reason for it not to work. Right ? Until now I only tried only to use it with speudo-bulk RnaSeq, pooling several scRNASEQ but still coverage is low. Depth is still around 5e106 reads for a speudo bulk. (and you got only 1e106 reads for one sample of scRNASEQ)
Hi ! I was wondering if it was possible to have the psi and sam outputs at the same time or do I have to run whippet twice ?
Hi Tim, I have a question of how to interpret the results I obtained from whippet. First of all thanks for creating the tool, it runs amazingly fast! So here is what I did: As quality control I ran whippet on a set of samples with or without mutations in an important splicing factor. This factor is already known to cause differential splicing for a lot of genes. My sample sizes are 5 for the samples with splicing mutations and 53 without (stranded single end sequencing). I used GRCh37_Ensembl75 for building the index and -r 20 and -s 3 as options for whippet-delta. As control I randomly drew 5 samples from my pool of samples and compared them against the rest of the samples to see how large the probabilities would be just by chance. Unfortunately, I am having some trouble of how to interpret my results. Attached you will find a histogram in which I plotted the probability distributions for the splicing factor comparison (red) and the random (blue) comparison. Something seems to be going on, however, the amount of nodes with very high probabilities does not really seem to be different. Would you recommend a different way of looking at the data or what would be your interpretation? Thank you very much for your help!

Dear Team,

I run Whippet on my paired-end, unstranded, polyA+ RNA-seq dataset. The dataset is of 75bp length (all reads in all samples have this length) and performed in triplicates per condition (2 conditions in total one is a negaitve control and the other one is a silencing of a lncRNA with LNA).

Whippet run options : default options, except -r 10 -s2: I tell Whippet to consider events whose total number of supporting reads are at least 10, in at least 2 samples of each condition.

Using a filtering criteria of : Probability > 0.95 and |dPSI| > 0.1 (10%), I observed the most abundant differential splicing events in my dataset are TE events with a ratio of 654/785 events in total.

Should this be normal? or is it the nature of the RNA-seq dataset (polyA+) which causes this higher number of TE events? I mean, since it is a human transcriptome, one would expect a higher number of CE events and only few tandem poly-adenylation sites events.

How is it possible to check whether the reported TE events are true-positives, as a starting point I inspected the events using IGV but since it is not involving a junction it was difficult to see differences between the two compared conditions.

I would be very thankful if you can comment on this!

Thank you so much in advance!
Kind regards,

Hello, Thanks for this tool. I am new to Julia. I failed to install Whippet either in version 0.6.4 or 1.1.1. Can you kindly provide updated information for us to fix it?
Sofya Laskina
Hello and thanks for the nice package, works nicely with our data. One question though, I am using --bam option while building an indexing, also samtools rejected to built some .bai index and proposed csi instead in the preparation step. So my data is indexed in .csi format which Whippet doesn't accept when building an index. Is there a workaround for this issue? Many thanks in advance!
Dear Team,
I'm wondering whether there is a simple way to skip detection of unannotated junctions in whippet-quant (those marked as ALT_UNIQ). Do you think it might improve the performance of the quantification or maybe it doesn't have significant influence on that? Thanks in advance for any comments!
Dear all, I want to use Whippet for identification of novel splicing events. I read the GitHub page and the paper but to me is still not clear how shoudl I proceed to buil an index by supplying the gtf and bam files. In particular I am aware that the default --bam-min-reads parameter is 1 and i should increase it when dealing with large bam files, but I couldn´t find an indication of what would be a reasonable value for this parameter.I built a customed gtf file integrating the Ensembl gtf annotation (TSL1/2) with gtf obtained by genome-guided transcriptome assembly (Hisat+Stringtie pipeline with short-read sequencing available in our lab) and with gtf obtained by long-read sequencing we performed in house. I used STAR to align our short reads sequencing data to the
Sorry, my message was cut. In short, what --bam-min-reads value would you suggest for building an index with a ~100M reads bam file? Thanks!
Dear team, thanks for the great Whippet package. I've recently encountered a new problem where every time I try to activate Whippet, I get a "ERROR: SystemError: close: No space left on device" message. I suspect it might be because Whippet is trying to access the /tmp which is probably full. Is there a way to specify a different temp folder for Whippet? Thanks :)

Dear all, I am new in Whippet and Julia as well. I am running the test and I need to install many packages in the julia environment. However, I am trying to install import Pkg; Pkg.add("FMIndexes"), and I got this error: ERROR: The following package names could not be resolved:

  • FMIndexes (not found in project, manifest or registry).

I already tried the recommendation from this link (JuliaLang/julia#40531), but it still does not work.

Does anybody could help me, please?!


() | Documentation: https://docs.julialang.org
) | () () |
| | | Type "?" for help, "]?" for Pkg help.
| | | | | | |/
` | |
| | || | | | (| | | Version 1.6.5 (2021-12-19)
_/ |\
'|||__'| | Official https://julialang.org/ release
|__/ |

julia> using Pkg

julia> Pkg.add("Whippet")
Installing known registries into ~/.julia
Added registry General to ~/.julia/registries/General
ERROR: The following package names could not be resolved:

  • Whippet (not found in project, manifest or registry)

[1] pkgerror(msg::String)
@ Pkg.Types /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Types.jl:55
[2] ensure_resolved(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; registry::Bool)
@ Pkg.Types /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Types.jl:883
[3] add(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; preserve::Pkg.Types.PreserveLevel, platform::Base.BinaryPlatforms.Platform, kwargs::Base.Iterators.Pairs{Symbol, Base.TTY, Tuple{Symbol}, NamedTuple{(:io,), Tuple{Base.TTY}}})
@ Pkg.API /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:193
[4] add(pkgs::Vector{Pkg.Types.PackageSpec}; io::Base.TTY, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Pkg.API /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:80
[5] add(pkgs::Vector{Pkg.Types.PackageSpec})
@ Pkg.API /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:78
[6] #add#23
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:76 [inlined]
[7] add
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:76 [inlined]
[8] #add#22
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:75 [inlined]
[9] add(pkg::String)
@ Pkg.API /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:75
[10] top-level scope
@ REPL[2]:1


Dear @Genentech
could you please help me?! I really would like to use whippet!

I am new in Whippet and Julia as well. I am running the test and I need to install many packages in the julia environment. However, I am trying to install import Pkg; Pkg.add("FMIndexes"), and I got this error: ERROR: The following package names could not be resolved:

FMIndexes (not found in project, manifest or registry).
I already tried the recommendation from this link (JuliaLang/julia#40531), but it still does not work.

Does anybody could help me, please?!


Good afternoon, I am using whippet and I want to understand the read_count collumn in the .jnc file. I am comparing the values i obtained with SJ.out.tab of STAR. Are these the whippet raw read counts or are they normalized? I don't understand how a splice junction can have 0.1604707 read counts if it is not normalized.
Also, the values are quite different from the STAR and whippet files
Star - collumn V7 number of uniquely mapping reads crossing the junction
V1 V2 V3 V4 V5 V6 V7 V8 V9
9 78480421 78480510 2 4 1 26795 54 50

9 78480420 78480511 PB.12851:7-9:ALT_ANNO 4603.964 -

in whippet, for index i used b) Annotation (GTF) + Alignment (BAM) supplemented index. --bam-min-reads 10, on quantify i used --biascorrect

Rnasepoly Merase
Hello @timbitz , I have two questions about the complexity column in the .psi.gz output files I hope to seek help from you. 1) Are the values in this column a prediction of the complexity of the relevant node or a report of actually observed complexity of this node in the RNA-seq sample I'm analyzing? 2) Does a K0 mean this node is constitutive in the particular RNA-seq sample I'm analyzing or based on initial files used for building the index? My apologies in advance if any questions are unclear, which I can further clarify. Thank you for your time!
Rnasepoly Merase
Another question is regarding the type column in .psi.gz files. I notice there are many genes, the 1st and last node of which have NA in the Type column despite that the junctions spanning these nodes have good readcount coverage judging from the jnc.gz files (please see the image below as an example). I wonder why this is the case?
Hi Tim, I have been using te older version of Whippet for a while and it works great. Just upgraded to new version on julia 1.6.7. Created new index file OK but ran into error when using whippet-quant:[genechip@datarig Whippet.jl]$ julia bin/whippet-quant.jl SRR9882956_1.fastq SRR9882956_2.fastq -x hg38.jls
Whippet v1.6.1 loading...
Activating environment at ~/Whippet.jl/Project.toml
20.57599 seconds.
Loading splice graph index... /home/genechip/Whippet.jl/hg38.jls
11.080704 seconds (6.03 M allocations: 1.035 GiB, 24.05% gc time)
Processing reads from file...
FASTQ_1: /home/genechip/Whippet.jl/SRR9882956_1.fastq
FASTQ_2: /home/genechip/Whippet.jl/SRR9882956_2.fastq
ERROR: LoadError: Cannot encode 78 to BioSequences.DNAAlphabet{2}()
[1] error(s::String)
@ Base ./error.jl:33
[2] throw_encode_error(A::BioSequences.DNAAlphabet{2}, src::Vector{UInt8}, soff::Int64)
@ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:216
[3] encode_chunk
@ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:228 [inlined]
[4] encode_chunks!(dst::BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}, startindex::Int64, src::Vector{UInt8}, soff::Int64, N::Int64)
@ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:239
[5] copyto!(dst::BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}, doff::Int64, src::Vector{UInt8}, soff::Int64, N::Int64, #unused#::BioSequences.AsciiAlphabet)
@ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:361
[6] copyto!
@ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:292 [inlined]
[7] BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}(src::Vector{UInt8}, startpos::Int64, stoppos::Int64)
@ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/constructors.jl:49
[8] BioSequence
@ ~/Whippet.jl/src/types.jl:74 [inlined]
[9] fill!(rec::Whippet.FASTQRecord, offset::Int64)
@ Whippet ~/Whippet.jl/src/record.jl:14
[10] process_paired_reads!(fwd_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{IOStream}}}, rev_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{IOStream}}}, param::AlignParam, lib::GraphLib, quant::GraphLibQuant{SGAlignPaired, DefaultCounter}, multi::MultiMapping{SGAlignPaired, DefaultCounter}, mod::DefaultBiasMod; bufsize::Int64, sam::Bool, qualoffset::Int64)
@ Whippet ~/Whippet.jl/src/reads.jl:103
[11] macro expansion
@ ~/Whippet.jl/src/timer.jl:5 [inlined]
[12] main()
@ Main ~/Whippet.jl/bin/whippet-quant.jl:143
[13] top-level scope
@ ~/Whippet.jl/src/timer.jl:5
in expression starting at /home/genechip/Whippet.jl/bin/whippet-quant.jl:185
Any idea what the LoadError: Cannot encode 78 to BioSequences.DNAAlphabet{2}() is? I reinstalled the BioSequence package in julia. Thanks, Nick
Hello, I am following the procedure to install Whippet.jl on the (official documentation)[https://github.com/timbitz/Whippet.jl/tree/master]. As I go ahead and compile the project it stops throwing the following error: ERROR: LoadError: SystemError: opening file "/home/emanuele/Whippet.jl/bin/../index/graph.jls": No such file or directory
Now, if you go the available files in the [[https://github.com/timbitz/Whippet.jl/tree/master] there is no such directory. How do I solve that? Thanks for any advise everyone.
It looks like that the /index/graph.jls is not in the github. Unless that has to be created somewhere else?

hello,We are having trouble cloning a repository on Windows. and gives this error:(@v1.6) pkg> activate
Activating environment at C:\Users\Mino\.julia\environments\v1.6\Project.toml

(@v1.6) pkg> test
ERROR: trying to test unnamed project

(@v1.6) pkg> what do we do?

I ran Whippet on two different datasets with the same annotation index but I don't have the same number of events or exons (CE) in the psi files. I'm wondering where do these differences come from? Whippet doesn't list all potential events and assign a score for each one?
Hi, I have the same question: when I run the command to collect .psi records on matrix using "readWhippetDataSet" from multiple files it throws following error: "Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 564590, 564591". After further digging, I found the error occurs from "readWhippetPSIfiles" function. I appreciate your help.