Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Alex Vasilev
    @ir0nfelix
    what is the barcode file? is it specific for my data? how should I find it??
    Mehmet Tekman
    @mtekman

    Yeah the tutorial could be a bit clearer, it's a complex topic

    what is the barcode file? is it specific for my data? how should I find it??

    So two things you need to know:

    1. What your barcode format is, so you can extract the barcodes out of the sequences
    2. Whether your protocol has a list of expected barcodes (a barcode file), to check your extracted barcodes are correct.

    When you extract the barcodes out of your reads, you may get some false positives that will inflate the number of cells in your sample, so the barcode file handles that by filtering out unwanted barcodes or clustering them to the expected barcodes that should exist in your sample.

    • The barcode format is usually specific to the protocol, which for Strt-seq is given on page 15 in their paper (https://doi.org/10.1007/978-1-4939-9240-9_9). From what I can understand of Step 7, the format is:

      • AATGATACGGCGACCACCGATNNNNNNGGGXX..XXCTGTCTCTTATACACATCTGACGCXXXXXXXXTCGTATGCCGTCTTCTGCTTG
        • i.e (21bp of Sequence) + (6bp UMI) + (Variable bp of Sequence again) + "GACGC" + (8bp Cell Barcode) + (Variable bp of Sequence again)
      • To extract this, you would need to provide UMI-tools extract with a regular expression pattern like:
        • (.{21})(?P<umi_1>.{6})(.*)(?P<discard_1>GACGC)(?P<cell_1>.{8})(.*) (notice that each parenthesis group here mirrors the above parentheses groups)
    • The barcode file which contains the list of true barcodes should be something you get from the specific facility that runs the Strt-seq protocol. I am not too familiar with this protocol, so it could be possible that there isn't a barcodes file, in which case you would need to filter out false positives by using a filtering tool like the DropletUtils tool (also in Galaxy).

    3 replies
    Björn Grüning
    @bgruening
    @ir0nfelix can you please ask this question again on help.galaxyproject.org
    this way @mtekman can answer there and the answer does not get lost
    Mehmet Tekman
    @mtekman
    yeah good point
    Mehmet Tekman
    @mtekman
    I've copied my response to a related question here (https://help.galaxyproject.org/t/single-cell-barcode-extraction/4889/5)
    tshum-create
    @tshum-create
    Hi Mehmet and Alex - I'm new and just went through your "Downstream Single-cell RNA analysis with RaceID" tutorial. It's awesome, thanks for making this. I was able to follow it until the lineage analysis section. I'm a bit confused about the StemID link score calculations. There is a discrepancy in the number of Cluster 2 links between Figure 19 (2 links) versus the spanning tree and cluster tree diagrams (4 links to C3, C4, C8, and C9), and in the solution that describes C2 as having the same links as C3 (which is not evident from Figure 19's top subfigure and the cluster diagram). Could you please explain why the StemID function doesn't show 4 links for C2, and instead only shows 2 links? Thanks!
    Mehmet Tekman
    @mtekman
    @tshum-create thanks for going through our materials! You're correct, that looks like it should be 4 links shown in the Fig19 plot, and not 2 - it is likely I took either the tree diagram from a different analysis, or the link scores. I will fix this in a later tutorial update, thanks!
    tshum-create
    @tshum-create
    Thanks for getting back Mehmet. I think it's an issue with Stem-ID, and not a copy/paste error on your part. I had run the practice materials on Galaxy, and my output from the "Lineage Computation using StemID" PDF Report gave the same outputs you showed in Figure 19 and the spanning tree diagrams. I'm just not sure how the StemID function decided to arbitrarily trim off the 2 links from C2.
    Mehmet Tekman
    @mtekman
    @tshum-create it's likely due to the link strength being low, and only showing you the more viable paths. I think if you play with the score threshold parameter, it might show you more
    lsryap
    @lsryap

    Hi – I am trying to complete the pre-processing of 10X Single-Cell RNA Datasets tutorial to produce a count matrix from fastq files, using the PBMC subset data. I’m using the data links (fastq, barcodes, gene annotations) provided and following the steps exactly, but the pipeline fails and I receive the error warning ‘EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version
    SOLUTION: re-generate genome index’

    Could you please advise? I’m completely new to this type of analysis and was hoping to run STARsolo with the PBMC data before moving on to my own (10x, v3). Thank you.

    lsryap
    @lsryap
    I should add that a tried downloading a different .gtf file, but this too, failed.
    Mehmet Tekman
    @mtekman
    @lsryap it looks like an issue with STARsolo, a fix is in the works as we speak
    Can you try rerunning with an earlier version of the tool?
    lsryap
    @lsryap
    Hi - thanks for the fast reply :) Is there a simple way to run with an earlier version of the tool on the Galaxy site? Apologies - completely new at this and have a lot to learn!
    Mehmet Tekman
    @mtekman
    Yep! At the top right of each tool is a "versions" button
    click on it to see previous galaxy wrappers of the same tool
    lsryap
    @lsryap
    Thanks for your help Mehmet. It's taken a bit of tweaking, but by using an earlier version of the tool, along with importing the fastq pairs separately instead of as a list and using my own reference genome (as the dropdown menu was empty), I seem to have things working and was able to use the outputs in R to perform Seurat analysis. Now to try my own data....!!
    Mehmet Tekman
    @mtekman
    :-)
    Pablo Moreno
    @pcm32
    Anyone making an https://github.com/satijalab/azimuth Interactive Tool?
    Björn Grüning
    @bgruening
    Not that I know of.
    tshum-create
    @tshum-create
    Hi, I'm having a problem at the beginning of the scanpy tutorial, at the step of Import Anndata and loom. I've filled in everything correctly with the test data as the tutorial says, but the program ends up with an error when I try to run this. Is this something new anybody else is seeing recently?
    tshum-create
    @tshum-create
    Just wanted to update- Things worked when I switched to an older version of Import Anndata and loom. It worked with version 0.6.22.post1+galaxy4, not with Version 0.7.4+galaxy1). This was on Galaxy version 20.09.
    Mehmet Tekman
    @mtekman
    This was patched recently, and I tested it with datasets from that tutorial -- but apparently it's still not quite robust yet -- please do use the previous version for now ;-)
    tshum-create
    @tshum-create
    Thanks Mehmet!
    Björn Grüning
    @bgruening
    Pavankumar Videm
    @pavanvidem
    Perfect timing :smiley: I will include the new options into the Galaxy tool. Filtering without remapping is a nice feature.
    Björn Grüning
    @bgruening
    And the conda package is available in 20min.
    Mehmet Tekman
    @mtekman
    :tada:
    tshum-create
    @tshum-create
    image.png
    Hey, quick question about the "Plot with scanpy" Visualize QC metrics step in the tutorial- how do I correct the orientation of the violin plot generated? My plots are coming out in the horizontal orientation, and I get an error whenever I select the vertical option in the "Parameters for seaborn.violinplot" Orientation of the Plot. Thanks for your help as always!
    Mehmet Tekman
    @mtekman
    This is a known bug, you can either use an earlier version of the tool, or wait for an update i n a few weeks
    tshum-create
    @tshum-create
    Hi Mehmet, thanks for letting me know. I was also unable to get the "Inspect and Manipulate with scanpy" function to work to compute Neighborhood graphs in the tutorial, so perhaps there are also some bugs in this.
    Mehmet Tekman
    @mtekman
    @tshum-create does a previous version of the tool not work at all?
    tshum-create
    @tshum-create
    Hey Mehmet, I tried it with 1.4.4post1 (galaxy 0 -3) and 1.4.4 galaxy0. I couldn't do it with 1.4 because not all the required fields from the tutorial were there.
    All of those didn't work
    Mehmet Tekman
    @mtekman
    Okay, thanks for reporting this -- we're working on an upgrade
    Björn Grüning
    @bgruening
    @mtekman @pavanvidem do you know if scanpy takes all CPUs by defaul if we do not restrict it?
    Mehmet Tekman
    @mtekman
    I remember @bebatut and @gmauro discussing this once in a team meeting, I believe there was an issue with numba, and that it was making scanpy take a long time --- either because it was using too much resources, or not enough
    Björn Grüning
    @bgruening
    what do you see locally?
    is it spawning 8 threads when you have 8 CPUs?
    Mehmet Tekman
    @mtekman
    I believe so yes
    Carolyn Nielsen
    @carolynnielsen
    Hi all - I'm looking to extract and analyse BCR sequences from my single cell SMART-Seq data. Are there tools for this in Galaxy? I've come across papers on Antigen Receptor Galaxy but this doesn't seem to be available on the instances I have checked. Are there alternatives? Or is there a way for me to import this (like a workflow) without having admin privileges? https://toolshed.g2.bx.psu.edu/repository?repository_id=2e457d63170a4b1c&changeset_revision=28fbbdfd7a87
    Duaa Mohammad Alawad
    @DuaaAlawad
    how can I choose the root cell type from this dataset?
    Mehmet Tekman
    @mtekman
    @DuaaAlawad which dataset are you referring to?
    Duaa Mohammad Alawad
    @DuaaAlawad
    pbmc8k
    Mehmet Tekman
    @mtekman
    so you don't typically choose the root/stem cell type, you infer it from the clustering. But you can rename the clusters to whatever you want by using the Rename Categories of Annotation option in the Manipulate Anndata tool
    Carolyn Nielsen
    @carolynnielsen
    Any suggestions for workflows or tools in Galaxy for BCR sequence analysis from either SMART-Seq or 10X V(D)J data sets?
    Mehmet Tekman
    @mtekman
    We're currently working on a SMART-Seq training -- but I think the following workflow should work: https://usegalaxy.eu/u/mehmet-tekman/w/smart-seq2
    iMammal
    @iMammal

    When walking through the Alevin tutorial (https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/droplet-quantification-preprocessing/tutorial.html) on the Human transcriptome "Comprehensive gene annotation" GTF file from Gencode (https://www.gencodegenes.org/human/) on the usegalaxy.eu server, the GTF2GeneList tools fails when the "Flag Mitochondrial Features" option is enabled (default parameters) with this error:

    Error in [[<-(tmp, name, value = character(0)) :
    0 elements in value to replace 258145 elements
    Calls: $<- -> $<- -> [[<- -> [[<-
    Execution halted

    I tried the GTF with or without the "scaffolds, assembly patches and alternate loci" with the same error. Am I missing something or is this an actual bug? It seems to work fine on genes and transcripts but adding the mito features always makes it crash.

    5 replies