These are chat archives for biom262/biom262-2016

14th
Feb 2016
bmlewis-UCSD
@bmlewis-UCSD
Feb 14 2016 00:42
assert s10_featurecounts.shape == (46983, 6)
assert (s10_featurecounts.columns[:-1] == pd.Index(['Chr', 'Start', 'End', 'Strand', 'Length'],
      dtype='object')).all()

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-16-d862fc3ff3b3> in <module>()
----> 1 assert s10_featurecounts.shape == (46983, 6)
      2 assert (s10_featurecounts.columns[:-1] == pd.Index(['Chr', 'Start', 'End', 'Strand', 'Length'],
      3       dtype='object')).all()

AssertionError:
# my output (2993, 6)
s10_featurecounts = pd.read_table('S10_featureCounts.txt', index_col=0, header=1)
print(s10_featurecounts.shape)
s10_featurecounts.head()
is my S10_featurecounts.txt file messed up (too short) or I am messing up exercise 6
Olga Botvinnik
@olgabot
Feb 14 2016 00:46
Can you show a screenshot of the first 5 lines of the data frame ie the head command?
The assertion error is if there are any mismatches in your file
Did you use chr11 or the full dataset?
For featurecounts?
bmlewis-UCSD
@bmlewis-UCSD
Feb 14 2016 00:48
    Chr    Start    End    Strand    Length    /home/ucsd-train21/projects/shalek2013/processed_data/S10.Aligned.out.sorted.bam
Geneid                        
ENSMUSG00000082286.10    chr11;chr11;chr11;chr11;chr11;chr11    3125904;3127479;3128929;3129765;3130153;3130793    3126058;3127644;3129520;3129911;3130313;3131004    +;+;+;+;+;+    1433    204
ENSMUSG00000023764.18    chr11;chr11;chr11;chr11;chr11;chr11;chr11;chr1...    3131850;3132216;3132887;3133072;3134318;313462...    3132131;3132327;3132965;3133207;3134501;313473...    -;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;...    4432    0
ENSMUSG00000087111.1    chr11;chr11;chr11    3142048;3142594;3143180    3142111;3142669;3143363    +;+;+    324    0
ENSMUSG00000081208.1    chr11    3149742    3150955    -    1214    0
ENSMUSG00000020457.13    chr11;chr11;chr11;chr11;chr11;chr11;chr11;chr1...    3249907;3252160;3252644;3254512;3256543;325930...    3250365;3252282;3252811;3254642;3256712;325937...    -;-;-;-;-;-;-;-;-    1546    10030
want a screen shot instead?
Olga Botvinnik
@olgabot
Feb 14 2016 00:48
Yeah it looks like you used chr11
It's my mistake because I thought it would take to long but featurecounts is really fast so use the full gtf file
bmlewis-UCSD
@bmlewis-UCSD
Feb 14 2016 00:50
in the feature counts.sh file
or the align.sh
Olga Botvinnik
@olgabot
Feb 14 2016 00:50
The one that produces this file
bmlewis-UCSD
@bmlewis-UCSD
Feb 14 2016 00:50
ok so I did align and sort the whole thing
just only counted 11
Olga Botvinnik
@olgabot
Feb 14 2016 00:51
You want to count everything
Ls in the directory where the gtf file is to find the other annotation
bmlewis-UCSD
@bmlewis-UCSD
Feb 14 2016 00:53
#!/bin/bash
#PBS -q hotel
#PBS -N s10_featurecounts
#PBS -V
#PBS -e s10_featurecounts.sh.err
#PBS -o s10_featurecounts.sh.out
#PBS -l nodes=1:ppn=8
#PBS -l walltime=0:10:00

featureCounts -T 8 \
-s -B --primary \
-a /projects/ps-yeolab/biom262-2016/genomes/mm10/gencode/m8/gencode.vM8.basic.annotation.chr11.gtf -o$
$HOME/projects/shalek2013/processed_data/S10.Aligned.out.sorted.bam
#!/bin/bash
#PBS -q hotel
#PBS -N s10_featurecounts
#PBS -V
#PBS -e s10_featurecounts.sh.err
#PBS -o s10_featurecounts.sh.out
#PBS -l nodes=1:ppn=8
#PBS -l walltime=0:10:00

featureCounts -T 8 \
-s -B --primary \
-a /projects/ps-yeolab/biom262-2016/genomes/mm10/gencode/m8/gencode.vM8.basic.annotation.gtf -o ~/projects/shalek2013/processed_data/S10_featureCounts.txt \
$HOME/projects/shalek2013/processed_data/S10.Aligned.out.sorted.bam
Olga Botvinnik
@olgabot
Feb 14 2016 00:55
Yep that's correct
bmlewis-UCSD
@bmlewis-UCSD
Feb 14 2016 00:55
should i delete the old ones or will it write over
Olga Botvinnik
@olgabot
Feb 14 2016 00:58
The old featurecounts output?
It will overwrite it
Actually I don't remember
bmlewis-UCSD
@bmlewis-UCSD
Feb 14 2016 00:59
ok theyre submitted.... i noticed eariler that that I could only see stuff on chr11 in the IGV viewer but was trying to ignore that
well i'll let you know
Olga Botvinnik
@olgabot
Feb 14 2016 00:59
Can you check the documentation? It might have a force command
bmlewis-UCSD
@bmlewis-UCSD
Feb 14 2016 00:59
if it does
Olga Botvinnik
@olgabot
Feb 14 2016 01:00
Hmm you may need to remap then
bmlewis-UCSD
@bmlewis-UCSD
Feb 14 2016 01:00
you mean realign
Olga Botvinnik
@olgabot
Feb 14 2016 01:00
Because thenyoull only have reads mapped to chr11
Yeah map/align are interchangeable
bmlewis-UCSD
@bmlewis-UCSD
Feb 14 2016 01:01
umm well it finished let me checkout the outputts
looks good in terminal and the nb assert error went away !
ya
yay*
bmlewis-UCSD
@bmlewis-UCSD
Feb 14 2016 01:23
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-11-5705422b0ff2> in <module>()
----> 1 sns.distplot(s10_featurecounts['/home/ucsd-train01/projects/shalek2013/processed_data/S10.Aligned.out.sorted.bam'])

/home/ucsd-train21/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in __getitem__(self, key)
   1967             return self._getitem_multilevel(key)
   1968         else:
-> 1969             return self._getitem_column(key)
   1970 
   1971     def _getitem_column(self, key):

/home/ucsd-train21/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   1974         # get column
   1975         if self.columns.is_unique:
-> 1976             return self._get_item_cache(key)
   1977 
   1978         # duplicate columns & possible reduce dimensionality

/home/ucsd-train21/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1089         res = cache.get(item)
   1090         if res is None:
-> 1091             values = self._data.get(item)
   1092             res = self._box_item_values(item, values)
   1093             cache[item] = res

/home/ucsd-train21/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3209 
   3210             if not isnull(item):
-> 3211                 loc = self.items.get_loc(item)
   3212             else:
   3213                 indexer = np.arange(len(self.items))[isnull(self.items)]

/home/ucsd-train21/anaconda3/lib/python3.5/site-packages/pandas/core/index.py in get_loc(self, key, method, tolerance)
   1757                                  'backfill or nearest lookups')
   1758             key = _values_from_object(key)
-> 1759             return self._engine.get_loc(key)
   1760 
   1761         indexer = self.get_indexer([key], method=method,

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)()

KeyError: '/home/ucsd-train01/projects/shalek2013/processed_data/S10.Aligned.out.sorted.bam'
that was from this input
sns.distplot(s10_featurecounts['/home/ucsd-train01/projects/shalek2013/processed_data/S10.Aligned.out.sorted.bam'])
bmlewis-UCSD
@bmlewis-UCSD
Feb 14 2016 01:30
nvm changed it to my ucsd-train folder and the error went away
sns.distplot(s10_featurecounts['/home/ucsd-train21/projects/shalek2013/processed_data/S10.Aligned.out.sorted.bam'])
thewilmergency
@thewilmergency
Feb 14 2016 21:55
Hello, I'm on exercise 6 and I'm getting a "KeyError"
I'm trying to get a distribution of the number of reads per feature using sns.distplot.
Screenshot 2016-02-14 13.56.13.png
I think I have all the relevant files. So I'm not sure what I'm missing to do this.
Screenshot 2016-02-14 13.57.57.png
thewilmergency
@thewilmergency
Feb 14 2016 22:01
Oh I think I figured it out
Olga Botvinnik
@olgabot
Feb 14 2016 23:46
What was the fix?