These are chat archives for biom262/biom262-2016

17th
Feb 2016
LElmen
@LElmen
Feb 17 2016 05:07
Can anyone give me a hint on how to slice off row 1 and get the index column set? I've tried various 'set_index', index_col = '0', print [1:46984
this is not supposed to be hard :-(
mbaughn
@mbaughn
Feb 17 2016 05:08
skiprows=1
index_col is the other one
you can just specify the name of the column in single quotes
or put the column number
both seem to work
LElmen
@LElmen
Feb 17 2016 05:11
I've tried both index_col and 'Geneid', ... it is maybe another problem alltogether.
Conall Sauvey
@csauvey
Feb 17 2016 05:11
index_col didn't work for me until i included the skiprows first
mbaughn
@mbaughn
Feb 17 2016 05:11
The assertion statement will also totally fail if your dataset is missing the last column like mine was… lol
LElmen
@LElmen
Feb 17 2016 05:11
Because it doesn't understand that the second row can be a header I guess
I think most of my homework is a total fail, but I'll fight through what I can.
Conall Sauvey
@csauvey
Feb 17 2016 05:12
^
LElmen
@LElmen
Feb 17 2016 05:12
Thanx for the tips!
vwfu
@vwfu
Feb 17 2016 05:16
@olgabot I completed exercise 6 last week, and just now noticed it's been modified, and my command has been moved up one cell I think... Do I need to redo the exercise? There's a new cell under Exercise 6 that just says: #MODIFIED ASSIGNMENT STATEMENT ABOVE
ecwheele
@ecwheele
Feb 17 2016 05:48
@LElmen did you try including: comment='#' inside of pd.read_table()
LElmen
@LElmen
Feb 17 2016 05:50
I think it was something else, I had to rerun the import pandas and those, restarted my login. This worked (see scrnshot)
Screen Shot 2016-02-16 at 9.50.33 PM.png
ecwheele
@ecwheele
Feb 17 2016 05:51
Woo!!
LElmen
@LElmen
Feb 17 2016 05:51
I learned some things about slicing and indexing in python trying to googleshoot the problem. :smile:
This forum has been great help!
ecwheele
@ecwheele
Feb 17 2016 05:52
While sometimes incredibly frustrating... combing through the google responses is the best way to learn!
LElmen
@LElmen
Feb 17 2016 05:54
Usually I end up learning something else than what I was looking for, but that sometimes comes in handy later on (oh, didn't I read that somewhere...)
Olga Botvinnik
@olgabot
Feb 17 2016 05:56
@vwfu what changed?
vwfu
@vwfu
Feb 17 2016 05:59
Screen Shot 2016-02-16 at 9.58.34 PM.png
Olga Botvinnik
@olgabot
Feb 17 2016 05:59
Hmm was that after you merged changes?
vwfu
@vwfu
Feb 17 2016 05:59
i think it moved my answer to the cell above exercise 6? not quite sure...
Olga Botvinnik
@olgabot
Feb 17 2016 06:00
Put your answer directly below -that’s where the grading code is
vwfu
@vwfu
Feb 17 2016 06:01
done! i'll resubmit the hw in that case
Olga Botvinnik
@olgabot
Feb 17 2016 06:02
you don’t have to “resubmit"
if you git add+commit+push your changes, it’ll update the pull request
vwfu
@vwfu
Feb 17 2016 06:03
oh yeah that's what i meant by 'resubmit' sorry terminology
Olga Botvinnik
@olgabot
Feb 17 2016 06:03
ah got it
just making sure!
I didn’t want you making a new branch or somethign
vwfu
@vwfu
Feb 17 2016 06:03
nope haha
thanks !
LElmen
@LElmen
Feb 17 2016 06:50
For the FPKM calculation, where do we get the total number of reads sequenced? The defined reads = from the bam is the read counts of feature of interest, isn't it?
ecwheele
@ecwheele
Feb 17 2016 06:52
Sum the values in the column of feature counts
Those are the total number of reads that mapped to genes. Each Fragment mapped to a gene comes from 1 read
So there are also sequencing reads that don't map to genes, but in this case we are not interested in them so we take the total sum of fragments (reads) that mapped to genes by summing that column in the featurecounts output
LElmen
@LElmen
Feb 17 2016 06:55
what is the header of the feature counts output, I might be missing columns
mbaughn
@mbaughn
Feb 17 2016 06:55
I was too
ecwheele
@ecwheele
Feb 17 2016 06:55
it should be the last column that is labelled /home/ucsd-trainxx/projects/shalek2013/processed_data/S10.Aligned.out.sorted.bam
LElmen
@LElmen
Feb 17 2016 06:59
No, I have it! Thank you both!
LElmen
@LElmen
Feb 17 2016 07:16
I don't get it... If FPMK= "counts observed for feature x" (which is the sum of the /...//sorted.bam)/ length (summed up length column) * Total reads sequenced,
where are the total reads sequenced no?
ecwheele
@ecwheele
Feb 17 2016 07:17
FPKM = ((counts to a gene / length of the gene) / total reads sequenced)* 1e9
total reads sequenced is the sum of the counts column
which you can get with:
reads = s10_featurecounts['/home/ucsd-trainxx/projects/shalek2013/processed_data/S10.Aligned.out.sorted.bam']
total counts = reads.sum()
LElmen
@LElmen
Feb 17 2016 07:31
Finally got it... I've been trying to sum up ALL genes and counts here... (For once syntax wasn't the problem). I should probably not try to do any calculations ever after 10pm. If I take the example gene given in the assert it matches now. Thank you!!\
mbaughn
@mbaughn
Feb 17 2016 18:09
If anyone is suspicious of their symbolic links, you can print their paths using the -H option of ls:
E.G.
[ucsd-train03@tscc-login1 scratch]$ ls -lhaH

lrwxrwxrwx   1 ucsd-train03 biom262-group   27 Feb  8 09:17 shalek2013 -> /home/ucsd-train03/projects