These are chat archives for biom262/biom262-2016

11th
Feb 2016
aparisian
@aparisian
Feb 11 2016 02:00
Hi Olga,
I'm wondering if the assert statement in exercise 6 might have an error in it. The statement is comparing the column names of all but the last column (s10_featurecounts.columns[:-1]) and in my file Length is the last column. Maybe I made a mistake and my file is wrong though.
aparisian
@aparisian
Feb 11 2016 02:16
I also can't get sns.distplot to work. Are we still using the featurecounts table we've generated, or opening another file?
LElmen
@LElmen
Feb 11 2016 02:40
@olgabot Hi Olga, on the page with the homework instructions it says "Turn in this note book: compare the kallisto and featurecounts algorithms" and it's marked as a link. This gives 404: not found" Is this not supposed to be a link? We make a new notebook folder and give it that name?
ecwheele
@ecwheele
Feb 11 2016 02:40
The link doesn't work but if you go to the website the homework is there.
Also, if you load up your python notebook, you will find the homework notebook in the week04 folder
LElmen
@LElmen
Feb 11 2016 02:44
It's named "homework" and when you open it the headline says "Gene expression analysis"?
ecwheele
@ecwheele
Feb 11 2016 02:45
it is called 2_compare_alignment_vs_quasialignment.ipynb
if it is not there you need to do a git pull upstream master
LElmen
@LElmen
Feb 11 2016 02:46
Do I have to switch to branch week04 first?
ecwheele
@ecwheele
Feb 11 2016 02:47
no but you will want to before you make any changes to the file otherwise you won't be able to git add them to the week04 branch
LElmen
@LElmen
Feb 11 2016 02:49
I see, now I also get the merge conflicts talked about earlier here. I think I can figure that out. Thank you @ecwheele !
ecwheele
@ecwheele
Feb 11 2016 02:50
No problem!
ecwheele
@ecwheele
Feb 11 2016 03:03
@aparisian What is your error with sns.distplot?
the syntax for the string in quotes is the name of the column in that dataframe that you will be plotting. So make sure that '/home/ucsd-train01/projects/shalek2013/processed_data/S10.Aligned.out.sorted.bam' exists as a column name in your dataframe
I believe you had to define it somewhere above
vwfu
@vwfu
Feb 11 2016 03:10
so the column name is : /home/ucsd-train01/projects/shalek2013/processed_data/S13.Aligned.out.sorted.bam
that may explain it
but why is it s13 not s10...
ecwheele
@ecwheele
Feb 11 2016 03:11
that I can't answer. There seems to have been a lot of confusion with s10 and s13. maybe some typos?
vwfu
@vwfu
Feb 11 2016 03:12
hm, so when i change it to S13 it works and plots it for me!
makes sense since thats the colun name when i define it using head()
but weird bc i though were were workign with s10 data for that particular exercise
ecwheele
@ecwheele
Feb 11 2016 03:13
did you load it properly in pd.read_table(s10_featureCounts.txt)
? That is where your dataframe is coming from
you can check it on your terminal with head s10_featureCounts.txt and see if the column name is the same there
vwfu
@vwfu
Feb 11 2016 03:13
here's my commands for loading: s10_featurecounts = pd.read_table('s10_featureCounts.txt')
print(s10_featurecounts.shape)
s10_featurecounts.head()
then to get rid of the first row: s10_featurecounts = pd.read_table('s10_featureCounts.txt')
print(s10_featurecounts.shape)
s10_featurecounts.head()
aparisian
@aparisian
Feb 11 2016 03:23
Where does it tell you to add that column? And what are we putting in it?
I don't see it mentioned anywhere before exercise 6
ecwheele
@ecwheele
Feb 11 2016 03:24
I think the column is already there. Can you see it with s10_featurecounts.head()
aparisian
@aparisian
Feb 11 2016 03:24
That's probably my problem though
No
ecwheele
@ecwheele
Feb 11 2016 03:25
Yeah... that would explain why the [:-1] wasn't working for you either...
I think it should come as an output in feature counts
are you sure your featurecounts ran properly?
aparisian
@aparisian
Feb 11 2016 03:28
Yeah, I think so. It's got the right number of rows and everything
ecwheele
@ecwheele
Feb 11 2016 03:28
do you see that column when you look at the file with less in the terminal?
aparisian
@aparisian
Feb 11 2016 03:29
I still don't see where we put that column, even in the featurecounts script from the in-class exercise
ecwheele
@ecwheele
Feb 11 2016 03:29
featureCounts automatically puts it there as part of the output from the program
You can read more about that here on page 30 of this PDF http://bioinf.wehi.edu.au/subread-package/SubreadUsersGuide.pdf
@aparisian check your featurecounts summary file in processed_data
compare that to your summary file for s13
There should be counts of the number of genes in each assigned category. If there are no counts there, that means there was an error in running your script.
ecwheele
@ecwheele
Feb 11 2016 03:34
ls -lh in processed_data.... if you s10_featureCounts.txt is much smaller than s13_featureCounts.txt, there is a problem with the script
aparisian
@aparisian
Feb 11 2016 03:37
Yeah it looks like the extra column is there in 13 but not 10, but I'm pretty sure I just copied the script over for 13
ecwheele
@ecwheele
Feb 11 2016 03:38
double check your input file for the s10 script
does it exist? is it empty?
ls -lh
aparisian
@aparisian
Feb 11 2016 03:46
Yeah I think it was an issue with the sort script. I'll try it again and see if it works now. Thanks Emily!
ecwheele
@ecwheele
Feb 11 2016 03:46
No problem, good luck!
Olga Botvinnik
@olgabot
Feb 11 2016 07:11
There may be typos with S10 and S13
since we did S13 in class and not S10 like I expected then that’s the data you have
@aparisian what’s your distplot error?
ecwheele
@ecwheele
Feb 11 2016 07:14
we got the distplot error figured out
ericsan119
@ericsan119
Feb 11 2016 07:29
Hi Olga
Screen Shot 2016-02-10 at 11.30.25 PM.png
Do you know what that problem is?
ecwheele
@ecwheele
Feb 11 2016 07:31
can you scroll down to the bottom of that error message
ericsan119
@ericsan119
Feb 11 2016 07:32
Screen Shot 2016-02-10 at 11.32.27 PM.png
ecwheele
@ecwheele
Feb 11 2016 07:33
hmmm.... not 100% sure but I would try...
on your tscc login terminal
conda install pandas
ericsan119
@ericsan119
Feb 11 2016 07:34
Yes I did it already
ecwheele
@ecwheele
Feb 11 2016 07:34
any error?
what were the last few lines
ericsan119
@ericsan119
Feb 11 2016 07:34
but it did not work out
let me try again
I got this error msg
ValueError: unknown locale: UTF-8
Screen Shot 2016-02-10 at 11.37.41 PM.png
ecwheele
@ecwheele
Feb 11 2016 07:38
are you running conda install pandas on your command line outside of the notebook?
ericsan119
@ericsan119
Feb 11 2016 07:40
I run condo install pandas on TSCC terminal
ecwheele
@ecwheele
Feb 11 2016 07:40
conda
not condo
and what are the last few output lines of that command?
ericsan119
@ericsan119
Feb 11 2016 07:41
Screen Shot 2016-02-10 at 11.40.45 PM.png
Sorry that was my typo
ecwheele
@ecwheele
Feb 11 2016 07:44
can you show me the output of ls in the directory where you are running the conda install
ericsan119
@ericsan119
Feb 11 2016 07:45
[ucsd-train08@tscc-login2 ~]$ ls
Anaconda3-2.4.1-Linux-x86_64.sh
Log.out
P21.bam
TSG_len.bed
anaconda3
code
data
gencode.v19.annotation.chr22.transcript.gtf
gencode.v19.annotation.chr22.transcript.promoter.gtf
gencode.v19.annotation.chr22.transcript.promoter.nfkb.fasta
gencode.v19.annotation.chr22.transcript.promoter.nfkb.gtf
myENVs.txt
notebooks
projects
scratch
test
tf.nfkb.bed
???p
[ucsd-train08@tscc-login2 ~]$
ecwheele
@ecwheele
Feb 11 2016 07:47
sorry... I'm out of ideas :(
I could suggest google?
ericsan119
@ericsan119
Feb 11 2016 07:48
Thank for your help @ecwheele
Olga Botvinnik
@olgabot
Feb 11 2016 16:54
hey @R
@ericsan119
can you show the shell environment variable $PYTHONPATH ?
I suspect there’s different pythons in conflict
if may be you need to empty the PYTHONPATH variable before running jupyter notebook
you’d do that with: export PYTHONPATH=
(i.e. assign pythonpath to nothing)
to show the output, do echo $PYTHONPATH (with the dollar sign)
note that the export aka “assign” command didn’t have the dolllar sign
Alannah Miranda
@ahmirand
Feb 11 2016 17:19
scp ucsd-train17@tscc-login1.sdsc.edu:/projects/ps-yeolab/biom262_2016/data/class/P21.TSG.bam Desktop
Olga Botvinnik
@olgabot
Feb 11 2016 17:21
You may need a tilde slash before Desktop
~/Desktop
ericsan119
@ericsan119
Feb 11 2016 17:39
Hi @olgabot I did what you suggest just now... I still get the error msg.
Screen Shot 2016-02-11 at 9.37.33 AM.png
Screen Shot 2016-02-11 at 9.40.12 AM.png
Conall Sauvey
@csauvey
Feb 11 2016 18:09
@olgabot when you had us make a "tscc" alias for sshing to the cluster, where was that? It's not in the .bashrc and I want to change it to always log in to login1
Alannah Miranda
@ahmirand
Feb 11 2016 18:18
nano ~/.bash_profile
but before you login into tscc, so you can just command+T to open a new tab and do it there
Olga Botvinnik
@olgabot
Feb 11 2016 18:26
@ericsan119 can you Uninstall and reinstall pandas?
I really don't know what's happening
ericsan119
@ericsan119
Feb 11 2016 19:16
@olgabot What is the command for uninstalling and installing pandas on terminal ?
is it "pip uninstall pandas"?
ericsan119
@ericsan119
Feb 11 2016 19:23
I did uninstall and install pandas. It still has not fixed...
ericsan119
@ericsan119
Feb 11 2016 19:52
@olgabot I have fixed my problem. I just unchecked "Set locale environment variables on startup"
Screen Shot 2016-02-11 at 11.52.04 AM.png
Olga Botvinnik
@olgabot
Feb 11 2016 20:03
@ericsan119 do conda uninstall pandas and conda install pandas instead of pip
because conda comes with all the C dependencies linked and pip doesn’t do that - pip is pure python
Olga Botvinnik
@olgabot
Feb 11 2016 20:16
(video chat)
jk this one