These are chat archives for biom262/biom262-2016

12th
Feb 2016
bmlewis-UCSD
@bmlewis-UCSD
Feb 12 2016 02:58
@olgabot where is S13_kallisto/abundance.tsv
when did 13 show up ? all I have is s10
Olga Botvinnik
@olgabot
Feb 12 2016 03:07
You have to run Kallisto and featurecounts on both samples
That means you still need to run Kallisto on s13
bmlewis-UCSD
@bmlewis-UCSD
Feb 12 2016 03:13
where is s13?
I have no idea
Olga Botvinnik
@olgabot
Feb 12 2016 03:26
In the same place as s10
TThe same folder
bmlewis-UCSD
@bmlewis-UCSD
Feb 12 2016 03:27
crying face
i am looking for that now
location should be in first s10.aligh.sh file?
Olga Botvinnik
@olgabot
Feb 12 2016 03:32
Yeah
Or the Kallisto sh file
I showed in class that the folder contains a bunch of files
bmlewis-UCSD
@bmlewis-UCSD
Feb 12 2016 03:33
k I just changed all the s10 to s13 s in the s10.align.sh and called it s13.align.sh and ran it
i just cd there
Olga Botvinnik
@olgabot
Feb 12 2016 03:35
That works! That's what I did
bmlewis-UCSD
@bmlewis-UCSD
Feb 12 2016 03:35
sigh
Olga Botvinnik
@olgabot
Feb 12 2016 03:35
You can do it!
bmlewis-UCSD
@bmlewis-UCSD
Feb 12 2016 03:39
thanks for the help, I am gonna take a mental health break , and try some more later
LElmen
@LElmen
Feb 12 2016 05:29
@olgabot Hi Olga, for importing numpy & co, I opened python (otherwise it complains about syntax), it seems it imported correctly, but then for the next step "s10_kallisto = pd.read_table('S10_kallisto/abundance.tsv', index_col='target_id')" it keeps complaining about syntax, whether I'm in bash or python. I've tried changing the ' for " etc. I also tried "load pd". Any suggestions?
[ucsd-train05@tscc-login2 processed_data]$ s10_kallisto = pd.read_table('S10_kallisto/abundance.tsv', index_col="target_id"')
-bash: syntax error near unexpected token `('
Olga Botvinnik
@olgabot
Feb 12 2016 05:45
Oh that should be run IN the jupyter notebook
It's python code so bash won't like it
LElmen
@LElmen
Feb 12 2016 05:50
Aha... thank you
LElmen
@LElmen
Feb 12 2016 06:05

I don't understand the # "_" is the previous output

assert _.index.shape == (17,)

What am i supposed to do here?
Olga Botvinnik
@olgabot
Feb 12 2016 06:40
That's the solution checking code
The_ is the previous output
Olga Botvinnik
@olgabot
Feb 12 2016 15:56
You don't have to do anything with it
Youtong Huang
@smugunicorn
Feb 12 2016 17:17
I accidentally deleted everything in my processed_data folder... but when I tried to resubmit things in my processing_scripts (s10_kallisto.sh, s10_sort.sh, etc), my err files keep saying "Could not open a connection to your authentication agent." I suppose this may have something to do with ssh-add we have done in the beginning but I haven't figured out what to do with this error... Would you give me a clue on what to try?
Olga Botvinnik
@olgabot
Feb 12 2016 17:20
I've seen that error on tscc sometimes, I think it's their fault not yours
Which queue are you submitting to?
Youtong Huang
@smugunicorn
Feb 12 2016 17:24
hotel
Olga Botvinnik
@olgabot
Feb 12 2016 17:25
do you get the output files you expect?
like after s10_align.sh finishes, you get an S10.Aligned.out.sam file ?
because if you’re getting the output you expect even with this stuff in your .err files then it’s fine
Youtong Huang
@smugunicorn
Feb 12 2016 17:27
I got a good Align.out file I think.
Olga Botvinnik
@olgabot
Feb 12 2016 17:27
sometimes TSCC gives you a weird error in the file but your output is fine
okay is it ~30 GB ?
ls -lh to see the file size
Youtong Huang
@smugunicorn
Feb 12 2016 17:27
it's exactly 30GB.
Olga Botvinnik
@olgabot
Feb 12 2016 17:27
okay then you’re good!
Youtong Huang
@smugunicorn
Feb 12 2016 17:27
yeah but my kallisto...
Olga Botvinnik
@olgabot
Feb 12 2016 17:28
do you get the folder?
can you show the output of ls -lh ~/projects/shalek2013/processed_data ?
Youtong Huang
@smugunicorn
Feb 12 2016 17:28
i get the folder, but there's nothing in it
Olga Botvinnik
@olgabot
Feb 12 2016 17:28
ok what’s in the s10_kallisto.sh.err file?
Youtong Huang
@smugunicorn
Feb 12 2016 17:28
hold on... I actually just deleted the folder haha.
Olga Botvinnik
@olgabot
Feb 12 2016 17:29
:fire:
Youtong Huang
@smugunicorn
Feb 12 2016 17:30
Screen Shot 2016-02-12 at 9.34.44 AM.png
Olga Botvinnik
@olgabot
Feb 12 2016 17:30
okay see where it says “file not found"
Youtong Huang
@smugunicorn
Feb 12 2016 17:30
yeah
Olga Botvinnik
@olgabot
Feb 12 2016 17:31
that’s because in the script it says [S10 R1 fastq file] and [S10 R2 fastq file] which indicates that you need to replace that placeholder with the real file
the file names are in the s10_align.sh file
Youtong Huang
@smugunicorn
Feb 12 2016 17:31
um... oh... wait LOL i got confused and just copy pasted your codes
Olga Botvinnik
@olgabot
Feb 12 2016 17:31
side note - I’m surprised the username @smugunicorn has not been taken until now!
yeah you gotta read them!
Youtong Huang
@smugunicorn
Feb 12 2016 17:32
hahah xD
Olga Botvinnik
@olgabot
Feb 12 2016 17:34
remember: “copy-paste-think"
Youtong Huang
@smugunicorn
Feb 12 2016 17:43
T^T yessss it worked... thanks so much @olgabot
Olga Botvinnik
@olgabot
Feb 12 2016 17:45
yayy!!!!!!
“copy-paste-think” is a lot of programming/bioinformatics .. someone out there has a solution to something that’s slightly different so you have to think about what you need to change for yours
is the video call for today
so you can type git co[TAB] and it’ll autocomplete git commit
mbaughn
@mbaughn
Feb 12 2016 19:14
Thank you!
LElmen
@LElmen
Feb 12 2016 20:24
I can't find the instructions for how to I kill a specific job on tscc? I forgot to rename the jobs and some other details so I had to resubmit, but now I have several jobs running. I thought ctrlC killed all jobs, but they are still running?
ecwheele
@ecwheele
Feb 12 2016 20:25
qdel jonid#
jobid#
LElmen
@LElmen
Feb 12 2016 20:25
Thank you!
LElmen
@LElmen
Feb 12 2016 20:37
ls
Olga Botvinnik
@olgabot
Feb 12 2016 20:50
?
I can tell you what files I currently have
Alannah Miranda
@ahmirand
Feb 12 2016 20:58
my s10_featurecounts.txt file has an extra column that is headed by the code i used to run the featurecounts job... and every time i get rid of said column the next few cells don't work/have an assertion error and when i leave the column in, i think the cells are still returning wrong answers. I've tried rerunning the job and compared it with the code that is in the notebook we worked on in class, but i can't figure out whats wrong. halp pls :disappointed_relieved:
Olga Botvinnik
@olgabot
Feb 12 2016 20:59
can you show the code?
we can fix this
Alannah Miranda
@ahmirand
Feb 12 2016 20:59
Screen Shot 2016-02-12 at 12.55.45 PM.png
here's the code from the notebook, but did you want to see what I used for to run the job too?
Olga Botvinnik
@olgabot
Feb 12 2016 21:02
yeah that looks correct
where is the error coming from?
can you show the code and the error message?
Alannah Miranda
@ahmirand
Feb 12 2016 21:09

well if I leave the column there, then the following cells seem to be working but the

sns.distplot(s10_featurecounts['/home/ucsd-train17/projects/shalek2013/processed_data/S10.Aligned.out.sorted.bam'])

results doesn't really look right to me, but then again I'm not sure what it should look like, but it looks like this:

Screen Shot 2016-02-12 at 12.56.02 PM.png

and when I use this to remove the last column:

s10_featurecounts = pd.read_table('s10_featureCounts.txt', skiprows=1, index_col=0, usecols= ['Chr', 'Start', 'End', 'Strand', 'Length'])

I get this error:

Olga Botvinnik
@olgabot
Feb 12 2016 21:10
why doesn’t it look right to you>
and whih column are you removing?
remember that the example data is usign ucsd-train01
so every time you see ucsd-train01 you need to replace that with ucsd-train17
You NEED that big column because that’s the read counts!!
Alannah Miranda
@ahmirand
Feb 12 2016 21:11
Screen Shot 2016-02-12 at 1.08.51 PM.png
Olga Botvinnik
@olgabot
Feb 12 2016 21:11
that distribution is exactly what it should look like
So you need that column - don’t ignore it
it’s the read counts per gene
the distribution is exactly what it should be
it looks big and crazy because theres some genes that had 10000s of reads mapping to it
the KeyError is like a NameError except in teh dataframe
it can’t find a column named that
so when you ignored that column, yhou no longer have it and you can’t plot it
Alannah Miranda
@ahmirand
Feb 12 2016 21:14
Ohhh haha i thought it was an error since the header for that column is the bam file location
Olga Botvinnik
@olgabot
Feb 12 2016 21:14
nope - that’s exactlyw hat the column should be called
it’s called that so you know which bam file you used to count features on
Alannah Miranda
@ahmirand
Feb 12 2016 21:16
cool, well then in that case in the first notebook I just noticed we were supposed to use the Chr11 subset file for featurecounts. should i go back and change that for this hmwk or is this one okay to use?
Olga Botvinnik
@olgabot
Feb 12 2016 21:16
you can use the full dataset
featurecounts is faster than I thought
Alannah Miranda
@ahmirand
Feb 12 2016 21:16
okay, great. thank you!!
Olga Botvinnik
@olgabot
Feb 12 2016 21:17
you are welcome!
LElmen
@LElmen
Feb 12 2016 21:50
@olgabot haha wrong window... please ignore
LElmen
@LElmen
Feb 12 2016 22:26
@olgabot Hello again, something is not right with my kallisto script. Is the #PBS -V flag supposed to be empty?
Olga Botvinnik
@olgabot
Feb 12 2016 22:28
yes it puts everythign in your environment variables
can you show the output of the .err file ?
LElmen
@LElmen
Feb 12 2016 22:28

std.err says Error: Missing read files
Error: need to specify output directory

/var/spool/torque/mom_priv/jobs/4357181.tscc-mgr.local.SC: line 12: --threads: command not found

So it looks like I would have missed --output-dir, but that's in there
Olga Botvinnik
@olgabot
Feb 12 2016 22:29
can you show the script?
LElmen
@LElmen
Feb 12 2016 22:30
Screen Shot 2016-02-12 at 2.30.07 PM.png
Olga Botvinnik
@olgabot
Feb 12 2016 22:30
You need another “\” after the first line
It thinks the first line is one command and it ends
and it starts reading the next line thinking it’s a new command called —threads and there’s nothing called that so it gets confused
computers get confused too :(
LElmen
@LElmen
Feb 12 2016 22:31
:smile:
I'll try that.
Thnx
On tscc, q=queued, r=running, c=cancelled, or is c for complete?
Olga Botvinnik
@olgabot
Feb 12 2016 22:35
c=compelte
which may have been cancelled by you or it finished
LElmen
@LElmen
Feb 12 2016 22:35
ok
LElmen
@LElmen
Feb 12 2016 23:05
It makes sh.err and sh.out, but I don't get an output folder. The error message is quite long.
Screen Shot 2016-02-12 at 3.07.08 PM.png
Screen Shot 2016-02-12 at 3.07.19 PM.png
Olga Botvinnik
@olgabot
Feb 12 2016 23:08
huh and you’re ucsd-train05 ?
LElmen
@LElmen
Feb 12 2016 23:08
Yes
Olga Botvinnik
@olgabot
Feb 12 2016 23:08
that’s a really strange error
I haven’t seen that before
okay you specified 8 threads and asked for 8 processors (threads) from TSCC
according to your old script
so it’s not a mismatch between how many threads the program expects and how many you gave it
for example, it would get mad if you asked for 8 in kallisto but only gave 4 in TSCC - then you’d shortchange the program
can you show your new script?
LElmen
@LElmen
Feb 12 2016 23:10
I see I have a $ that maybe shouldn't be there
Olga Botvinnik
@olgabot
Feb 12 2016 23:10
can you show it?
LElmen
@LElmen
Feb 12 2016 23:11

!/bin/bash

PBS -q hotel

PBS -V

PBS -N s13_kallisto

PBS -e s13_kallisto.sh.err

PBS -o s13_kallisto.sh.out

PBS -l nodes=1:ppn=8

PBS -l walltime=0:30:00

kallisto quant --index /projects/ps-yeolab/biom262-2016/genomes/mm10/gencode/m8/gencode.vM8.pc_transcripts.kallisto$
--threads 8 --output-dir $HOME/projects/shalek2013/processed_data/S10_kallisto \
$HOME/projects/shalek2013/raw_data/S13_R1.fastq.gz \
$HOME/projects/shalek2013/raw_data/S13_R2.fastq.gz
Olga Botvinnik
@olgabot
Feb 12 2016 23:11
yeah that dollar sign shouldn’t be there
that’s where the “\” should o
go
it’s hard for me to read this
can you format it with three backticks “```” on an otherwise empty line at the beginning and end of the message?
LElmen
@LElmen
Feb 12 2016 23:13
Screen Shot 2016-02-12 at 3.13.06 PM.png
Olga Botvinnik
@olgabot
Feb 12 2016 23:14
oh hmm that may be a formatting thing
Can you add a backslash after —index and press ENTER ?
the index file is very long and it’s hard to tell waht’s going on
(make sure there’s no spaces or tabs after —index)
oops I mean after the “\” (backslash)
LElmen
@LElmen
Feb 12 2016 23:15
Screen Shot 2016-02-12 at 3.15.00 PM.png
The $ disappeared when I did that
Olga Botvinnik
@olgabot
Feb 12 2016 23:15
okay yeah nano was using the $ to show the line still continues
that’s what I suspected
LElmen
@LElmen
Feb 12 2016 23:15
aha
Olga Botvinnik
@olgabot
Feb 12 2016 23:16
hmm I say double-check for spaces after the backslashes and run again
it might have been a heisenbug
LElmen
@LElmen
Feb 12 2016 23:16
heisenbug?
Olga Botvinnik
@olgabot
Feb 12 2016 23:16
(a bug that you see once and never see again)
LElmen
@LElmen
Feb 12 2016 23:16
oh weird
Olga Botvinnik
@olgabot
Feb 12 2016 23:16
or you can observe but can’t pinpoint the reason
LElmen
@LElmen
Feb 12 2016 23:17
I'll try to run it again
Olga Botvinnik
@olgabot
Feb 12 2016 23:17
ok great
LElmen
@LElmen
Feb 12 2016 23:40
This isn't working. For s13 I got the long error message again, then I tried to re-run s10 naming the job s10again and it didn't create any output either, but the error was different.
Screen Shot 2016-02-12 at 3.39.50 PM.png
Olga Botvinnik
@olgabot
Feb 12 2016 23:40
yay different error!
my favorite!
hmm does the folder ~/projects/shalek2013/processed_data exist?
did you make the soft link from scratch?
LElmen
@LElmen
Feb 12 2016 23:42
yes, ls
Olga Botvinnik
@olgabot
Feb 12 2016 23:42
can you show ls -lha ~/projects/shalek2013/
LElmen
@LElmen
Feb 12 2016 23:43
Screen Shot 2016-02-12 at 3.42.52 PM.png
Olga Botvinnik
@olgabot
Feb 12 2016 23:43
okay perfect
now can you show ls -lha ~/projects/shalek2013/processed_data
there’s something weird going on with permissions
LElmen
@LElmen
Feb 12 2016 23:44
Screen Shot 2016-02-12 at 3.43.57 PM.png
Olga Botvinnik
@olgabot
Feb 12 2016 23:45
oh right you have to cd first
cd ~/projects/shalek2013/processed_data
ls -lha
soft links are funny like that
LElmen
@LElmen
Feb 12 2016 23:46
Screen Shot 2016-02-12 at 3.46.21 PM.png
Olga Botvinnik
@olgabot
Feb 12 2016 23:47
wait someone else has been there
bmreilly
LElmen
@LElmen
Feb 12 2016 23:47
I have an S10_kallisto folder from when we did this in class, but I'm not sure it is right. I can't do the S10_kallisto.shape
Olga Botvinnik
@olgabot
Feb 12 2016 23:48
and they’ve made it so you can’t edit anything
LElmen
@LElmen
Feb 12 2016 23:48
?
Olga Botvinnik
@olgabot
Feb 12 2016 23:48
check out the permissions: -rw-r—r—
the first - means not a folder
then the first trio of rw- is for the user, who is bmreilly and can read-write-execute
then the next trio is “group” (tscc-group in this case) and the last trio is “other” (i.e. everyone else)
someone has messed with your folder
we need to get them to fix it
LElmen
@LElmen
Feb 12 2016 23:50
Does bmreilly have red hair by any chance? Because the person that helped me remove the softlink I created that looped back to the same place, deleted the whole thing - since it linked to itself
Olga Botvinnik
@olgabot
Feb 12 2016 23:50
I don’t know
it’s brian reilley
I’m emailing them now
ecwheele
@ecwheele
Feb 12 2016 23:50
yes he does
Olga Botvinnik
@olgabot
Feb 12 2016 23:50
if they used their account then it messes up yours
LElmen
@LElmen
Feb 12 2016 23:50
He then copied his stuff so that I could get the harismendy files and follow in class
Olga Botvinnik
@olgabot
Feb 12 2016 23:50
they need to use YOUR account to do things in your home
yeah that’s what happened
LElmen
@LElmen
Feb 12 2016 23:55
This is a total laugh or cry moment (mostly laugh though)
Olga Botvinnik
@olgabot
Feb 12 2016 23:55
yeah
I just emailed
hopefully that’ll get fixed soon :(
sit tight until then
:pray:
LElmen
@LElmen
Feb 12 2016 23:55
It all started with that I made that soft link that linked to itself haha
Olga Botvinnik
@olgabot
Feb 12 2016 23:56
hah :)
it’s all part of the learning process
ecwheele
@ecwheele
Feb 12 2016 23:56
If you can't laugh at all your ridiculous fails in science you will never make it out of grad school! :)
That's why PhD comics exist!
Olga Botvinnik
@olgabot
Feb 12 2016 23:56
It’s true!
LElmen
@LElmen
Feb 12 2016 23:57
As a matter of fact in this specific class, I learn much more by things NOT working, because otherwise I wouldn't think so much about what all flags etc mean.
Olga Botvinnik
@olgabot
Feb 12 2016 23:58
:satisfied:
yep!
that’s the beauty of all this mess