These are chat archives for yandexdataschool/mlhep2016

22nd
Jun 2016
Nikita Kazeev
@kazeevn
Jun 22 2016 05:49
Basic seminars.
  • Practical advice how to win at Kaggle (starts on 13:30 sharp - if you're late, you miss it)
  • Dimensionality reduction
  • Ensemble algorithms
Kevin Heinicke
@bixel
Jun 22 2016 08:29
where can I get the code for these interactive Tree visualizations? 😍
Mauro Verzetti
@mverzett
Jun 22 2016 08:30
Generally speaking these interactive data visualization look very cool!
ecorrigan
@ecorrigan
Jun 22 2016 09:30
agreed, they are great
Nikita Kazeev
@kazeevn
Jun 22 2016 09:38

If you are planning to attend the singing and dancing around the maypole
on 24th, please sign up here:
https://docs.google.com/forms/d/1QkLVm7nDnBHEtiCS9Uz-wn-Jk_LzLTermcLdgqfewLY/viewform

Caterina would meet you in the hostel at 10:00.

The deadline is 18:00 today - if you miss it you still can go, but we
make no promises about ticket availability.

Igor Myagkov
@mjagkow
Jun 22 2016 09:41
Are we obliged to sing and dance if we attend? :)
Nikita Kazeev
@kazeevn
Jun 22 2016 09:41
Конечно, как же иначе.
Igor Myagkov
@mjagkow
Jun 22 2016 09:42
Well, i'd rather watch, I'm neither a confident dancer nor a singer :|
Sebastian Liem
@sliem
Jun 22 2016 09:43
I'll promise you that you will not find good dancers nor singers around a maypole.
Sebastian Liem
@sliem
Jun 22 2016 09:54
'help I'm frozen in carbonite'
Igor Myagkov
@mjagkow
Jun 22 2016 10:18
Are these slides for all lectures (by Yandex) by any chance available in russian?
Andrey Ustyuzhanin
@anaderi
Jun 22 2016 10:21
@mjagkow , no, why?
Igor Myagkov
@mjagkow
Jun 22 2016 10:26
I hoped that these lectures may have been also presented on any kind of Yandex school in Moscow and hence may have had russian version of the slides.
Never mind, it's just a matter of taste.
Andrey Ustyuzhanin
@anaderi
Jun 22 2016 10:28
@mjagkow , no, it hasn't been presented anywhere else.. yet )
Igor Myagkov
@mjagkow
Jun 22 2016 10:31
@anaderi Ok :)
So, are these ML topics also covered in, say, Yandex Data Analysis School?
Andrey Ustyuzhanin
@anaderi
Jun 22 2016 10:35
@mjagkow , yup, but in a bit different way, say, with no HEP references )
Igor Myagkov
@mjagkow
Jun 22 2016 10:38
I see.
Igor Myagkov
@mjagkow
Jun 22 2016 11:46
What was the git pull command?
martin-ljunggren
@martin-ljunggren
Jun 22 2016 11:49
I would also need the command again
Dan Marley
@demarley
Jun 22 2016 11:50
I think it was this:
!git pull https://github.com/yandexdataschool/mlhep2016.git
but someone else should confirm
Igor Myagkov
@mjagkow
Jun 22 2016 11:50
Yes, it's correct
Nikita Kazeev
@kazeevn
Jun 22 2016 12:03
!pip install --upgrade sklearn
then restart notebook
Mauro Verzetti
@mverzett
Jun 22 2016 12:23
The "!" command on the notebook gives you access to a small unix/linux shell, right?
Alex Rogozhnikov
@arogozhnikov
Jun 22 2016 12:25
@mverzett
yes. Pay attention that each ! is creating new separate shell, so between commands state is not preserved.
Mauro Verzetti
@mverzett
Jun 22 2016 12:30
OK, got it! Thanks @arogozhnikov
JanKuechler
@JanKuechler
Jun 22 2016 12:31
Is the 'rep' module available outside everware?
Alex Rogozhnikov
@arogozhnikov
Jun 22 2016 12:31
@JanKuechler
!pip install rep --no-dependencies
JanKuechler
@JanKuechler
Jun 22 2016 12:32
great, thank you
Alex Rogozhnikov
@arogozhnikov
Jun 22 2016 12:32
(added no-deps, please note)
Adam Dendek
@adendek
Jun 22 2016 12:32
Is it a way to extract only python code from the notebook?
Alex Rogozhnikov
@arogozhnikov
Jun 22 2016 12:32
@adendek
File > Download as > .py
Sebastian Liem
@sliem
Jun 22 2016 12:32
download as .py?
too slow
Adam Dendek
@adendek
Jun 22 2016 12:33
great! thank you!
IrinaGergart
@IrinaGergart
Jun 22 2016 14:47
Manfred Berger, Are Raklev, Siim Tolk, Valentina Mariani, Alessio Piucci, Giovanni Siragusa, come to me after a lecture, your receipts ready
Petr
@PeterZhizhin
Jun 22 2016 14:49
I have a problem with LDA
need_features = data[list(set(data.columns) - {'target', 'event_id'})]
Xlda = lda.fit(data[need_features], data.target)
print data[masses].shape, data.target.shape, Xlda.shape
And here is the error
ValueError: Must pass DataFrame with boolean values only
In the "lda.fit" method
Kecksdose
@Kecksdose
Jun 22 2016 14:51
You fill in a DataFrame in another DataFrame
Petr
@PeterZhizhin
Jun 22 2016 14:51
Hmmm
Could you explain it?
Kecksdose
@Kecksdose
Jun 22 2016 14:52
Remove data[and the closing ] in the first line
Mauro Verzetti
@mverzett
Jun 22 2016 14:52

I have one as well, seems like LDA returns me a one-column array, when I was asking for two

lda = LinearDiscriminantAnalysis(n_components=2)
lda.fit(X,Y)
Xlda = lda.transform(X)
print Xlda.shape, X.shape
#((4727, 1), (4727, 13))

Any suggestion on why?

Petr
@PeterZhizhin
Jun 22 2016 14:52
Oooops
The same
By the way
As the above comment with the one-column array
need_features = list(set(data.columns) - {'target', 'event_id'})
Xlda = lda.fit(data[need_features], data.target).transform(data[need_features])
print data[masses].shape, data.target.shape, Xlda.shape
#  (1000, 9) (1000,) (1000, 1)
smoortga
@smoortga
Jun 22 2016 15:25
@mverzett
Try something like
lda = LDA(n_components=5,solver='eigen', shrinkage = 'auto')
This makes it work for me
Mauro Verzetti
@mverzett
Jun 22 2016 15:27
@smoortga MAGIC!
smoortga
@smoortga
Jun 22 2016 15:27
the solver solves it all I guess, MAGIC indeed ;-)
rashchedrin
@rashchedrin
Jun 22 2016 16:48
In Higgs dataset, what is mem? And why there is mem_phi but no mem_eta?
Lisa Benato
@lbenato
Jun 22 2016 16:49
It's the transverse missing energy, a.k.a. MET. You have no components other than the transverse ones (only phi and pt)
rashchedrin
@rashchedrin
Jun 22 2016 16:50
Thank you
Lisa Benato
@lbenato
Jun 22 2016 16:54
You're welcome :)
GilesStrong
@GilesStrong
Jun 22 2016 16:55
Trying to download the test data using :!cd datasets; wget -O public_test.root -nc --no-check-certificate https://2016.mlhep.yandex.net/data/higgs/public_test.root (from baseline.ipynb) but the network is unreachable and the connection times out. Any ideas?
giosiragusa
@giosiragusa
Jun 22 2016 16:55
Somebody running too many parallel jobs?
jupiter is very slow for me. others see the same?
giosiragusa
@giosiragusa
Jun 22 2016 17:12
Actually the server does not respond. Commands are started, but they don't get executed. [*] stays there for ages...
giosiragusa
@giosiragusa
Jun 22 2016 17:21
now it works again
Petr
@PeterZhizhin
Jun 22 2016 18:30
Maybe somebody trains their NN :D
Its "Adding to proxy"
Petr
@PeterZhizhin
Jun 22 2016 18:45
Anybody with the same problem?
Petr
@PeterZhizhin
Jun 22 2016 19:05

=_=
Help me
Everware just got crazy.
GilesStrong
@GilesStrong
Jun 22 2016 19:07
I've occasionally had the same problem, and solved it by refreshing.
Petr
@PeterZhizhin
Jun 22 2016 19:09
Aww. My data got destroyed.
I don't have the previous repo anymore.
Andrey Ustyuzhanin
@anaderi
Jun 22 2016 20:09
@PeterZhizhin seems you have stopped your container at 20:28. then created a new one at 21:08. Sasha told me he has told you to save data to /data or to own repo. it would make you future-proof )
Petr
@PeterZhizhin
Jun 22 2016 21:12
Why I can't import XGBoostClassifier?
from rep.estimators import XGBoostClassifier
ImportError: cannot import name 'XGBoostClassifier'
Petr
@PeterZhizhin
Jun 22 2016 21:34
I got that. It works only with Python2
MicheleFG
@MicheleFG
Jun 22 2016 23:27
i server down?