These are chat archives for gindeleo/climate

18th
Mar 2016
Gregor
@echna
Mar 18 2016 01:11
Nice job! I guess we'd need another dimension, like rain fall or humidity to get more 2 dimensional clustering. Also I wonder if people actually use clustering, since it's supposedly O(n^3) :( I had to repickle again. So I put those lines back in for running once at the start of a session.
Oliver Gindele
@gindeleo
Mar 18 2016 08:57
We can also load in the data of all the cities. I think k-means clustering is more efficient. We could try that one.
Well, the data is already 2-D (the crosscorrelation matrix). But of course it would make more sense to correlate it with other effects, rather than which city behaves similar to which. Nevertheless, I was actually surprised that the clusters are geographically so distinct.
Btw, you've got a letter from UCL and a climbing magazine
Gregor
@echna
Mar 18 2016 09:04
Oh, got to change those addresses.
I don't think there is much point in loading in the rest of the cities. Is only going to take more time to run. But it would be cool to see what happens once.
Oliver Gindele
@gindeleo
Mar 18 2016 09:15
true.
kaggle has a bunch of world development indices for countries
Gregor
@echna
Mar 18 2016 09:19
I was also wondering, if there are better ways of doing these calculations using list comprehension instead of for loops. (That's where the real speed up with numpy is supposed to happen.) Maybe this can be done by using 'city' as an index?
Yeah, I think we should link this data with some other data. There is only so much we can do with the temperature data itself.
Oliver Gindele
@gindeleo
Mar 18 2016 09:22
I guess the only thing that can be sped up a lot is the correlation calculation
Gregor
@echna
Mar 18 2016 09:22
jup.
ok, first some job applications and corrections :(
maybe we can use hdf5 instead of pickle. This might work across machines and it is designed to be fast. 10 times read_cvs.