Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Mar 22 2021 01:01
    GenevieveBuckley commented #155
  • Mar 22 2021 00:20
    jwitos commented #155
  • Feb 04 2021 11:17
    Gpotier commented #175
  • Feb 04 2021 07:27
    jni labeled #176
  • Feb 04 2021 07:27
    jni opened #176
  • Feb 04 2021 07:26
    jni commented #172
  • Feb 04 2021 07:25

    jni on master

    Pin to bokeh version 2 Updates for backwards incompati… Fix copying of data (must conve… and 5 more (compare)

  • Feb 04 2021 07:25
    jni closed #175
  • Feb 04 2021 07:25
    jni commented #175
  • Feb 04 2021 07:23
    jni synchronize #175
  • Feb 04 2021 02:41
    coveralls commented #175
  • Feb 04 2021 02:10
    GenevieveBuckley commented #175
  • Feb 04 2021 02:09
    GenevieveBuckley commented #175
  • Feb 04 2021 02:08
    GenevieveBuckley ready_for_review #175
  • Feb 04 2021 02:08
    GenevieveBuckley synchronize #175
  • Feb 04 2021 01:53
    coveralls commented #174
  • Feb 04 2021 01:52
    jni commented #174
  • Feb 04 2021 01:52

    jni on master

    Formatting fix (conda had issue… Pin bokeh >= 1.0.0 < 2.0.0 Merge pull request #174 from Ge… (compare)

  • Feb 04 2021 01:52
    jni closed #174
  • Feb 04 2021 01:52
    jni closed #173
Adrian Hecker
@starcalibre
I'm on campus now and I have them as stitched and montaged JPEGs on a USB. They're about 2gb in total.
Otherwise the TIFs are at home
Juan Nunez-Iglesias
@jni
whoo! That's USB stick friendly! =D
and Dropbox. Well, I'm not coming in today, but are you around tomorrow? At any rate I'll send you and @Don86 an email and maybe you can coordinate the handoff amongst yourselves. =)
Adrian Hecker
@starcalibre
Yeah I'll be in tomorrow! Free from about 1
I think Don and I are in a lecture together tomorrow but yeah shoot an email and we'll sort something out :)
Juan Nunez-Iglesias
@jni
@starcalibre all 9 tSNE plots look worse than your original one? Why is that?
As I've mentioned before to @Don86 (just yesterday actually), we don't generally want to maximise between-label distances, because some labels can have identical or very similar phenotypes. I would say the goal is to minimise within-label distances while making sure average distance is not 0 (ie regularising)
having said that the overall approach is awesome. =D
Adrian Hecker
@starcalibre
It's either randomness or the threshold adjacency statistics I was using before are helpful after all. I'll see what happens when I include those.
Juan Nunez-Iglesias
@jni
Having said that, given that the vast majority of image pairs are between-sample, the ratio within / between should approximate within / average quite well
isn't initial perplexity another perhaps-important parameter?
Finally, I would also use a log scale to test learning rates. You could be off by a huge factor in terms of optimality and never know it. =)
Should I let you fiddle with the notebook before merging? =)
Adrian Hecker
@starcalibre
Yeah, I was having that problem before -- sometimes TSNE completely fails (usually with small learning rates and small datasets I noticed) and terminates in the early stages while the data is still one tight blob around the original. I think minimising the within-distance and rejecting embeddings with a small mean distance would be good.
Juan Nunez-Iglesias
@jni
yeah but what I'm saying is that "embeddings with small mean distance" is kind of the same thing as "embeddings with large within/between ratio", which is what you're already doing. So it's probably not the problem with your analysis
Adrian Hecker
@starcalibre
The docs suggest TSNE is insensitive to changes in the perplexity parameter, but also suggests choosing a larger rate for large datasets so.. not sure. :P
Juan Nunez-Iglesias
@jni
so I think the most important thing for you to fiddle with is to use a 2D grid (include perplexity in your optimisation search, despite the docs ;)), and use a log scale for your search
you can later use a linear scale around the best log parameters
The brilliant thing about this whole approach is that once you've hammered it down we can automate it in microscopium, and only present users with an optimised tSNE from the beginning — one we know will maximise visual classification accuracy!
Adrian Hecker
@starcalibre
So you mean, rather than use a linearly spaced values in the grid search, use logarithmically spaced values? ie. np.logspace(2, 10, 9) ?
Juan Nunez-Iglesias
@jni
yes although I would think something more like -5, 5 would be a better range to choose?
and possibly using base=2... Compute time is cheap, right? ;) How long does this all take?
Adrian Hecker
@starcalibre
Suggested learning rate is in the range [10, 1000] np.logspace(2, 3, 9) for the learning parameter
The grid search over 9 values? About 3-4 minutes
Juan Nunez-Iglesias
@jni
"suggested learning rate" doesn't fill me with confidence, especially since we know about the deficiencies of the scikit-learn implementation compared to the original implementation.
Back when you compared the two methods, I thought you got a better result by dramatically reducing the learning rate?
I can't remember details, can you?
I found the message using the search bar, but you didn't give details about it, it was by fiddling with learning rate but I don't know in which direction?
Adrian Hecker
@starcalibre
I don't recall either and the notebook and script I used aren't on my laptop, but I can test those logarithmic values now
logarithmic-plot.png
LOL
Juan Nunez-Iglesias
@jni
hahaha that's awesome. Well, time to expand in opposite direction, too. =)
two btws: (1) I think it's worth spending time optimising this optimisation approach. e.g. do multiple runs help, picking the one with the best KL divergence; and (2) I have my IPython terminal log all sessions to file, worth setting that up. =) github.com/jni/ipython-config for more. =)
Adrian Hecker
@starcalibre
Annoyingly, TSNE doesn't expose the K-L divergence after it's fit to the data :(
Juan Nunez-Iglesias
@jni
WHAAAAAA
are you positive about this???
amazing
just had to go check
I would suggest that you download the source code and make the required changes and submit them as a PR
because that's just bullshit. LOL
Adrian Hecker
@starcalibre
Trawling through the closed PRs somebody beat me to it! It's exposed in the dev version -- http://scikit-learn.org/dev/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE
Juan Nunez-Iglesias
@jni
bahahahaah amazing! LOL
well it's probably not too bad to install the dev version! =)
Adrian Hecker
@starcalibre
yup, doing that right now :D
Adrian Hecker
@starcalibre
When you magnify the plots from the second BBBC notebook, the "niceness" of the embedding is much more obvious!
tsne_plot.png
Adrian Hecker
@starcalibre
tsne_plot_all_points.png
Same plot but with samples with no annotated mechanism of action added as transparent points
Juan Nunez-Iglesias
@jni
Looks awesome. Like clarity coming in from the fog... =D