## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
• Sep 19 22:05
hali-geoviz commented #5086
• Sep 19 22:03
hali-geoviz commented #5086
• Sep 19 21:37
hali-geoviz commented #5086
• Sep 19 21:37
hali-geoviz commented #5086
• Sep 19 21:37
hali-geoviz commented #5086
• Sep 19 21:36
hali-geoviz commented #5086
• Sep 19 21:35
hali-geoviz commented #5086
• Sep 19 21:34
hali-geoviz commented #5086
• Sep 19 21:33
hali-geoviz commented #5086
• Sep 19 21:31
hali-geoviz commented #5086
• Sep 19 21:30
hali-geoviz commented #5086
• Sep 19 21:28
hali-geoviz commented #5086
• Sep 19 21:14
hali-geoviz commented #5086
• Sep 19 21:13
hali-geoviz commented #5086
• Sep 19 21:13
hali-geoviz commented #5086
• Sep 19 21:12
hali-geoviz commented #5086
• Sep 19 21:11
hali-geoviz commented #5086
• Sep 19 20:07
codecov[bot] commented #2759
• Sep 19 20:06
codecov[bot] commented #2759
• Sep 19 20:04
codecov[bot] commented #2759
James A. Bednar
@jbednar
Whether it's Dask or Pandas or Numpy doesn't matter; whether it's categorical data does matter. If you can map from categorical data into numeric data, then Datashader can handle it. If not, it has no idea what to do with your categorical x and y axes; it's built for continuous variables on x and y (though it does support categories on z, i.e. stacked up together).
Zooming in won't really help if the data has no structure; if it's truly categorical in x and y you just have to deal with the data in chunks somehow, because categorical data isn't meaningful without a label.
hoangthienan95
@hoangthienan95
But then will I be getting the dynamic recomputation as I zoom in to inspect finer-grained patterns?
James A. Bednar
@jbednar
There aren't any finer grained patterns unless the axes are sorted in some meaningful way.
It would just be meaninglessly grouping unrelated values.
Datashader can handle when the data has spatial structure (in whatever space you are plotting things in). Without spatial structure, zooming is not meaningful.
hoangthienan95
@hoangthienan95
ok so something like hierarchichally clustering the rows and columns and rearrange them would help, because I do expect patterns from that
James A. Bednar
@jbednar
Sure, at which point you can replace the category axis with some similarity metric value.
But then you'll want to reveal categories when you've zoomed in to individual data points, which I guess you can do with hover.
It would be great to add something to the examples where we show that whole process, i.e. starting with categorical axes, clustering the data to give spatial structure, then displaying the datashaded result with some way of indicating underlying categories (with colors plus hover, etc.) That would make a nice example, but it's a good bit of work, with lots of assumptions along the way. In no way would that get packaged up as .hvplot.heatmap(... , datashade=True). :-)
It's somewhat similar to https://examples.pyviz.org/iex_trading/IEX_stocks.html, where zooming in reveals individual stock trades, and zooming out reveals patterns; you'd need some similar approach, plus the clustering.
hoangthienan95
@hoangthienan95
I'll check that out thanks. "Visualizing interactive heatmap with hierarchical clustering with more than 1k cols and rows" seems a good weed out interview question. I have not found a way to do that easily so far
James A. Bednar
@jbednar
:-)
hoangthienan95
@hoangthienan95
does hvplot has a way that I can insert in a dendrogram?
James A. Bednar
@jbednar
I don't think so.
hoangthienan95
@hoangthienan95
because it would be helpful to see which ones are in the same group
James A. Bednar
@jbednar
Yep. That would be cool!
hoangthienan95
@hoangthienan95
I just tried, hvplot.heatmap alone doesn't even display the 3000x3000 heatmap. It works well with around 500x3000 tho!
thank you @jbednar for the rapid help :)
James A. Bednar
@jbednar
1K x 1K isn't so bad, actually, given current monitor resolutions, because at least then all the data will fit on screen. The problem is then just that the data means nothing without the labels. So If you do only want something where there's no actual aggregation (Datashader's specialty), i.e. no downsampling required, then I withdraw my objections about it being meaningless without clustering. Clustering would help a lot, but if every category bin at least gets one pixel, it's not ridiculous to simply datashade the array.
In that case it's a simpler problem to solve, but not a solved one, because we don't support datashade=True on heatmap directly, which means that we can't provide the categorical axes along with the datashaded plot. And we'd have to do a good bit of work on Bokeh to subsample the category values so that it doesn't try to draw 1000 labels. But that's doable, if time consuming.
Definitely not something for anyone but a hardcore Bokeh hacker to attempt, though!
(My objections above are about when there are more rows and columns than in your plot, but that doesn't have to be true if you only have 1K x 1K. At 3K x 3K it's already a problem again, of course...)
hoangthienan95
@hoangthienan95

just to follow up, assuming that it's 1k x 1k, you mean to do like what they did in here right? http://holoviews.org/user_guide/Large_Data.html (Hover info section)

Using rasterize without shade

And if we see hover info from a datashader output, it means that it's not aggregated right? and if we cannot hover over then the data has been aggregated. Is that correct?

James A. Bednar
@jbednar
"they" is me in this case. :-) When that page was written, hover wasn't available from hv.Image plots, but Jean-Luc Stevens added that to Bokeh at least a year ago, and so now that information is at least partially out of date.
In any case, no, I don't think that support would work here. The hover in this case will either only show the row and column and value of the datashaded plot (which isn't useful, since it won't have the category axis values), or it would require overlaying something as large as the original table (which will then incur the same slowdowns you've been seeing).
What's needed is a hybrid, where Datashader renders the heatmap, Bokeh knows the list of categories in each axis (but not the full crossproduct between those two axes), and Bokeh displays a dynamic subsampling of the category values depending on the zoom level. So it would be a datashaded plot, with magic axes.
hoangthienan95
@hoangthienan95
oh I see, the hacker project :))
James A. Bednar
@jbednar
Definitely some work for Bokeh gurus, which is why:
Conceivable, but a lot of work! No one is funding us to do that, and none of the core developers strictly need it, so it's not happening now.
But if you look at the IEX example above, you can see a way that one can provide custom info on zoom that may be more practical for someone not a deep JS hacker.
(Not totally sure; Jean-Luc wrote that as well, not me! But I think the internals may be able to be adapted, and would be much less work than making a fancy Bokeh zoomable categorical axis.)
Feel free to write this up as a post at discourse.bokeh.org (asking for dynamically subsampled categorical axes, particularly in a way that doesn't require the actual data to be available for the full matrix). A Bokeh developer might get inspired, at which point it would be pretty easy for us to add hvplot Datashader support.
I.e. Bokeh just needs to handle the axes; we can handle the plotting!
hoangthienan95
@hoangthienan95
In the IEX example, I'd have to run live server with Panel to see the hover right?
Cause I don't see hover info on the website now
James A. Bednar
@jbednar
Yes, click on the links at the top of the page to see the live version. Plus you have to zoom in enough that a sufficiently small amount of data is required.
At that point, that section of the data is no longer being datashaded, it's just a normal Bokeh plot, with hover, selection, etc.
The dynamic switching is what would be of interest here.
hoangthienan95
@hoangthienan95

Awesome! Gotcha. Thank you for patiently explaining it to me, I'm new to this stuff.

I'll try that first, then if it doesn't work for some reason I might explore other tools. My only other two that I know for large data viz for python is Plotly with WebGL rendering and then Vaex with interactive widget . Have you had experience with either? Do you think any other libraries would be promising

I'll attempt at the discourse post and will probably bug you again to make sure I mean the right thing

James A. Bednar
@jbednar
Bokeh also has webgl, but I don't know if it works with heatmaps. https://higlass.io/app may be of interest. Higlass is a good example of dynamic categorical axes; I just wish they had used Datashader behind the scenes so that it's compatible rather than competitive!
I don't know of any interactive categorical axes with Vaex, but if there are, I'd be interested to see them.
hoangthienan95
@hoangthienan95
The stuff in higlass page is exactly the kind of data I'm working with every day haha. Classic Biologists, loves to reinvent the wheel (sometimes that's good but sometimes bad)
James A. Bednar
@jbednar
If HiGlass works, just enjoy! :-)
But what I want is what's there, but with no tie to Biology or any other specific domain; there was no need for it to be so domain specific. Sigh!