Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jul 25 20:51
    jbednar commented #5045
  • Jul 25 18:57
    hyamanieu synchronize #2548
  • Jul 25 18:13
    AjayThorve edited #5045
  • Jul 25 18:11
    AjayThorve opened #5045
  • Jul 25 14:09
    ahuang11 commented #5000
  • Jul 25 04:55
    MarcSkovMadsen edited #2573
  • Jul 25 04:54
    MarcSkovMadsen edited #2573
  • Jul 25 04:53
    MarcSkovMadsen edited #2573
  • Jul 25 04:42
    MarcSkovMadsen edited #2573
  • Jul 25 04:40
    MarcSkovMadsen labeled #2573
  • Jul 25 04:40
    MarcSkovMadsen opened #2573
  • Jul 25 04:35
    MarcSkovMadsen closed #1072
  • Jul 25 04:35
    MarcSkovMadsen commented #1072
  • Jul 24 18:28
    codecov[bot] commented #2572
  • Jul 24 18:26
    codecov[bot] commented #2572
  • Jul 24 18:21
    codecov[bot] commented #2572
  • Jul 24 18:21
    codecov[bot] commented #2572
  • Jul 24 18:20
    codecov[bot] commented #2572
  • Jul 24 18:17
    codecov[bot] commented #2572
  • Jul 24 18:15
    codecov[bot] commented #2572
James A. Bednar
@jbednar
In that case it's a simpler problem to solve, but not a solved one, because we don't support datashade=True on heatmap directly, which means that we can't provide the categorical axes along with the datashaded plot. And we'd have to do a good bit of work on Bokeh to subsample the category values so that it doesn't try to draw 1000 labels. But that's doable, if time consuming.
Definitely not something for anyone but a hardcore Bokeh hacker to attempt, though!
(My objections above are about when there are more rows and columns than in your plot, but that doesn't have to be true if you only have 1K x 1K. At 3K x 3K it's already a problem again, of course...)
hoangthienan95
@hoangthienan95

just to follow up, assuming that it's 1k x 1k, you mean to do like what they did in here right? http://holoviews.org/user_guide/Large_Data.html (Hover info section)

Using rasterize without shade

And if we see hover info from a datashader output, it means that it's not aggregated right? and if we cannot hover over then the data has been aggregated. Is that correct?

James A. Bednar
@jbednar
"they" is me in this case. :-) When that page was written, hover wasn't available from hv.Image plots, but Jean-Luc Stevens added that to Bokeh at least a year ago, and so now that information is at least partially out of date.
In any case, no, I don't think that support would work here. The hover in this case will either only show the row and column and value of the datashaded plot (which isn't useful, since it won't have the category axis values), or it would require overlaying something as large as the original table (which will then incur the same slowdowns you've been seeing).
What's needed is a hybrid, where Datashader renders the heatmap, Bokeh knows the list of categories in each axis (but not the full crossproduct between those two axes), and Bokeh displays a dynamic subsampling of the category values depending on the zoom level. So it would be a datashaded plot, with magic axes.
hoangthienan95
@hoangthienan95
oh I see, the hacker project :))
James A. Bednar
@jbednar
Definitely some work for Bokeh gurus, which is why:
image.png
Conceivable, but a lot of work! No one is funding us to do that, and none of the core developers strictly need it, so it's not happening now.
But if you look at the IEX example above, you can see a way that one can provide custom info on zoom that may be more practical for someone not a deep JS hacker.
(Not totally sure; Jean-Luc wrote that as well, not me! But I think the internals may be able to be adapted, and would be much less work than making a fancy Bokeh zoomable categorical axis.)
Feel free to write this up as a post at discourse.bokeh.org (asking for dynamically subsampled categorical axes, particularly in a way that doesn't require the actual data to be available for the full matrix). A Bokeh developer might get inspired, at which point it would be pretty easy for us to add hvplot Datashader support.
I.e. Bokeh just needs to handle the axes; we can handle the plotting!
hoangthienan95
@hoangthienan95
In the IEX example, I'd have to run live server with Panel to see the hover right?
Cause I don't see hover info on the website now
James A. Bednar
@jbednar
Yes, click on the links at the top of the page to see the live version. Plus you have to zoom in enough that a sufficiently small amount of data is required.
At that point, that section of the data is no longer being datashaded, it's just a normal Bokeh plot, with hover, selection, etc.
The dynamic switching is what would be of interest here.
hoangthienan95
@hoangthienan95

Awesome! Gotcha. Thank you for patiently explaining it to me, I'm new to this stuff.

I'll try that first, then if it doesn't work for some reason I might explore other tools. My only other two that I know for large data viz for python is Plotly with WebGL rendering and then Vaex with interactive widget . Have you had experience with either? Do you think any other libraries would be promising

I'll attempt at the discourse post and will probably bug you again to make sure I mean the right thing

James A. Bednar
@jbednar
Bokeh also has webgl, but I don't know if it works with heatmaps. https://higlass.io/app may be of interest. Higlass is a good example of dynamic categorical axes; I just wish they had used Datashader behind the scenes so that it's compatible rather than competitive!
I don't know of any interactive categorical axes with Vaex, but if there are, I'd be interested to see them.
hoangthienan95
@hoangthienan95
The stuff in higlass page is exactly the kind of data I'm working with every day haha. Classic Biologists, loves to reinvent the wheel (sometimes that's good but sometimes bad)
James A. Bednar
@jbednar
If HiGlass works, just enjoy! :-)
But what I want is what's there, but with no tie to Biology or any other specific domain; there was no need for it to be so domain specific. Sigh!
image.png
Are these numbers being updated. To me they have looked the same for 6 months. Maybe my memory is playing tricks on me.
image.png
hoangthienan95
@hoangthienan95
Thank you @MarcSkovMadsen
epifanio
@epifanio
I was wondering ig the GPU processing is used automatically by datashader, when a cudf/cupy environment is found
Philipp Rudiger
@philippjfr
Sorry, meant to reply but didn't get around to it.
Quick reply: "Sort of", i.e. it's automatic when the input is on the GPU, e.g. i.e. if you're feeding in a cuDF
epifanio
@epifanio

@philippjfr thanks!
I am porting my little app from bokeh to panel https://github.com/SIOS-Svalbard/NC-Plot
( bdw, thanks to the awesome presentation at pydata berlin 2019)

in my app I use xarray to read a netcdf datasource, and then transform it to a pandas dataframe which is then feeded to a bokeh columndatasource

I will first complete the porting into panel (from bokeh) then I will add a check to understand if cudf can be imported and if yes do a further transformation from pandas dataframe to cudf and give it to datashader - i guess it is not possible to go from xarray directly to cudf

Philipp Rudiger
@philippjfr
What kind of data is it?
epifanio
@epifanio
3 type of data, all stored as netcdf: Time Series, Profiles, TimeSeries-Profile - the app takes an url as input and has to guess the data-type
based on the type, a different widget/app is composed - there are 3 demos, the more interesrting (embarassing simple) is the time-series-profile
epifanio
@epifanio
they are CTD observation, weather station, soon permafrost - but we also have to integrate gridded data (sentinel satellite)
Philipp Rudiger
@philippjfr
Okay for gridded data you could also convert to cupy (I think xarray supports those now) and then use a HoloViews QuadMesh + datashading to render it.
Would recommend https://hvplot.holoviz.org/ for plotting, at least in the exploratory phase.
I don't know how large any of your data is so not sure how much benefit you are going to see from datashading or if you even need GPU datashading.
But hvPlot/HoloViews will transparently work with either cuDF or regular dataframes and xarrays backed by NumPy, Dask or Cupy.
epifanio
@epifanio
For gridded data I started with the landsat example which is using the xarray_raster capabilities, so I guess I will first use xarray_raster to read the data in and then convert it to a cupy array which should be understood by the quadmesh api
each nectdf data (for the gridded data) is ~2GB
James A. Bednar
@jbednar
Sounds good.
Thomas Diederen
@Patrickens_gitlab
is it possible to label datapoints in hv.Scatter? In the following example I would like a legend where the datapoints are labelled 'a' and 'b' (corresponding to color):
import holoviews as hv
import pandas as pd
import numpy as np
import random

randhex = lambda : format(random.randint(0, 255), '02x')
randcol = lambda : f'#{randhex()}{randhex()}{randhex()}'
data = pd.DataFrame({
    'x': np.arange(0,1,0.1),
    'y': np.arange(0,1,0.1),
    'color': [randcol()]*5 + [randcol()]*5,
    'label': ['a']*5 + ['b']*5,
})
hv.Scatter(data, kdims=['x', 'y'], vdims=['color', 'label'], label='label').opts(color='color', show_legend=True)