Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Oriol Abril-Pla
    @OriolAbril
    Now that GSoC has finished a new release should come very soon, not sure exactly when though
    Sayam Kumar
    @Sayam753
    That's awesome.
    Madhu Charan
    @madhucharan
    Hey Guys!My name is Madhu.I am from India and I would like to get started contributing to ArviZ and would like to know about some getting started guide as well as some beginner issues to get a grasp of the codebase.Kindly guide me through some beginner issues as well as some Information/resources to better understand about the codebase
    lakshya
    @kenkirito
    I am interested in GSoC and have done with basic setup and soon making a pr
    Oriol Abril-Pla
    @OriolAbril
    Great, welcome @kenkirito! We also have a specific gsoc channel if you want to discuss projects or have doubts about the application procedure, general ArviZ questions and introductions are fine here :)
    Sarthak Bhardwaj
    @thesarthakbhardwaj:matrix.org
    [m]
    hey @OriolAbril can you please send the link of specific GSoC channel where we can discuss project ideas?
    Oriol Abril-Pla
    @OriolAbril
    Welcome to the community @thesarthakbhardwaj:matrix.org, this https://gitter.im/arviz-devs/GSoC is the channel to discuss gsoc topics
    Xavier Fernández i Marín
    @xfim
    Hello to everyone. After writing ggmcmc for R (https://cran.r-project.org/web/packages/ggmcmc/index.html) and keeping it alive to use it with JAGS and stan, I am trying to also start with pyMC3.
    However, there is something very basic with ArviZ that is keeping me from the joy of seeing my first figure. I am using GNU/Linux and the figures do not show up. See this: https://discourse.pymc.io/t/arviz-figures-not-showing-up-after-sampling-gnu-linux/7978. I am sure it is something very simple, but would anyone suggest some kewords or ideas on how to solve it and move forward? Thank you very much.
    Xavier Fernández i Marín
    @xfim
    Oh, sorry. It seems someone already pointed my towards the right direction: https://discourse.pymc.io/t/arviz-figures-not-showing-up-after-sampling-gnu-linux/7978. Anyway, very nice to be able to share ideas with you.
    Ari Hartikainen
    @ahartikainen
    hi, try plt.show()
    Xavier Fernández i Marín
    @xfim
    Yes, @ahartikainen , thank you. It worked. Now I start to understand the link between matplotlib and arviz.
    Tobias Bartsch
    @tobiasbartsch
    Hi, I am running into OOM errors with my pymc3 idata objects (during the computation of the pointwise log likelihood values), and it sounds like the arviz/dask integration may be the solution to this issue (see here https://discourse.pymc.io/t/memory-spike-at-the-end-of-the-mcmc-sampling/5669). I saw that the corresponding PR has been merged (https://github.com/arviz-devs/arviz/pull/1229) but could not yet find any example notebooks or documentation that would explain how to switch Dask integration on. Could anyone point me in the right direction? Thanks a lot!
    Ravin Kumar
    @canyon289
    Hey Tobias, Sorry about the trouble
    Youre right it doenst seem weve made that notebook...... https://arviz-devs.github.io/arviz/user_guide/computation.html
    i dont have a good answer off of the top of my head but give me a couple of days and ill think through this one, my apologies for the subpar experience
    Tobias Bartsch
    @tobiasbartsch
    Hey Ravin, great, thank you! If you don't get to it, I might be able to just figure this out myself by looking through the source code. Could you point me to where I should start looking to understand how this is implemented? Thanks a lot!
    Ravin Kumar
    @canyon289
    Thanks for your williingness to work with on this one
    so heres what it looks like. Theres a Dask class in arviz.utils
    I think the way to enable ask is to do az.utils.Dask.enable_dask(**kwargs)
    and give it a shot
    if things dont work it would be great if you could create an issue ticket or post here, we didnt know whether people wanted dask, but postnig and issue shows it being used and what the specific errors are. With both of those we can address problems :)
    Tobias Bartsch
    @tobiasbartsch
    Thanks Ravin! I will give it a try and report back
    Oriol Abril-Pla
    @OriolAbril
    Dask support is still very limited, and we started with the low hanging fruit first. Several stats and diagnostics have dask support because we already used xr.apply_ufunc which makes it very easy to back the same computation with numpy or with dask, however, none of the conversion routines have support for it yet
    You have to somehow get a dask backed inferencedata first, then you can use some functions with that (I don't quite remember which right now though)
    It is not ideal, but what I do with PyMC is sampling only the trace (I am more used to models with moderate number of parameters but many observations so my main concerns memory wise are pointwise log likelihood and posterior predictive), then chuncking the resulting inferencedata to then compute the log likelihood/posterior predictive with dask directly, and store it in a dask array
    Oriol Abril-Pla
    @OriolAbril
    https://arviz-devs.github.io/arviz/user_guide/pymc3_refitting_xr_lik.html doesn't use dask, but it does show the "externalization" of log likelihood computation so it should look very similar to the dask backed version.
    Anthony Rodgers
    @adrodgers
    Hi all, I was wondering if it is possible to show the y axis of the subplots in plot_posterior, as it is not visible by default. Any help is appreciated, thank you!
    1 reply
    Zaharid
    @Zaharid
    Hi, I have a simple question I have been scarching my head around: Given an InferenceData object from pymc3, how do I get one given sample from the posterior parameters, like the first sample from the first chain?
    Zaharid
    @Zaharid
    ...as in a simpler way to do something like
    stacked = sample.posterior.stack(draws=("chain", "draw"))
    sel = stacked.isel({"draws":2500})
    sample_dict = sel.to_dict()["data_vars"]
    a_sample = {k: v["data"] for k,v in sample_dict.items()}
    Oriol Abril-Pla
    @OriolAbril
    when you say "one given sample" what do you mean? I would say sample.posterior.sel(chain=0, draw=0) but you seem to want a specific format, a dict instead of an xarray dataset?
    what are you planning to do with that? I know the temptation of doing any conversion necessary (whatever the complexity and cost) to get to whatever format you were using before is strong, but I think xarray can really make the code much more clear. It might be longer because using labels and dimension names is by design longer than using index only and relying on their positions but anyone will understand .sel(chain=0, draw=0) whereas I myself have problems understanding my own code months later if what I'm using is [:, 0, 0, :]
    Zaharid
    @Zaharid
    I would like to extrapolate a time series for which I need to put the actual parameters in a function. The format is a dictionary parameter -> values, like e.g. the find_MAP function in PyMC3 returns
    Something like e.g. sample.posterior.sel(chain=0, draw=0).to_dataframe() (or to_array) doesn't do the right thing for me as it put a vector variable in an when I want the value for each entry.
    Zaharid
    @Zaharid
    ...while to_dicts needs to be post processed a bit as shown above.
    Oriol Abril-Pla
    @OriolAbril
    xarray datasets also have a dictionary interface, they have items(), keys() and values() methods. IIUC, you can't change the function and need a dictionary of {var_name: numpy array} so ...sel(...).items() is already very close to what you want, should be closer than the .to_dict method which includes coords, dims, attributes... in the generated dict
    You probably want {k: da.values for k, da in sample.posterior.sel(chain=0, draw=0).items()}
    Zaharid
    @Zaharid
    Right, thank you. Still can't say I find it intuitive but I guess I'll need more practice.
    Oriol Abril-Pla
    @OriolAbril
    it can't take a long time and effort to grok xarray, but IMO it will probably never happen while constantly going back and forth between pandas, dicts and xarray objects. I'd recommend trying to force yourself to use xarray only in a couple projects and reading examples using xarray extensively
    https://docs.pymc.io/projects/examples/en/latest/blog.html for example lists the notebooks in pymc-examples by latest update date. Some only sample from a pymc model and generate a couple arviz plots so there is no xarray usage, but others have more extensive use of xarray. https://docs.pymc.io/projects/examples/en/latest/case_studies/multilevel_modeling.html has multiple places where we take advantage of automatic alignment and broadcasting, sorting, even groupbys
    Grunde Waag
    @grundew

    Hi, I'm running the linear regression example from the pymc3 getting started documentation, but I'm swapping out the Normal distribution with DensityDist and my own implementation of the normal distribution.

        # Expected value of outcome
        mu = alpha + beta[0] * X1 + beta[1] * X2
    
        # Likelihood (sampling distribution) of observations
        Y_obs = pm.DensityDist("Y_obs", logp, observed=dict(value=Y, mu_logp=mu, sigma_logp=sigma))

    But when I do the az.plot_trace(trace) I get the following error:

    raise MissingInputError(error_msg, variable=var)
    theano.graph.fg.MissingInputError: Input 0 of the graph (indices start from 0), used to compute Subtensor{int64}(beta, Constant{1}), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.

    Is this a known problem?

    1 reply
    Marco De Mattia
    @demattia

    Hi all, I am new to Arviz and I am trying to generate a prior predictive check plot from PyMC3. My goal is to generate the same plot made by plot_ppc from a posterior using the prior to check if the prior is reasonable. I am using the plot_ppc() function passing group="prior". I generate prior predictive samples from a PyMC3 model via

    with model:
        prior_pred = pm.sample_prior_predictive(samples=200)
    az.concat(idata, az.from_pymc3(prior=prior_pred), inplace=True)

    However, when I run

    as.plot_ppc(idata, ax=ax, group="prior")

    I get an error saying "data" argument must have the group "prior_predictive" for ppcplot. The from_pymc3 does not have a prior_predictive kwarg.

    The PyMC3 example (https://docs.pymc.io/en/v3/pymc-examples/examples/diagnostics_and_criticism/posterior_predictive.html) for prior/posterior predictive only uses Arviz for the posterior check.

    Is there a way to generate the prior distribution from plot_ppc?

    Marco De Mattia
    @demattia
    It seems that it runs when I move az.plot_ppc(az.from_pymc3(prior=prior_pred)) under the with model: context. Does that mean that it will evaluate the likelihood using the model on the fly? If that is the case it works differently than the posterior case where samples are used?
    Oriol Abril-Pla
    @OriolAbril
    in pymc3, the pm.sample_prior_predictive function samples both prior and prior predictive and returns a dictionary with all the variables there. Without the model ArviZ can't know which variables correspond to each quantity and we decided to add everything to the prior instead of erroring out. You should get a warning however whenever you call from_pymc3 without the model info
    in your call doing az.from_pymc3(prior=prior_pred) the model is only used to know which variables are prior and which prior predictive and to retrieve the observed_data, nothing else
    Marco De Mattia
    @demattia
    Thank you very much Oriol, that makes sense.
    Thomas Mühlfriedel
    @tomsen-san

    Hello there. I started with PyMC3 + arviz recently and got stuck with dims and coords. I define an additional coord on an unpooled model but it never shows up in the InferenceData. I use arviz 0.12.0 and the current pymc3. I have the following model:

    with Model() as deltas_unpooled_model:
        # how many users do we have?
        mu_c = Normal("mu", mu=10.0, sigma=10, shape=len(selectCd))
        sigma_c = Normal("sigma", mu=10.0, sigma=10, shape=len(selectCd))
        delta_customer = LogNormal("deltas", mu=mu_c[selectCd.customer_idx.values], sigma=sigma_c[selectCd.customer_idx.values], observed=selectCd['delta'])
        trace_unpooled = sample(random_seed=2412, chains=4, draws=2000, return_inferencedata=False)
        prior_unpooled = sample_prior_predictive()
        posterior_predictive_unpooled = sample_posterior_predictive(trace_unpooled)

    and construct the InferenceData like this:

    # with deltas_unpooled_model:
    unpooled_data = az.from_pymc3(
        trace=trace_unpooled,
        model=deltas_unpooled_model,
        prior=prior_unpooled,
        posterior_predictive=posterior_predictive_unpooled,
        coords={"customerId": selectedCustomers},
        dims={"mu_c": ["customerId"], "sigma_c": ["customerId"]},
    )

    The InferenceData posterior contains Coordinates:

    chain
    draw
    mu_dim_0
    sigma_dim_0

    Not what I expected. Any idea, where I take the wrong turn?

    2 replies
    Ghost
    @ghost~629f67b56da0373984981e2b
    Hi all, I'm using pymc3. I've noticed that some variables have been transformed during the sampling process and I'm wondering if there's a way to get the effective sample size in the transformed space using az.ess etc. I described the question in detail in the pymc forum pymc. It will be highly appreciated if you could give some advice. Thanks you!