These are chat archives for bayespy/bayespy

29th
Mar 2016
Deebul Nair
@deebuls
Mar 29 2016 13:40
Hi need help with some modeling doubts regarding modelling.
I would like to model, Kemp et. al (2007) hierarchical model for Bag of Marbles
β=<1,1,1,1,1>
θ|β∼Dirichlet(β)
y|θ∼Multinomial(θ)
I have written this :
from bayespy import nodes
import numpy as np

n_colors = 5 #Number of colors in each bag
n_bags = 3 # Number of bags
p_theta = nodes.Dirichlet(np.ones(n_colors),
plates=(n_bags,),
name='p_theta')

data = nodes.Multinomial(n_colors,
p_theta,
plates=(3, 10),
name='data')
Deebul Nair
@deebuls
Mar 29 2016 13:47

But it gives the error with the number of plates

The plates (3,) of the parents are not broadcastable to the given plates (3, 10).

Jaakko Luttinen
@jluttine
Mar 29 2016 14:02
@deebuls, broadcasting for plates works similarly to numpy broadcasting: comparison is done element-wise, starting with the trailing dimension. so (3,10) broadcasts with (10,) or (3,1) but not with (3,)
so, switch the order of plates in data to (10, 3) (i'd prefer this) or use plates (3, 1) for p_theta
does that work?
Deebul Nair
@deebuls
Mar 29 2016 14:18
@jluttine yes . Thanks a lot
Jaakko Luttinen
@jluttine
Mar 29 2016 14:19
np :)
Jaakko Luttinen
@jluttine
Mar 29 2016 14:30
@deebuls, by the way, you are using n_colors as the number of trials in the multinomial distribution. that is of course perfectly valid, but i'm just guessing that you might want to use some other value there. n_trials?
or is 10 supposed to be the number of trials? it'd be better as the first argument for Multinomial instead of plates. multinomial data is such that you have a vector where each element tells how many times that color was picked, for instance, [3, 0, 6] if you have 9 trials.
if it is important to have a separate variable for each trial, telling what color was picked in that trial, use Categorical and then plates (10, 3) with that.
these are just suggestions based on my guesses what you are aiming to do
Jaakko Luttinen
@jluttine
Mar 29 2016 14:35
Multinomial and Categorical infer the number of colors from the size of the probability vector (p_theta)
Categorical data is in a form where the value tells the index of the color that was picked in a trial. so if n_colors=5, Categorical data could be [4, 4, 0, 1, 1, 2, 4] if the number of trials was 7. so there is a significant difference between Multinomial and Categorical.
Deebul Nair
@deebuls
Mar 29 2016 15:01

@jluttine yes 10 is the number of trails . And the task is to predict the number the distribution of the different colors(5 colors) in each bag. Each trial we take 1 ball from each bag, thus by taking 10 balls need to predict the distribution in each bag.

Based on the data set, I have i think categorical is the distribution to use

#Generate some random distributions to fill in each timezone
p_color = nodes.Dirichlet(1e-1 * np.ones(n_colors),
plates = (n_bags,)).random()

#Randomly choose the timezone in which the reading is being undertaken
draw_marbles = nodes.Categorical(p_color,
plates=(10,n_bags)).random()
Deebul Nair
@deebuls
Mar 29 2016 15:08
I am trying to implement Hierarchical models from https://probmods.org/hierarchical-models.html using bayespy .
Jaakko Luttinen
@jluttine
Mar 29 2016 15:09
@deebuls, if you don't need the individual trials, you can use the aggregated results and model with Multinomial. you'd just need to sum the counts of each color from each bag in the trials. i don't see any reason why you'd want to use Categorical instead of Multinomial
artificial data can be generated by using Multinomial(...).random()
but if your data is in such a format that you don't want to bother computing the aggregated results, then you might write your code faster with Categorical node. it's like Bernoulli vs Binomial. if you have counts from trials, i'd use Binomial, but if you really need access to individual trials, then you need to use Bernoulli. the same applies to Categorical vs Multinomial
Deebul Nair
@deebuls
Mar 29 2016 15:15
draw_marbles = nodes.Multinomial(n_colors,
p_color,
plates=(10,n_bags)).random()
This gives error
random() got an unexpected keyword argument 'plates'
tried without plates also
Jaakko Luttinen
@jluttine
Mar 29 2016 15:20
yep, i just started looking at that. that's definitely a bug (or a missing feature) in bayespy at the moment. sorry about that.
well, the categorical method should work at least..
i'll fix that issue asap
Deebul Nair
@deebuls
Mar 29 2016 15:24
@jluttine yes it works :) . Thanks a lot for your help. I may bug you soon with different models
Jaakko Luttinen
@jluttine
Mar 29 2016 15:35
sure! :)
Deebul Nair
@deebuls
Mar 29 2016 16:15

Same problem

data = nodes.Multinomial(n_colors,
p_theta,
plates=(10,3),
name='data')
data.observe([[1,1,1,1,6],[1,2,3,4,0], [1,2,3,4,0]])

gives an error

Counts must sum to the number of trials .

10 was put in plates to signify the trails and each row sums upto 10 .

Jaakko Luttinen
@jluttine
Mar 29 2016 16:21
number of trials is the first argument to Multinomial. it is n_colors.
the same comments as above: either use Categorical with the number of trials as a plate axis, or use Multinomial and give the number of trials as the first argument, not as plates
Deebul Nair
@deebuls
Mar 29 2016 16:23
ok got it now
Jaakko Luttinen
@jluttine
Mar 29 2016 16:25
multinomial random sampling should now work in develop branch
if you want to try: pip install git+https://github.com/bayespy/bayespy.git@develop
so the issue "random() got an unexpected keyword argument plates" should be fixed now
Deebul Nair
@deebuls
Mar 29 2016 16:36
checked it . Works perfectly now :clap:
Deebul Nair
@deebuls
Mar 29 2016 16:46
I would like to add this example for hierarchical modelling . If you find this ok . But you need to guide me a bit on where to add
Jaakko Luttinen
@jluttine
Mar 29 2016 16:48
i'm in the process of converting all examples into jupyter notebooks
doc/source/examples
you can write a notebook there and make a pull request. or RST will be ok too.
or you can write it into a github issue message