@deebuls about the first one: there is non-conjugacy in beta w.r.t. theta. also, what does 5xDirichlet mean? multiply the prob.vector by 5? i don't understand why wouldn't you just learn beta freely. why constrain it to be a probability vector?
@deebuls about the second model: again, non-conjugacy in beta w.r.t. theta. i don't quite understand the model construction. would it be ok to learn beta as a free parameter and discard alpha and phi?
@deebuls about the books: i started collecting some pieces from here and there to form a vb book, but it's really just a collection of copy-pasted random parts, missing lots of stuff. but just in case you find it in any way useful: http://variational-bayes-book.readthedocs.org/en/latest/
@jluttine Thanks for the book. I will read it to get details of VB inference . But currently I am more interested in application of graphical modelling . To understand different models available when to use, different distributions when to use and where to use .
The explanation given is basically "sharing of knowledge" . So using these model we can learn about other bags content just by observing 1 bag . As all bags will be connected. And the learning is faster . They call the concept as Over hypotheses a part of Hierarchical modelling. The base paper is http://web.mit.edu/cocosci/Papers/KempPerforsTenenbaum06.pdf