- Join over
**1.5M+ people** - Join over
**100K+ communities** - Free
**without limits** - Create
**your own community**

A library for probabilistic modeling, inference, and criticism. http://edwardlib.org

- Oct 14 16:48nbro commented #124
- Oct 04 22:27cshlzhang opened #939
- Oct 04 13:26Vikram2712 commented #936
- Oct 04 10:42dayexinjia commented #936
- Oct 01 07:58CXY573 opened #938
- Sep 24 04:09huyongjun commented #895
- Sep 20 01:04jihwan-jeong commented #882
- Aug 20 08:01junpenglao commented #541
- Aug 15 14:42dustinvtran commented #937
- Aug 15 02:33dusenberrymw commented #937
- Aug 15 02:23dustinvtran commented #937
- Aug 15 02:14dustinvtran commented #937
- Aug 15 02:14dustinvtran edited #937
- Aug 15 02:13dustinvtran edited #937
- Aug 15 02:11dustinvtran edited #937
- Aug 15 02:11dustinvtran edited #937
- Aug 15 02:10dustinvtran edited #937
- Aug 15 02:08dustinvtran edited #937
- Aug 15 02:07dustinvtran edited #937
- Aug 15 02:02dustinvtran commented #937

Hello guys! Just curious, **why do you prefer ADVI over Laplace approximation on the MAP**? The latter is faster, cheaper. The ADVI paper mentions some difference between ADVI and Laplace approximation. But from my viewpoint, ADVI is still **mode seeking** so I don't know what is exactly the improvement for the past few decades since Laplace approximation is found.

update on the understanding of hyperprior setting in Edward: it is assumed that Q(w,alpha) = Q(w) Q(alpha), which is stated in some other paper about VI but not Edward/ADVI paper. It is kind of weird that alpha is the prior parameter for w but their posterior is assumed to be independent between w and alpha.

Results on some simple linear regression have been shown this approximation seems okay to produce reasonable results.

However, the most orthodox way (not VI to minimize KL) to do this is to compute the P(alpha|D), by first computing P(D|alpha) P(alpha), and integrating out the parameter in the likelihood P(D|w,a lpha) to get P(D|alpha). Then do the MAP on P(alpha|D), to get the best alpha. Still, I don't see any exact equivalent relation between the above classical way and the ADVI.

I ended up writing a probably-terrible thing for this: https://gist.github.com/EvanKrall/daab4d4abced844e6caef951e7fee06e

Hi, I'm looking into issue #271 which is about implementing IS / SMC inference and I was thinking of the two following options:

- Implement an ImportanceSampling class inheriting from MonteCarlo. The build_update method simply computes one sample from the prior and its the importance weight (likelihood ratio). At each iteration, I don't update the user's Empirical but store everything internally (let's ignore the how for now) and only in populate the user's Empirical in the finalize method by sampling the auto-normalized weighted approximation.
- Create a WeightedEmpirical distribution, super-classing Empirical (default weights of 1/N) and replace in MonteCarlo the Empirical requirement by WeightedEmpirical. Then at each iteration I can directly populate the WeightedEmpirical and auto-normalize the distribution in the finalize method.

It feels like the first option is more of a hack, but is easier to implement than the second option which would require refactoring some existing code. Please let me know which of these two options make more sense or if you have any comments about them.

Oh, sorry I forgot to mention that I'm just trying to implement the IS part of the ticket. Implementing SMC straight away would be a little too much for my first contribution ;)

Hi. In Edwaard there is

`ed.models.Empirical`

. Does anyone know what the corresponding thing in Edward2 or TensorFlow Probability is? Maybe `as_random_variable`

?
@cruyffturn I already read that, but it wasn't entirely clear to me why we would want to use

`softplus`

though. I can reread this though, so thanks for taking time to respond.
I have a problem where I have an architecture vaguely similar to an auto-encoder, but I want the encoder to be probabilistic.

I think I need this because the loss function I'm optimizing has a few 'hot spots' (good initial conditions) and many very 'cold spots' (zero gradient).

So, if I treat the output as something deterministic, just by the luck of initialization very few (maybe zero) of the encoder outputs may be hot spots. But if they are treated as a distribution with enough variance to cover the hot spots, I should be able to sample good encodings to find a good trajectory to optimize (as well as push the encoder distribution further towards hot spots during training).

Does this sound suited for probabilistic programming, and does anyone have any advice based on this description?

when i import edward ImportError: cannot import name 'set_shapes_for_outputs'

Greetings, I m trying to use Multinomial Distribution in Edward to predict multiclass labels ( 3 class ) with neural network. I m confused about :

1-) how should i design label dataset as shape. I choose to way that converting label as [[0],[1],[0],[2]] to [[1,0,0],[0,1,0],[1,0,0],[0,0,2] ] .

2-) what conditions for my total_counts arg in multinomial function not being equal 1 when i use probs instead logits?

I m not too familiar with multinomial actually i used bernoulli easily but i cant handle multiclass network:/

After training my data i got predictions from test data.But when i try evaluate mse i m getting error:

ValueError: Dimensions must be equal, but are 145 and 3 for 'sub_2' (op: 'Sub') with input shapes: [145], [145,3].

Here my code :

and if u see some missing parts of me i m very happy to get advise.

```
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf
import edward as ed
from edward.models import Normal, Multinomial
num_labels = 3
(n_samples, n_iter) = (30, 2500)
symbol = 'A'
dataFrequency = '10'
X, Y = np.array(X), np.array(Y)
Y =(np.arange(num_labels) == Y[:,None]).astype(np.float32)
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)
# X_train.shape (578, 120)
# y_train.shape (578, 3)
# X_test.shape (145, 120)
# y_test.shape (145, 3)
def neural_network(x):
h = tf.tanh(tf.matmul(x, W_0) + b_0)
h = tf.tanh(tf.matmul(h, W_1) + b_1)
h = tf.tanh(tf.matmul(h, W_2) + b_2)
h = tf.matmul(h, W_3) + b_3
nn_result = tf.nn.softmax(h)
return nn_result
D = X_train.shape[1]
N = X_train.shape[0]
N2 = X_test.shape[0]
W_0 = Normal(loc=tf.zeros([D, 10]), scale=tf.ones([D, 10]))
W_1 = Normal(loc=tf.zeros([10, 10]), scale=tf.ones([10, 10]))
W_2 = Normal(loc=tf.zeros([10, 5]), scale=tf.ones([10, 5]))
W_3 = Normal(loc=tf.zeros([5, 3]), scale=tf.ones([5, 3]))
b_0 = Normal(loc=tf.zeros(10), scale=tf.ones(10))
b_1 = Normal(loc=tf.zeros(10), scale=tf.ones(10))
b_2 = Normal(loc=tf.zeros(5), scale=tf.ones(5))
b_3 = Normal(loc=tf.zeros(3), scale=tf.ones(3))
x_ph = tf.placeholder(tf.float32, [None, D])
y = Multinomial(probs=neural_network(x_ph), total_count=1.)
qw_0 = Normal(loc=tf.get_variable("qw_0/loc", [D, 10]),
scale=tf.nn.softplus(tf.get_variable("qw_0/scale", [D, 10])))
qb_0 = Normal(loc=tf.get_variable("qb_0/loc", [10]),
scale=tf.nn.softplus(tf.get_variable("qb_0/scale", [10])))
qw_1 = Normal(loc=tf.get_variable("qw_1/loc", [10, 10]),
scale=tf.nn.softplus(tf.get_variable("qw_1/scale", [10, 10])))
qb_1 = Normal(loc=tf.get_variable("qb_1/loc", [10]),
scale=tf.nn.softplus(tf.get_variable("qb_1/scale", [10])))
qw_2 = Normal(loc=tf.get_variable("qw_2/loc", [10, 5]),
scale=tf.nn.softplus(tf.get_variable("qw_2/scale", [10, 5])))
qb_2 = Normal(loc=tf.get_variable("qb_2/loc", [5]),
scale=tf.nn.softplus(tf.get_variable("qb_2/scale", [5])))
qw_3 = Normal(loc=tf.get_variable("qw_3/loc", [5, 3]),
scale=tf.nn.softplus(tf.get_variable("qw_3/scale", [5, 3])))
qb_3 = Normal(loc=tf.get_variable("qb_3/loc", [3]),
scale=tf.nn.softplus(tf.get_variable("qb_3/scale", [3])))
inference = ed.KLqp({
W_0: qw_0, b_0: qb_0,
W_1: qw_1, b_1: qb_1,
W_2: qw_2, b_2: qb_2,
W_3: qw_3, b_3: qb_3,
}, data={x_ph: X_train, y: y_train})
inference.run(n_samples=n_samples, n_iter=n_iter,
logdir='log/{}/{}/{}/{}'.format(symbol,
dataFrequency,
n_samples,
n_iter)
)
y_post = ed.copy(y, {
W_0: qw_0, b_0: qb_0,
W_1: qw_1, b_1: qb_1,
W_2: qw_2, b_2: qb_2,
W_3: qw_3, b_3: qb_3,
})
sess = ed.get_session()
predictions = sess.run(y_post, feed_dict={x_ph: X_test})
print('mse: ', ed.evaluate('mse', data={x_ph: X_test, y: y_test}))
```

However, I can't seem to find a way to replicate a theano switch statement. Goal, use one distribution over another given the particular value sampled at run-time.

If the Latent bernoulli variable is true, use the False Positive Rate beta var, if the latent variable is false, sample from respective Sensitivity beta

Thank you for your time, Mr. Scholak! Spacibo

like Figure 1?

Follow up: I don't think I'm able to use tf.where since it evaluates and doesn't wait for inferencing. Next step for me is to try is tf.cond and return from functions the respective dependant distributions. I am able to use theano.tensor.switch for the PyMC3 implementation built on top of theano.

I was able to find this in the Edward source code code-link:

Use TensorFlow ops such as

Use TensorFlow ops such as

`tf.cond`

to execute subgraphs conditioned on a draw from a random variable.`tf.cond`

to execute subgraphs conditioned on a draw from a random variable.
this might be more of a TFP question, but hopefully someone here can help: why does this crash? what am I misunderstanding? https://gist.github.com/EvanKrall/fdc4e23e3688c809890d908e70737c9c

the issue seems to be that the Affine bijector has a forward_min_event_ndims of 1 even though the distribution it's operating on is a scalar distribution

If so, do you have a reference to specific runtime results comparing Edward to the other PPLs ?

Hi, I am facing an issue while running variational inference using KLqp on an RNN model. Here are the details of my code

def rnn_cell(hprev, x):

return tf.tanh(ed.dot(hprev, Wh) + ed.dot(x, Wx) + bh)

Wx = Normal(loc=tf.zeros([n_i, n_h]), scale=tf.ones([n_i,n_h]))

Wh = Normal(loc=tf.zeros([n_h, n_h]), scale=tf.ones([n_h, n_h]))

Wy = Normal(loc=tf.zeros([n_h, n_o]), scale=tf.ones([n_h, n_o]))

bh = Normal(loc=tf.zeros(n_h), scale=tf.ones(n_h))

by = Normal(loc=tf.zeros(n_o), scale=tf.ones(n_o))

x = tf.placeholder(tf.float32, [None, n_i], name='x')

h = tf.scan(rnn_cell, x, initializer=tf.zeros(n_h))

y = Normal(loc=tf.matmul(h, Wy) + by, scale = 1.0*tf.ones(N))

qWx = Normal(loc=tf.get_variable("qWx/loc", [n_i, n_h]),scale=tf.nn.softplus(tf.get_variable("qWx/scale", [n_i, n_h])))

qWh = Normal(loc=tf.get_variable("qWh/loc", [n_h, n_h]),scale=tf.nn.softplus(tf.get_variable("qWh/scale", [n_h, n_h])))

qWy = Normal(loc=tf.get_variable("qWy/loc", [n_h, n_o]),scale=tf.nn.softplus(tf.get_variable("qWy/scale", [n_h, n_o])))

qbh = Normal(loc=tf.get_variable("qbh/loc", [n_h]),scale=tf.nn.softplus(tf.get_variable("qbh/scale", [n_h])))

qby = Normal(loc=tf.get_variable("qby/loc", [n_o]),scale=tf.nn.softplus(tf.get_variable("qby/scale", [n_o])))

inference = ed.KLqp({Wx:qWx, Wh:qWh, Wy:qWy, bh:qbh, by:qby}, data={x: x_train, y: y_train})

inference.run(n_iter=1000, n_samples=5)

Getting the following ERROR

/Applications/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.pyc in AddWhileContext(self, op, between_op_list, between_ops)

1257 if grad_state is None:

1258 # This is a new while loop so create a grad state for it.

-> 1259 outer_forward_ctxt = forward_ctxt.outer_context

1260 if outer_forward_ctxt:

1261 outer_forward_ctxt = outer_forward_ctxt.GetWhileContext()

AttributeError: 'NoneType' object has no attribute 'outer_context'

I am using Tensorflow 1.7.0 since I faced other issues with using Edward on higher versions. I have seen the same AttributeError reported on stack overflow for other use cases of TensorFlow, but not found a practical solution reported anywhere. Thanks in advance for the help.

Hi, does anyone have an idea about https://discourse.edwardlib.org/t/reproducing-cvae-in-edward-2-and-keras/1074 ? :)

@dustinvtran Are there any examples of MDN for classification on mnist? I've only seen one example http://edwardlib.org/tutorials/mixture-density-network which is about regression on a toy problem from the original paper.

I am trying to run a simple example of Edward (https://github.com/blei-lab/edward/blob/master/examples/bayesian_nn.py), but with TensorFlow 2. In TF2, there is no Edward, but only Edward 2 (https://www.tensorflow.org/probability/api_docs/python/tfp/edward2). Apparently,

`KLqp`

is not defined in TF2., so the call at line 89 of the example (https://github.com/blei-lab/edward/blob/master/examples/bayesian_nn.py#L89) produces the error `AttributeError: module 'tensorflow_probability.python.edward2' has no attribute 'KLqp'`

.