Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
  • Oct 20 2020 02:27
    yugyesh commented #723
  • Sep 27 2020 08:58
    mlwp3 opened #949
  • Aug 11 2020 07:42
    mafattma commented #948
  • Jul 05 2020 16:16
    PhatBoy44 opened #948
  • Jun 10 2020 06:44
    wuziniu opened #947
  • May 29 2020 06:36
    hyeon424 closed #946
  • Apr 08 2020 09:19
    hyeon424 commented #943
  • Apr 08 2020 09:14
    jtlz2 commented #943
  • Mar 12 2020 07:33
    hyeon424 commented #944
  • Mar 12 2020 07:23
    hyeon424 edited #946
  • Mar 12 2020 07:22
    hyeon424 commented #942
  • Mar 12 2020 07:21
    hyeon424 commented #942
  • Mar 12 2020 07:19
    hyeon424 commented #943
  • Mar 12 2020 07:19
    hyeon424 commented #943
  • Mar 12 2020 07:18
    hyeon424 commented #943
  • Mar 12 2020 07:14
    hyeon424 opened #946
  • Feb 24 2020 14:59
    RoyiAvital commented #464
  • Jan 29 2020 13:54
    ursb2017 commented #945
  • Jan 29 2020 13:53
    ursb2017 commented #945
  • Jan 29 2020 13:52
    ursb2017 opened #945
Raza Habib
what have you tried to fix it so far?
have you tried running it with a very low learning rate, sometimes its numerical overflow from large gradients

Hi all, In the implementation of Stochastic gradient Hamiltonian Monte Carlo [@chen2014stochastic] in Edward, we found that the update for gradients always uses full dataset instead of a minibatch of the dataset as suggested in the paper. We wonder if we miss anything or in the current implementation we just use the full dataset?

Code example: https://github.com/blei-lab/edward/blob/152c19f3080be0826b60fdb57c6d60724e044f2e/edward/inferences/sghmc.py#L110

Paper reference: http://proceedings.mlr.press/v32/cheni14.pdf

Any insight would be appreciated, thank you!
Thanks for your quick response, @Razcle. I did change learning_rate to 1e-10. But I still got 'nan's.
Ahmad Salim Al-Sibahi
Hi, what should I do when sample returns nan?
that is on an empirical distribution
Ahmad Salim Al-Sibahi
Oh, I should have read what it said above
Hello, I am a beginer. I have a question. Can I combine the tf.contrib.rnn.basicLSTMCell and ed.models.Normal together? How should I train my model? Is there a example? Thank you.
Josh Drakee
Hiya, does anyone have example code to how Edward can be used for sentiment analysis?
Josh L. Espinoza
Any tutorials/books using Edward largely around Keras?
Hello guys, what does Turing complete probabilistic programming language means? If it means any computable probability distribution, then can we define an Undirected Graph in Edward? I think not all the undirected graphical models can be represented by directed models which is stated by a figure in Bishop in Pattern Recognition and Machine Learning.
Raza Habib
You can always l1
Shaowu Pan
Hi all, a quick question: the KLqp documentation say it is using variational EM, is it just ADVI for the ELBO?
if you look at the paper https://arxiv.org/abs/1610.09787 pg:14 as an example to Hybrid Algorithms they show the variational EM algorithm, it's a sub class of variational methods that can be implemented with Edward
Shaowu Pan
@cruyffturn Thanks for the information. So I think the default KLqp is not doing the central algorithm described in ADVI Paper? Today I checked the API, it says KLqp is doing This class minimizes the objective by automatically selecting from a variety of black box inference techniques. So it is BBVI, not the better version of VI that use gradient in the model. ref http://edwardlib.org/api/ed/KLqp
Shaowu Pan
But from the another API page, I think the reparametericGraident in KLqp is doing ADVI that uses the gradient information and score function is the BBVI. http://edwardlib.org/tutorials/klqp
Shaowu Pan
Hello guys! Just curious, why do you prefer ADVI over Laplace approximation on the MAP? The latter is faster, cheaper. The ADVI paper mentions some difference between ADVI and Laplace approximation. But from my viewpoint, ADVI is still mode seeking so I don't know what is exactly the improvement for the past few decades since Laplace approximation is found.
Shaowu Pan
update on the understanding of hyperprior setting in Edward: it is assumed that Q(w,alpha) = Q(w) Q(alpha), which is stated in some other paper about VI but not Edward/ADVI paper. It is kind of weird that alpha is the prior parameter for w but their posterior is assumed to be independent between w and alpha.
Results on some simple linear regression have been shown this approximation seems okay to produce reasonable results.
However, the most orthodox way (not VI to minimize KL) to do this is to compute the P(alpha|D), by first computing P(D|alpha) P(alpha), and integrating out the parameter in the likelihood P(D|w,a lpha) to get P(D|alpha). Then do the MAP on P(alpha|D), to get the best alpha. Still, I don't see any exact equivalent relation between the above classical way and the ADVI.
Shaowu Pan
I will just leave it here, in case someone found similar problems or interesting things.
Hello all, I have very general question I am not sure this is the right place to ask
Evan Krall
When doing inference, is there a way to map variables to distributions of different shapes? e.g. the prior is a mixture model for each of several variables, but the posterior is a multivariate normal distribution
Evan Krall
I ended up writing a probably-terrible thing for this: https://gist.github.com/EvanKrall/daab4d4abced844e6caef951e7fee06e
matthieu bulté

Hi, I'm looking into issue #271 which is about implementing IS / SMC inference and I was thinking of the two following options:

  1. Implement an ImportanceSampling class inheriting from MonteCarlo. The build_update method simply computes one sample from the prior and its the importance weight (likelihood ratio). At each iteration, I don't update the user's Empirical but store everything internally (let's ignore the how for now) and only in populate the user's Empirical in the finalize method by sampling the auto-normalized weighted approximation.
  2. Create a WeightedEmpirical distribution, super-classing Empirical (default weights of 1/N) and replace in MonteCarlo the Empirical requirement by WeightedEmpirical. Then at each iteration I can directly populate the WeightedEmpirical and auto-normalize the distribution in the finalize method.

It feels like the first option is more of a hack, but is easier to implement than the second option which would require refactoring some existing code. Please let me know which of these two options make more sense or if you have any comments about them.

Oh, sorry I forgot to mention that I'm just trying to implement the IS part of the ticket. Implementing SMC straight away would be a little too much for my first contribution ;)
Evan Krall
looks like the Independent distribution is what I'm looking for, though it seems to require that everything is identically distributed too
Matthew Feickert
Hi. In Edwaard there is ed.models.Empirical. Does anyone know what the corresponding thing in Edward2 or TensorFlow Probability is? Maybe as_random_variable?
Matthew Feickert
@cruyffturn I already read that, but it wasn't entirely clear to me why we would want to use softplus though. I can reread this though, so thanks for taking time to respond.

I have a problem where I have an architecture vaguely similar to an auto-encoder, but I want the encoder to be probabilistic.

I think I need this because the loss function I'm optimizing has a few 'hot spots' (good initial conditions) and many very 'cold spots' (zero gradient).

So, if I treat the output as something deterministic, just by the luck of initialization very few (maybe zero) of the encoder outputs may be hot spots. But if they are treated as a distribution with enough variance to cover the hot spots, I should be able to sample good encodings to find a good trajectory to optimize (as well as push the encoder distribution further towards hot spots during training).

Does this sound suited for probabilistic programming, and does anyone have any advice based on this description?

so in essence I would like to treat the output of my encoder as parameters of a distribution, and calculate my loss based on (potentially many) samples from that distribution
when i import edward ImportError: cannot import name 'set_shapes_for_outputs'
when i import edward ImportError: cannot import name 'set_shapes_for_outputs'
Ahmet Can Acar

Greetings, I m trying to use Multinomial Distribution in Edward to predict multiclass labels ( 3 class ) with neural network. I m confused about :

1-) how should i design label dataset as shape. I choose to way that converting label as [[0],[1],[0],[2]] to [[1,0,0],[0,1,0],[1,0,0],[0,0,2] ] .
2-) what conditions for my total_counts arg in multinomial function not being equal 1 when i use probs instead logits?

I m not too familiar with multinomial actually i used bernoulli easily but i cant handle multiclass network:/

After training my data i got predictions from test data.But when i try evaluate mse i m getting error:
ValueError: Dimensions must be equal, but are 145 and 3 for 'sub_2' (op: 'Sub') with input shapes: [145], [145,3].
Here my code :
and if u see some missing parts of me i m very happy to get advise.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf
import edward as ed
from edward.models import Normal, Multinomial

num_labels = 3
(n_samples, n_iter) = (30, 2500)
symbol = 'A'
dataFrequency = '10'

X, Y = np.array(X), np.array(Y)
Y =(np.arange(num_labels) == Y[:,None]).astype(np.float32)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)

# X_train.shape  (578, 120)
# y_train.shape  (578, 3)
# X_test.shape  (145, 120)
# y_test.shape  (145, 3)

def neural_network(x):
    h = tf.tanh(tf.matmul(x, W_0) + b_0)
    h = tf.tanh(tf.matmul(h, W_1) + b_1)
    h = tf.tanh(tf.matmul(h, W_2) + b_2)
    h = tf.matmul(h, W_3) + b_3
    nn_result = tf.nn.softmax(h)
    return nn_result

D = X_train.shape[1]
N = X_train.shape[0]
N2 = X_test.shape[0]

W_0 = Normal(loc=tf.zeros([D, 10]), scale=tf.ones([D, 10]))
W_1 = Normal(loc=tf.zeros([10, 10]), scale=tf.ones([10, 10]))
W_2 = Normal(loc=tf.zeros([10, 5]), scale=tf.ones([10, 5]))
W_3 = Normal(loc=tf.zeros([5, 3]), scale=tf.ones([5, 3]))
b_0 = Normal(loc=tf.zeros(10), scale=tf.ones(10))
b_1 = Normal(loc=tf.zeros(10), scale=tf.ones(10))
b_2 = Normal(loc=tf.zeros(5), scale=tf.ones(5))
b_3 = Normal(loc=tf.zeros(3), scale=tf.ones(3))

x_ph = tf.placeholder(tf.float32, [None, D])
y = Multinomial(probs=neural_network(x_ph), total_count=1.)

qw_0 = Normal(loc=tf.get_variable("qw_0/loc", [D, 10]),
              scale=tf.nn.softplus(tf.get_variable("qw_0/scale", [D, 10])))
qb_0 = Normal(loc=tf.get_variable("qb_0/loc", [10]),
              scale=tf.nn.softplus(tf.get_variable("qb_0/scale", [10])))
qw_1 = Normal(loc=tf.get_variable("qw_1/loc", [10, 10]),
              scale=tf.nn.softplus(tf.get_variable("qw_1/scale", [10, 10])))
qb_1 = Normal(loc=tf.get_variable("qb_1/loc", [10]),
              scale=tf.nn.softplus(tf.get_variable("qb_1/scale", [10])))
qw_2 = Normal(loc=tf.get_variable("qw_2/loc", [10, 5]),
              scale=tf.nn.softplus(tf.get_variable("qw_2/scale", [10, 5])))
qb_2 = Normal(loc=tf.get_variable("qb_2/loc", [5]),
              scale=tf.nn.softplus(tf.get_variable("qb_2/scale", [5])))
qw_3 = Normal(loc=tf.get_variable("qw_3/loc", [5, 3]),
              scale=tf.nn.softplus(tf.get_variable("qw_3/scale", [5, 3])))
qb_3 = Normal(loc=tf.get_variable("qb_3/loc", [3]),
              scale=tf.nn.softplus(tf.get_variable("qb_3/scale", [3])))

inference = ed.KLqp({
    W_0: qw_0, b_0: qb_0,
    W_1: qw_1, b_1: qb_1,
    W_2: qw_2, b_2: qb_2,
    W_3: qw_3, b_3: qb_3,
}, data={x_ph: X_train, y: y_train})
inference.run(n_samples=n_samples, n_iter=n_iter,

y_post = ed.copy(y, {
    W_0: qw_0, b_0: qb_0,
    W_1: qw_1, b_1: qb_1,
    W_2: qw_2, b_2: qb_2,
    W_3: qw_3, b_3: qb_3,

sess = ed.get_session()
predictions = sess.run(y_post, feed_dict={x_ph: X_test})

print('mse: ', ed.evaluate('mse', data={x_ph: X_test, y: y_test}))
Gaurav Shrivastava
Hi there, I'm not sure this is the right place to ask but can anybody direct me to an example(script) of variational gaussian process or hierarchical variational models.
vincenzoserio Yo ;)
Dmitriy Voronin
Hey guys, checking in from Richmond VA! Anybody have a second to lend an ear?
Dmitriy Voronin
@tscholak Hey, thanks for that great talk last year on Edward. Do you have a moment?
Torsten Scholak
sure, what’s up?
Dmitriy Voronin
I am trying to move my Bayesian Network from a PyMC3 implementation to Edward since Theano isn't able to handle the complexity of the network.
However, I can't seem to find a way to replicate a theano switch statement. Goal, use one distribution over another given the particular value sampled at run-time.
Torsten Scholak
a mixture model with two different base distributions?
Dmitriy Voronin
Latent Truth Model
If the Latent bernoulli variable is true, use the False Positive Rate beta var, if the latent variable is false, sample from respective Sensitivity beta
Thank you for your time, Mr. Scholak! Spacibo
Torsten Scholak
sounds like a mixture to me
like Figure 1?