Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 16:15
    ankurankan opened #1202
  • 15:17
    yaowang1111 closed #1201
  • 15:17
    yaowang1111 commented #1201
  • 15:08
    ankurankan commented #1201
  • 14:58
    yaowang1111 commented #1201
  • 14:51
    ankurankan commented #1201
  • 14:08
    codecov[bot] commented #1176
  • 14:08
    ankurankan synchronize #1176
  • 13:13
    yaowang1111 opened #1201
  • 10:21

    ankurankan on dev

    Renames BdeuScore to BDeuScore (compare)

  • 10:21
    ankurankan closed #1200
  • 10:18
    codecov[bot] commented #1200
  • 09:48
    ankurankan edited #1162
  • 09:40
    ankurankan opened #1200
  • 09:37

    ankurankan on dev

    Updates Readme Make the version number just de… (compare)

  • 09:37
    ankurankan closed #1199
  • 08:58
    ankurankan opened #1199
  • Dec 10 16:02
    ankurankan edited #1198
  • Dec 10 15:40

    ankurankan on dev

    Adds github action to run all t… Don't run Mmhc estimators tests… (compare)

  • Dec 10 15:40
    ankurankan closed #1189
jonvaljohn
@jonvaljohn
@ankurankan, this is the error I am getting.
Ankur Ankan
@ankurankan
@jonvaljohn Hmm, it's working fine on my machine. Could you tell me your python version and dependency packages' versions?
jonvaljohn
@jonvaljohn
Python 3.7
How do I find the dependency packages' versions?
Thanks for all your help.
Ankur Ankan
@ankurankan
@jonvaljohn You can run: pip freeze | grep -E '(numpy|scipy|pandas|networkx)=='
jonvaljohn
@jonvaljohn
networkx==2.2
numpy==1.16.2
pandas==0.24.2
scipy==1.2.1
jonvaljohn
@jonvaljohn
Has anyone used dynamic bayesian network in pgmpy? The provided code is implementing a 2-TBN but I dont see a way to iterate with time. What is the best practice to provide new values for the CPDs for the "0" parameters. I assume that the "1" parameters should be automatically updated.
Ankur Ankan
@ankurankan
@jonvaljohn You can't use Variable Elimination for DBN with the current implementation. Try using pgmpy.inference.DBNInference.
jonvaljohn
@jonvaljohn
Thanks so much. Let me try... if I can get all this to work, I could contribute a DBN usage notebook so others can leverage this.
jonvaljohn
@jonvaljohn
from pgmpy.factors.discrete import TabularCPD
from pgmpy.models import DynamicBayesianNetwork as DBN
from pgmpy.inference import DBNInference


dbn = DBN() 
#dbn.add_edges_from([(('M',0), ('S', 0)), (('S',0), ('S', 1)), (('S', 0), ('J', 0)) ])
dbn.add_edges_from([(('M',1), ('S', 1)), (('S', 1), ('J', 1)), (('S',0), ('S', 1)) ])
#S_cpd = TabularCPD (('S',0), 2, [[0.5, 0.5]]) # Prior the state should go 50/50 
M_cpd = TabularCPD (('M',1), 2, [[0.5, 0.5]]) # Prior the model should go 50/50 


M_S_S_cpd = TabularCPD(variable = ('S',1), variable_card=2, 
                       values = [[0.95, 0.3, 0.1, 0.05],
                                 [0.05, 0.7, 0.9, 0.95]],
                     evidence=[ ('M', 1), ('S', 0)],
                     evidence_card=[2, 2])


S_J_cpd = TabularCPD(('J',1), 2, [[0.9, 0.1],
                                   [0.1, 0.9]],
                     evidence=[('S',1)],
                     evidence_card=[2])
dbn.add_cpds(M_cpd, M_S_S_cpd, S_J_cpd)
dbn_inf = DBNInference(dbn)
dbn_inf.forward_inference([('J', 2)], { ('M', 1):0, ('M', 2):0, ('S', 0): 0})

It is still failing... if I have the variable that goes from state to state as the parent in the individual time slice then the code works otherwise it fails.

Now this gives a "key error"

Ankur Ankan
@ankurankan
@jonvaljohn Sorry for the late reply. Yes, the current code makes the assumption that the model structure remains the same in each time slice. Are you trying to have a different model structure?
Yujian Liu
@yujianll
Hi, it looks like reduce() function in tabularCPD and DiscreteFactor takes (var_name, var_state_name) as input. I wonder is there a way to pass (var_name, var_state_no) as input.
Because this for loop in pre_compute_reduce() Sampling.py seems to use (var_name, var_state_no) as input:
for state_combination in itertools.product( *[range(self.cardinality[var]) for var in variable_evid] ): states = list(zip(variable_evid, state_combination)) cached_values[state_combination] = variable_cpd.reduce( states, inplace=False ).values
Sandeep Narayanaswami
@sandeep-n
Hey folks!
I'm using pgmpy in a project, and needed to fit a LinearGaussianBayesianNetwork to a dataset. Since the .fit() method isn't yet implemented, I wrote my own using sklearn's LinearRegression.
Would the maintainers be interested in a PR with this implementation? Note that it would be introducing a dependency on sklearn. Or would you prefer an implementation with scipy/statsmodels?
Ankur Ankan
@ankurankan
@sandeep-n Hey, that would be great to have in pgmpy. But ideally I wanted to have the implementation using statsmodels mainly because of the implementations for different fit metrics . With sklearn we will have to write our own methods to compute these. Do you think it would be possible for you to use statsmodels instead of sklearn? Else if you open a PR with your current implementation, I can work on it to use statsmodels.
Sandeep Narayanaswami
@sandeep-n
@ankurankan Sounds good, I should be able to port it to statsmodels instead.
Ankur Ankan
@ankurankan
@sandeep-n Great. Let me know if I can help in any way :)
IshayTelavivi
@IshayTelavivi
Hi! I am new to pgmpy. I have created a model, generated the cpds and made a prediction, which is what I needed, and this was cool. However there are two things I am struggling with: 1. 'predict_probability' doesn't work. It gives me an index error (IndexError: index 11 is out of bounds for axis 0 with size 9), I can't figure out why. Standard predict works fine. 2. I couldn't find any reference for using a latent variable. How can include a latent variable? How do I establish its cpd? Suppose my latent variable is "C", and is the outcome of "A" and "B", and the outcome of "C" is "D". How do I combine everything?. Thanks
Ankur Ankan
@ankurankan
@IshayTelavivi Could you share your code so that I can reproduce the error? Maybe create an issue for it. Currently pgmpy doesn't support latent variables so you won't be able to do that right now.
felixleopoldo
@felixleopoldo
Hi, I want to import the data from the standard datasets in th R-library blnearn, http://www.bnlearn.com/bnrepository/. Did anyone do this before?
Ankur Ankan
@ankurankan
@felixleopoldo If you mean that you want to import the models, you can use the pgmpy.readwrite.BIFReader for the BIF format files.
Tomislav Kovačević
@tomkovna

Hello,
Is there any way to update already defined CPD for given model, with new data points with missing values? Here's an example of code (in order for you to understand more clearly what I'd like to do)

import numpy as np
import pandas as pd
import numpy as np
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator, BayesianEstimator
from pgmpy.factors.discrete import TabularCPD

# define links between nodes
model = BayesianModel([('A', 'B'), ('C', 'B')])

# define some initial CPDs
cpd_a = TabularCPD(variable='A', variable_card=2,
                      values=[[0.9], [0.1]])
cpd_c = TabularCPD(variable='C', variable_card=2,
                       values=[[0.3], [0.7]])
cpd_b = TabularCPD(variable='B', variable_card=3,
                        values=[[0.2, 0.1, 0.25, 0.5],
                                [0.3, 0.5, 0.25, 0.3],
                                [0.5, 0.4, 0.5, 0.2]],
                        evidence=['A', 'C'],
                        evidence_card=[2, 2])

# Associating the parameters with the model structure.
model.add_cpds(cpd_a, cpd_b, cpd_c)

# Checking if the cpds are valid for the model.
model.check_model()

#generate some data, with C as missing value
raw_a = np.random.randint(low=0, high=2,size=100)
raw_b = np.random.randint(low=0, high=3,size=100)
raw_c = np.empty(100)
raw_c[:] = np.NaN
data = pd.DataFrame({"A" : raw_a, "B" : raw_b, "C" : raw_c})

# define pseudo counts according to initial cpds and variable cardinality
pseudo_counts = {'A': [[300], [700]], 'B': [[500,100,100,300], [100,500,300,400], [400,500,100,200]], 'C': [[200], [100]]}

# fit model with new data 
model.fit(data, complete_samples_only=False, estimator=BayesianEstimator, prior_type='dirichlet', pseudo_counts=pseudo_counts)

# print updated cpds
for cpd in model.get_cpds():
    print("CPD of {variable}:".format(variable=cpd.variable))
    print(cpd)

When i try this, i get an error message saying:
"ValueError: The shape of pseudo_counts must be: (3, 0)"

Ankur Ankan
@ankurankan
@tomkovna This should ideally work. Seems like a bug. I will try to fix this and get back to you. Thanks for reporting.