Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Apr 12 10:38
    MathijsPost commented #1177
  • Apr 11 18:36
    MathijsPost commented #1177
  • Apr 11 18:27
    MathijsPost commented #1177
  • Apr 11 16:49
    ankurankan commented #1177
  • Apr 11 16:18
    MathijsPost commented #1177
  • Apr 11 16:10
    ankurankan commented #1177
  • Apr 11 15:46
    MathijsPost commented #1177
  • Apr 11 15:42
    MathijsPost commented #1177
  • Apr 11 15:08
    Cby19961020 commented #1391
  • Apr 11 08:55

    ankurankan on dev

    Extra checks and fixes for elim… (compare)

  • Apr 11 08:55
    ankurankan closed #1402
  • Apr 11 08:55
    ankurankan closed #1401
  • Apr 11 08:55
    codecov[bot] commented #1402
  • Apr 11 08:55
    codecov[bot] commented #1402
  • Apr 11 08:54
    codecov[bot] commented #1402
  • Apr 11 08:53
    codecov[bot] commented #1402
  • Apr 11 08:51
    Travis ankurankan/pgmpy@2300795 (issue/1401) passed (1550)
  • Apr 11 08:50
    codecov[bot] commented #1402
  • Apr 11 08:34
    ankurankan opened #1402
  • Apr 11 07:13
    ankurankan commented #1391
Ankur Ankan
@ankurankan
@Tracywoou Hi, that doesn't sound right. If the data set is the same, the score should also be the same. Could you also share your code, so I can reproduce this?
Tracywoou
@Tracywoou

Hello, here's my source code.
@ankurankan

print("=================== Based on score =================================")
# create random data sample with 3 variables, where Z is dependent on X, Y:
data = bnlearn.import_example()
bic = BicScore(data)

es = ExhaustiveSearch(data, scoring_method=bic)

for score, dag in reversed(es.all_scores()):
    print("\n Get the score and dag:")
    print(bic.score(dag), dag.edges())

print("\n")
BN = [[('Cloudy', 'Wet_Grass')], [('Wet_Grass', 'Cloudy')], [('Rain', 'Sprinkler')], []]
for bn_ in BN:
    print(bn_)
    bn_ = BayesianModel(bn_)
    print("Get the score and dag:")
    print(BicScore(data).score(bn_))

Here is result of my code:

The score of ExhuastSearch:

Snipaste_2020-11-10_17-36-58

The score of the graph I defined:

Snipaste_2020-11-10_17-43-36

Ankur Ankan
@ankurankan
@Tracywoou It's happing because the networks are different in the two cases. When using ExhausiveSearch the algorithm will automatically add all the variables in the dataset as nodes, even if they don't have any edges. Whereas in the case when you creating the BayesianModel using the edge list, only those variables are added as nodes to the network. If you want to get the same results, you can simply add this line: bn_.add_nodes_from(['Cloudy', 'Wet_Grass', 'Rain', 'Sprinkler']) after the bn_ = BayesianModel(bn_) line and you should see same results.
Tracywoou
@Tracywoou

@Tracywoou It's happing because the networks are different in the two cases. When using ExhausiveSearch the algorithm will automatically add all the variables in the dataset as nodes, even if they don't have any edges. Whereas in the case when you creating the BayesianModel using the edge list, only those variables are added as nodes to the network. If you want to get the same results, you can simply add this line: bn_.add_nodes_from(['Cloudy', 'Wet_Grass', 'Rain', 'Sprinkler']) after the bn_ = BayesianModel(bn_) line and you should see same results.

Thx for solving the problem for me :D

LiuCNStephen
@LiuCNStephen

@ankurankan hello, i'm trying structure learning. I followed the example at http://pgmpy.org/estimators.html#mmhc-estimator. And i got an error :

model = est.estimate()
File "D:\python\anaconda\lib\site-packages\pgmpy\estimators\MmhcEstimator.py", line 91, in estimate
white_list=skel.to_directed().edges(), tabu_length=tabu_length
File "D:\python\anaconda\lib\site-packages\pgmpy\estimators\HillClimbSearch.py", line 296, in estimate
key=lambda t: t[1],
ValueError: max() arg is an empty sequence

I don't see any similar error, please take a look.

Ankur Ankan
@ankurankan
@LiuCNStephen Hi, this was a bug in the last release but has been fixed in the latest dev branch. So, if you install pgmpy again from github, this should get fixed.
LiuCNStephen
@LiuCNStephen
@ankurankan hello, i install the latest version, and there is no error. But I got an empty model result, i saw the same issue at issue #1329. Does this mean that mmhc can't be used now? THX
Ankur Ankan
@ankurankan
@LiuCNStephen Yes, sorry I haven't been able to figure out that bug yet. So, will have to wait a while.
LiuCNStephen
@LiuCNStephen
@ankurankan ok, thanks, looking forward to it
Uzumaki Naruto
@nguyenduyhanlam
I'm a newbie to probabilistic graphical model, I have tried searching google but it doesn't have any clear tutorial on how can I build the hmm to classify the sound based on labels and its mfcc features. Could anyone know how I can do it? Thanks.
Uzumaki Naruto
@nguyenduyhanlam
hi, how can i show MaximumLikelihoodEstimator.estimate_cpd when it's too large for google colab or ide to draw?
image.png
Thanks.
LiuCNStephen
@LiuCNStephen
@ankurankan hello, i'm learning BN structure from data using hill climb algorithm, i see there is a "max_indegree" parameter in hc. Here's my problem: there is a particular importrant node, so i don't want it to have "max_indegree" limitation, and for other nodes, i want to limit their indegree nums. Is there any way i can do this? Thank you.
Ankur Ankan
@ankurankan
@LiuCNStephen Sorry but I don't think there is any direct way to do this. A possible way could be to first learn the model with the max indegree specified. And then manually add different combinations of edges to the node for which you don't want the indegree to be limited, and compare the structure scores for these combinations. But this approach might lead to suboptimal solutions. The other way would be to modify the source code which I think should be quite straightforward.
LiuCNStephen
@LiuCNStephen
@ankurankan OK, thanks for advice, i'll give them a try.
Kalvik
@kdkalvik
Hi, is there a way to model a variable's density as a Bernoulli distribution in pgmpy?
Ankur Ankan
@ankurankan
@kdkalvik Hi, no it's not possible yet.
Kalvik
@kdkalvik
@ankurankan Got it, thanks.
Pramod P Nair
@medackan_twitter
Please help me rectify the error

Initialise Hill Climbing Estimator

hc = HillClimbSearch(train_data, scoring_method=K2Score(train_data));
expert = BayesianModel();
expert.add_nodes_from(datasets);
expert.add_edges_from([('STEO.PAPR_NONOPEC.M', 'WTISPLC'),
('STEO.PAPR_OPEC.M', 'WTISPLC'),
('STEO.PATC_OECD.M', 'WTISPLC'),
('STEO.PATC_NON_OECD.M', 'WTISPLC'),
('STEO.RGDPQ_OECD.M', 'STEO.PATC_OECD.M'),
('STEO.RGDPQ_NONOECD.M', 'STEO.PATC_NON_OECD.M'),]);
model = hc.estimate(expert); # Performs local hill climb search
model.fit(train_data,state_names=dict(map(lambda e: (e, [0, 1, 2]), datasets)),estimator=BayesianEstimator, prior_type="K2");
the error message is:

AttributeError Traceback (most recent call last)

<ipython-input-34-feb2dc429ef3> in <module>
9 ('STEO.RGDPQ_OECD.M', 'STEO.PATC_OECD.M'),
10 ('STEO.RGDPQ_NONOECD.M', 'STEO.PATC_NON_OECD.M'),]);
---> 11 model = hc.estimate(expert); # Performs local hill climb search
12 model.fit(train_data,state_names=dict(map(lambda e: (e, [0, 1, 2]), datasets)),estimator=BayesianEstimator, prior_type="K2");

~\Anaconda3\lib\site-packages\pgmpy\estimators\HillClimbSearch.py in estimate(self, scoring_method, start_dag, fixed_edges, tabu_length, max_indegree, black_list, white_list, epsilon, max_iter, show_progress)
222 "bicscore": BicScore,
223 }
--> 224 if (scoring_method.lower() not in supported_methods) and (
225 not isinstance(scoring_method, StructureScore)
226 ):

AttributeError: 'BayesianModel' object has no attribute 'lower'

Ankur Ankan
@ankurankan
@medackan_twitter You need to use the argument name to specify the start_dag: model = hc.estimate(start_dag=expert). Without the argument name python by default will use expert as input for scoring_method argument.
Sruthi Radhakrishnan
@Sruthi5797
Hi, I try to perform a hill -climb search using BIC score, please help me to clarify when you perform a hill climb search how many variables are actually supported ? are there any possibilities of a variable being eliminated in the process? if yes, then why? - Thanks
Ankur Ankan
@ankurankan
@Sruthi5797 HI, please have a look at the code example here: http://pgmpy.org/estimators.html#pgmpy.estimators.HillClimbSearch for how to use it. There is no limit on the number of supported variables but the computation time will increase exponentially with the number of variables. To reduce the time, you can always put limitations on the model structures by using the various arguments of the estimate method. The variables shouldn't get eliminated during estimation, though it is possible that there are no edges to a variable but is should still show up if you call the nodes() method on the learned model.
LiuCNStephen
@LiuCNStephen
@ankurankan Hi, I'm using hill-climb search. I added a black_list before and the algorithm worked just fine, but when I added a fixed edges list, it would raise "Memory Error", even if the length of fixed edegs list was even below 10. I'm wondering why.
Ankur Ankan
@ankurankan
@LiuCNStephen That's a bit weird because the algorithm in the case of fixed edges just does an extra check so that it doesn't change any of the fixed edges. Also, hill climb is not a memory intensive algorithm, so I am not sure why would it run out of memory. How many variables are you using, and what's the size of your dataset?
LiuCNStephen
@LiuCNStephen
@ankurankan I'm using around 50 variables and 30000+ data. It is very weird as the algorithm worked out fine when I didn't add any fixed edges.
Ankur Ankan
@ankurankan
@LiuCNStephen So, I tried doing Hill Climb with a dataset of similar size as you mentioned (with some fixed edges) , and it takes around 200 MBs of memory. Could you check if some other process is using up the memory on your machine.
MathijsPost
@MathijsPost
Hi, in HillClimbSearch, what is the difference between starting with a start_dag or specifying 'fixed_edges'?
Ankur Ankan
@ankurankan
@MathijsPost start_dag gives the algorithm the initial model from where it starts the search. Giving fixed_edges ensures that the final output always has those edges.
MathijsPost
@MathijsPost
@ankurankan does that mean that the algorithm could delete edges from the start_dag? I am wondering, if the start_dag and fixed_edges are the same, will the outcome be different if you use one or the other?
Ankur Ankan
@ankurankan
@MathijsPost Yes, that's correct. The edges in the start_dag can be modified by the agorithm. You can think of fixed_edges as same as start_dag but with a flag to not change those edges.
MathijsPost
@MathijsPost
@ankurankan I see! Thank you :)
Martin Manolov
@martin.manolov96_gitlab
@ankurankan Hey, is there a way to print out the phi values for the outcome as an array/list instead of a table? After using VariableElimination and infer query for my model.
Gaoxiang Zhou
@Gavin_Chou1994_twitter
@ankurankan Hey, I was trying to add inter-slice cpds just as the tutorial. With all the dependency packages installed, I was only able to add intra-slice cpds as the following. The get_cpds() function returns no error and just ignores these inter-slice cpds.
from pgmpy.models import DynamicBayesianNetwork as DBN
from pgmpy.factors.discrete import TabularCPD
dbn = DBN()
dbn.add_edges_from([(('D', 0),('G', 0)),(('I', 0),('G', 0)),(('D', 0),('D', 1)),(('I', 0),('I', 1))])
grade_cpd = TabularCPD(('G', 0), 3, [[0.3, 0.05, 0.9, 0.5],
                                     [0.4, 0.25, 0.8, 0.03],
                                     [0.3, 0.7, 0.02, 0.2]],
                       [('I', 0),('D', 0)], [2, 2])
d_i_cpd = TabularCPD(('D', 1), 2, [[0.6, 0.3],
                                  [0.4, 0.7]],
                     [('D',0)], [2])
diff_cpd = TabularCPD(('D', 0), 2, [[0.6], [0.4]])
intel_cpd = TabularCPD(('I', 0), 2, [[0.7], [0.3]])
i_i_cpd = TabularCPD(('I', 1), 2, [[0.5, 0.4],
                                   [0.5, 0.6]],
                     [('I', 0)], [2])
dbn.add_cpds(grade_cpd, d_i_cpd, diff_cpd, intel_cpd, i_i_cpd)
dbn.get_cpds()
[<TabularCPD representing P(('G', 0):3 | ('I', 0):2, ('D', 0):2) at 0x7f90b6bcdf40>,
 <TabularCPD representing P(('D', 0):2) at 0x7f90b6bcdf70>,
 <TabularCPD representing P(('I', 0):2) at 0x7f90b6bcdfd0>]
Ankur Ankan
@ankurankan
@martin.manolov96_gitlab You can try the .values attribute of the returned CPD. It will give you a numpy array.
Ankur Ankan
@ankurankan
@Gavin_Chou1994_twitter I think it's behaving as expected. get_cpds method takes an argument time_slice and the default value is 0. Hence, you are seeing only cpds with nodes in the 0th slice. If you want all the cpds, you can use the .cpds attribute.
In [9]: dbn.get_cpds(time_slice=0)
Out[9]: 
[<TabularCPD representing P(('G', 0):3 | ('I', 0):2, ('D', 0):2) at 0x7f914d335700>,
 <TabularCPD representing P(('D', 0):2) at 0x7f914d335730>,
 <TabularCPD representing P(('I', 0):2) at 0x7f914d3357c0>]

In [10]: dbn.get_cpds(time_slice=1)
Out[10]: 
[<TabularCPD representing P(('D', 1):2 | ('D', 0):2) at 0x7f914d335670>,
 <TabularCPD representing P(('I', 1):2 | ('I', 0):2) at 0x7f914d335850>]

In [11]: dbn.cpds
Out[11]: 
[<TabularCPD representing P(('G', 0):3 | ('I', 0):2, ('D', 0):2) at 0x7f914d335700>,
 <TabularCPD representing P(('D', 1):2 | ('D', 0):2) at 0x7f914d335670>,
 <TabularCPD representing P(('D', 0):2) at 0x7f914d335730>,
 <TabularCPD representing P(('I', 0):2) at 0x7f914d3357c0>,
 <TabularCPD representing P(('I', 1):2 | ('I', 0):2) at 0x7f914d335850>]
PhilrainV
@PhilrainV
Is there anyone who has BNC-PSO algorithm help me pls <3
MathijsPost
@MathijsPost
Good morning, I have a quick question. I'm using structure learning to find structures with the K2Score method and BICScore method. When inspecting the scores afterwards, one gives a higher score for K2, and the other gives a higher score for BIC, which makes sense. But, how do I know which model fits the dataset better? Is there a common score both have to compare the two model and find the better one?
Meteore
@iameteore314
@ankurankan Hi, I've been looking for resources - documentations, threads - on how to deploy a pgmpy BN/DBN on Google Cloud Platform in order to run predictions (at scale). Any recommendations? Would mean a lot, thanks 😊
*by predictions I mean inferences
Ankur Ankan
@ankurankan
@iameteore314 I don't know if anyone has deployed pgmpy at scale, so I don't really know if it would bring any challenges. I think it should work normally as any python package. One point to keep in mind would be that predictions would be quite slow compared to normal machine learning algorithms, but it might help if you do batch predictions (because of result caching).
Meteore
@iameteore314
@ankurankan thanks for your quick reply! I’ll try it out, and document this project to let you know what comes up. All the best, Meteore.
MathijsPost
@MathijsPost

Good morning, I have a quick question. I'm using structure learning to find structures with the K2Score method and BICScore method. When inspecting the scores afterwards, one gives a higher score for K2, and the other gives a higher score for BIC, which makes sense. But, how do I know which model fits the dataset better? Is there a common score both have to compare the two model and find the better one?

@ankurankan do you know how to approach this?

Ankur Ankan
@ankurankan
@MathijsPost I would suggest doing something like what we discussed in #1361. I don't know of any better methods for model comparison. I also recently worked on a paper for testing model structures against data, might be helpful: https://currentprotocols.onlinelibrary.wiley.com/doi/pdfdirect/10.1002/cpz1.45
MathijsPost
@MathijsPost
Thank you! @ankurankan