ankurankan on dev
Extra checks and fixes for elim… (compare)
dirichlet
prior with the psuedo_counts
parameter. The default prior_type is BDeu and that uses the equivalent_sample_size
parameter to add a uniform number of counts to each state. If you just add an additional argument prior_type='dirichlet'
to the fit method, I think you will get the expected results.
Hello, here's my source code.
@ankurankan
print("=================== Based on score =================================")
# create random data sample with 3 variables, where Z is dependent on X, Y:
data = bnlearn.import_example()
bic = BicScore(data)
es = ExhaustiveSearch(data, scoring_method=bic)
for score, dag in reversed(es.all_scores()):
print("\n Get the score and dag:")
print(bic.score(dag), dag.edges())
print("\n")
BN = [[('Cloudy', 'Wet_Grass')], [('Wet_Grass', 'Cloudy')], [('Rain', 'Sprinkler')], []]
for bn_ in BN:
print(bn_)
bn_ = BayesianModel(bn_)
print("Get the score and dag:")
print(BicScore(data).score(bn_))
Here is result of my code:
The score of ExhuastSearch:
The score of the graph I defined:
bn_.add_nodes_from(['Cloudy', 'Wet_Grass', 'Rain', 'Sprinkler'])
after the bn_ = BayesianModel(bn_)
line and you should see same results.
@Tracywoou It's happing because the networks are different in the two cases. When using ExhausiveSearch the algorithm will automatically add all the variables in the dataset as nodes, even if they don't have any edges. Whereas in the case when you creating the BayesianModel using the edge list, only those variables are added as nodes to the network. If you want to get the same results, you can simply add this line:
bn_.add_nodes_from(['Cloudy', 'Wet_Grass', 'Rain', 'Sprinkler'])
after thebn_ = BayesianModel(bn_)
line and you should see same results.
Thx for solving the problem for me :D
@ankurankan hello, i'm trying structure learning. I followed the example at http://pgmpy.org/estimators.html#mmhc-estimator. And i got an error :
model = est.estimate()
File "D:\python\anaconda\lib\site-packages\pgmpy\estimators\MmhcEstimator.py", line 91, in estimate
white_list=skel.to_directed().edges(), tabu_length=tabu_length
File "D:\python\anaconda\lib\site-packages\pgmpy\estimators\HillClimbSearch.py", line 296, in estimate
key=lambda t: t[1],
ValueError: max() arg is an empty sequence
I don't see any similar error, please take a look.
AttributeError Traceback (most recent call last)
<ipython-input-34-feb2dc429ef3> in <module>
9 ('STEO.RGDPQ_OECD.M', 'STEO.PATC_OECD.M'),
10 ('STEO.RGDPQ_NONOECD.M', 'STEO.PATC_NON_OECD.M'),]);
---> 11 model = hc.estimate(expert); # Performs local hill climb search
12 model.fit(train_data,state_names=dict(map(lambda e: (e, [0, 1, 2]), datasets)),estimator=BayesianEstimator, prior_type="K2");
~\Anaconda3\lib\site-packages\pgmpy\estimators\HillClimbSearch.py in estimate(self, scoring_method, start_dag, fixed_edges, tabu_length, max_indegree, black_list, white_list, epsilon, max_iter, show_progress)
222 "bicscore": BicScore,
223 }
--> 224 if (scoring_method.lower() not in supported_methods) and (
225 not isinstance(scoring_method, StructureScore)
226 ):
AttributeError: 'BayesianModel' object has no attribute 'lower'
estimate
method. The variables shouldn't get eliminated during estimation, though it is possible that there are no edges to a variable but is should still show up if you call the nodes()
method on the learned model.
from pgmpy.models import DynamicBayesianNetwork as DBN
from pgmpy.factors.discrete import TabularCPD
dbn = DBN()
dbn.add_edges_from([(('D', 0),('G', 0)),(('I', 0),('G', 0)),(('D', 0),('D', 1)),(('I', 0),('I', 1))])
grade_cpd = TabularCPD(('G', 0), 3, [[0.3, 0.05, 0.9, 0.5],
[0.4, 0.25, 0.8, 0.03],
[0.3, 0.7, 0.02, 0.2]],
[('I', 0),('D', 0)], [2, 2])
d_i_cpd = TabularCPD(('D', 1), 2, [[0.6, 0.3],
[0.4, 0.7]],
[('D',0)], [2])
diff_cpd = TabularCPD(('D', 0), 2, [[0.6], [0.4]])
intel_cpd = TabularCPD(('I', 0), 2, [[0.7], [0.3]])
i_i_cpd = TabularCPD(('I', 1), 2, [[0.5, 0.4],
[0.5, 0.6]],
[('I', 0)], [2])
dbn.add_cpds(grade_cpd, d_i_cpd, diff_cpd, intel_cpd, i_i_cpd)
dbn.get_cpds()
[<TabularCPD representing P(('G', 0):3 | ('I', 0):2, ('D', 0):2) at 0x7f90b6bcdf40>,
<TabularCPD representing P(('D', 0):2) at 0x7f90b6bcdf70>,
<TabularCPD representing P(('I', 0):2) at 0x7f90b6bcdfd0>]
In [9]: dbn.get_cpds(time_slice=0)
Out[9]:
[<TabularCPD representing P(('G', 0):3 | ('I', 0):2, ('D', 0):2) at 0x7f914d335700>,
<TabularCPD representing P(('D', 0):2) at 0x7f914d335730>,
<TabularCPD representing P(('I', 0):2) at 0x7f914d3357c0>]
In [10]: dbn.get_cpds(time_slice=1)
Out[10]:
[<TabularCPD representing P(('D', 1):2 | ('D', 0):2) at 0x7f914d335670>,
<TabularCPD representing P(('I', 1):2 | ('I', 0):2) at 0x7f914d335850>]
In [11]: dbn.cpds
Out[11]:
[<TabularCPD representing P(('G', 0):3 | ('I', 0):2, ('D', 0):2) at 0x7f914d335700>,
<TabularCPD representing P(('D', 1):2 | ('D', 0):2) at 0x7f914d335670>,
<TabularCPD representing P(('D', 0):2) at 0x7f914d335730>,
<TabularCPD representing P(('I', 0):2) at 0x7f914d3357c0>,
<TabularCPD representing P(('I', 1):2 | ('I', 0):2) at 0x7f914d335850>]
Good morning, I have a quick question. I'm using structure learning to find structures with the K2Score method and BICScore method. When inspecting the scores afterwards, one gives a higher score for K2, and the other gives a higher score for BIC, which makes sense. But, how do I know which model fits the dataset better? Is there a common score both have to compare the two model and find the better one?
@ankurankan do you know how to approach this?