Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Aug 12 18:22

    ankurankan on dev

    factor_product returns a copy o… (compare)

  • Aug 12 18:22
    ankurankan closed #1565
  • Aug 12 18:22
    ankurankan closed #1566
  • Aug 12 18:21
    ankurankan edited #1566
  • Aug 12 18:21
    ankurankan edited #1566
  • Aug 12 18:21
    ankurankan edited #1566
  • Aug 12 18:21
    codecov[bot] commented #1566
  • Aug 12 18:16
    codecov[bot] commented #1566
  • Aug 12 18:15
    codecov[bot] commented #1566
  • Aug 12 18:15
    codecov[bot] commented #1566
  • Aug 12 18:13
    codecov[bot] commented #1566
  • Aug 12 18:11
    codecov[bot] commented #1566
  • Aug 12 17:58
    ankurankan opened #1566
  • Aug 12 17:58
    ankurankan milestoned #1565
  • Aug 12 17:58
    ankurankan labeled #1565
  • Aug 12 17:58
    ankurankan labeled #1565
  • Aug 12 17:48
    ankurankan commented #1564
  • Aug 12 17:48
    ankurankan milestoned #1564
  • Aug 12 17:47
    ankurankan labeled #1564
  • Aug 12 17:47
    ankurankan labeled #1564
Meteore
@iameteore314
Hi @ankurankan , I've just loaded a BN in .bif format with the read/write function but I can't seem to run (VE or BP) inference directly on the model... I either get this error: "TypeError: 'BIFWriter' object is not callable" or "AttributeError: 'BIFWriter' object has no attribute 'check_model'"
How could I run inference with evidence on a .bif model without having to enter all the cpds by hand? Thanks a lot for your help
Ankur Ankan
@ankurankan
@EthanEX The forward_inference returns a DiscreteFactor object. If you print it, it should show you a table with which probability value corresponds to which state
@iameteore314 If you are using the BIFReader file to read in the model, you would need to also call the get_model method to get the actual BayesianNetwork object. Here's an example: https://pgmpy.org/readwrite/bif.html#pgmpy.readwrite.BIF.BIFReader.get_model . Alternatively, you could just use the load method to get the model: https://pgmpy.org/models/bayesiannetwork.html#pgmpy.models.BayesianNetwork.BayesianNetwork.load
Meteore
@iameteore314
Thanks @ankurankan, the load method works well!
EthanEX
@EthanEX
hi @ankurankan can I use pgmpy to predict time series, I've constructed a DBN with two time slices, and used the forward_inference to try to predict the target variable. I'm a beginner, so I don't know whether this approach is correct. By the way, I want to know whether pgmpy can be used to construct DBN with continuous variables? Thank you very much!
Ankur Ankan
@ankurankan
@EthanEX I think you should use the DBNInference.query method if you are looking to get prediction probabilities instead of forward_inference. The forward_inference method is the first part of running belief propagation, the query method handles both forward and backward inference automatically. No, only discrete variables are supported for DBNs right now.
EthanEX
@EthanEX
thanks @ankurankan
Meteore
@iameteore314
Hi @ankurankan,
I was just wondering if there was a way to export the output of an exact inference query in a convenient format (like json)? Thanks a lot
Ankur Ankan
@ankurankan
@iameteore314 No, there's no way to export to json yet. The best way to view the output would be to just print it, which would show it as a table.
Meteore
@iameteore314
Alright! Thank you very much for the auick replies
*quick
EthanEX
@EthanEX
@ankurankan hi, when using DBN_inference .query method I got the '[nan nan nan nan nan]', I don't know why the probability could be nan and what it means? Thanks!
Ankur Ankan
@ankurankan
@EthanEX That shouldn't happen ideally and something is going wrong. Would it be possible to share your code so that I can reproduce it?
Sim Da Yang
@DYSIM

Hi, is there a way to initialize a bayesian network with independent nodes and state information? psuedo code e.g.
model.add_node("weather", {"good","bad"})
model.add_node("wind strength", {"strong", "weak})

after initializing the nodes should all be independent and the states probability all equal weighted.

How can I do this in pgmpy? Thanks!

Ankur Ankan
@ankurankan
@DYSIM Do you mean something like this:
In [1]: from pgmpy.models import BayesianNetwork

In [2]: model = BayesianNetwork()

In [3]: model.add_nodes_from(['weather', 'wind'])

In [4]: cpd_weather = TabularCPD('weather', 2, [[0.5], [0.5]])

In [5]: cpd_wind = TabularCPD('wind', 2, [[0.5], [0.5]])

In [6]: model.add_cpds(cpd_weather, cpd_wind)

In [7]: model.nodes()
Out[7]: NodeView(('weather', 'wind'))

In [8]: model.cpds
Out[8]: 
[<TabularCPD representing P(weather:2) at 0x7fa4e22976d0>,
 <TabularCPD representing P(wind:2) at 0x7fa4e2297940>]
Sim Da Yang
@DYSIM
@ankurankan Hi, Yes this works for 2 features. I was wondering is there a way to scale it up easily. Say.
After adding N variables, to set the equal weight cpd with a model.set_equal_weight()
Hopefully that makes sense
Ankur Ankan
@ankurankan
@DYSIM There's no direct functionality to do this but you can create the cpds in a loop. Since, the shape of the values array for any node would be (node's cardinality, product of parent node's cardinality), you can dynamically create the values array for each node.
Duc Minh La
@bobkatla

Hi, I am trying to learn BN recently with pgmpy and bnlearn. I am just wondering is there a way to do these things:

  1. Tabu search: I found that pgmpy only implement hill climbing. But I saw that you guys actually had a tabu list in the code. So was your implementation a tabu search or I need to do some more work to make it tabu search.
  2. Setting up a root node: I wonder with the tabu search (or even hill climbing) would there be a way to make a var the root node (so no other vars will direct to it in the DAG).

Thanks!

Duc Minh La
@bobkatla
You can ignore the second one, I found a way by setting up the black list. I just need a confirmation for the first one! Cheers!
EthanEX
@EthanEX
@ankurankan Is there a limit to the complexity of dynamic Bayesian network inference using PGMPY?
When I use it, I often report some kinds of error when the node reaches five. I don't understand what is wrong. Thanks
Uthsav Chitra
@uthsavc

Hi @ankurankan , is there a way to speed up Gibbs sampling with a MarkovNetwork? e.g. I want to sample from an Ising model using the following code but it hangs at the gibbs_chain=GibbsSampling(G) line

n=20
thetas=np.random.rand(n)

G = MarkovNetwork()
G.add_nodes_from([str(i) for i in range(n)])

# add theta factors
theta_factors=[DiscreteFactor( [str(i)], [2], [np.exp(-1*thetas[i]), np.exp(thetas[i])] ) for i in range(n)]
G.add_factors(*theta_factors)

gibbs_chain = GibbsSampling(G)
samples=gibbs_chain.sample(size=1000).to_numpy()

Thank you so much

Ankur Ankan
@ankurankan
@bobkatla Sorry for the super late reply. Yes, the tabu_length forces the algorithm to not get into previously visited states by not allowing the reverse of any of the last tabu_length operations.
@EthanEX I have not empirically analzed the limits of the algorithm. Would it be possible to share your code? I can have a look at it then.
Ankur Ankan
@ankurankan
@uthsavc The problem seems to be in creating the kernel for Makov Model as it iterates over all possible combination of states of the variables. I will have to look up a better way of doing that/speeding it up. Could you maybe create an issue for it at: https://github.com/pgmpy/pgmpy/issues?
anna_jingwen
@anna_jingwen:matrix.org
[m]
@ankurankan: Hi, I wonder that is it possible for Pgmpy to get the P(A,B|C) CPD table? For example, I want to get the CPD table of observing P(A=a, B=b) conditioning on all cases of C.
Thank you for help!
Also, this is for Bayesian network.
anna_jingwen
@anna_jingwen:matrix.org
[m]
Sorry to clarify again, I want to query P(A=a,B=b|C) based on trained BN model. I know how to query for P(A=a,B=b) and P(A=a|C=c). However, I want to know P(A=a,B=b|C) and I have tried to play around with phi but I still didn't get any idea of how to do this. Thank you for help.
Ankur Ankan
@ankurankan

@anna_jingwen:matrix.org There's no direct function to compute that but you can essentially compute it using: P(A, B|C) = P(A, B, C) / P(C). In code it would look something like this:

In [30]: from pgmpy.utils import get_example_model

In [31]: from pgmpy.inference import VariableElimination

In [32]: model = get_example_model('alarm')

In [33]: infer = VariableElimination(model)

In [34]: joint_p = infer.query(variables=['VENTLUNG', 'VENTALV', 'ARTCO2'])
Finding Elimination Order: : 100%|███████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 6636.56it/s]
Eliminating: INTUBATION: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 1751.76it/s]

In [35]: p_artco2 = joint_p.marginalize(['VENTLUNG', 'VENTALV'], inplace=False)

In [36]: conditional_p = joint_p / p_artco2

In [37]: conditional_p.get_value(VENTLUNG='LOW', VENTALV='NORMAL', ARTCO2='LOW')
Out[37]: 0.0004884522839265131

# Verify that the value matches when doing a simple inference with evidence
In [39]: infer.query(variables=['VENTLUNG', 'VENTALV'], evidence={'ARTCO2': 'LOW'}).get_value(VENTLUNG='LOW', VENTALV='NORMAL')
Finding Elimination Order: : 100%|██████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 10819.36it/s]
Eliminating: INTUBATION: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 2728.89it/s]
Out[39]: 0.0004884522839265131

Here conditional_p would be a DiscreteFactor object instead of TabularCPD as TabularCPD can't represent joint distributions.

1 reply
824761521
@824761521
@nanlliu Have you solved the slow problem now?
824761521
@824761521
@ankurankan How to put query () function in dynamic Bayesian network for prediction of multiple lines of data?
Ankur Ankan
@ankurankan
824761521
@824761521
~MJGSO6E_)PGJ057B%5Y@ZU.png
@ankurankan Similar to this, is there any optimization method? It's too slow now.
Ankur Ankan
@ankurankan
@824761521 Not with the query method. A faster approach could be to simulate data from the network (using the simulate method) and use that to compute probabilities. This can speed up a lot if you need to do multiple queries but at the cost of giving approximate results.
824761521
@824761521
@ankurankan Can you give an example according to my code? Thank you very much
Ankur Ankan
@ankurankan
This message was deleted
Ankur Ankan
@ankurankan

@824761521 Yes, it should be something like this:

# Define a random network
In [30]: from pgmpy.models import DynamicBayesianNetwork as DBN
    ...: from pgmpy.factors.discrete import TabularCPD
    ...: dbn = DBN([(("D", 0), ("G", 0)), (("I", 0), ("G", 0)),
    ...:            (("D", 0), ("D", 1)), (("I", 0), ("I", 1)),])
    ...: diff_cpd = TabularCPD(("D", 0), 2, [[0.6], [0.4]])
    ...: grade_cpd = TabularCPD(variable=("G", 0), variable_card=3,
    ...:                        values=[[0.3, 0.05, 0.9, 0.5],
    ...:                                [0.4, 0.25, 0.08, 0.3],
    ...:                                [0.3, 0.7, 0.02, 0.2]],
    ...:                        evidence=[("I", 0), ("D", 0)],
    ...:                        evidence_card=[2, 2])
    ...: d_i_cpd = TabularCPD(variable=("D", 1), variable_card=2,
    ...:                      values=[[0.6, 0.3], [0.4, 0.7]],
    ...:                      evidence=[("D", 0)],
    ...:                      evidence_card=[2])
    ...: intel_cpd = TabularCPD(("I", 0), 2, [[0.7], [0.3]])
    ...: i_i_cpd = TabularCPD(variable=("I", 1), variable_card=2,
    ...:                      values=[[0.5, 0.4], [0.5, 0.6]],
    ...:                      evidence=[("I", 0)],
    ...:                      evidence_card=[2])
    ...: g_i_cpd = TabularCPD(variable=("G", 1), variable_card=3,
    ...:                      values=[[0.3, 0.05, 0.9, 0.5],
    ...:                              [0.4, 0.25, 0.08, 0.3],
    ...:                              [0.3, 0.7, 0.02, 0.2]],
    ...:                      evidence=[("I", 1), ("D", 1)],
    ...:                      evidence_card=[2, 2])
    ...: dbn.add_cpds(diff_cpd, grade_cpd, d_i_cpd, intel_cpd, i_i_cpd, g_i_cpd)
# Sample from the network
In [31]: samples = dbn.simulate(n_time_slices=10, n_samples=int(1e4))
  0%|                                                                               | 0/30 [00:00<?, ?it/s]

# Select samples relevant to our query. Let's say we want to query: P((I, 2) | (I, 0) = 1, (D, 1) = 0)
 In [33]: rel_samples = samples.loc[(samples.loc[:, [('I', 0)]] == 1).values & (samples.loc[:, [('D', 1)]] == 0).values, [('I', 2)]]
# Then probability values can be calculated as:
In [40]: rel_samples.value_counts() / rel_samples.shape[0]
Out[40]: 
(I, 2)
1         0.568954
0         0.431046
dtype: float64

Depending on how many queries you want to do you can also specify the evidence directly in the simulate method, and it will directly generate rel_samples with the downside that you will have to regenerate the samples if you want to specify a different evidence. Also, since this method will give approximate results, you can increase the accuracy by increasing the number of samples generated.

NickAzeem
@NickAzeem
@ankurankan hello, i'm using the hill climb algorithm in pgmpy to draw causal graphs, however, after adding fixed edges it tells me " Unable to allocate 698. GiB for an array with shape (34542774, 2714) and data type float64". My data consists 33 variables and around 11000 data points. Is there any way to solve that?
824761521
@824761521
@NickAzeem I had the same problem,Unable to allocate 11.9 GiB for an array with shape (1595932672,) and data type float64
9GTHDMD[NFLGZ%I]GST43CM.png
McGee
@McGee-Wang
Hello,i'm a fresher,who got in trouble.Please tell me how to deal with this .
jupyter notebook:

ModuleNotFoundError Traceback (most recent call last)

<ipython-input-1-1b4b423c17da> in <module>
2 from pgmpy.models.BayesianModel import BayesianModel
3 #from pgmpy.inference.causal_inference import CausalInference
----> 4 import CausalInference

ModuleNotFoundError: No module named 'CausalInference'

(pytorch) C:\Users\11836>pip install CausalInference
Requirement already satisfied: CausalInference in d:\anaconda3\envs\pytorch\lib\site-packages (0.1.3)
Ankur Ankan
@ankurankan
@NickAzeem @824761521 Would it be possible to share your datasets? I can try to reproduce and see if there's any way to reduce memory usage.
@McGee-Wang For importing the CausalInference class, you should use: from pgmpy.inference import CausalInference