Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 11:03
    yashml closed #1247
  • Apr 01 20:26
    yashml edited #1247
  • Apr 01 19:03
    codecov[bot] commented #1247
  • Apr 01 18:58
    codecov[bot] commented #1247
  • Apr 01 18:45
    yashml opened #1247
  • Mar 30 14:58
    ankurankan milestoned #1245
  • Mar 30 14:57
    ankurankan milestoned #1246
  • Mar 30 14:57
    ankurankan labeled #1246
  • Mar 30 14:57
    ankurankan labeled #1246
  • Mar 27 12:51
    ankurankan labeled #1245
  • Mar 25 18:29
    mjuarezm commented #1246
  • Mar 25 18:18
    mjuarezm edited #1246
  • Mar 25 18:17
    mjuarezm commented #1246
  • Mar 25 18:16
    mjuarezm opened #1246
  • Mar 25 17:58
    mjuarezm edited #1244
  • Mar 25 17:58
    mjuarezm closed #1244
  • Mar 25 17:58
    mjuarezm commented #1244
  • Mar 25 14:15
    Aleib89 opened #1245
  • Mar 25 11:10
    ankurankan commented #1244
  • Mar 25 04:51
    mjuarezm edited #1244
axi672
@axi672
@ankurankan Thank you, that works!
axi672
@axi672

@axi672 Thanks for reporting this. I have updated the notebook now to work with the latest version of the codebase. If you still get any errors, let me know along with your code, so that I can reproduce it.

Hi @ankurankan I just tested reading back the file written with ProbModelXMLWriter and get the following error:

/usr/local/lib/python3.6/dist-packages/pgmpy/readwrite/ProbModelXML.py in add_edge(self, edge)
    981         >>> reader.add_edge(edge)
    982         """
--> 983         var1 = edge.findall("Variable")[0].attrib["name"]
    984         var2 = edge.findall("Variable")[1].attrib["name"]
    985         self.probnet["edges"][(var1, var2)] = {}

IndexError: list index out of range

Code I’m running:

model_data = get_probmodel_data(model)

writer = ProbModelXMLWriter(model_data=model_data)

writer.write_file('test.pgmx')

from pgmpy.readwrite import ProbModelXMLReader

reader_string = ProbModelXMLReader('test.pgmx')

The model is the one from the notebook https://github.com/pgmpy/pgmpy_notebook/blob/master/notebooks/8.%20Reading%20and%20Writing%20from%20pgmpy%20file%20formats.ipynb

XMLBeliefNetwork also gives the following error

/usr/local/lib/python3.6/dist-packages/pgmpy/readwrite/XMLBeliefNetwork.py in get_static_properties(self)
     97         return {
     98             tags.tag: tags.get("VALUE")
---> 99             for tags in self.bnmodel.find("STATICPROPERTIES")
    100         }
    101 

TypeError: 'NoneType' object is not iterable
zyh530
@zyh530
How to get a value from the result of query
Ankur Ankan
@ankurankan
@zyh530 If you are looking to get values from a DiscreteFactor object, you can just print it to see the values or can look at the values attribute to get it as a numpy array.
The numpy array will be an n-dim (n is the number of variables) array and each axis represents the values for the corresponding variable.
zyh530
@zyh530
Can you look at your mailbox? I sent you an email with screenshots, hope you can give me an example
@ankurankan
Ankur Ankan
@ankurankan
@zyh530 I didn't get your email.
Costa Huang
@vwxyzjn
Hello, I was wondering if anyone would help me with speeding up the BN inference
inference = BayesianModelSampling(model)
inference.likelihood_weighted_sample(evidence=evidence, size=2)
image.png
The profiler suggests that the function pre_compute_reduce is taking a significant amount of time
Is there any way i could do the inference faster, given some evidence.
Costa Huang
@vwxyzjn
image.png
Further analysis shows that it was the copy that was slowing everything down
benthestudent
@benthestudent_twitter
Hi, I'm a student . I 've got a problem about query XMLBIF file
here is my code
q = infer.query(variables=["HAS_DISEASE"], evidence = )
benthestudent
@benthestudent_twitter
my question is what is evidence and how to know
Ankur Ankan
@ankurankan
@vwxyzjn Interesting. I wouldn't have guessed copy to be that slow. I will check why it's so slow and if there's a way to optimize it. But until then I don't think there's a way to speed it up. Also, could you please create an issue for this problem on github?
Ankur Ankan
@ankurankan
@benthestudent_twitter Evidence basically tells the inference method about the values of variables that are known. For example, let's say you are trying to compute the probability of a given ball to be basketball or football based on their weight and diameter. In this case, if you already know the weight and/or diameter of the given ball, you can pass it as the evidence and the inference method would increase or decrease the probability of it being basket ball or foot ball based on the evidence. In general machine learning terminology say we are trying to predict y using X as the features. While doing prediction for any new datapoint, we want to know the value of y given X, the values of X becomes the evidence and y becomes the variables in the query.
KittyNatty
@KittyNatty
@ankurankan Sorry, I have an issue with the prediction by PGMPY. I create the model through BayesianModel and would like to do the prediction. I follow the example source code of the predict method in the library (https://pgmpy.org/models.html#module-pgmpy.models.BayesianModel). But I am curious that why they use 'value variable' that holds the whole dataset to fit the model? Why don't they use the train data? Since I try to follow them by fitting the model with the whole dataset, the RMSE and R square that I use to check accuracy are 1.71 and -6 respectively which is very weird. However, it is an error as a key error when I change to fit the model with train data.
Ankur Ankan
@ankurankan
@KittyNatty Yes, that is a mistake in the documentation. Ideally, we would want to show training on the train dataset and do prediction on the test. Because it's a randomly generated dataset and a random model, I wouldn't expect the fit stats would be good.
KittyNatty
@KittyNatty
@ankurankan I see, Do you have any suggestions to evaluate the prediction result then? instead of using RMSE and R square.
Ankur Ankan
@ankurankan
@KittyNatty I didn't mean that RMSE is a bad metric to evaluate. I meant that In this specific example, the results will be bad because of the random data and random model.
KittyNatty
@KittyNatty
@ankurankan Ok, Thank you very much :)))
KittyNatty
@KittyNatty
@ankurankan Excuse me once again, I have tried so many ways on doing the prediction with the predict method of PGMPY. The Bayesian model can be executed from the data, but it cannot proceed with the prediction process which as shown in this picture.
image.png
Since the predict attribute needs to be droped for doing the prediction, but why there is an error?
KittyNatty
@KittyNatty
@ankurankan I found one solution to update my dataset. I changed it, but then it's a Key error when calling predict command on the trained model
bushyttail
@bushyttail
image.png
image.png
@KittyNatty @ankurankan I also encountered this KeyError exception when doing prediction with bayesian model. Above is the screen shot:
MatheusCL8
@MatheusCL8
I'm new to the library, and have already used the Bayesian model in a data set, but how can I use the dynamic Bayesian network model in a dataset just as I use a dataset in a normal Bayesian network?
Ankur Ankan
@ankurankan
@KittyNatty @bushyttail I am not able to think why this would be throwing an error. Would it be possible to share your code and part of your dataset, so that I can reproduce the error?
@MatheusCL8 Could you please elaborate on what exactly are you trying to do with Dynamic BNs because pgmpy has limited functionality for it. So, things like learning them from data is not very straightforward.
bushyttail
@bushyttail
@ankurankan thanks for reply. I uploaded code and dataset to https://c-t.work/s/7b9c3f200fc84a .(password: 2020325)
axi672
@axi672
Hi @ankurankan I'm wondering whether pgmpy supports sensitivity analysis of any kind for a Bayesian network? For example, measuring what effects variables have on each other
Ankur Ankan
@ankurankan
@bushyttail Thanks. I will have a look.
Ankur Ankan
@ankurankan
@axi672 Yes, have a look at https://github.com/pgmpy/pgmpy/blob/dev/pgmpy/inference/CausalInference.py. You can compute adjustment sets, biasing paths etc and then based on that can apply any statistical model to compute the effects. For linear estimation, you can simply use CausalInference.estimate_atebut for other estimates you will need to use other packages like statsmodels.
Rahul Valiya Veettil
@vvrahul11

Hi @ankurankan, I'm new to PGMs and found the package to be interesting. I'm trying to build a Bayesian network for continuous random variables. I would like to build a network and infer the dependencies between these variables, estimate the population covariance parameters, mean and standard deviation. By going through tutorials I realized discretizing the variables is the way to go. I was using pymc3 so far to do inference but they do not have support for graphical models.

  1. Is using the directed Bayesian network the way to go for this kind of problem? (Looking for a simple fit model, not linear models)
  2. After discretizing the continuous random variable, how can I build CPTs?

It would be great if you could give any lead. With pymc3 I buil model using LKJ priors https://docs.pymc.io/notebooks/LKJ.html

Ankur Ankan
@ankurankan
@vvrahul11 I am not sure why you want to work with discretized variables and not linear models becuase linear models will have better interpretation of variable relations, more robust model testing and extra things like confidence intervals etc. Am I missing something?
If you want to work with discrete variables, you can build the model in pgmpy and then call the fit method on it to learn the CPTs from the data. Have a look at BayesianModel.fit method.
Rahul Valiya Veettil
@vvrahul11

@ankurankan Thank you very much for the reply. I will look at BayesianModel.fit method. I'm pretty new to this field so let me explain with an example.
Let's say I am monitoring an industrial process where my target variable of interest is the "fineness of coffee powder". The variables that can affect the fineness of a coffee powder are the vibration of the engine, voltage, amount of coffee beans, room temperature, humidity etc. I also assume that voltage and vibration are covarying. In such a situation, if I want to build a graphical model, what is the right approach?

  1. As you suggested I can build a linear model using this data. But can I still infer the covariance between voltage and vibration or voltage and coffee powder fineness?
  2. Can I ask conditional probability questions such as the P(powder_fineness| voltage, humidity) etc?

I would really appreciate any help. Is there any toy problem similar to this?

bushyttail
@bushyttail
Is there a way to know/caculate in a bayesian network the respective impact of other nodes on one node? I am doing prediction with pgmpy and would like to find out nodes that have sth to do with the prediction(label node) and their importance order/ranking based on the contribution to the prediction.
@ankurankan
Ankur Ankan
@ankurankan
@vvrahul11 You should be able to answer both the questions with both discrete and linear models. For the covariance question, you would want to look into causal inference as you are trying to deduce direct and indirect covariance between two variables from the sample covariance. You would basically need to find the adjustment set for the covariance parameter, such that conditioning on those variables would result in a single path between the variables. For the conditional probabilities, I think you will have to make a distribution assumption say Linear Gaussian and then you can compute a conditional Gaussian distribution.
@bushyttail I think for such questions, you can look into causal inference. With that you will be able to quatify the strength of direct relationship between variables and will give you a sense of how much they affect each other.
Rahul Valiya Veettil
@vvrahul11
Thank you @ankurankan :)
MatheusCL8
@MatheusCL8
I am trying to implement a dynamic network in a real data set with various types of variables and several years. I am trying to make a 10 year forecast over this data set. I saw that the library is very limited to do this in a very large group. I'm trying to adapt to what I want, but I have doubts about time_slice, and also about the application of DBNInference and its output. About time_slice, does it not accept more than one value in timi_slice at once? Do I have to put several variables with different time_slice? And about DBNInference, you could clarify the parameters he asks for, I am new and in the library and I had difficulty understanding how he asks for the parameters and how the output is, I would like to understand this part about the parameters. Thank you very much. Although the library does not have very good documentation, it is a great library.
it is a great library.