Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
Pierre-Henri Wuillemin
@phwuill_gitlab
(the point where the optimal decision is changed is quite visible)
Michael Kontoulis
@Mikailo
Hello, first of all Happy new Year!
Secondly, thank you very much for providing this example, this is a much better and cleaner way of examining this problem and displaying the results
Michael Kontoulis
@Mikailo
But i still don't understand why this would happen. Could it be because the change caused is too small to be captured?
I used the same piece of code, you used in your example to apply soft evidence to the te1 and mat1 nodes
image.png
Michael Kontoulis
@Mikailo
I also repeated this with a slightly different diagram and while the imp nodes worked similarly to your example
image.png
As far as I understand,there should be some change but there isnt one, besides in the first example the optimal decision changing, which I dont understand why it changes, if the nodes below the imp level aren't impacted
Pierre-Henri Wuillemin
@phwuill_gitlab

hi @Mikailo, all my best whishes !

Can you show me your code so I can use the exact same one?

Michael Kontoulis
@Mikailo
Here you go!
2 replies
Michael Kontoulis
@Mikailo
Hello again,an update on some testing I did. I used the same graph in a different python library called pycid, https://github.com/causalincentives/pycid and in that instance, when changing the probabilities in the te1 node, the inference in the imp nodes changed as expected. I do not know, how or why there is this difference in outcome between the two modules.
Pierre-Henri Wuillemin
@phwuill_gitlab

Hi @Mikailo , sorry but I did not find time to check your last example. You may have found a bug but I am quite surprised since, once you set the value of the decision node, this ID mainly is a bayesian network. I will try to check as soon as possible.

(and moreover there are plenty of inference in IDs that are validated during the tests : maybe your ID will be a new test case :-) )

Pierre-Henri Wuillemin
@phwuill_gitlab

Hi again @Mikailo ,
with this code (in a single-cell-notebook) :

%load_ext autoreload
%autoreload 2

import os

from pylab import *
import matplotlib.backends._backend_agg
from IPython.display import display,HTML

import math

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import matplotlib.pyplot as plt
import math

%matplotlib inline

def variation(model,ev,re=None):
    nbr=model.variable(ev).domainSize()
    l = []
    for i in range(101):
        ie=gum.ShaferShenoyLIMIDInference(model)        
        ie.addEvidence(ev, [1-i/100] + [i/100]*(nbr-1))
        if re is not None:
            ie.addEvidence("re", re)
        ie.makeInference()
        l.append(ie.MEU())

    med = [x['mean'] for x in l]
    mi = [x['mean'] - math.sqrt(x['variance']) for x in l]
    ma = [x['mean'] + math.sqrt(x['variance']) for x in l]

    fig = figure(figsize =(3, 2)) 
    ax = fig.add_subplot(1, 1, 1)
    ax.fill_between(range(101),mi,ma, alpha=0.2)
    ax.plot(med, 'g')
    if re is None:
        ax.set_title(f"Evidence on {ev}")
    else:
        ax.set_title(f"Evidence on {ev} with {re=}")
    return fig

diag=gum.loadID("res/Normal - 1.bifxml")
vars=['te1','mat1','con4','imp1','obj1']

gnb.flow.clear()
gnb.flow.add_html(gnb.getInfluenceDiagram(diag,size="5!"))
gnb.flow.new_line()
for v in vars:
    gnb.flow.add_plot(variation(diag,v))
gnb.flow.new_line()
for v in vars:
    gnb.flow.add_plot(variation(diag,v,re=0))
gnb.flow.new_line()
for v in vars:
    gnb.flow.add_plot(variation(diag,v,re=1))
gnb.flow.display()

I produce this figure :

image.png
which I find quite convincing (despite the fact that the other tool is not agree): the more the evidence is close to the utility node, the more it induces strong variation in the MEU.
now, I need a bit more time to try to understand the behavior w.r.t the results of the other tool you found.
Benjamin Datko
@bdatko_gitlab
Hi all, two questions.
  1. Is there a way to extract the particles/samples when using sampling inference?
  2. In the Sampling Inference in the pyAgrum Tutoria I am surprised how poor the Gibbs sampling performs against Lazy Propagation. Is this expected due to the large number of variables and dimensions in the Diabetes network?
Pierre-Henri Wuillemin
@phwuill_gitlab

Hi Benjamin,
1) You mean "keep all the particles in a csv-like file "? No, there is no way to do so. You can generate a csv (forward sampling without evidence) from a BN with gum.generateCSV().
It could be a nice feature to add, though (but not an easy one : do we keep only the particles, or also the weight ? Gibbs (especially) would create very large files, etc.)

2) Gibbs is just slow for large BNs and in this notebook, we do not let it converge (always quickly stopped by timeout). But you're right, this notebook is not very gentle with Gibbs. The only Gibbs's advantage we can see here is that its behavior does not depend on the position of the evidence (unlike the others sampling algorithms).

In Approximate inference, Gibbs is better valued :-).

Benjamin Datko
@bdatko_gitlab
@phwuill_gitlab
  1. Gotcha, understood. I was comparing features with pomegranate, but I was able to reproduce everything in pyAgrum
  2. Ah, okay, these tutorials are wealth of information. Thank you for your explanation.
Pierre-Henri Wuillemin
@phwuill_gitlab
I guess that at some point it will be relevant to compare/benchmark the different pgm libraries (pomegranate, pypgm, etc.). This needs a lot of resources, unfortunately.
Pierre-Henri Wuillemin
@phwuill_gitlab
image.png
Hi @Mikailo, I think that I found the problem : in certain cases, the messages after the resolution for the decisions were not distributed (and so, some chance nodes were not refreshed after the decisions)
So the resolution was OK but the graphs were not.
MLasserre
@MLasserre
Hi fellow agrumers ! 🍋
On the 18th of march, the APUD'22 (aGrUM/pyAgrum user's day) will be held both physically and remotely at SCAI (Sorbonne Center of Artificial Intelligence).
It will be the occasion for users to gather and share ideas about the library.
You can register and find more information here : https://agrum.gitlab.io/pages/apud22.html
We hope that you will join us ! 🥳
MLasserre
@MLasserre
Also, you can now follow us on linkedin (https://linkedin.com/company/pyagrum) if this is not already the case ! 😃
Pierre-Henri Wuillemin
@phwuill_gitlab
@Mikailo, just to let you know that the bug you found should be fixed in the last tag (0.22.6)
Benjamin Datko
@bdatko_gitlab
Quick question about the user day, in the email the time zone was UTC+1 but on the pyAgrum page (https://agrum.gitlab.io/pages/apud22.html) the time zones are GMT+1. These seem very different, which one is correct?
MLasserre
@MLasserre
Hi Benjamin,
It is actually the same :)
From my understanding we should use UTC though :)
Sorry for the confusion
Benjamin Datko
@bdatko_gitlab
😅 yup I did not know that before haha. Thank you for the help! I was just hoping I could get an extra hour of sleep before the first presentation.
1 reply
Benjamin Datko
@bdatko_gitlab
One last question, will the zoom conference be recorded?
MLasserre
@MLasserre
We plan to record but we don't know yet if we have the agreement of participants to share it after.
nojhan
@nojhan:matrix.org
[m]
Can someone share the zoom link for people having registered (too) late?
MLasserre
@MLasserre
Sorry, I just saw you message
ahmed_mabrouk
@ahmed_mabrouk:matrix.org
[m]
Bonjour, Is it possible to learn the BN structure from a data containing some missing values?
when I run learner = gum.BNLearner(data_filename,template, ['?', 'N/A','NA', "NaN"])
I got the following error
[pyAgrum] The database contains some missing values: For the moment, the BNLearner is unable to cope with missing values in databases
Pierre-Henri Wuillemin
@phwuill_gitlab
Hi @ahmed_mabrouk:matrix.org , yes , structural EM algorithm is quite tricky and for the moment, we are stuck between a version of the algorithm that is classically implemented but not mathematically correct and a version that is mathematically sound but so slow that it becomes unusable... This is the next big algorithmic project for aGrUM : to implement a version of structural EM that will satisfy us.
Benjamin Datko
@bdatko_gitlab

What is the upper limit for the number of variables and arcs, assuming binary variables, for pyAgrum? I have seen some talks showing off very complex networks with large number of variables 2,127 variables:
https://r13-cbts-sgl.engr.tamu.edu/wp-content/uploads/2021/08/CBTS-SGL-Webinar-Cool-Things-That-One-Can-Do-With-Graphical-Probabilistic-Models-Dr.-Marek-Drudzel.pdf
https://youtu.be/9_l9dpvezOc?t=417

I know of gobnilp which has a paper titled Learning Bayesian Networks with Thousands of Variables (https://people.idsia.ch/~zaffalon/papers/2015nips-jena.pdf)
I think there is also BayeSuite which advertises massive network learning. The paper also has a nice table of alternative Bayesian network software. Maybe their list is a nice place to start to make some comparisons ;) (https://www.sciencedirect.com/science/article/pii/S0925231220318609)

Pierre-Henri Wuillemin
@phwuill_gitlab

Hello @bdatko_gitlab , the answer is not unique depending on what you are looking for.
1- the short answer : the limit is your memory

2- less short answer, but still short : the number of nodes/arcs are not necessarily the most relevant parameters for understanding the size of the BN. The max number of parents for a node is certainly more relevant because the memory complexity of a Bayes net is mainly the size of the CPTs, dominated by the size of the biggest one : 2^(nbrMaxParent+1)(for binary variables) ... For instance, a BN as a chain (which models a Markov Chain actually) can be veryyyy long :-)

3- The model can fit in the memory, but inference may not. The good parameter here is the treewidth of the graph (more or less the size of the biggest clique in the junction tree). In that case, you still have access to approximated inference (sampling, loopy belief propagation)

If you have enough memory to fit your model and your inference, the issue is now the time. As you may know, aGrUM/pyAgrum try hard to parallelize its algorithms in order to speed up long process ... Still time can be prohibitive in very large models.

4- Particurlarly true for learning : the process for learning a large model has an increasing quadratic (at least) complexity w.r.t the number of nodes. There is specific algorithms dedicated to learn such large models (based on fast approximation of Markov Blanket for each node for instance), but there are not implemented in aGrUM (but could be implemented quite easily). The size of the database has of course an impact for the time (and the memory needed).

Concretely, for pyAgrum, we test our algorithms with graph sizes up to 900 nodes/1250 arcs (some of them do not allow exact inference). We have seen bioinformatics teams which learn BNs of much more than 1000 nodes.

Thanks for the links, there is really a big job to do indeed in term of comparisons & benchmarks ... No resource for that for now, unfortunately :-(

Benjamin Datko
@bdatko_gitlab
@phwuill_gitlab Thank you for this detailed information! Your answer really helps my figure out if my problem could be better modeled using pyAgrum
Benjamin Datko
@bdatko_gitlab

Thanks for the links, there is really a big job to do indeed in term of comparisons & benchmarks ... No resource for that for now, unfortunately :-(

I understand. I think pyAgrum deserves a lot more attention compared to other libraries, I am not sure why pyAgrum doesn't have more popularity. =)

Has the team ever considered mirroring the repo to GitHub? I think it would make pyAgrum more discoverable for SEO

4 replies
Benjamin Datko
@bdatko_gitlab
Changing topics:
Do you have recommendations for building a complex factor graph model with deterministic nodes? Could you use fillWithFunction (https://pyagrum.readthedocs.io/en/1.0.0/potential.html?highlight=fillWithFunction#pyAgrum.Potential.fillWithFunction) within inference, or is the fillWithFunciton mainly a convenience function to fill tables within nodes?
4 replies
Benjamin Datko
@bdatko_gitlab
@phwuill_gitlab in cell 11 of Tutorial2 (https://webia.lip6.fr/~phw//aGrUM/docs/last/notebooks/Tutorial2.ipynb.html) I don't understand the call to gum.getPosterior with the keyword argument evs = {'MINVOLSET':[0,x/100.0,0.5]} because the array for each iteration never sums to one. I would expect this call to raise some kind of exception. My understanding is that each index in the array [0,x/100.0,0.5] is the probability for each state of MINVOLSET. Is this soft evidence without any normalization?
2 replies