Create Bayesian Network and Learn Parameters with Python3.x

Building Bayesian Networks with Python

Bayesian networks are probabilistic graphical models that represent the relationships between variables. They are powerful tools for reasoning under uncertainty, widely used in various domains like medical diagnosis, fault detection, and decision making.

1. Installation

We’ll be using the “pgmpy” library for working with Bayesian networks in Python. If you haven’t already, install it using pip:

pip install pgmpy

2. Creating a Bayesian Network

Let’s create a simple Bayesian network for a hypothetical scenario:

We have three variables:

  • Cloudy: Whether it’s cloudy or not (True/False)
  • Sprinkler: Whether the sprinkler is on (True/False)
  • WetGrass: Whether the grass is wet (True/False)

Our intuition suggests the following relationships:

  • Cloudy affects whether the sprinkler is on.
  • Both cloudy and the sprinkler affect whether the grass is wet.

2.1. Defining the Structure

from pgmpy.models import BayesianModel

# Define the network structure
model = BayesianModel([('Cloudy', 'Sprinkler'), ('Cloudy', 'WetGrass'), ('Sprinkler', 'WetGrass')])

2.2. Specifying the Conditional Probability Tables (CPTs)

from pgmpy.factors.discrete import TabularCPD

# Define CPT for Cloudy
cpd_cloudy = TabularCPD(variable='Cloudy', variable_card=2, values=[[0.5], [0.5]])

# Define CPT for Sprinkler
cpd_sprinkler = TabularCPD(variable='Sprinkler', variable_card=2, 
                           values=[[0.5, 0.9], [0.5, 0.1]],
                           evidence=['Cloudy'], evidence_card=[2])

# Define CPT for WetGrass
cpd_wetgrass = TabularCPD(variable='WetGrass', variable_card=2,
                           values=[[0.9, 0.2, 0.9, 0.01], [0.1, 0.8, 0.1, 0.99]],
                           evidence=['Cloudy', 'Sprinkler'], evidence_card=[2, 2])

# Add the CPTs to the model
model.add_cpds(cpd_cloudy, cpd_sprinkler, cpd_wetgrass)

# Check if the model is valid
model.check_model()
# Output: True

3. Parameter Learning

In real-world scenarios, we often need to learn the parameters (CPT values) of a Bayesian network from data.

3.1. Generating Sample Data

from pgmpy.inference import VariableElimination
import numpy as np

# Simulate data using the defined model
data = model.simulate(n_samples=1000)
data.head()
# Output:
#      Cloudy  Sprinkler  WetGrass
# 0     False      True      True
# 1     False      True      True
# 2      True     False     False
# 3      True      True      True
# 4      True     False     False
...

3.2. Learning Parameters from Data

from pgmpy.estimators import MaximumLikelihoodEstimator

# Create a Maximum Likelihood Estimator
estimator = MaximumLikelihoodEstimator(model, data)

# Learn the parameters from the data
learned_cpds = estimator.get_parameters()

for cpd in learned_cpds:
    print(cpd)

# Output:
# +-----+---------+-----+----------+
# | Cloudy  | Sprinkler | WetGrass |   P(WetGrass | Cloudy, Sprinkler) |
# +-----+---------+-----+----------+
# | True  |  True  |  True  |   0.9122807017543859 |
# | True  |  True  | False |   0.08771929824561403 |
# | True  | False |  True  |   0.13333333333333333 |
# | True  | False | False |   0.8666666666666667 |
# | False |  True  |  True  |   0.8085106382978723 |
# | False |  True  | False |   0.19148936170212764 |
# | False | False |  True  |   0.029411764705882353 |
# | False | False | False |   0.9705882352941176 |
# +-----+---------+-----+----------+
#
# +-----+---------+-----+----------+
# | Cloudy  | Sprinkler |   P(Sprinkler | Cloudy) |
# +-----+---------+-----+----------+
# | True  |  True  |   0.9129889545781506 |
# | True  | False |   0.08701104542184938 |
# | False |  True  |   0.6106382978723404 |
# | False | False |   0.3893617021276596 |
# +-----+---------+-----+----------+
#
# +-----+---------+-----+----------+
# | Cloudy  |   P(Cloudy) |
# +-----+---------+-----+----------+
# | True  |   0.517 |
# | False |   0.483 |
# +-----+---------+-----+----------+

4. Inference

Once our Bayesian network is defined and parameterized, we can use it for inference, answering queries like “what is the probability of wet grass given it is cloudy?”

4.1. Variable Elimination

inference = VariableElimination(model)

# Query for P(WetGrass=True | Cloudy=True)
probability = inference.query(variables=['WetGrass'], evidence={'Cloudy': True})
print(probability)
# Output:
# +-----+----------+
# | WetGrass |   P(WetGrass | Cloudy=True) |
# +-----+----------+
# | True  |   0.9122807017543859 |
# | False |   0.08771929824561403 |
# +-----+----------+

We see that the probability of the grass being wet given it’s cloudy is approximately 0.91.

5. Conclusion

This tutorial provided a basic introduction to creating and learning Bayesian networks in Python using pgmpy. The library offers a wide range of functionalities for defining, manipulating, learning, and performing inference on these powerful probabilistic models.


Leave a Reply

Your email address will not be published. Required fields are marked *