Skip to content

A recent SEED Group paper, "Bayesian Belief Networks for Predicting Drinking Water Distribution System Pipe Breaks" was presented at PSAM11/ESREL12 in Helsinki, Finland.  This peer-reviewed conference paper was co-authored by Dr. Francis with JHU collaborators Dr. Seth Guikema and Lucas Henneman.  The abstract of this paper follows:

In this project, we use Bayesian Belief Networks (BBNs) to construct a knowledge model for pipe breaks in a water zone.  BBN modeling is a critical step towards real-time distribution system management.  Development of expert systems for analyzing real-time data is not only important for pipe break prediction, but is also a first step in preventing water loss and water quality deterioration through the application of machine learning techniques to facilitate real-time distribution system monitoring and management.  Our model will be based on pipe breaks and covariate data from a mid-Atlantic United States (U.S.) drinking water distribution system network. The expert model is learned using a conditional independence test method, a score-based method, and a hybrid method, then subjected to 10-fold cross validation based on log-likelihood scores.

A short report from PSAM11/ESREL12 will follow in a later post.

Bayesian networks are remarkable graphical models for organizing joint probability distributions according to the conditional independence relationships extant in a dataset.  Another way of saying this is that people usually process information according to hypotheses linking the objects they observe.  People don’t connect objects intellectually unless the hypotheses connect.  One of my PhD advisors at Carnegie Mellon, Mitchell Small, introduced me to the Bayesian Network as a way to combine information from health effects studies to support risk assessment, but as a postdoctoral fellow, I started to look at Bayesian networks as a potential data mining tool.  However you look at them, they are elegant computational models with a compelling axiomatic basis for philosophical reasoning, to boot.  For me, they helped me understand and visualize Bayes’ rule as a graduate student, and now I’m hoping to use them more as a data mining technique to model drinking water distribution system reliability.  For these applications, I am thinking that learning the networks from my datasets will be indispensable.

OK, so learning Bayesian networks hasn’t been the exclusive focus of my thoughts in preparing this research.  Most of the past month has been a more thorough reading of the first few parts of Judea Pearl’s Probabilistic Reasoning in Intelligent Systems, but I have found a really cool paper by an Italian geneticist whose integrated several of the most popular network learning algorithms into an R package (bnlearn) for learning both the structure and parameters of a Bayesian network.  I originally came across his article last March or so when working with some JHU colleagues on using Bayesian Networks to predict missing data in a public hurricane loss model database, but we didn’t learn our network from data, and made some simplifying assumptions that did not require the sophisticated set of techniques in the linked paper.  Having read Marco Scutari’s paper several more times in the past week, I’m very impressed at the resource that he’s constructed.  It also helps a lot that it is in my favorite programming environment. There are many tools that learn either structure or parameters of Bayesian networks, but doing both at the same time has generally been left alone.  While Scutari’s package doesn’t do both at the same time, a researcher can come close, especially when using the bootstrapping or cross-validation utilities included in bnlearn.

Because of Scutari and bnlearn, I am excited to move further with the modeling I’m doing.  As an environmental engineer who wants to use computer science, not necessarily create it, I’m very pleased he’s made this tool available.