Bayesian networks are remarkable graphical models for organizing joint probability distributions according to the conditional independence relationships extant in a dataset. Another way of saying this is that people usually process information according to hypotheses linking the objects they observe. People don’t connect objects intellectually unless the hypotheses connect. One of my PhD advisors at Carnegie Mellon, Mitchell Small, introduced me to the Bayesian Network as a way to combine information from health effects studies to support risk assessment, but as a postdoctoral fellow, I started to look at Bayesian networks as a potential data mining tool. However you look at them, they are elegant computational models with a compelling axiomatic basis for philosophical reasoning, to boot. For me, they helped me understand and visualize Bayes’ rule as a graduate student, and now I’m hoping to use them more as a data mining technique to model drinking water distribution system reliability. For these applications, I am thinking that learning the networks from my datasets will be indispensable.
OK, so learning Bayesian networks hasn’t been the exclusive focus of my thoughts in preparing this research. Most of the past month has been a more thorough reading of the first few parts of Judea Pearl’s Probabilistic Reasoning in Intelligent Systems, but I have found a really cool paper by an Italian geneticist whose integrated several of the most popular network learning algorithms into an R package (bnlearn) for learning both the structure and parameters of a Bayesian network. I originally came across his article last March or so when working with some JHU colleagues on using Bayesian Networks to predict missing data in a public hurricane loss model database, but we didn’t learn our network from data, and made some simplifying assumptions that did not require the sophisticated set of techniques in the linked paper. Having read Marco Scutari’s paper several more times in the past week, I’m very impressed at the resource that he’s constructed. It also helps a lot that it is in my favorite programming environment. There are many tools that learn either structure or parameters of Bayesian networks, but doing both at the same time has generally been left alone. While Scutari’s package doesn’t do both at the same time, a researcher can come close, especially when using the bootstrapping or cross-validation utilities included in bnlearn.
Because of Scutari and bnlearn, I am excited to move further with the modeling I’m doing. As an environmental engineer who wants to use computer science, not necessarily create it, I’m very pleased he’s made this tool available.