Statistical Learning – Strategic [Urban] Ecologies, Engineering, and Decision-Making

New SEED paper reporting on use of Bayesian Networks to model water pipe breaks!

seedMay 13, 2014Leave a comment

We have recently had our article, "Bayesian belief networks for predicting drinking water distribution system pipe breaks," accepted for publication in Reliability Engineering and System Safety. It is now available online from the publisher.

This was one of the most rewarding papers I've written, because it allowed me to learn so much more about one of my favorite modeling techniques, the Bayesian Network. Specifically, the challenge of this paper is in learning the network from the data, instead of taking the more popular approach of assuming a network structure a priori. I am still not finished investigating the use of Bayesian Networks in infrastructure data problems, but I'm excited about this first step.

The abstract is quoted below:

In this paper, we use Bayesian Belief Networks (BBNs) to construct a knowledge model for pipe breaks in a water zone. To the authors’ knowledge, this is the first attempt to model drinking water distribution system pipe breaks using BBNs. Development of expert systems such as BBNs for analyzing drinking water distribution system data is not only important for pipe break prediction, but is also a first step in preventing water loss and water quality deterioration through the application of machine learning techniques to facilitate data-based distribution system monitoring and asset management. Due to the difficulties in collecting, preparing, and managing drinking water distribution system data, most pipe break models can be classified as “statistical-physical” or “hypothesis-generating.” We develop the BBN with the hope of contributing to the “hypothesis-generating” class of models, while demonstrating the possibility that BBNs might also be used as “statistical-physical” models. Our model is learned from pipe breaks and covariate data from a mid-Atlantic United States (U.S.) drinking water distribution system network. BBN models are learned using a constraint-based method, a score-based method, and a hybrid method. Model evaluation is based on log-likelihood scoring. Sensitivity analysis using mutual information criterion is also reported. While our results indicate general agreement with prior results reported in pipe break modeling studies, they also suggest that it may be difficult to select among model alternatives. This model uncertainty may mean that more research is needed for understanding whether additional pipe break risk factors beyond age, break history, pipe material, and pipe diameter might be important for asset management planning.

SEED Group Presentation at Concordia University Today, 11AM

seedJuly 6, 2012Leave a comment

Today, Dr. Francis is giving a talk titled "Two Studies in Using Graphical Model for Infrastructure Risk Models" discussing some recent peer-reviewed conference papers given at ICVRAM and PSAM11/ESREL12. The abstract for today's talk is:

In this talk, I will discuss the use of Bayesian Belief Networks (BBNs) and Classification and Regression Trees (CART) for infrastructure risk modeling. In the first case study, we focus on supporting risk models used to quantify economic risk due to damage to building stock attributable to hurricanes. The increasingly complex interaction between natural hazards and human activities requires more accurate data to describe the regional exposure to potential loss from physical damage to buildings and infrastructure. While databases contain information on the distribution and features of the building stock, infrastructure, transportation, etc., it is not unusual that portions of the information are missing from the available databases. Missing or low quality data compromise the validity of regional loss projections. Consequently, this paper uses Bayesian Belief Networks and Classification and Regression Trees to populate the missing information inside a database based on the structure of the available data. In the second case study, we use Bayesian Belief Networks (BBNs) to construct a knowledge model for pipe breaks in a water zone. BBN modeling is a critical step towards real-time distribution system management. Development of expert systems for analyzing real-time data is not only important for pipe break prediction, but is also a first step in preventing water loss and water quality deterioration through the application of machine learning techniques to facilitate real-time distribution system monitoring and management. Our model is based on pipe breaks and covariate data from a mid-Atlantic United States (U.S.) drinking water distribution system network. The expert model is learned using a conditional independence test method, a score-based method, and a hybrid method, then subjected to 10-fold cross validation based on log-likelihood scores.

This talk is hosted by Ketra Schmitt in the Center for Engineering in Society on the Faculty of Engineering and Computer Science.

New SEED Group Paper Presented at PSAM11/ESREL12 in Helsinki, Finland on 27 June 2012

seedJuly 4, 2012Leave a comment

A recent SEED Group paper, "Bayesian Belief Networks for Predicting Drinking Water Distribution System Pipe Breaks" was presented at PSAM11/ESREL12 in Helsinki, Finland. This peer-reviewed conference paper was co-authored by Dr. Francis with JHU collaborators Dr. Seth Guikema and Lucas Henneman. The abstract of this paper follows:

In this project, we use Bayesian Belief Networks (BBNs) to construct a knowledge model for pipe breaks in a water zone. BBN modeling is a critical step towards real-time distribution system management. Development of expert systems for analyzing real-time data is not only important for pipe break prediction, but is also a first step in preventing water loss and water quality deterioration through the application of machine learning techniques to facilitate real-time distribution system monitoring and management. Our model will be based on pipe breaks and covariate data from a mid-Atlantic United States (U.S.) drinking water distribution system network. The expert model is learned using a conditional independence test method, a score-based method, and a hybrid method, then subjected to 10-fold cross validation based on log-likelihood scores.

A short report from PSAM11/ESREL12 will follow in a later post.

New SEED Group Paper w/JHU, TAMU Collaborators!

seedAugust 5, 2011Leave a comment

We have just published a paper, "Characterizing the Performance of the Conway-Maxwell Poisson Generalized Linear Model" in the journal Risk Analysis

Here is the abstract:

Count data are pervasive in many areas of risk analysis; deaths, adverse health outcomes, infrastructure system failures, and traffic accidents are all recorded as count events, for example. Risk analysts often wish to estimate the probability distribution for the number of discrete events as part of doing a risk assessment. Traditional count data regression models of the type often used in risk assessment for this problem suffer from limitations due to the assumed variance structure. A more flexible model based on the Conway-Maxwell Poisson (COM-Poisson) distribution was recently proposed, a model that has the potential to overcome the limitations of the traditional model. However, the statistical performance of this new model has not yet been fully characterized. This article assesses the performance of a maximum likelihood estimation method for fitting the COM-Poisson generalized linear model (GLM). The objectives of this article are to (1) characterize the parameter estimation accuracy of the MLE implementation of the COM-Poisson GLM, and (2) estimate the prediction accuracy of the COM-Poisson GLM using simulated data sets. The results of the study indicate that the COM-Poisson GLM is flexible enough to model under-, equi-, and overdispersed data sets with different sample mean values. The results also show that the COM-Poisson GLM yields accurate parameter estimates. The COM-Poisson GLM provides a promising and flexible approach for performing count data regression.

Enjoy!

Does Big Data make the scientific method obsolete? I hope not?

seedJuly 26, 2011Leave a comment

I came across a link posted by @urbandata on Twitter asking the question, "Does 'big data' make the scientific method obsolete?" My immediate response before clicking the link was, "I sure hope not." After reading the article, I think it may be a bit more complex than that, but stand by my original impression.

The article "The end of theory: The data deluge makes the scientific method obsolete" can be found here: "Does big data make the scientific method obsolete?"

I think the author, Chris Anderson, rightly points out that correlation must not be confused with causation, but he continues without exploring the full meaning of this statement. As a result, he builds an argument that rests on the wisdom of this traditional warning without intending it.

For example, Anderson uses Craig Venter's successful "shotgun sequencing" method to DNA sequencing as an example, yet doesn't realize that the established theory that species are uniquely identified by their genome makes this approach valid. More than that, it lends credence to the author's later observation that organisms don't need to be directly observed to learn about their characteristics. The author can make this claim for the same mechanistic reason the shotgun approach works.

This is not to say that the use of statistical and mathematical models to analyze ubiquitous data around us does not extend the scientific method in ways that we don't yet imagine. It does. However, science provides not only the foundation for the mathematical theories underlying statistical methods, but it also helps us to interpret the data streams and statistical results. Yes, we should strive to change the way science works, but we should not abdicate responsibility for inquiry and investigation to the black box.

[This post also appears on my personal blog, the fertile paradox...]