Skip to content

Blog

A recent SEED Group paper, "Bayesian Belief Networks for Predicting Drinking Water Distribution System Pipe Breaks" was presented at PSAM11/ESREL12 in Helsinki, Finland.  This peer-reviewed conference paper was co-authored by Dr. Francis with JHU collaborators Dr. Seth Guikema and Lucas Henneman.  The abstract of this paper follows:

In this project, we use Bayesian Belief Networks (BBNs) to construct a knowledge model for pipe breaks in a water zone.  BBN modeling is a critical step towards real-time distribution system management.  Development of expert systems for analyzing real-time data is not only important for pipe break prediction, but is also a first step in preventing water loss and water quality deterioration through the application of machine learning techniques to facilitate real-time distribution system monitoring and management.  Our model will be based on pipe breaks and covariate data from a mid-Atlantic United States (U.S.) drinking water distribution system network. The expert model is learned using a conditional independence test method, a score-based method, and a hybrid method, then subjected to 10-fold cross validation based on log-likelihood scores.

A short report from PSAM11/ESREL12 will follow in a later post.

Today, I'm presenting a guest post from Behailu Bekera, a first-year EMSE PhD student working in the SEED Group.  He is studying the relationship between risk-based and resilience-based approaches to systems analysis.

Resilience is defined as the capability of a system with specific characteristics before, during and after a disruption to absorb the disruption, recover to an acceptable level of performance, and sustain that level for an acceptable period of time. Resilience is an emerging approach towards safety. Conventional risk assessment methods are typically used to determine the negative consequences of potential undesired events, understand the nature of and to reduce the level of risk involved. In contrast, the resilience approach emphasizes on anticipation of potential disruptions, giving appropriate attention to perceived danger and establishing response behaviors aimed at either building the capacity to withstand the disruption or recover as quickly as possible after an impact. Anticipation refers to the ability of a system to know what to expect and prepare itself accordingly in order to effectively withstand disruptions. The ability to detect the signals of an imminent disruption is captured by the attentive property of resilience. Once the impact takes place, the system must know how to efficiently respond with the aim of quick rebound.

Safety, as we know it traditionally, is usually considered as something a system or an organization possesses as evidenced by the measurements of failure probability, risk and so on. Concerning the new approach, Hollnagel and Woods argue that safety is something an organization or a system does. Seen from a resilience point of view, safety is a characteristic of how a system performs in the face of disruptions, how it can absorb or dampen the impacts or how it can quickly reinstate itself after suffering perturbation.

Resilience may allow for a more proactive approach for handling risk. It puts the system on a path of continuous performance evaluation to ensure safety at all times. Resilient systems will be flexible enough to accommodate different safety issues in multiple dimensions that may arise and also robust enough to maintain acceptable performance.

In the world of chemical or human health risk analysis there seem to be several clouds forming over the horizon: mixture-based toxicology and interpretation, data-poor extrapolation to human exposure, and high-dose chronic to sub-chronic low-dose dose-response extrapolation.  These opportunities force us to approach risk analysis as an art, and necessitates the inclusion of decision analysis into chemical screening procedures.

One problem whose urgency is increasing is data-poor extrapolation from animal to human dose-response relationships.  Not only are there tens of thousands of compounds that are not regulated and have no publicly available data, but there are also entirely new types of chemicals produced by technological innovation for which existing toxicological approaches may not be appropriate.

Traditionally, risk scientists make this approximation (and similar others) by proposing a reference dose.  The reference dose (RfD) is an unenforceable standard postulating a daily oral human exposure for which no appreciable risk of adverse effects attributable to exposure to the given compound likely exist.  The reference dose is obtained from a point of departure for which either the lowest dose producing effects, or the highest dose for which no effects have been observed (i.e., LOAEL or NOAEL) that has been divided by uncertainty factors reflecting the uncertainties introduced by extrapolation between species and data quality contexts. Roger Cooke (and several commentators) discuss the RfD, concluding that the approach needs to be updated to incorporate probabilistic interpretation of these uncertainties, but there seems to be disagreement on how to update the RfD. In his Risk Analysis article, “Conundrums with Uncertainty Facors,” Cooke argues that this approach not only relies on inappropriate statistical independence assumptions, but that this is analogous to the engineering design application of safety factors.  By not employing a probabilistic approach, we promulgate uneconomic guidelines at best, while at worst we are overconfident in the in our risk mitigation.

Cooke’s paper illustrates a probabilistic approach to obtaining estimates of dose-response relations from combined animal, human, data-poor, and data-rich results in a chemical toxicity knowledge base founded on Bayesian Belief Networks (in his example, non-parametric, continuous BBNs).  He demonstrates the possibility of employing nonparametric or generalizable statistical methods to obtain a probabilistic understanding of the response of interest in the context of the chemical’s toxicological knowledge base.  This in in contrast to the uncertainty factor approach which presupposes there is only limited understanding of the dose-response relationship at relevant human exposures which we might hope to obtain.  While we are a ways away from abandoning the RfD approach, Cooke acknowledges that it may be difficult to rely only on dose-response modeling.  His approach initializes on current practice, while promising a rapid and simple inference mechanism capable of deriving indicators in toxicological indicators and amenable to inclusion in broader decision-making models.

The Exxon Mobil oil spill in the Gulf of Mexico this past year brought to light one of the most unfortunate aspects of the socio-technical systems that define our society. Because of the complexity and technical sophistication of our most critical infrastructures and crucial goods and services the parties responsible for making regulatory decisions are often not in possession of the data required to make decisions about risk mitigation and management that offer the most public protection, especially in the context of disaster response and risk management.  This becomes more of a problem when the environment in which these decisions are promulgated is characterized by a lack of trust between the regulator, the regulated, and third-party beneficiaries.

In an environment where trust exists between the regulated and regulator, opportunities for mutual collaboration towards broader social goals may be more prevalent.  These opportunities may also be more likely to be identified, formulated, and implemented in ways that my promote more trust and improve overall efficiencies both regulatory and economic. But when trust is broken, the adversarial nature of the regulatory relationship can bring gridlock.

We are very familiar with the image of gridlock in a transportation network from our time stuck in traffic during rush hour in many of our North American cities, and 2011 has made us more and more acquainted with partisan gridlock in Congress, but what about regulatory gridlock?  I am stil thinking this one through but am borrowing from the idea of economic gridlock developed by Daniel Heller to construct these ideas. In my opinion, regulatory gridlock occurs when, in an adversarial arrangement, the intended consequences of a complex technical system (CTS) are well known and integrated while the undesirable consequences of a CTS’s deployment are unpredictable and fragmentary.  The adversarial relationship makes it nearly impossible to facilitate effective communication between owners of the CTS that has failed and the stakeholders who are affected.  In addition, the adversarial relationship activates a feedback loop between perceived transparency of the CTS innovation cycle within the CTS ownership and the willingness of stakeholders to accept non-zero risk.  As this feedback loop promotes increased negative perception of transparency and decreased willingness to accept risk, risk mitigation becomes less economically effective while increasing the overall costs to society of CTS management and innovation.

In 2012, as economic and political pressure to make government more efficient and promote economic recovery increases, will we see the need for navigating this potential gridlock increase?  How will we address this challenge, ensuring that the potential for disasters doesn’t divert our focus from the important work of improving our economic and social welfare through technological innovation within our lifeline infrastructures?

During the holiday break, I have had the opportunity to do some undirected reading in a variety of areas. One of the topics I’ve browsed is urban data. My favorite source for this type of work, IBM Smarter Planet, indirectly led me to a transcript of a talk by an IBM Distinguished Engineer, Colin Harrison. He was discussing the advent of Urban Information Networks during the Paris 2030 Colloque.

Harrison specifically focused on the type of data used to link urban services and their users to each other using three classes: the invisible rendered visible, information for resource management, and open data 2.0. Urban information networks are tearing down the boundaries between citizens and their participation in the pragmatic management of their own urban resources by increasing process transparency at the same time exclusivity of information access is reduced.

One of the possibilities emblematic of the types of problems I hope to address in this space is an anecdote Harrison gives concerning CalTrans:

An example of how information enables the inhabitants to most effectively use the immediately available capacity of the total, multi-modal transportation system comes from our work with CalTrans in the San Francisco bay area. Here inhabitants with smart mobile telephones can subscribe to a service that enables CalTrans to observe their journeys based on the GPS reading from the telephone. From these observations CalTrans can determine the individual user’s common journeys. When the system sees the user beginning a familiar journey, for example commuting from home to the workplace, it looks at the multi-modal choices available to the traveller and the operational status of each of those systems along the required paths, and then makes a recommendation to the traveller for the optimal way to make this journey at this time. The traveller thus makes the journey with the minimum delays and disruptions and the transportation systems’ loads can be balanced.

The opportunities in understanding the impacts of the interplay between user behaviors and system properties is truly awesome. Let us work together to continue seeking understanding of how these emerging problems can be more greatly understood.

Admittedly, I am having trouble developing a title for this post. Broadly speaking, the idea behind my thoughts is that most of my education has been done under a paradigm characterized by two things:

  1. I am a consumer of information.
  2. I am a producer of information only for the professor.

Now, obviously most professors and students alike understand that the professor is ultimately not the final evaluator of the quality of a student's preparations. This privilege belongs to those who will consume whatever the student produces in their professional (or personal) life. Moreover, the student will probably have the ability to largely select the audience to whom their work will be offered. Although many professors and students will readily acknowledge these truths, ultimately, our approach to classroom pedagogy does not prepare students to be producers of information for the audience of their choosing.

Honestly, there seems to be much room to modify the classroom experience for students such that they become not merely consumers of knowledge, but that they might also be producers of knowledge for an audience of their choosing. Certainly, there is not total freedom in this respect, as one of the most important responsibilities of a professor is ensuring the students have mastered the requisite body of knowledge. So, there probably will remain some aspects of the "professor as audience" characteristics of the current classroom. I do believe, however, that education can be greatly enhanced if the transition from receptor to transmitter can be facilitated in the classroom. In this regard, I am reminded of something I read recently that makes this point: "Program or be programmed..." [OK, so the idea is the title of Douglass Rushkoff's book... Reviews welcome!]

To this end, I wonder if there's not also room for explicitly incorporating production into more engineering classes. We have some aspects of the production model in place for capstone and senior design courses, and much ado has been made about problem- and project-based learning. But what about the use of communications as a way to ensure mastery? I was recently reading an old blog post on Academhack about changing the approach to teaching writing. In this post, a professor had been assigned to a class that had shown difficulty learning to write effectively. While the professor had been advised to provide "more structure" for the students, the students were directed instead to write and produce a short documentary. [This killed about 50 birds with one stone, but I won't get into that here...] Because the students were given control in an assignment that built on their skills and interests, their attention remained sharp during the entire semester and the pedagogical results were encouraging.

I'm thinking that requiring students to produce media for the broader public, or whatever audience interests them, will help them internalize mastery of the subject material in a way that is well situated into their increasingly media- and information-saturated lifestyles. In my experience, I have found it to be widely accepted that teaching a subject is the most effective way to ensure mastery of it. Why not incorporate some of these aspects into the student's experiences in the lower-level or fundamental engineering classes? Admittedly, engineering curricula are very demanding, and such approaches may be too risky. In the interest of full disclosure, I do not plan on routinely using these ideas to teach my courses [yet]. Furthermore, we often have design aspects explicitly incorporated into our curricula, so this may be a moot point in the eyes of many students and professors. But certainly, such approaches may hold out promise for ensuring students understand the broader social, policy, and economic implications of the technologies they are developing...

The SEED group is actively involved in the Society for Risk Analysis, and we take this opportunity to list a few brief highlights from this year’s meeting:

  1. A paper by Dr. Francis and collaborators was presented in the preference elicitation and benefits assessment symposium.  The abstract is provided below.
  2. Dr. Francis began serving as the Chair of the Engineering and Infrastructure Specialty Group of the SRA.  EISG will be emphasizing linkages with other specialty groups, while also increasing our presence in the Risk Analysis journal.
  3. Expert elicitation and evidence synthesis were big topics this year, and Drs. Francis and Gray will be hoping to build on this interest within the SRA by applying innovative evidence synthesis techniques to chemical risk assessment problems.
  4. The plenary sessions this year were excellent, including a discussion of the Deepwater Horizon oil spill by Admiral Thad Allen, and a tribute to Carnegie Mellon’s Lester Lave.  Look out for class material from both of these knowledge bases to appear in Dr. Francis’s future courses

This past week the Annual Meeting of the Society for Risk Analysis took place in Charleston, SC. Dr. Francis and collaborators George Gray, John Carruthers, and Robert Lee presented a paper titled "Preferences related to urban sustainability under risk, uncertainty, and dynamics." The abstract is included here:

Numerous older cities in the US are experiencing a state of decline, due to shrinking populations, economic hardship, and many other factors. Large areas of these cities are comprised of contaminated and vacant land. We explore the decision context around land redevelopment approaches focused upon reducing risk, improving quality of life, and fostering sustainability. Characterizing the preferences and objectives of diverse stakeholders in a multi-attribute framework may improve decisions and planning. However, traditional decision analytic approaches tend to be ‘static’, and do not capture the temporal and spatial dynamics of this problem. We propose a framework that integrates stated and revealed preferences in a dynamic modeling environment designed to capture key attributes of urban sustainability identified by stakeholders. The utility of this model will be demonstrated through an observational experiment. Key attributes and preferences will be elicited from a population of stakeholders in a Web environment. After eliciting these preferences, the participants will then engage in a dynamic modeling exercise in which they are able to interactively explore land use decisions considering the complexities of urban dynamics; the numerous tradeoffs, risks, and uncertainties; the resource constraints; and so on. We call this model DMASE (for Dynamic/Multi-Attribute/Spatially-Explicit). Preferences over the key attributes will then be elicited again. We hypothesize that the key attributes and preferences will change appreciably based upon interaction with the DMASE model. Additionally, the model can be modified in an iterative fashion to capture the decision context and preferences of the participants in a more meaningful way. This work will lead to a decision support tool that will allow stakeholders and decision-makers in declining cities to make more informed decisions about changes in the complex urban environment.

Bayesian networks are remarkable graphical models for organizing joint probability distributions according to the conditional independence relationships extant in a dataset.  Another way of saying this is that people usually process information according to hypotheses linking the objects they observe.  People don’t connect objects intellectually unless the hypotheses connect.  One of my PhD advisors at Carnegie Mellon, Mitchell Small, introduced me to the Bayesian Network as a way to combine information from health effects studies to support risk assessment, but as a postdoctoral fellow, I started to look at Bayesian networks as a potential data mining tool.  However you look at them, they are elegant computational models with a compelling axiomatic basis for philosophical reasoning, to boot.  For me, they helped me understand and visualize Bayes’ rule as a graduate student, and now I’m hoping to use them more as a data mining technique to model drinking water distribution system reliability.  For these applications, I am thinking that learning the networks from my datasets will be indispensable.

OK, so learning Bayesian networks hasn’t been the exclusive focus of my thoughts in preparing this research.  Most of the past month has been a more thorough reading of the first few parts of Judea Pearl’s Probabilistic Reasoning in Intelligent Systems, but I have found a really cool paper by an Italian geneticist whose integrated several of the most popular network learning algorithms into an R package (bnlearn) for learning both the structure and parameters of a Bayesian network.  I originally came across his article last March or so when working with some JHU colleagues on using Bayesian Networks to predict missing data in a public hurricane loss model database, but we didn’t learn our network from data, and made some simplifying assumptions that did not require the sophisticated set of techniques in the linked paper.  Having read Marco Scutari’s paper several more times in the past week, I’m very impressed at the resource that he’s constructed.  It also helps a lot that it is in my favorite programming environment. There are many tools that learn either structure or parameters of Bayesian networks, but doing both at the same time has generally been left alone.  While Scutari’s package doesn’t do both at the same time, a researcher can come close, especially when using the bootstrapping or cross-validation utilities included in bnlearn.

Because of Scutari and bnlearn, I am excited to move further with the modeling I’m doing.  As an environmental engineer who wants to use computer science, not necessarily create it, I’m very pleased he’s made this tool available.

We have just published a paper, "Characterizing the Performance of the Conway-Maxwell Poisson Generalized Linear Model" in the journal Risk Analysis

Here is the abstract:

Count data are pervasive in many areas of risk analysis; deaths, adverse health outcomes, infrastructure system failures, and traffic accidents are all recorded as count events, for example. Risk analysts often wish to estimate the probability distribution for the number of discrete events as part of doing a risk assessment. Traditional count data regression models of the type often used in risk assessment for this problem suffer from limitations due to the assumed variance structure. A more flexible model based on the Conway-Maxwell Poisson (COM-Poisson) distribution was recently proposed, a model that has the potential to overcome the limitations of the traditional model. However, the statistical performance of this new model has not yet been fully characterized. This article assesses the performance of a maximum likelihood estimation method for fitting the COM-Poisson generalized linear model (GLM). The objectives of this article are to (1) characterize the parameter estimation accuracy of the MLE implementation of the COM-Poisson GLM, and (2) estimate the prediction accuracy of the COM-Poisson GLM using simulated data sets. The results of the study indicate that the COM-Poisson GLM is flexible enough to model under-, equi-, and overdispersed data sets with different sample mean values. The results also show that the COM-Poisson GLM yields accurate parameter estimates. The COM-Poisson GLM provides a promising and flexible approach for performing count data regression.

Enjoy!