Skip to content

During the holiday break, I have had the opportunity to do some undirected reading in a variety of areas. One of the topics I’ve browsed is urban data. My favorite source for this type of work, IBM Smarter Planet, indirectly led me to a transcript of a talk by an IBM Distinguished Engineer, Colin Harrison. He was discussing the advent of Urban Information Networks during the Paris 2030 Colloque.

Harrison specifically focused on the type of data used to link urban services and their users to each other using three classes: the invisible rendered visible, information for resource management, and open data 2.0. Urban information networks are tearing down the boundaries between citizens and their participation in the pragmatic management of their own urban resources by increasing process transparency at the same time exclusivity of information access is reduced.

One of the possibilities emblematic of the types of problems I hope to address in this space is an anecdote Harrison gives concerning CalTrans:

An example of how information enables the inhabitants to most effectively use the immediately available capacity of the total, multi-modal transportation system comes from our work with CalTrans in the San Francisco bay area. Here inhabitants with smart mobile telephones can subscribe to a service that enables CalTrans to observe their journeys based on the GPS reading from the telephone. From these observations CalTrans can determine the individual user’s common journeys. When the system sees the user beginning a familiar journey, for example commuting from home to the workplace, it looks at the multi-modal choices available to the traveller and the operational status of each of those systems along the required paths, and then makes a recommendation to the traveller for the optimal way to make this journey at this time. The traveller thus makes the journey with the minimum delays and disruptions and the transportation systems’ loads can be balanced.

The opportunities in understanding the impacts of the interplay between user behaviors and system properties is truly awesome. Let us work together to continue seeking understanding of how these emerging problems can be more greatly understood.

I came across a link posted by @urbandata on Twitter asking the question, "Does 'big data' make the scientific method obsolete?" My immediate response before clicking the link was, "I sure hope not." After reading the article, I think it may be a bit more complex than that, but stand by my original impression.

The article "The end of theory: The data deluge makes the scientific method obsolete" can be found here: "Does big data make the scientific method obsolete?"

I think the author, Chris Anderson, rightly points out that correlation must not be confused with causation, but he continues without exploring the full meaning of this statement. As a result, he builds an argument that rests on the wisdom of this traditional warning without intending it.

For example, Anderson uses Craig Venter's successful "shotgun sequencing" method to DNA sequencing as an example, yet doesn't realize that the established theory that species are uniquely identified by their genome makes this approach valid. More than that, it lends credence to the author's later observation that organisms don't need to be directly observed to learn about their characteristics. The author can make this claim for the same mechanistic reason the shotgun approach works.

This is not to say that the use of statistical and mathematical models to analyze ubiquitous data around us does not extend the scientific method in ways that we don't yet imagine. It does. However, science provides not only the foundation for the mathematical theories underlying statistical methods, but it also helps us to interpret the data streams and statistical results. Yes, we should strive to change the way science works, but we should not abdicate responsibility for inquiry and investigation to the black box.

[This post also appears on my personal blog, the fertile paradox...]