May 15, 2003

Some thoughts on Determinism

At one time I spent a fair amount of time investigating the behavior of a certain kind of computer model. These models consisted of simple grids of squares. Each square had a value associated with it that would increase slowly. If at some time the value of two adjacent cells differed by more than some threshold, the values of all the cells would be redistributed until the entire system was once again below the threshold. It turns out that such a model can produce very sophisticated patterns of behavior (at the time, I was thinking about the sizes of earthquakes) if there is also a certain amount of randomness in the system (for example, error in how the system re-equilbrates when the threshold is crossed).

In the models that I was working with all of the values were integers (whole numbers like 1,2,3...). In doing the re-equilibrating values from a cell are distributed between the neighbors and thus there is a division. I would round off all of my division back to integers and it turns out that this rounding was enough randomness to drive the model into the region of interesting behavior.

So what does this have to do with determinism? The behavior that was interesting is commonly associated with complex systems, sometimes called the edge of chaos. This behavior is very difficult or impossible to predict; it has sensitive dependence on initial conditions. In my models (and many others) it was also completely deterministic. That is, from the moment I started the program running, the entire path of its evolution was determined (using integers made this transparent (well to me anyway)) even though the next step cannot be predicted.

This is interesting in the context of the machine / organism debate. Clearly the models I was working with were machine-like, while they were interesting, they were completely deterministic. On the other-hand, looking at them from the outside, you would not necessarily know that. Furthermore, computer models can be constructed that modify their internal workings and the relationships amoung their pieces as they attempt to achieve certain (externally set) goals. One class of these codes is called genetic algorithms.

It is also interesting in the context of the "what do we do now" question. Consider humans as part of the system. We can change how we interact with each other and with the natural systems. But given the difficulty in predicting or understanding the impact of any given change, how should we decide what to do? In many ways answering this question is the task that I have set myself. I have the following thoughts as a place to start:

  • Incrementalism is probably a good default approach.
  • Plans should include monitoring impacts and contingencies for when things go wrong.
  • We must recognize that Earth systems are dynamic. To the extent that there is any balance in nature, that balance is likely due to tension rather than stasis.

May 14, 2003

Microwave Popcorn

Begin Aside
a bit of late night contemplation...
End Aside

Consider microwave popcorn:

$0.80 / bag (cheap, what is the margin on this stuff?)
the bag (let steam out, don't burn etc)
the popcorn (probably hybrid)
the grease / flavor (I don't really want to think about it)

not mention the ubiquitous oven itself

it is all technology...

The Idea of Wilderness

I am reading a book called The Idea of Wilderness by Max Oelschlaeger (1991, Yale Univ Press). Max spends a chapter on "Ancient Mediterranian Ideas", but I jumped straight from the paleo and neolithic to his discussion of the Modern. In making that jump it seems that one of his main points is that the earliest humans did not seperate themselves from the wild, but saw themselves as part of the natural cycles. While he doesn't say so explicitly, part of the cycle image is also a sense that our linear or forward notion of progress was absent from earliest human cultures.

Jumping to the Modern as I did I missed alot of transition but here is what I have gleaned about the difference (remember this is what I think Max thinks and I am still early on in the book...):

  • Christianity has a dominate the Earth element to it. How that manifests has evolved but it has been an important part of the development of Western thinking about the relationship between humans and nature. In general though it requires that humans understand and dominate nature to know God or to return to a state of grace.
  • Within the Modern there is an important split between nature-as-machine and nature-as-organism. This split is presented as an evolution from organismic to mechanistic and Oelschlaeger traces its origins back to Galileo. In part Galileo's use of the telescope introduced science as measurement and nature as the thing to be measured.
  • Of course a crucial element of the Modern is that humans stand outside of Nature. Even the romantic poets stood outside. They longed to get back in and they looked to Modernity to return them to the Garden.
  • The nature-as-machine metaphor is traced back to Descartes and his mind / body split.
  • The finalization of the split between civilization and wilderness is laid at Adam Smith's feet.
    Wealth of Nations represents the realization of Merlin's dream: the base and valueless could not, with the facility of natural science and industrial technology, be transformed into a Heaven on earth. Consumption, and its never-ending growth, is the summum bonum of the Wealth of Nations, an ideal yet living today in the relentless pursuit of economic development. Through legerdemain, Smith transformed the first world from which humankiind came into a standing reserve - a nature of significance only within a human matrix of judgment, devoid of intrinsic value. (p.94)
  • The machine vs organism view continues to be important. In the machine causal relationships are linear and direct. In the organism they can be non-linear and complex.
More on this soon...

May 13, 2003

Data, Results and Observations

Tonight's entry is an editted excerpt from a paper I wrote for my PhD thesis, but in the end it wasn't included there.

The terms, "data", "results" and "observations" are often used interchangeably. This reflects the fact that in the evolution of a field of study, the distinction between data, results and observations can become blurred. As work on a topic progresses, assumptions that were clearly stated in the beginning become taken for granted; they are no longer stated and exceptions to them are easily dismissed as mistakes. In an effort to see through the blurring, the following section presents some definitions ...

Begin Aside
The following are proposed definitions. They are how I would like to see the terms used not necessarily how they actually are used. In particular, computer modelers often talk about the data that their models produce.
End Aside

Data are things that are accepted as being facts; this is analogous to being accepted as independent of any model. It is this characteristic which makes data useful in constraining models. Unfortunately, as noted yesterday, it is not possible for anything to be completely model independent; the very act of perception involves assumptions about what is likely to be seen. To accommodate this fact, data can be defined as things whose model we are willing to ignore or at least to accept without question. What is and is not data is a decision that is made in the context of the problem that is being considered. An example of something that I would consider data are the measurements of CO2 concentration in the Keeling curve.

Results are the output of some operation on data; results cannot be model free. Any sort of manipulation of data is done with some purpose and that purpose is determined by the choice of model. In the case of data reduction (e.g., averaging several measurements) the model is usually so widely accepted that the results are once again considered data. The fact remains that even averaging assumes something about the nature of the process which is being measured. An example of results would the the average temperature of Earth.

Observations are generalizations from data or results and, like data, they are meant to be model free. Also like data, it is not possible for them to be so. Observations are the most pernicious of model hiders. It is in generalizing or expounding upon data or results that our preconceived notions are the most invisible. An example of an observation would be to note that both CO2 concentration and the average temperature of Earth are increasing.

End Excerpt

So there you have it - some definitions. In the remainder of the paper that those come from I worked very hard to be consistent with how I used the terms. The problem is that the rest of the world hasn't read my paper and doesn't maintain the same vigilence in how they use these terms. I think what I am hovering around here is that even as we investigate how the world works, we make assumptions about how it works. Those assumptions influence what we find etc.

In reaction to my Eqn 0 piece last night a reader pointed out that politics can also have a strong influence on what is model and what is data. (Is that a fair paraphrase David?) No question about that and that is why this whole problem is so insidious. Politics are buried even deeper than assumptions about linearity or homogeniety. Obviously there is much more to say on this ...

May 12, 2003

Equation 0

My favorite equation is the following:

data = model + residual (Eqn 0)

When it was first presented to me it was in the form:

residual = data - model (Eqn 0')

In the 0' form it is a way to compare your understanding of a problem (model) with the way that the world actually works (data). Another way of thinking about eqn 0' is that the model is the part of the world that you understand and the residual is the part of the world that remains to be explained. Your model of the world is acceptable to the extent that your residual is acceptable. In general, acceptable resduals are very much like low volume white noise (they have no structure and low amplitude).

Begin Aside
Notice that I have not said that the model is true or false, it is only accepable or unacceptable.
End Aside

If your model is not acceptable there are two things that can be done. The first is to refine the parameters of the existing model. Lets say that our model of how much CO2 a car puts into the atmosphere is a linear function of how many miles it is driven. We might write that down as follows:

CO2 = a * miles + b (Eqn 1)

a and b are the parameters of the model. a is the slope of the line and b is the "0 intercept". The values of a and b are choices we make and can be adjusted based on the make and model of the particular car. Cars with better gas mileage will have a lower values of a. The intercept value, b, will be very close to 0 and will vary with the driver of the car; in my quick thinking tonight it might reflect the time that a driver allows her car to warm up before starting off.

The second option if your residuals are not acceptable is to change models. In the context of the example above perhaps the amount of CO2 emitted by a car is some more complicated function of its average speed:
CO2 = c * sqrt(avg speed) (Eqn 2)

In this case c is our adjustable parameter but we have also introduced a non-linear element (the square root) and an aggregate factor (average speed). (I am not going to into this further tonight, the important point is that there are alternate possiblities for our explanations of how the world works).

OK that is all fine and good, but what does it have to do with my preferred formulation of this equation, Eqn 0? Well my preferred form suggests that the data we actually collect reflects what we expect to find plus some surprises. This is a variation on Kuhn's ideas of a paradigm and pardigm shifts. In times of normal science, experiments are designed to explore the details (refine the values of a and b in Eqn 1) of the paradigm (model); we only look for what we expect to find. In times of pardigm shift, the surprise part cannot be ignored and we must replace our models (Eqn 1 vs Eqn 2).

The key issue here is that Eqn 0 and Eqn 0' are the same equation. Each form has surprise in it and models and data are acceptable to the extent that our level of surprise remains acceptable.

Begin Aside
Notice that I have not said that the model is true or false, it is only accepable or unacceptable.
End Aside

May 11, 2003

SARS slight return

I am getting some great feedback on the modeling stuff, but I am feeling a bit frazzled tonight so I am going to shift gears briefly and let my thoughts on modeling simmer for a while.

SARS is back in the news. It seems like the disease has the potential for a bit of a second wind. The figure below summarizes my understanding of the disease. In individuals, there were reports of recurrence if the disease was not fully conquered in the first go-round. In communities, there seems to be the potential for new outbreaks as well. Toronto is seeing a pretty good resurgence. Part of the bump in Toronto seems to have to do with how SARs is defined.

What is the relationship between the progress of the disease in individuals and in populations? The evolution of the disease in individuals is the domain of medicine; the evolution of the disease in populations is the domain of public health. My guess is that the stubbornness of the virus in individuals is related to the potential for new outbreaks. In part I think this because of the importance of restricting exposure in controlling spread - in order to contain the disease, each victim must infect less than one other person.

When I wrote about SARS previously, it looked like the mortality rate was reasonably low. Turns out that the mortality rate is higher than previously thought it may be as high as 17 - 20%. As might be expected, mortality is higher in younger and older people. For people over 60, the mortality is quite high. I am quite interested in the age structure of the mortality - is it typical for infectious diseases or is it unusual?

As I noted earlier, SARS is a coronavirus and it does seem to have moved between animals and humans.