May 13, 2003

Data, Results and Observations

Tonight's entry is an editted excerpt from a paper I wrote for my PhD thesis, but in the end it wasn't included there.

The terms, "data", "results" and "observations" are often used interchangeably. This reflects the fact that in the evolution of a field of study, the distinction between data, results and observations can become blurred. As work on a topic progresses, assumptions that were clearly stated in the beginning become taken for granted; they are no longer stated and exceptions to them are easily dismissed as mistakes. In an effort to see through the blurring, the following section presents some definitions ...

Begin Aside
The following are proposed definitions. They are how I would like to see the terms used not necessarily how they actually are used. In particular, computer modelers often talk about the data that their models produce.
End Aside

Data are things that are accepted as being facts; this is analogous to being accepted as independent of any model. It is this characteristic which makes data useful in constraining models. Unfortunately, as noted yesterday, it is not possible for anything to be completely model independent; the very act of perception involves assumptions about what is likely to be seen. To accommodate this fact, data can be defined as things whose model we are willing to ignore or at least to accept without question. What is and is not data is a decision that is made in the context of the problem that is being considered. An example of something that I would consider data are the measurements of CO2 concentration in the Keeling curve.

Results are the output of some operation on data; results cannot be model free. Any sort of manipulation of data is done with some purpose and that purpose is determined by the choice of model. In the case of data reduction (e.g., averaging several measurements) the model is usually so widely accepted that the results are once again considered data. The fact remains that even averaging assumes something about the nature of the process which is being measured. An example of results would the the average temperature of Earth.

Observations are generalizations from data or results and, like data, they are meant to be model free. Also like data, it is not possible for them to be so. Observations are the most pernicious of model hiders. It is in generalizing or expounding upon data or results that our preconceived notions are the most invisible. An example of an observation would be to note that both CO2 concentration and the average temperature of Earth are increasing.

End Excerpt

So there you have it - some definitions. In the remainder of the paper that those come from I worked very hard to be consistent with how I used the terms. The problem is that the rest of the world hasn't read my paper and doesn't maintain the same vigilence in how they use these terms. I think what I am hovering around here is that even as we investigate how the world works, we make assumptions about how it works. Those assumptions influence what we find etc.

In reaction to my Eqn 0 piece last night a reader pointed out that politics can also have a strong influence on what is model and what is data. (Is that a fair paraphrase David?) No question about that and that is why this whole problem is so insidious. Politics are buried even deeper than assumptions about linearity or homogeniety. Obviously there is much more to say on this ...