Saturday, December 15, 2018

When big data are bad data

As archaeologists turn increasingly to the analysis of large, systematic databases, we need to confront an epistemological problem: How do we identify bad data, and what can we do about it? Economic historians and others are becoming consumers of archaeological data, and they are quick to jump on new databases. They seldom ask about the quality of the data, and this can result in sophisticated analyses of bad data. But, as we all know, “Garbage in, garbage out.”

I blogged about this a couple of years ago in reference to Tertius Chandler’s list of city sizes through history, from both archaeological and historical sources (Link is here). Those data (Chandler 1987) are considered shockingly bad and worthless by most historical demographers and historians. In technical terms, they may be "bullshit" (see my post on bullshit).
Yet some urban scholars merrily use the data for studies today. I consider this a real problem, and said so in my review of a manuscript for a journal titled “Scientific Data” (my blog post was an elaboration on that review). But data quality issues were evidently not as important to the authors and journal editors; the paper was published with only a few weak caveats about the data (Reba et al. 2016).

Another recent case focuses on the identification of the plague and other diseases in historical sources. This one focuses on a database compiled in 1975 by Jean-Noël Biraben of historical occurrences of the plague in France and the Mediterranean area. Specialists recognize numerous biases and problems with the basic data. But once the data were digitized, non-historians readily used them without question, leading to problematic results. The basic problem was pointed out by Jones and Nevell (2016), and elaborated on by Roosen and Curtis (2018).
Plague or not plague? From Jones/Nevell)

I highly recommend Roosen and Curtis (and thanks to my excellent colleague Monica Green for sending this my way). Their remarks parallel my views of the Chandler city-size data, but they do a better job of articulating the historiographical issues involved when subsequent scholars used these data (badly):

“When scholars fail to apply source criticism or do not reflect on the content of the data they use, the reliability of their results becomes highly questionable.” (p. 103).

Jones and Nevell deplore “the loosening of the rigorous standards of evidence and interpretation scientific researcher typically demand within their own disciplines.” (103)

Roosen and Curtis list three problems with recent analyses of the Biraben data (including a paper in PNAS): “First, reflection on the data collection process has been improper; second, what the data represent has not been recognized; and third, critique of the original sources has been inadequate. We argue that a critical consideration of any of these three elements would have led to the conclusion that the data set should not have been used at face value.” (105)

“However, through digitization and subsequent publication in a top-ranked journal, the 4-decade-old data set was imbued with a false aura of trustworthiness and the impression of being new historical research.” (104)

Joosen and Curtis make some recommendations for improving the systematic analysis of historical disease data. First, scholars need to employ basic historiographical techniques when they use data. That is, scholars need to subject data to source criticism, comparison of sources, contextual analysis. Second, hypothesis testing should be done on limited regions, rather than Europe as a whole (because of regional biases in Biraben’s material). And third, scholars should compile new, better databases. According to Monica Green, this is now in process with a new international collaborative project.

I think I’ve run out of steam; I’ll have to discuss Binford’s hunter-gatherer database in a separate post. This is a complicated and very troubling case, and I’ve been delaying writing about it for some time now, so I guess I’ll wait a bit longer.

So, what about big-data archaeology. Do we have any bad data? “Not me!” we all exclaim. Nevertheless, we all know that some of our colleagues are sloppier than others, and that some data are more reliable than others. Here are two suggestions.

First, take a historiographic approach to data, both your own and those of others. Historiography refers to the basic methods that historians use to evaluate their sources. Compare sources. Analyze possible biases in the sources. Analyze the context, purpose, and situation of the process of generating or gathering the data. Good metadata standards is a start, but I think we need to go further. Check out some of the methodological literature in history: (Henige 2005), (Hansen and Hansen 2016), (Kipping et al. 2014), and an old favorite of mine: (Fischer 1970); I still kick myself for not taking a class with Fischer when I was at Brandeis!

Second, archaeologists should work out methods of categorizing the reliability of our data and findings, and these should become part of the metadata of any database. There are basic methods of assessing the reliability of data and findings (6 and Bellamy 2012; Gerring 2012; Hruschka et al. 2004). The Intergovernmental Panel on Climate Change has worked out a useful system for coding the reliability of findings by different authors. They suggest evaluating independently the strength of the evidence, and the agreement among authorities (Adler and Hirsch Hadorn 2014).

These suggestions will have to be adapted to use with archaeological data. But they can help us avoid some of the problems that have arisen with the use of the faulty databases on city size and plague occurrence described above. As archaeologist rush ahead into the brave new world of big data, we should try to fix data problems sooner rather than later. We should try to ensure that big data are not bad data.

6, Perri and Christine Bellamy
2012 Principles of Methodology: Research Design in Social Science. Sage, New York.

Adler, Carolina E. and Gertrude Hirsch Hadorn
2014 The IPCC and Treatment of Uncertainties: Topics and Sources of Dissensus. Wiley Interdisciplinary Reviews: Climate Change 5 (5): 663-676.

Chandler, Tertius
1987 Four Thousand Years of Urban Growth: An Historical Census. St. David's University Press, Lewiston, NY.

Fischer, David Hackett
1970 Historians' Fallacies: Toward a Logic of Historical Thought. Harper, New York.

Gerring, John
2012 Social Science Methodology: A Unified Framework. 2nd ed. Cambridge University Press, New York.

Hansen, Bradley A. and Mary Eschelbach Hansen
2016 The historian's craft and economics. Journal of Institutional Economics 12 (2): 349-370.

Henige, David P.
2005 Historical Evidence and Argument. University of Wisconsin Press, Madison.

Hruschka, Daniel J., Deborah Schwartz, Daphne Cobb St. John, Erin Picone-Decaro, Richard A. Jenkins, and James W. Carey
2004 Reliability in coding open-ended data: Lessons learned from HIV behavioral research. Field Methods 16 (3): 307-331.

Jones, Lori and Richard Nevell
2016 Plagued by doubt and viral misinformation: the need for evidence-based use of historical disease images. The Lancet Infectious Diseases 16 (10): e235-e240.

Kipping, Matthias, R. Daniel  Wadhwani, and Marcelo Bucheli
2014 Analyzing and Interpreting Historical Sources: A Basic Methodology. In Organizations in Time: History, Theory, Methods, edited by Marcelo Bucheli and R. Daniel Wadhwani, pp. 305-330. Oxford University PRess, New York.

Reba, Meredith, Femke Reitsma, and Karen C. Seto
2016 Data Descriptor: Spatializing 6,000 years of global urbanization from 3700 BC to AD 2000. Scientific Data 3 (160034).

Roosen, Joris and Daniel R. Curtis
2018 Dangers of Noncritical Use of Historical Plague Data. Emerging Infectious Disease journal 24 (1): 103-110. 

1 comment:

Grant said...

I m so glad to visit this blog.This blog is really so amazing