As archaeologists turn increasingly to
the analysis of large, systematic databases, we need to confront an epistemological
problem: How do we identify bad data, and what can we do about it? Economic
historians and others are becoming consumers of archaeological data, and they
are quick to jump on new databases. They seldom ask about the quality of the
data, and this can result in sophisticated analyses of bad data. But, as we all
know, “Garbage in, garbage out.”
I blogged about this a couple of years
ago in reference to Tertius Chandler’s list of city sizes through history, from both archaeological and historical
sources (Link is here). Those data (Chandler 1987)
are considered shockingly bad and worthless by most historical demographers and
historians. In technical terms, they may be "bullshit" (see my post on bullshit).
Yet some urban scholars merrily use the data for studies today. I
consider this a real problem, and said so in my review of a manuscript for a
journal titled “Scientific Data” (my blog post was an elaboration on that
review). But data quality issues were evidently not as important to the authors
and journal editors; the paper was published with only a few weak caveats about
the data (Reba et al. 2016).
Another recent case focuses on the identification of the plague and other
diseases in historical sources. This one focuses on a database compiled in 1975
by Jean-Noël Biraben of historical occurrences of the plague in France and the
Mediterranean area. Specialists recognize numerous biases and problems with the
basic data. But once the data were digitized, non-historians readily used them
without question, leading to problematic results. The basic problem was pointed
out by Jones and Nevell (2016), and
elaborated on by Roosen and Curtis (2018).
Plague or not plague? From Jones/Nevell) |
I highly recommend Roosen and Curtis
(and thanks to my excellent colleague Monica Green for sending this my way). Their
remarks parallel my views of the Chandler city-size data, but they do a better
job of articulating the historiographical issues involved when subsequent
scholars used these data (badly):
“When scholars fail to apply source
criticism or do not reflect on the content of the data they use, the
reliability of their results becomes highly questionable.” (p. 103).
Jones and Nevell deplore “the loosening
of the rigorous standards of evidence and interpretation scientific researcher
typically demand within their own disciplines.” (103)
Roosen and Curtis list three problems
with recent analyses of the Biraben data (including a paper in PNAS): “First,
reflection on the data collection process has been improper; second,
what the data represent has not been recognized; and third,
critique of the original sources has been inadequate. We argue that a critical
consideration of any of these three elements would have led to the conclusion
that the data set should not have been used at face value.” (105)
“However, through digitization and
subsequent publication in a top-ranked journal, the 4-decade-old data set was
imbued with a false aura of trustworthiness and the impression of being new
historical research.” (104)
Joosen and Curtis make some
recommendations for improving the systematic analysis of historical disease
data. First, scholars need to employ basic historiographical techniques when
they use data. That is, scholars need to subject data to source criticism,
comparison of sources, contextual analysis. Second, hypothesis testing should
be done on limited regions, rather than Europe as a whole (because of regional
biases in Biraben’s material). And third, scholars should compile new, better
databases. According to Monica Green, this is now in process with a new international collaborative project.
I think I’ve run out of steam; I’ll
have to discuss Binford’s hunter-gatherer database in a separate post. This is
a complicated and very troubling case, and I’ve been delaying writing about it
for some time now, so I guess I’ll wait a bit longer.
So, what about big-data archaeology. Do we have any bad data? “Not me!”
we all exclaim. Nevertheless, we all know that some of our colleagues are
sloppier than others, and that some data are more reliable than others. Here
are two suggestions.
First, take a historiographic approach
to data, both your own and those of others. Historiography refers to the basic
methods that historians use to evaluate their sources. Compare sources. Analyze
possible biases in the sources. Analyze the context, purpose, and situation of
the process of generating or gathering the data. Good metadata standards is a
start, but I think we need to go further. Check out some of the methodological
literature in history: (Henige 2005), (Hansen and Hansen 2016), (Kipping et al. 2014), and an old favorite of
mine: (Fischer 1970); I still kick myself
for not taking a class with Fischer when I was at Brandeis!
Second, archaeologists should work out methods
of categorizing the reliability of our data and findings, and these should
become part of the metadata of any database. There are basic methods of
assessing the reliability of data and findings (6
and Bellamy 2012; Gerring 2012; Hruschka et al. 2004). The
Intergovernmental Panel on Climate Change has worked out a useful system for
coding the reliability of findings by different authors. They suggest
evaluating independently the strength of the evidence, and the agreement among
authorities (Adler and Hirsch Hadorn 2014).
These suggestions will have to be
adapted to use with archaeological data. But they can help us avoid some of the
problems that have arisen with the use of the faulty databases on city size and
plague occurrence described above. As archaeologist rush ahead into the brave
new world of big data, we should try to fix data problems sooner rather than
later. We should try to ensure that big data are not bad data.
6, Perri and Christine Bellamy
2012 Principles of Methodology: Research Design
in Social Science. Sage, New York.
Adler, Carolina E. and Gertrude Hirsch
Hadorn
2014 The
IPCC and Treatment of Uncertainties: Topics and Sources of Dissensus. Wiley Interdisciplinary Reviews: Climate
Change 5 (5): 663-676.
Chandler, Tertius
1987 Four Thousand Years of Urban Growth: An
Historical Census. St. David's University Press, Lewiston, NY.
Fischer, David Hackett
1970 Historians' Fallacies: Toward a Logic of
Historical Thought. Harper, New York.
Gerring, John
2012 Social Science Methodology: A Unified
Framework. 2nd ed. Cambridge University Press, New York.
Hansen, Bradley A. and Mary Eschelbach
Hansen
2016 The
historian's craft and economics. Journal
of Institutional Economics 12 (2): 349-370.
Henige, David P.
2005 Historical Evidence and Argument.
University of Wisconsin Press, Madison.
Hruschka, Daniel J., Deborah Schwartz,
Daphne Cobb St. John, Erin Picone-Decaro, Richard A. Jenkins, and James W.
Carey
2004 Reliability
in coding open-ended data: Lessons learned from HIV behavioral research. Field Methods 16 (3): 307-331.
Jones, Lori and Richard Nevell
2016 Plagued
by doubt and viral misinformation: the need for evidence-based use of
historical disease images. The Lancet
Infectious Diseases 16 (10): e235-e240.
Kipping, Matthias, R. Daniel Wadhwani, and Marcelo Bucheli
2014 Analyzing
and Interpreting Historical Sources: A Basic Methodology. In Organizations in Time:
History, Theory, Methods, edited by Marcelo Bucheli and R. Daniel Wadhwani,
pp. 305-330. Oxford University PRess, New York.
Reba, Meredith, Femke Reitsma, and
Karen C. Seto
2016 Data
Descriptor: Spatializing 6,000 years of global urbanization from 3700 BC to AD
2000. Scientific Data 3 (160034).
Roosen, Joris and Daniel R. Curtis
2018 Dangers
of Noncritical Use of Historical Plague Data. Emerging Infectious Disease journal 24 (1): 103-110. http://wwwnc.cdc.gov/eid/article/24/1/17-0477.