So, I found this cartoon and think it’s pretty funny.
I asked myself what it means in our discipline. Everyone is talking about “data“. Or “databases“. Or “big data“. So, what the heck is this “data“? Or are these “data”, because, well, shouldn‘t it be a plural word from “datum“?
So, according to Jorge Cham’s cartoon, either data are facts or data is information.
In archaeology we use data as information , because we don’t excavate “facts” – at least in processual thinking facts are identified as past processes, which we are not able to observe directly , or as artefacts , which we are not able to record by themself. So, in archaeology “documentation” is what we use for analysis. Of course it can be the documentation of a physical fact, such as the length of a pin or the weight of a lead ingot (cheers, Mr. Hanel), but quite often it is the documentation of an archaeological observation or interpretation (“this pot was found in a hearth”, “this pot belong to type XY, “the height of this pot is z cm”). It is absolutely impossible to document everything there is to know about this pot: It would be an infinite task and to use infinity for data gathering would not leave any time for data analysis. We don’t want to do this. Also, this would be way too much information and a lot of it would probably be completely unimportant. That’s why the archaeologist decides what information might be interesting or relevant for his/her question and what information is to be gathered – this is one reason why we will never get rid of subjectivity in research.
So data does not mean “this pot”, data is the information we gather about this pot. Thus a data set is a collection of information. Data sets may be small, e. g. a simple small table concerning the length of three Iron Age swords. Data sets may be huge and organized in databases, with several tables interlinking and a few ten thousand entries. That is actually what a database is: An organized collection of data, in which every information is stored only once. Then, especially in relational databases, the information which object in the database relates to which and how is described as efficiently as possible. For an introduction to data bases see the book of Rolland, in English or German .
Aggregating different databases can lead to … well… big data sets (I’d say, Jorge does not have a big dataset in his cartoon.). “Big Data” has been a buzz word these last few years, but archaeologists are not yet completely sure how to use and whether we want to use Big Data at all , because they are so big we cannot analyze them in conventional ways (that is actually one way to define “Big Data”: “a massive volume of both structured and unstructured data that is so large that it’s difficult to process using traditional database and software techniques” ). So this is not just about big data sets, it is about a method to analyze huge and very complex data sets, comprising of various kinds of data, which have different scale, different precision and quality. The advantage would be the possibility to use “all data”, everything that has been gathered – if it is shared by the person who gathered it (therefore Open Data is a topic of concern as well). This sounds like a job just for data scientists, software developers or statisticians, but the interpretation and analysis of Big Archaeological Data is still the task of an archaeologist, nobody else knows about the practical, methodological and theoretical implications of “our kind of” data.
I haven’t yet read enough about Big Data in archaeology to know how well these approaches work to answer our research questions. But I like the idea of using every scrap of information accessible to us, don’t you?