*Why use such complicated stuff, which is concerned with numbers, probability, density, tests, and all that which we hated in school? Why oh why?! *

*Fact is, archaeologists produce huge amounts of data.
*

At least we in prehistoric archaeology do (this is not exactly a jibe… it is me being un-informed 😉 ). So, what do we do with the data?

Of course we have a “humanistic” research question, one which may be concerned with human behavior in the past, maybe with interaction between groups, maybe with adaption to certain environments, maybe with social relations, whatever.

On the other hand, we have several thousand ceramic sherds. We have several thousand slices of flint. We have a few hundred sites.

Statistics, sometimes called “quantitative methods”, offer tools to work with big data sets. Statistics is by itself such a huge field, that it is impossible to describe everything in one small blog post. But I want to try and give a simple little overview of what may be useful for an archaeologist.

Almost everyone knows **descriptive statistics**. As the name says, you can use it to describe the data you gathered, which is called a *sample*. These are some of the things you can do with it:

- Produce lovely tables and graphics, which inform about the composition of your data: e. g. How much of which type do you have? How much in which region? Do certain types of measures (e. g. length and width of spear points) scatter differently in two groups? Is there a decrease or an increase in certain things over time?
- Get some measures of central tendencies, which tell you about the distribution of your data (mean, median, standard deviation,… too complicated? You can say something like: half of
*all*my flint objects are smaller than 4 cm (and half is bigger), but if I look at those*from a certain site*, half of those are smaller than 1.5 cm… does it mean something?)

**Testing statistics** are about proving, whether there is some statistical significance to what you find. Here we work with probabilities: How big is the chance that the things I observe in my sample are by chance or by some underlying structure in the data (statistically significant)? This also means we try to transfer our results from the sample (the data we gathered) to the population (the group we actually want to talk about, which is described in our research question). Here it is important to think about sampling strategies, sample sizes and correct testing methods for your data. Though this is not trivial, it is quite important not to simply trust your feeling, but to *test*.

- Quite often we use it to look at correlation: You can test whether two variables really occur more often together than would be expected or whether this is by chance. Are there really more female than male graves oriented in a certain way or is this pure chance?
- You can test whether the development you thought you saw in your data (in the sample) using graphs was actually probably there or not. Does the circumference of the ceramic vessels really increase with their size or is my sample too small to say anything about this?

**Multivariate statistics**: Actually multivariate statistics are quite often purely descriptive. The aim here is to find structures in your data set using not just one or two variables but many at the same time, which is quite different to doing simply several bivariate analyses. Multivariate approaches belong to the most used forms of statistics in archaeological literature by now, I reckon.

- Typical analyses are ordering algorithms: I want to know if there is a sequence in the assemblages of grave goods, that I could interpret as chronological or social order (using seriation or correspondence analysis). People rarely dare to create new chronologies without one of these methods nowadays.
- Other well known applications are clustering algorithms, whose job it is to group your data according to similarity. Or discriminance analysis, that looks whether the groups you identified are well defined or not. These are often used for analysis of provenance using chemical components of ceramics or for finding subgroups in your data. Very useful, really!

So, there are loads of things you can do with statistics. It is, admittedly, not easy to learn about them all. I definitely recommend taking courses and not trying to tackle it all by yourself. Nonetheless it is very worthwhile, because it is a way to objectify your results. Of course you can cheat using statistics – if you want to cheat, you’ll always find a way and there is no true impartiality in science or anywhere, really. But by giving your colleagues the data and a good documentation of the statistical methods you used you can enable them to reproduce your analysis. Which *is* needed for objective and good scientific practice.

So. Statistics. Not something I enjoyed in school, I’ll admit freely. At first it was boring to throw the dices a few hundred times, later it was too abstract (I cannot even remember what we did)**. But knowing now what I can learn applying these methods to my archaeological data and research questions, I really really like it.** Quantitative methods enable me to say things with a certain certainty – even if, of course, I’ll never be 100% sure. Nobody ever is. But I believe being able to ascertain that there’s only a chance of 1% that my results are totally amiss is totally worth the time learning the methods.