Archaeological Data Analysis Cheat Sheet: Methods, Tools, and Best Practices

Introduction to Archaeological Data Analysis

Archaeological data analysis encompasses the systematic examination and interpretation of material remains to understand past human activities, behaviors, and cultural processes. It bridges quantitative and qualitative approaches to transform archaeological finds into meaningful insights about past societies. Effective data analysis is essential for rigorous archaeological interpretation, helping archaeologists move beyond descriptive inventories toward explanatory models and cultural reconstructions.

Core Data Types in Archaeology

Artifact Data

Data TypeDescriptionCommon VariablesAnalysis Methods
LithicsStone tools and debitageRaw material, technology, typology, dimensions, use-wearReduction sequence analysis, functional analysis, sourcing studies
CeramicsPottery and fired clay objectsFabric, form, decoration, firing technique, dimensionsSeriation, typological analysis, use-alteration analysis
Faunal RemainsAnimal bones and shellsSpecies, element, age, sex, taphonomic indicatorsZooarchaeology, hunting/husbandry patterns, seasonality
Botanical RemainsSeeds, pollen, phytolithsSpecies, part, preservation typePaleoenvironmental reconstruction, subsistence patterns
Human RemainsSkeletal materialAge, sex, pathology, metric/non-metric traitsPaleodemography, paleopathology, activity patterns
MetallurgicalMetal objects and production debrisMaterial, manufacturing technique, typologyCompositional analysis, production techniques, trade patterns

Spatial Data

TypeDescriptionCommon FormatsAnalysis Applications
Site LocationsGeographical coordinates of archaeological sitesPoint data (XYZ)Settlement patterns, predictive modeling
Excavation PlansDetailed maps of excavated areasVector data (polygons)Intra-site spatial analysis, feature relationships
Artifact DistributionSpatial patterns of artifact recoveryPoint data, density mapsActivity areas, site formation processes
Environmental DataLandscape features relevant to site locationRaster data, vector layersSite catchment analysis, land use reconstruction
Remote Sensing DataSatellite imagery, geophysical survey resultsRaster imagery, point cloudsSite discovery, non-invasive investigation

Temporal Data

TypeDescriptionCommon MethodsApplications
Absolute DatesCalendar dates from scientific methodsRadiocarbon, dendrochronology, OSLChronological frameworks, rates of change
Relative DatesTemporal ordering without specific yearsStratigraphy, seriation, typologySequence development, phase identification
DurationTime spans of occupation or activityDate ranges, Bayesian modelingOccupation intensity, abandonment processes
PeriodizationCultural-historical time divisionsPhase assignment, cultural attributionRegional comparisons, cultural histories

Quantitative Analysis Methods

Descriptive Statistics

StatisticApplication in ArchaeologyCommon Tools
Mean/Median/ModeCentral tendency in artifact dimensionsExcel, R, SPSS
Standard DeviationVariation in assemblage characteristicsExcel, R, SPSS
Frequency DistributionsArtifact type proportions across contextsExcel, R, SPSS
RatiosTool-to-debitage ratios, ceramic form proportionsExcel, calculators
Density MeasuresArtifacts per unit volume, features per areaGIS software, Excel

Exploratory Data Analysis

# Example R code for basic exploratory analysis of lithic dimensions
# Load libraries
library(tidyverse)
library(ggplot2)

# Load data
lithics <- read.csv("lithic_assemblage.csv")

# Summary statistics
summary(lithics)

# Boxplot of length by raw material
ggplot(lithics, aes(x=raw_material, y=length_mm)) +
  geom_boxplot() +
  theme_minimal() +
  labs(title="Lithic Length Distribution by Raw Material",
       x="Raw Material Type", y="Length (mm)")

# Scatter plot of length vs. width with tool type
ggplot(lithics, aes(x=length_mm, y=width_mm, color=tool_type)) +
  geom_point(alpha=0.7) +
  theme_minimal() +
  labs(title="Length vs. Width by Tool Type",
       x="Length (mm)", y="Width (mm)")

Inferential Statistics

TestArchaeological ApplicationWhen to Use
Chi-SquareCompare artifact distributions between contextsCategorical data, comparing observed vs. expected frequencies
t-testCompare mean artifact dimensions between assemblagesContinuous data, comparing two groups
ANOVACompare variability across multiple assemblagesContinuous data, comparing three or more groups
Correlation (Pearson/Spearman)Relationship between artifact attributesExamining association between two variables
Regression AnalysisPredicting site locations based on environmental variablesModeling relationships between dependent and independent variables

Multivariate Analysis

MethodArchaeological ApplicationKey Considerations
Principal Component Analysis (PCA)Reducing dimensionality in complex artifact datasetsGood for identifying main sources of variation
Correspondence Analysis (CA)Seriation, identifying chronological patternsEspecially useful for presence/absence data
Cluster AnalysisGrouping similar artifacts or assemblagesRequires decisions about distance measures and clustering methods
Discriminant Function AnalysisClassifying artifacts into predefined groupsNeeds training data with known classifications
K-means ClusteringIdentifying spatial clusters of artifactsRequires specifying number of clusters in advance
# Example R code for PCA of ceramic attributes
# Load libraries
library(FactoMineR)
library(factoextra)

# Load data
ceramics <- read.csv("ceramic_assemblage.csv")

# Select numeric variables for PCA
ceramic_vars <- ceramics %>%
  select(rim_diameter, wall_thickness, height, weight)

# Run PCA
ceramic_pca <- PCA(ceramic_vars, graph = FALSE)

# Visualize results
fviz_pca_biplot(ceramic_pca, 
                habillage = ceramics$vessel_type,
                palette = "jco",
                addEllipses = TRUE,
                title = "PCA - Ceramic Vessel Attributes")

Spatial Analysis Methods

Site-Level Analysis

MethodDescriptionToolsApplications
Kernel Density EstimationCreates smoothed density surface of findsQGIS, ArcGIS, RIdentifying activity areas, artifact concentrations
Nearest Neighbor AnalysisMeasures clustering/dispersion of pointsQGIS, ArcGIS, RStructure placement, burial patterns
Viewshed AnalysisModels visible areas from given pointGIS platformsDefensive positioning, monument visibility
Cost Surface AnalysisModels travel costs across landscapeGIS platformsAccess routes, territorial boundaries
Spatial AutocorrelationMeasures similarity of nearby observationsGeoDa, R (spdep)Identifying spatial patterns and clusters

Regional Analysis

MethodDescriptionApplications
Site Catchment AnalysisExamines resources within reach of sitesSubsistence strategies, territory modeling
Predictive ModelingProjects site locations based on environmental variablesCRM surveys, research design, site discovery
Least Cost Path AnalysisModels optimal routes between pointsTrade networks, movement corridors
Thiessen PolygonsCreates territories based on proximityPolitical boundaries, service areas
Point Pattern AnalysisStatistical evaluation of point distributionsSettlement hierarchies, site interrelationships

GIS Operations for Archaeology

# Example Python code using ArcPy for archaeological site predictive modeling
import arcpy
from arcpy.sa import *

# Set workspace
arcpy.env.workspace = "C:/ArchaeologyProject/GIS"
arcpy.CheckOutExtension("Spatial")

# Environmental factors (input rasters)
slope = Raster("slope.tif")
dist_to_water = Raster("dist_to_water.tif")
elevation = Raster("dem.tif")
aspect = Raster("aspect.tif")

# Reclassify factors to suitability scores (1-10)
slope_reclass = Reclassify(slope, "Value", 
                           RemapRange([[0,5,10], [5,10,8], [10,15,6], 
                                      [15,25,3], [25,90,1]]))

water_reclass = Reclassify(dist_to_water, "Value",
                          RemapRange([[0,100,10], [100,500,8], [500,1000,5], 
                                     [1000,2000,2], [2000,10000,1]]))

# Weight and combine factors
predictive_model = (slope_reclass * 0.3) + (water_reclass * 0.5) + 
                  (elevation * 0.1) + (aspect * 0.1)

# Save output
predictive_model.save("site_prediction_model.tif")

# Validate with known sites
arcpy.MakeFeatureLayer_management("known_sites.shp", "sites_lyr")
ZonalStatisticsAsTable("sites_lyr", "OBJECTID", predictive_model, 
                       "validation_stats.dbf", "DATA", "ALL")

Chronological Analysis Methods

Seriation Techniques

MethodDescriptionApplications
Frequency SeriationOrders assemblages based on changing frequenciesRelative chronology development
Occurrence SeriationOrders based on presence/absence of typesBroad chronological frameworks
Battleship CurvesVisualizes changing frequencies over timeDisplaying chronological trends
Correspondence AnalysisStatistical approach to seriationComplex assemblage ordering

Bayesian Chronological Modeling

# Example R code for Bayesian radiocarbon modeling with OxCal syntax
# This would typically be run in OxCal, but the syntax is shown here

# Define a sequence model in OxCal
"
Plot()
 {
  Sequence("Site A")
  {
   Boundary("Start of occupation");
   Phase("Early occupation")
   {
    R_Date("Sample 1", 5000, 30);
    R_Date("Sample 2", 4950, 35);
    R_Date("Sample 3", 4920, 40);
   };
   Boundary("Transition");
   Phase("Late occupation")
   {
    R_Date("Sample 4", 4800, 30);
    R_Date("Sample 5", 4750, 35);
    R_Date("Sample 6", 4700, 25);
   };
   Boundary("End of occupation");
  };
 };
"

Duration and Event Analysis

MethodDescriptionApplications
Aoristic AnalysisDeals with temporal uncertaintyActivity patterns with imprecise dating
Phase Probability ModelingModels likely temporal distributionSite occupation spans
Event DetectionIdentifies short-term events in chronological dataIdentifying abandonment, disasters, rapid changes
Tempo AnalysisExamines rates of changeCultural transformation processes

Compositional and Materials Analysis

Statistical Methods for Compositional Data

MethodDescriptionApplications
Hierarchical Cluster AnalysisGroups similar compositionsSourcing studies, workshop identification
Discriminant Function AnalysisClassifies samples into known groupsProvenience determination
Mahalanobis DistanceMeasures multivariate distanceOutlier detection, group membership
Log-ratio TransformationHandles compositional data constraintsProper statistical treatment of percentage data

Visualization of Compositional Data

Plot TypeDescriptionBest For
Bivariate PlotsPlots two elements against each otherSimple relationships, initial exploration
Ternary DiagramsThree-variable plots summing to 100%Three-component systems (e.g., clay mineralogy)
Spider DiagramsMulti-element patterns normalized to standardComparing overall compositional signatures
PCA BiplotsReduced dimensionality visualizationComplex multi-element patterns

Dealing with Common Data Challenges

Missing Data Strategies

ChallengeSolutions
Incomplete ArtifactsUse ratios or indices that don’t require complete specimens
Preservation BiasApply correction factors, focus on well-preserved categories
Sampling GapsInterpolation techniques, predictive modeling
Unrecorded VariablesProxy measures, statistical estimation
Documentation GapsLiterature review, re-examination of collections if possible

Small Sample Size Approaches

ChallengeSolutions
Limited Statistical PowerNon-parametric tests, bootstrap resampling
Outlier SensitivityRobust statistical methods, careful outlier evaluation
Representativeness IssuesClear acknowledgment of limitations, contextual interpretation
Inability to SubdivideBroader analytical categories, qualitative supplementation

Taphonomic Bias Corrections

Bias TypeAnalytical Approaches
Differential PreservationMNI/NISP adjustments, preservation indices
Size Sorting (Water Transport)Size distribution analysis, spatial pattern evaluation
Cultural SelectionComparison with reference assemblages, ethnographic analogy
Excavation RecoveryScreen size corrections, recovery rate estimates

Integrated Approaches and Interpretation

Combining Multiple Data Types

ApproachDescriptionExamples
TriangulationUsing multiple methods to address same questionCombining zooarchaeology, isotopes, and residue analysis for diet
Complementary AnalysisDifferent methods addressing different aspectsSettlement patterns plus ceramic analysis for social complexity
Sequential AnalysisResults from one method informing application of anotherInitial survey followed by targeted geophysics
Nested ScalesIntegrating site, local, and regional analysesHousehold activities within settlement patterns

Interpretive Frameworks

FrameworkKey ConceptsAnalytical Focus
Process-FunctionSystems, adaptation, optimizationEnvironmental relationships, subsistence strategies
StructuralismBinary oppositions, mental templatesSymbolic aspects, spatial organization
Agency-PracticeIndividual choice, habitus, structurationVariation, innovation, resistance
Behavioral ArchaeologyFormation processes, behavioral chainsSite formation, technological organization
Historical EcologyHuman-environment interaction, landscape historyLong-term environmental relationships

Digital Tools and Software

Statistical and Data Analysis Software

SoftwareStrengthsCommon Archaeological Applications
RFree, extensive statistical capabilities, reproducibleMultivariate analysis, Bayesian modeling, data visualization
PASTFree, user-friendly, archaeology-specific testsBasic statistics, seriation, biodiversity measures
SPSSUser-friendly interface, comprehensive statisticsDescriptive statistics, hypothesis testing
ExcelWidely available, good for basic analysisData organization, simple statistics, charts
Python (pandas, scipy)Powerful, flexible, good for automationCustom analytical pipelines, machine learning applications

Spatial Analysis Tools

SoftwareStrengthsCommon Archaeological Applications
QGISFree, extensive plugin ecosystemSite mapping, spatial analysis, predictive modeling
ArcGISComprehensive toolset, strong supportComplex spatial analysis, professional mapping
GRASS GISFree, powerful raster analysisTerrain analysis, viewsheds, cost surfaces
R (sf, sp packages)Integration of statistics and spatial analysisPoint pattern analysis, spatial statistics
GeoDaSpecialized for spatial statisticsSpatial autocorrelation, cluster analysis

Visualization and Presentation Tools

SoftwareBest ForArchaeological Applications
ggplot2 (R)Statistical visualizations, publication-quality graphsArtifact attribute distributions, seriation diagrams
QGIS/ArcGIS ComposerMap layouts, spatial data visualizationSite distribution maps, excavation plans
Inkscape/IllustratorVector graphics, diagram creationArtifact illustrations, stratigraphic sections
Blender3D modeling and visualizationArtifact reconstruction, landscape visualization
WebGL/Three.jsInteractive 3D visualization for webOnline artifact galleries, virtual site tours

Best Practices for Archaeological Data Analysis

Data Management

  • Create comprehensive data dictionaries documenting all variables and coding systems
  • Implement consistent measurement protocols to ensure comparability within and between projects
  • Maintain original (raw) data separate from processed/analyzed data
  • Use version control for analytical scripts and databases
  • Follow data citation standards when using datasets from other sources
  • Plan for long-term archiving in sustainable formats and repositories

Analytical Approach

  • Begin with clear research questions rather than applying methods for their own sake
  • Use exploratory analysis before confirmatory statistics
  • Consider multiple working hypotheses rather than single hypothesis testing
  • Acknowledge and quantify uncertainty in measurements and interpretations
  • Combine quantitative and qualitative approaches for robust interpretations
  • Document all analytical steps to ensure reproducibility

Reporting and Publication

  • Provide access to raw data when possible through repositories or supplements
  • Clearly describe methods including software versions and analytical parameters
  • Visualize data effectively with appropriate chart types and clear labeling
  • Report appropriate statistical details (sample sizes, p-values, effect sizes)
  • Acknowledge limitations of data and methods
  • Use open-access formats where possible for maximum accessibility

Resources for Further Learning

Textbooks and References

  • Baxter, M. (2003). Statistics in Archaeology. London: Arnold.
  • Conolly, J., & Lake, M. (2006). Geographical Information Systems in Archaeology. Cambridge: Cambridge University Press.
  • Drennan, R. D. (2009). Statistics for Archaeologists: A Common Sense Approach. New York: Springer.
  • Lock, G. (2003). Using Computers in Archaeology: Towards Virtual Pasts. London: Routledge.
  • VanPool, T. L., & Leonard, R. D. (2011). Quantitative Analysis in Archaeology. Chichester: Wiley-Blackwell.

Journals with Strong Methodological Focus

  • Journal of Archaeological Science
  • Journal of Archaeological Science: Reports
  • Archaeometry
  • Archaeological and Anthropological Sciences
  • Advances in Archaeological Practice

Online Resources and Communities

Training Opportunities

Remember that archaeological data analysis should always serve the broader goals of archaeological interpretation and understanding past human societies. The most sophisticated analyses still require thoughtful archaeological interpretation grounded in solid theoretical frameworks.

Scroll to Top