About & sources · Colonial Office & India Office Lists

An Atlas of Imperial Careers

The working lives of roughly 46,000 officials of two civil services, 1820–1966.

This atlas reconstructs the careers of the officials of two British administrative services — the Colonial Office and the India Office — and maps each official by the moves they made between colonies, presidencies and provinces. Each arc is one recorded transfer; follow a single name and you can watch a working life travel the empire, decade by decade.

It is a solo project by Jim Clifford, an environmental and digital historian of Greater London, the British world and Canada in the Department of History at the University of Saskatchewan. It grew out of a recognition that large language models now make feasible the kind of large-scale text mining and entity linkage once attempted in Trading Consequences — and that the Colonial Office and India Office Lists are unusually strong foundational documents from which to begin building a knowledge graph of the British world system. A companion view extends it — Schools of Empire, on the institutions that trained each service.

The sources

The records behind the atlas are two printed serials. Across their many annual editions both volumes carried a “Record of Services” for each serving official — a terse paragraph of postings, dates, honours and schooling — and that paragraph is the raw material for everything shown here.

COThe Colonial Office List, digitised from the Internet Archive.

IOThe India Office List, digitised from the British Library.

The method

Each volume was OCR'd and every biographical paragraph parsed into structured career events — person, position, place, employer, year. The duplicate entries an official accumulates across decades of editions were then resolved into a single life. Places, roles, institutions and people were grounded to Wikidata, so that “Calcutta”, “C.O.” and “the N.W. Provinces” resolve to stable entities and coordinates (via QLever, property P625). Only located postings appear on the map; the fuller record — roles, honours, education — lives in an underlying knowledge graph.

How it was built

Work at this scale — tens of thousands of biographies across more than a century of editions — was possible because Claude Code orchestrated the pipeline end to end: running the Chandra 2 OCR model on the Nibi cluster (Digital Research Alliance of Canada), driving local language models for the parsing, deduplication and entity-grounding, and building the interactive visualisation itself. What would once have been years of manual transcription and record linkage became a tractable solo project — a small illustration of how AI tooling is changing what a single historian can attempt.

Building the visualisation was itself a core part of the method, not merely its output. Plotting careers on a map made errors legible that no table of numbers would surface — an official marooned in the wrong hemisphere, two people fused into one, a place matched to its modern namesake. Each such anomaly was traced back through the pipeline and corrected at the source, then the atlas rebuilt. Reading the map, in other words, is how the data behind it was debugged — and that work is ongoing. If you spot a mistake — a misplaced posting, a wrong identification, a career that looks wrong — I would be glad to hear it: email me.

Open data

The data behind the atlas is free for anyone to use, released into the public domain under Creative Commons CC0 — copy it, remix it, build on it, for any purpose, no permission needed. The full knowledge graph for both services — every person, posting, place, school and honour, grounded to Wikidata — is published as plain data files in the project repository, ready to load into a graph database (LadybugDB or Neo4j) or read directly in Python, R or a spreadsheet. Attribution is appreciated but not required.

Citable archive: doi.org/10.5281/zenodo.21079938.

This is research-in-progress built from imperfect OCR of nineteenth- and twentieth-century print; names, dates and places carry error, and the coverage is uneven. A fuller account of the sources, models and method will appear in a forthcoming journal article.