About / Methodology
What this is
The HGIS Canada Knowledge Graph is an academic resource that publishes the Canadian Census Subdivision (CSD) record, 1851–1921, as a queryable knowledge graph and a browseable corpus of per-place web pages. It is designed to serve three audiences:
- Historians and historical geographers who want to look up a township, follow boundary changes across the eight censuses, or compare neighbours, without downloading and parsing the raw boundary files
- The Linked Open Data community, who can consume the same data as CIDOC-CRM-compliant Turtle published for LINCS interoperability
- AI assistants and citation-grounded RAG systems (Gemini, Claude, ChatGPT, Copilot), which can read the per-place pages — with their prose summaries and Schema.org structured data — to answer citizen questions about Canadian local history with citable sources
Relationship to hgiscanada.usask.ca
This project builds directly on the Canadian Peoples / TCP project hosted at the HGIS Lab, University of Saskatchewan. That project provides:
- Georeferenced polygon boundaries for every Canadian Census Subdivision in each census from 1851 through 1921
- Transcribed census tables (population, ethnic origin, religion, agriculture, etc.) joined to the polygons by TCP UID
- The original record-pages with raw key-value census data for each CSD-year
The HGIS Canada Knowledge Graph reorganises those source files into:
- A property-graph knowledge base (KuzuDB) with explicit Place / Presence / Measurement / CensusVariable nodes and typed edges (OBSERVED_IN, BORDERS, CONTINUES_AS, OVERLAPS_TEMPORALLY, SPLIT_FROM, MERGED_INTO, MEASURED_AT, OF_VARIABLE, PART_OF_COUNTY)
- Per-place prose pages with the full census record, neighbours, cross-year continuity, and Wikidata grounding
- A CIDOC-CRM RDF/Turtle export for LINCS publication
The TCP project is the data source; this project is a knowledge-graph layer on top of it. Use this site for browsing, citing, or pointing AI assistants at; use the parent project for the underlying GIS layers and primary record pages.
Data model
Each page describes a Place (the enduring concept, e.g. "Westmeath Township" as a thing that existed across multiple censuses) in a specific Presence (its 1871 census manifestation, with that year's polygon, that year's tabulated values, that year's neighbours).
Identity across years
Identity from one census to the next is established by spatial polygon overlap, not name match. A township that was renamed (Berlin → Kitchener, 1916) or whose census-name spelling drifted (Pembroke / Pembroke Town–Ville) still threads through to the same enduring Place if its 1871 polygon overlaps its 1881 polygon at IoU ≥ 0.98 (a strict SAME_AS chain). This is fundamentally more reliable than name-matching for historical data.
When boundaries shifted enough that the strict SAME_AS chain broke (typically because a township split off a town or annexed neighbouring land), the Boundary continuity section on each year-page surfaces the partial overlaps (CONTAINS, WITHIN, OVERLAPS) to adjacent census years. The Pembroke 1851 → Pembroke 1861 + Pembroke Town–Ville 1861 split is a representative example.
Wikidata grounding
Where a persistent Place can be identified with a modern Wikidata entity, the
owl:sameAs link to that Wikidata QID becomes the canonical external identifier
for the place. This means questions about the deep-time meaning of a place (its modern
location, its current administrative status, its name in other languages) inherit from
Wikidata rather than being modelled here. Where Wikidata's coverage is thin (pre-1850 phases,
small townships that were dissolved, conflations of city / metro / county boundaries), this
graph inherits the thinness — we do not attempt to reconstruct what Wikidata does not
provide.
Grounding is performed via an MCP-assisted disambiguation pipeline that uses spatial distance, name similarity, and entity-type filtering to validate Wikidata candidates. The ~1,000+ verified Ontario matches were produced by this pipeline; the rest of Canada is in progress. Where no Wikidata match exists, places get a permanent minted URI per LINCS conventions (this work is staged separately).
How to cite
For the project as a whole:
Clifford, J. (2026). HGIS Canada Knowledge Graph: Census Subdivisions, 1851–1921 [Web resource]. Built on the Canadian Peoples / TCP project. Available at https://jimclifford.ca/hgiscanada/.
For a specific place-year, cite the page directly using its canonical URL — every page
has a stable URL of the form
https://jimclifford.ca/hgiscanada/places/<prov>/<name>-<tcpuid>-<year>/.
For the underlying TCP source data, cite:
St-Hilaire, M., Sweeney, S., Inwood, K., et al. Canadian Census Subdivisions, 1851–1921 [Dataset]. Borealis Dataverse. borealisdata.ca/dataverse/canadiansubdivisions.
Reproducibility & source code
The full pipeline — boundary processing, persistent-place identification via spatial overlap, Wikidata grounding, knowledge-graph construction in KuzuDB, RDF/Turtle export, and the page generator behind this site — is in the Canada History Knowledge Graph repository. The pipeline is parametric on province + census year; running it against an updated TCP release produces a refreshed site in approximately 30 seconds for the page generation step plus a few minutes for the KuzuDB rebuild.
Limitations
- Boundaries are derived from TCP's polygon files; any errors there propagate here. Boundary uncertainty for some pre-1881 northern and prairie CSDs is significant.
- Census tabulations are transcribed by the TCP project from the original schedules; transcription errors and OCR artifacts in some early censuses propagate here. Where we found systematic OCR errors in CSD names (e.g. "Wesfwd" → "Westwood"), we corrected them via a canonical-name pass; row-level census values are unchecked.
- Wikidata grounding coverage varies by province — Ontario is essentially complete; other provinces are in progress.
- Major cities (Toronto, Montreal, Halifax) are fragmented across many ward-level CSDs in the source data, so a single "Toronto" page does not exist — instead, dozens of ward pages link to each other. A future iteration may add city-aggregate pages.
- The knowledge graph models 1851–1921 only; pre-1850 history is delegated to Wikidata's QID-level depth, and post-1921 census records are out of scope.
Acknowledgments
This project would not exist without the Canadian Peoples / TCP project at the HGIS Lab, University of Saskatchewan, and the years of GIS, transcription, and methodological work behind it. Wikidata grounding uses the WikidataMCP service developed by the Wikimedia Foundation. Knowledge-graph construction is in KuzuDB. CIDOC-CRM modelling follows the LINCS application profile.