London’s Text-Mined Hinterlands for the Social Science History Association

The map below visualizes the text-mined data produced by the Trading Consequences project. We queried the database to identify all the commodities with a strong relationship to London and then found every other location where the text mining pipeline identified a relationship those commodities at least 10 times in a given year. This results in 111,977 rows of data, each representing between 2841 and 10 commodity-place relationships. I will present this data visualization to the Social Science History Association meeting in Toronto this November.

The map above uses CartoDB’s Torque Cat animation to visualize the data as it changes over time. It only distinguishes 10 different commodities, which is already too many to really follow, and displays the remaining commodities in the Other category. The word cloud below shows all of the commodities and ranks them by the number of places and number of years they met the 10 relationships threshold (i.e. the words are bigger if a commodity had a lot of mined relationships with different places and these relationships remained consistent across the whole century).

It is also possible to look at all of the data from the whole of the nineteenth century to see the the locations with a high intensity of relationships with numerous commodities that also have a strong relationship with London.


[This map looks better when you zoom in.]
I should note that this data does not confirm a direct relationship with London and not all of these locations are a part of the city’s increasingly global hinterlands. Some locations would be competing markets sourcing the same materials or producing the same goods as London. British ports were also waystations where goods from the world were transhipped and sent on to other European centres. The text mining identified when a commodity term, like sugar, was in the same sentence as a place name. The text mining shows a strong correlation between London and sugar and a strong correlation between Cuba and sugar. In this case Cuba, I know from other sources, it was among the numerous suppliers of sugar to London. We cannot simply assume, however, that the strong correlation between Leather and Calais in 1822 meant the French port supplied London with Leather in that year. They could be a market for London’s leather or a competitor. To focus the map on London’s hinterlands exclusively, I would need to filter out results based on additional research and an extensive ground-truthing exercise. It would probably be more accurate to say these maps helps illuminate the geography of commodities related to London in the nineteenth century, but this data and the visualizations remain a starting point for further research (like the research I’m doing with Andrew Watson on leather).

You can download the data as a CSV file with this link.

Here is the abstract for the SSHA paper I’m co-authoring with Bea Alex and Uta Hinrichs:
Visualizing Text Mined Geospatial Results: Exploring the Trading Consequences Database.

Trading Consequences, a Digging Into Data funded project, extracted 150 gigabyte of data from a corpus of more than 10 million pages of digitized historical documents. Using the Edinburgh Geoparser and a newly created lexicon of nineteenth century commodities, we identified relationships between mentions of raw materials (i.e. coal, wheat, cinchona, gum arabic or tallow) and place names (i.e. Canada, London, Lagos, the Chincha Islands or Ceylon). We also extracted dates, information on whether the location is the place of origin, transit or destination and the sentences from which these named entities were extracted. The Trading Consequences team created a number of web visualizations using the D3.js javascript library to allow historians to explore the database, discover interesting relations within the data and, potentially, develop new historical research questions and findings. These visualizations enable users to drill down into the data and identify historical documents for close reading but they also highlight trends and particular relations between commodities and geographic locations. For instance, a word cloud timeline visualization was built to show overall trends in the changing geographies of a particular commodity over the course of the nineteenth century. The size of the database, however, places challenges on the speed of dynamic web-based interactive visualizations. For some queries, the preprocessing of data is necessary to provide instant results. This creates some limitations on the questions historian can explore using the web visualizations. Importing data into an HGIS provides an alternative means to visualize the data. When brining the text-mined data into an HGIS, we can query the database directly, and not worry if the process takes a considerable period of time to an extract interesting subsets of the data. Moreover, we can build upon existing HGIS methodologies and layer the text-mined data with other resources, including scanned historical maps along with vectors and attribute data digitized from documents found in the archives. This interdisciplinary paper will use a historical case study focused on the nineteenth century expansion of the global supply of fats for British industry, to discuss the different approaches developed by the Trading Consequences team to explore big data geospatially for historical research. http://tradingconsequences.blogs.edina.ac.uk/

Leave a Reply