The ChartEx gateway is a gateway for research on large-scale analysis of digital historical records, called charters. These charters are hand-written documents that record legal transactions of property, such as houses, fields and meadows. They have survived in abundance, and are one of the richest sources for studying the lives of people between the 12th and 16th century: they allow us to trace back how cities, towns and villages have developed over time, who were the key characters it this process, and how these people lived.
Over the last years, tens of thousands of such charters from all over Europe have been digitized, making them more easily accessible. Still, working with these charters is wrought with difficulty.
First, they are dispersed across different archives all over Europe. Through the ChartEx gateway however, historians can upload their collections and organize the information they contain in a central database.
Second, traditional search engines are not very useful on this type of data. Current digital catalogues allow historians to search for place-names (e.g., ‘York'), but such a search easily returns many hundreds of charters, too many to read. Likewise, given the fact that there were no family names or birth registers in that era, searching for a person (e.g., John of York) will again yield too many results to manage, or conversely, miss many spelling variations. However, using Natural Language Processing (NLP), we can extract information from these charters automatically, for instance that ‘John' lives in ‘York' and has sold a specific piece of land to another person. These semantic annotations of charters allow more powerful search interfaces. While NLP processes are quite efficient and can handle small batches of documents in a reasonable amount of time, a more scalable, distributed approach is needed when aiming to run these processes over many thousands of documents. These NLP processes are developed in the ChartEx project (http://www.chartex.org/).
Third, even if the correct charters are found, it is often difficult to establish significant links between those charters. For instance, how do we know that ‘John' appearing in one charter is the same person as the ‘John' appearing in a later charter? Using ML techniques, we can reason over such large amounts of charters at once. In practice, this is done by storing all annotations in a large graph (network) and looking for patterns within that graph. Indeed, if a person has the same connections as another person, e.g. he lives in the same town during the same time period, owning the same land and/or being related to the same persons, it is likely that we are talking about the same person. However, these ML techniques need to reason over the information as a whole, which means running these processes over and over again, and over more and more data, as charters are being added to the system. This again calls for a scalable approach, and a convenient interface for end users to start and restart these processes