Archiving data is straightforward. These data are gathered from contributors and placed online for public consumption. But you may ask, “Why? Aren’t there a million places that already do this?” Yes, there are (well, maybe not a million), but it is our end goal that separates our project from other archives.
So far we only have a handful of these data on display, but they serve as a starting point for a larger project. We hope to one day transform these data to be able to link them together (a.k.a. data federation). It is a very ambitious project, but I believe it is feasible, and could be a game changer for researchers. In my opinion two primary things have to take place in order for this project to be successful:
- We need to identify common denominators for our collection of data. In order to tie these data sets together, we need a common denominator that we can use in a programming algorithm. For this purpose, I feel that GIS coordinates make the most sense, because almost all data are tied to a location. We’ve also thought about tying data temporally and by discipline, and are hard at work at gathering these metadata.
Of course tying data to any common denominator will not be a simple task, but this is what separates our project from most other archives and keeps us motivated to push ahead with the project. That said, we are already in the process of coding our data with GIS attributes (along with adding temporal and thematic metadata). I hope to cover these side elements in a separate post. - We have to start small. We should approach the programming task using a select group of datasets that are similar. Doing so helps:
- The project to focus on programming.
If we use a set of data that is too diverse at the onset, we might not be able to obtain a sensible common denominator. For example if we looked to combine silver data in the 18th century with poverty data in the 20th century, we might face a common denominator issue: Even though these data are tied to a place, the names of these places are different in the two centuries.
On the other hand, if we focus on a handful of similar data sets, we should be able to give priority to programming. Admittedly, this time-place issue would have to be addressed one day, but to really kick start the project we have to start with a small and similar set of data. - Promote our project to other contributors.
Researchers may be more inclined to donate their datasets when they see a niche project that fits their work. (One of the biggest issues with data collection is to gain contributions from others, I may write about the issue of incentives in a separate post.)
We have already begun to address the issue of this niche dataset. Our commodity in focus side project has begun collecting opium data sets. Next year, we hope to focus on silver.
- The project to focus on programming.
There are many trial and errors in the work that we’ve done so far. And perhaps we’re wrong to use GIS as a common denominator. But this won’t deter us from working on this ambitious project. We will adapt to new challenges and continue to press on with the creation of a World-Historical Dataverse.
No comments:
Post a Comment