-->

Thursday, April 15, 2010

Our Data Archive – Current and Future Work

“So what is it that you do there?” is a typical question I get asked by my friends. To answer this question would require a five-page essay, so I usually tell them the abbreviated version: that I work with data archives.

Archiving data is straightforward. These data are gathered from contributors and placed online for public consumption. But you may ask, “Why? Aren’t there a million places that already do this?” Yes, there are (well, maybe not a million), but it is our end goal that separates our project from other archives.

So far we only have a handful of these data on display, but they serve as a starting point for a larger project. We hope to one day transform these data to be able to link them together (a.k.a. data federation). It is a very ambitious project, but I believe it is feasible, and could be a game changer for researchers. In my opinion two primary things have to take place in order for this project to be successful:

  1. We need to identify common denominators for our collection of data. In order to tie these data sets together, we need a common denominator that we can use in a programming algorithm. For this purpose, I feel that GIS coordinates make the most sense, because almost all data are tied to a location. We’ve also thought about tying data temporally and by discipline, and are hard at work at gathering these metadata.

    Of course tying data to any common denominator will not be a simple task, but this is what separates our project from most other archives and keeps us motivated to push ahead with the project. That said, we are already in the process of coding our data with GIS attributes (along with adding temporal and thematic metadata). I hope to cover these side elements in a separate post.

  2. We have to start small. We should approach the programming task using a select group of datasets that are similar. Doing so helps:

    • The project to focus on programming.
      If we use a set of data that is too diverse at the onset, we might not be able to obtain a sensible common denominator. For example if we looked to combine silver data in the 18th century with poverty data in the 20th century, we might face a common denominator issue: Even though these data are tied to a place, the names of these places are different in the two centuries.

      On the other hand, if we focus on a handful of similar data sets, we should be able to give priority to programming. Admittedly, this time-place issue would have to be addressed one day, but to really kick start the project we have to start with a small and similar set of data.

    • Promote our project to other contributors.
      Researchers may be more inclined to donate their datasets when they see a niche project that fits their work. (One of the biggest issues with data collection is to gain contributions from others, I may write about the issue of incentives in a separate post.)

      We have already begun to address the issue of this niche dataset. Our commodity in focus side project has begun collecting opium data sets. Next year, we hope to focus on silver.
This blog entry started off as a way for me to explain what the archive section of World-Historical Dataverse project is about. What I ended up doing – and this was not on purpose – was basically explaining the linkages between the three main components of our project: the data archive, the commodity in focus, and the GIS project. I don’t think I can explain one without the other. The one element that is beyond my technical expertise is the programming part of the project. That is why we hope to bring experts on board (to consult on project details) and are searching for a programmer (to begin the work of data federation).

There are many trial and errors in the work that we’ve done so far. And perhaps we’re wrong to use GIS as a common denominator. But this won’t deter us from working on this ambitious project. We will adapt to new challenges and continue to press on with the creation of a World-Historical Dataverse.

No comments:

Post a Comment