Wednesday, August 31, 2011

Wednesday, August 10, 2011

Merged Data Set Documents

We've updated our Merged Data Sets page to include key documents and presentations on our current work related to merging data. Additionally, we've uploaded all other merged data set presentations via Google Docs.

Saturday, December 11, 2010

The New and Improved World-Historical Dataverse Archive

The World-Historical Dataverse has relocated its data archives to Harvard's IQSS (Institute for Quantitative Social Science) Dataverse website.

The design of the site is still consistent with the rest of Pitt's Dataverse site, but now we have the added benefit of Harvard's server space and archival organization.

The new link can be found here. You can also access the archive through our main site, under "The Archive" tab.

Additionally, we've revamped the blog so that the site now is in the same format as the World-Historical Dataverse website.

Friday, April 30, 2010

Of Ipods and Microforms

There are times as a Research Assistant that I’m very thankful for my ipod. I’ve been spending a lot of time in the Hillman Library lately, researching the imports and exports of opium from the British Parliamentary Papers. I’m not going to lie. It can get a little tedious, and I’m glad I have the music and podcasts coming from my headphones to keep me company. Sometimes I wonder what research assistants did before they had this helpful piece of technology to get them through times like this.

I am getting quite a view of some the technology they had to deal with though. There’s this machine way in the back recesses of the microfiche room called a Readex Microprint reader. I’ve been using it to view the vast amounts provided by the British Parliamentary Papers. The process is painstaking. After consulting an index and locating the right microprint card out of tens of thousands, I scroll through the statistical tables of imports/exports for a particular colony for a particular year until I locate opium. Then, after squinting for quite some time at the screen, I make out the quantity and value of opium and note copy it into an excel document. I then double check the numbers by adding the total for each country being imported from or exported to and making sure it matches the total listed in the original document. You can view the initial results here. (Be sure there is more to come. A lot more.)

I’ve been in the library enough in the past month that the staff members in the Microform room know me by sight. They smile and say hi. Eventually one of them came up to me to ask me how many cards I had used so far. That’s when I realized why I was so recognizable. I was the only person that ever used the Microprint Reader. In fact, they needed to know how many cards I had used so they could tell their administrators that people still used it.

The Readex Microprint Reader is somewhat old and cumbersome. In fact, I came across an article in The Journal of Documentation where D.T. Richnell noted that “In academic libraries, in particular, it has been felt that the advantage of possessing in compact form such material as the Microprint edition of the Three Centuries of Drama has been outweighed by the fact that members of the academic staff were unwilling to subject themselves to the strain of prolonged reading on these reading machines.” That article was written in 1957.

Sometimes I think about what this kind of research will be like years from now. Will humble research assistants still rely on their ipods to get them through the more monotonous parts of the day? Or will there be some other form of technology they’ll be thanking? Our hope with the Dataverse is that with the help of many contributors there will be a massive depository of data freely available in an easy to use format, allowing scholars and researchers to get on with the important work of testing hypotheses about large-scale historical patterns.

I also wonder how long that Readex Microprint Reader will be in the Hillman Library. And what will happen to all the information that can be read from it if we don’t get it online fast enough…

Thursday, April 22, 2010

World Bank's Open Data Initiative

The World Bank recently gave access to tons of previously private data.

Thursday, April 15, 2010

Our Data Archive – Current and Future Work

“So what is it that you do there?” is a typical question I get asked by my friends. To answer this question would require a five-page essay, so I usually tell them the abbreviated version: that I work with data archives.

Archiving data is straightforward. These data are gathered from contributors and placed online for public consumption. But you may ask, “Why? Aren’t there a million places that already do this?” Yes, there are (well, maybe not a million), but it is our end goal that separates our project from other archives.

So far we only have a handful of these data on display, but they serve as a starting point for a larger project. We hope to one day transform these data to be able to link them together (a.k.a. data federation). It is a very ambitious project, but I believe it is feasible, and could be a game changer for researchers. In my opinion two primary things have to take place in order for this project to be successful:

  1. We need to identify common denominators for our collection of data. In order to tie these data sets together, we need a common denominator that we can use in a programming algorithm. For this purpose, I feel that GIS coordinates make the most sense, because almost all data are tied to a location. We’ve also thought about tying data temporally and by discipline, and are hard at work at gathering these metadata.

    Of course tying data to any common denominator will not be a simple task, but this is what separates our project from most other archives and keeps us motivated to push ahead with the project. That said, we are already in the process of coding our data with GIS attributes (along with adding temporal and thematic metadata). I hope to cover these side elements in a separate post.

  2. We have to start small. We should approach the programming task using a select group of datasets that are similar. Doing so helps:

    • The project to focus on programming.
      If we use a set of data that is too diverse at the onset, we might not be able to obtain a sensible common denominator. For example if we looked to combine silver data in the 18th century with poverty data in the 20th century, we might face a common denominator issue: Even though these data are tied to a place, the names of these places are different in the two centuries.

      On the other hand, if we focus on a handful of similar data sets, we should be able to give priority to programming. Admittedly, this time-place issue would have to be addressed one day, but to really kick start the project we have to start with a small and similar set of data.

    • Promote our project to other contributors.
      Researchers may be more inclined to donate their datasets when they see a niche project that fits their work. (One of the biggest issues with data collection is to gain contributions from others, I may write about the issue of incentives in a separate post.)

      We have already begun to address the issue of this niche dataset. Our commodity in focus side project has begun collecting opium data sets. Next year, we hope to focus on silver.
This blog entry started off as a way for me to explain what the archive section of World-Historical Dataverse project is about. What I ended up doing – and this was not on purpose – was basically explaining the linkages between the three main components of our project: the data archive, the commodity in focus, and the GIS project. I don’t think I can explain one without the other. The one element that is beyond my technical expertise is the programming part of the project. That is why we hope to bring experts on board (to consult on project details) and are searching for a programmer (to begin the work of data federation).

There are many trial and errors in the work that we’ve done so far. And perhaps we’re wrong to use GIS as a common denominator. But this won’t deter us from working on this ambitious project. We will adapt to new challenges and continue to press on with the creation of a World-Historical Dataverse.