Archive for the ‘OAI-PMH’ Category

XQuery and OAI

September 25, 2007

It has been a while since I wrote this post about creating an XQuery OAI-data provider. I haven’t done much work on the code since early spring, but Mike Giarlo who was also working on the code has taken it much further and has published a version of the code here. It validates, and I believe the only missing piece are resumption tokens, which may not be to hard to add in. Because Princeton uses X-Hive as their database, the code is X-Hive specific, and may not work in eXist without some modifications. I haven’t tried it with our collections yet, but hope to in the next week or two.

OAI data provider

February 7, 2007

I’ve pretty much finished writing my XQuery OAI-data provider. The process has taken longer than I expected (particularly since the original XQuery I was using was mostly complete). However, I ended up re-writing most of it, partially to insure that I had a thorough understanding of the code, and partially to add some additional features. For example I wanted to be able to provide unqualified Dublin Core records for all the metadata types we hold in the repository, with the flexibility to easily add additional types. Currently the query supports Qualified Dublin Core, MODS, and EAD records and adding additional types should be trivial.

Implementing the data provider is also bringing up some organizational questions. For example, how do I want to support deleted records? How about sets? For our collections I think defining sets as collection of records (rather than an item and its component parts, which is what the METS records do) makes the most sense. For deleted records I’m using the RECORDSTATUS attribute in the mets:header to “deleted” and deleting the actual content, metadata, and full text. I haven’t decided how to implement deleted records for the EAD’s yet, I think it is unlikely they will be deleted. I will probably use the revisiondesc tag, with the value of the item tag as “deleted.”

I’m also getting a little hung up on ResumptionTokens. I have simple paging in place, but have started to wonder if there might be a better method, the guidelines are little vauge on this.

Here are a few of the most helpful resource that I used while putting together the data provider.

  1. The Open Archives Initiative Protocol for Metadata Harvesting – I found this resource to be the most helpful in actual implementation, there are lots of examples.
  2. OAI Best Practices [NSDL]
  3. Open Archives Forum Online Tutorial
  4. Exposing and Harvesting Metadata Using the OAI Metadata Harvesting Protocol: A Tutorial
  5. Proai 1.0 – “Proai is a repository-neutral, Java web application supporting the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) version 2.0.” This may be a reasonable alternative to the XQuery that I’m working on, though I’d like to finish it anyway.

There have also been some recent discussions in my department about being an aggrigator for Vermont based digital collections, as there a several institutions who have expressed interest in collaborative projects, or who have content that would mesh really well with the Vermont centric nature of our current content. So there may be more fun with OAI in my future.