OAI data provider

I’ve pretty much finished writing my XQuery OAI-data provider. The process has taken longer than I expected (particularly since the original XQuery I was using was mostly complete). However, I ended up re-writing most of it, partially to insure that I had a thorough understanding of the code, and partially to add some additional features. For example I wanted to be able to provide unqualified Dublin Core records for all the metadata types we hold in the repository, with the flexibility to easily add additional types. Currently the query supports Qualified Dublin Core, MODS, and EAD records and adding additional types should be trivial.

Implementing the data provider is also bringing up some organizational questions. For example, how do I want to support deleted records? How about sets? For our collections I think defining sets as collection of records (rather than an item and its component parts, which is what the METS records do) makes the most sense. For deleted records I’m using the RECORDSTATUS attribute in the mets:header to “deleted” and deleting the actual content, metadata, and full text. I haven’t decided how to implement deleted records for the EAD’s yet, I think it is unlikely they will be deleted. I will probably use the revisiondesc tag, with the value of the item tag as “deleted.”

I’m also getting a little hung up on ResumptionTokens. I have simple paging in place, but have started to wonder if there might be a better method, the guidelines are little vauge on this.

Here are a few of the most helpful resource that I used while putting together the data provider.

  1. The Open Archives Initiative Protocol for Metadata Harvesting – I found this resource to be the most helpful in actual implementation, there are lots of examples.
  2. OAI Best Practices [NSDL]
  3. Open Archives Forum Online Tutorial
  4. Exposing and Harvesting Metadata Using the OAI Metadata Harvesting Protocol: A Tutorial
  5. Proai 1.0 – “Proai is a repository-neutral, Java web application supporting the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) version 2.0.” This may be a reasonable alternative to the XQuery that I’m working on, though I’d like to finish it anyway.

There have also been some recent discussions in my department about being an aggrigator for Vermont based digital collections, as there a several institutions who have expressed interest in collaborative projects, or who have content that would mesh really well with the Vermont centric nature of our current content. So there may be more fun with OAI in my future.


7 Responses to “OAI data provider”

  1. mjgiarlo Says:


    You can see why I punted on resumption tokens, eh? :)

  2. wsalesky Says:

    Indeed! ResumptionTokens and Sets are requiring way more organizational thought than I had originally anticipated. Anyway, look for a copy of the query (as it currently stands) in you mailbox sometime this afternoon (still doing a little clean up and commenting work).

  3. Clay Redding Says:

    Kudos for properly implementing deleted records! For sets, I’d have standalone XMLs somewhere in the db that could somehow help grab the content you need. The Perl-based VT OAI provider had a neat static set.xml file that influenced my way of thinking on how to ever deliver this aspect with nxdb/xq.

  4. wsalesky Says:

    Thanks Clay.
    Currently the script does supports sets, but doesn’t support nested hierarchy. Each item refers to its parent collection, and each sub-collection refers to its parent collection, however if I have nested sub-collections I don’t really track that in the record (currently). I guess I could, I’m still thinking about what would be the best way. Maybe a separate xml document would be the easiest way, and then I could update it as new collections were added. It is something to think about.

  5. Clay Redding Says:

    Oh, I forgot the obvious question: are you gonna GPL or distribute this with somehow, whether an open source license or not? I wanna use it!

  6. wsalesky Says:

    I am planning on distributing it eventually. I think the code still needs some work and I don’t really have the time devote to it right now, but after I get a chance to clean it up and test it a little more throughly I’ll make it available. If you’d like a copy of it as is, just let me know, I’ll be happy to send it to you.

  7. XQuery and OAI « the DIL Says:

    […] and OAI It has been a while since I wrote this post about creating an XQuery OAI-data provider. I haven’t done much work on the code since early […]

Comments are closed.

%d bloggers like this: