METS, MODS, MIX, PREMIS, MARCXML, VRA Core, TEI, EAD, DDI… My life is alphabet soup.
There are a lot of options for metadata. For the last project I worked on we archived social science survey data and used the DDI (Data Documentation Initiative) specification heavily. I got used to the convenience of having a metadata standard tailored to the content type. The DDI is a pretty complex specification but it allowed us to build/use some cool tools for end users, and to preserve the numerical data and export it to a large array of statistical packages.
The descriptive metadata I’m looking at now is a different story. The choices are less obvious and range from the complex MARCXML standard to the very simple Dublin Core. For my current project (links coming soon, I hope) I have many document formats to support and several types of metadata: EAD finding aids, some form of TEI encoded text for full text or transcriptions, METS for structural metadata, and a lot of descriptive metadata.
The structural metadata was a pretty easy decision, I think METS is widely accepted and used in the library community. Greenstone 3 will use METS, both Fedora and XTF use METS or a form of METS for structural metadata, and even ContentDM will be able to export METS (not import though) soon. Descriptive metadata is a stickier issue. I limited my options to Dublin Core and MODS. Dublin Core is accepted by all of the systems I investigated and is widely used. However the simplicity of Dublin Core has some serious drawbacks when working with complex data. In addition it is easy to go from something more complex (MODS/MARCXML) to a simple Dublin Core record (for OIA-Harvesting for example), but a little less easy to go the other way. On the other hand the simplicity of Dublin Core makes original cataloging easy. There is also the added benefit that our cataloger is familiar with and supportive of Dublin Core.
So I chose Dublin Core. The kind of descriptive metadata we were able to implement suggested that this would be perfectly adequate, and it is. But… I want to change my mind.
We immediately ran into limitations with the subject headings. We are using LCSH, which I’m ambivalent about, because browsing by this:
“Women’s rights — Law and legislation — United States — History”
is not intuitive. Also the number of items using that exact subject heading will most likely be pretty small, at least in our collections. I could be missing the point of subject headings here, but I’m looking at this from a usability perspective, and they are not so useful as is.
I don’t really want to parse out each piece, I would rather have a more structured browse interface. I’m a fan of this browse interface, but I would like to generate it dynamically. MODS has a nice subject heading structure where you can enter each part of the subject heading into its own element and specify that it is a topic, geographic location, a name, or one of several other options.
<subject>
<topic>Women’s rights</topic>
<topic>Law and legislation</topic>
<geographic>United States</geographic>
</subject>
This can be partially implemented in Dublin Core, using dcterms:temporal and dc:coverage, but it takes some extra work and customization to get subject headings for names. Also, unlike MODS, there is no way to group a set of dc:coverage elements with specific dc:subjects elements in Dublin Core.
I really appreciate the use of attributes and nodes in MODS. In the case of subject headings I would be able to specify the vocabulary we are using and then use multiple controlled vocabularies if needed. There are a lot of other features in MODS that I think would be useful, and could probably all happen behind the scenes so that the data entry form does not get overly complex.
Is the additional functionality worth the additional work? I’m not sure, but if it is I need to make the change sooner rather than later.