Troubleshooting eXist

January 28, 2008

For the past few months I’ve been experiencing some mysterious (to me at any rate) problems with eXist crashing. Generally speaking the problems seemed to be due to corrupted index files, and/or issues with temporary fragments. The problems seem to intensify the more the metadata editor is used, and I noticed when I deleted a collection the database would invariable crash, usually the following night. I’ve been limping along by surreptitiously restarting and re-indexing the database, but have had several instances where the database could not be restored in this way and had to restore from backup. This meant that the site was down and completely non-functional while I restored the files.

I think the issue is pressing enough that I’m going to have to put a lot of other things on hold to sort it out. I’ve found a few threads (1,2,3) on the mailing list that seem to address the issue, and it looks like some of it can be solved by an improvement of my xqueries, and that the issue of temporary fragments is on the minds of the developers.

I think I’ll start going over my xqueries to eleminate the creation of temporary fragments where I can. This will be a useful exercise anyway, as many of the queries were written before I had a through understanding of xquery. Hopefully this will significantly improve performance. Another option could be to move the metadata processing to the development server, and have a master/slave configuration to update the live site every night. Since the biggest problems with crashing/corrupted indexes seems to happen when metadata creation work is highest, this could be a way to cut down interruptions in service on the public side.

I like that UVM has given me the freedom to experiment and use technologies that are not being used elsewhere in the library, however with the freedom have come some headaches. I do not have colleagues to fall back on for troubleshooting. I understand why small digital projects go with systems like ContentDM, the trade off in flexibility may be well worth the time saved in other areas.


Quick update

January 8, 2008

I have a steadily growing list of things I need to get finished, but I finally managed to get around to adding RSS (actually atom) feeds for tracking search results and also for tracking collections. This will be particularly handy for a collection like the McAllister Photographs which will be a work in progress for quite some time. (It is currently growing at a rate of about 50-100 photos a week, but I imagine this will slow down once the cataloging staff has caught up to our part time scanning tech.)

In the process of adding the feeds I also took a the opportunity to upgrade to Solr 1.2, which was a pretty painless upgrade, with some nice additional functionality. I hope to get a chance to install 1.3 on my development machine next week to explore the MoreLikeThis functionality. I’d like to use this feature on the item pages, allowing users to get some immediate related items, in addition to using the subject and geographic headings to get related items.

Wrapping up

November 16, 2007

As of November 1st the CDI has been flying sans grant money. I was hired onto the project 18 months ago to start up a digital initiatives program and launch a pilot project with a website, preliminary collections and the infrastructure for future growth. It has been a busy 18 months but I think we accomplished a lot, although perhaps not everything that was in the original proposal. This past week has been filled with administrative wrap up details, including writing a final report on our activities during the grant period.  I tend to think more in lists, so here is the wrap-up of what the CDI has gotten done in the past 18 months.

  • CDI office space and digital photography studio designed and built
    • Equipment selected, purchased and installed
  • Backend –
    • Evaluation of available Digital Asset Management Systems (DAMS)
      • Selection of eXist
    • Built metadata administrative interface
      • Dublin Core XForm
      • MODS XForm
      • Solr XForm
      • Authority Control with MADS
    • Creation of metadata workflow (still a bit of a work in progress)
    • Data dictionaries for Dublin Core, MODS and TEI template for transcribed letters.
    • OIA-PMH data provider (with some help)
  • Built web interface
    • Initial design of web interface
    • Site designed and implemented for the EAD collection
    • User testing and design adjustments
    • Implementation of Solr for faceted searching and browsing
    • RSS for news, and search results (still in progress)
  • Steering committee formed
  • Metadata working group formed
  • Content selection committee formed
    • Creation of content development policy and evaluation matrix
    • 6 live collections with around 670 completed records and more than 8000 pages scanned
    • Scanning work on two additional high use photograph collections
  • Several presentations, including an upcoming presentation at the New England Archivists meeting in March 2008
  • A paper in the fall 2007 issue (not out yet) of Microform & Imaging Review

In addition to the final report I’ve been working on getting the MODS editor into production for our next collection. We are trying a new approach to metadata, and bringing in more staff from cataloging to work on metadata creation, so it will be a good test of the forms. I’ve also been exploring what it would take to make the MODS XForm available as open source, it seems to involve some paperwork, some waiting and assurance to the University that the code is not commercially viable. It has been a busy week.

An XForms evening

October 17, 2007

XML 2007 is in Boston this year, and part of the line up is an XForms evening event. The line up of speakers looks great and includes many of the people whose work I have relied heavily on in exploring XForms during the past year (in particular: John Boyer, Erik Bruchez, and Mark Birbeck).

I’m not sure I will be able to make it to the conference, registration costs are pretty high, and I have been saving my travel money for code4lib2008, hopefully there will be some good post conference blogging on the event.

*Updated- Looks like I will be able to go after all. The XForms evening is a free event.