Archive for May, 2007

Solr/XForms Continued

May 31, 2007

Once I got down to implementing my Solr/Xforms integration I realized how many options I really have. In addition to the two mentioned in my last post I could also populate my Solr instance by sending a get request that would replace a dummy instance with a Solr record (generated by an XQuery). This request could be triggered when the user marks the form as complete. Then the instance could be sent to Solr when the user clicks on the publish button. Unfortunately I can’t seem to get this option working. I’m not sure if it is because of the way I have my instances are set up, or if I need an additional model for the Solr data. I’m using this tutorial as an example, and will keep plugging away at it. But for now I’m using the pop-up window method.

Here is what my “publish” button looks like:

<xforms:trigger ref="//dc:status[. = 'complete']">

  <xforms:label>Publish</xforms:label>

  <xforms:action ev:event="DOMActivate">

    <xforms:send submission="submitMetadata" />

    <xforms:load resource="javascript:openWin('addRecs.xql?pid={$pid}')" />

  </xforms:action>

</xforms:trigger>

This creates a pop-up XForm window generated by the addRecs.xql. This XQuery grabs all the descriptive metadata and also any transcripts if they are available and transforms it into a Solr style xml. It can be used for a single record or for all the records in the database. The pop-up XForm is very simple it looks like this:

<html xmlns:xf="http://www.w3.org/2002/xforms"

xmlns:ev="http://www.w3.org/2001/xml-events"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml"

xmlns:dc="http://purl.org/dc/elements/1.1/" xsl:version="2.0">

  <head>

    <title>Send Data to Indexer</title>

    <meta http-equiv="Pragma" content="no-cache"/>

      <xf:model>

        <xf:instance>

   	  <add><!--Your Solr Record goes here --></add>

	</xf:instance>

	<xf:instance id="solr-resp">

	  <results xmlns=""/>

	</xf:instance>

	<xf:instance id="sendCommit">

	  <commit/>

	</xf:instance>

	<xf:submission id="submit" method="post" replace="instance"
	  instance="solr-resp" action="http://localhost/solr/update"/>

	*<xf:submission id="submitCommit" method="post" replace="instance"
	  instance="commit" ref="instance('sendCommit')" action="http://localhost/solr/update"/>

      </xf:model>

    </head>

    <body>

      <div id="main">

   	<div id="formInfo">

	  <xf:switch>

	   <xf:case id="case1" selected="true()">

	    <h2>Title of Record</h2>

		<div>

		<xf:trigger>

		  <xf:label>Add</xf:label>

		  <xf:action ev:event="DOMActivate">

		  <xf:send submission="submit" />

		  <xf:toggle case="case2" ev:event="DOMActivate"/>

		  </xf:action>

	   	</xf:trigger>

		</div>

	  </xf:case>

	  <xf:case id="case2">

	    <h2>Item submitted!</h2>

	    <div>

	     <xf:trigger>

	      <xf:label>Okay</xf:label>

		<xf:action ev:event="DOMActivate">

		<xf:send submission="submitCommit" />

		<xf:load resource="javascript:window.close()" />

	        </xf:action>

	    </xf:trigger>

	   </div>

	 </xf:case>

	</xf:switch>

	</div>

     </div>

    </body>

</html>

When the user presses the “add” button the main instance is posted to Solr and the form toggles to the commit view. This view says: “Item Submitted!” The user presses “okay” to send the commit command and close the pop-up window. It may not be the most elegant solution but it works for now.

I also took this opportunity to revamp the user interface. Previously it was one long form with multiple save buttons scattered throughout the form. This was unwieldy and really didn’t make much sense, so I’ve be switched to a tabbed UI (like this) with 4 tabs: “Citation Information,” “Resource Description,” “Subject Analysis,” “Related Items” and “View”. The tabs will eliminate the need for multiple save buttons throughout the form, and make the data more compact and, I hope, easier to navigate. Here is a snapshot of what the new form looks like:

XForm metadata editor

As you can see there are 4 tabs and two buttons. The “View” tab first saves the record and then toggles to an xforms:case with xforms:output for each element. In addition this gives a link to the “public view” of the record so the metadata librarian can see the page images and the data in context. It is a little garish but the multi-colored tabs saves me from writing any JavaScript to highlight the active tab.

*Opps, the code for the pop-up window was missing a crucial part, the ref attribute in the second submit. I added it to the code above, so everything should work now. The ref attribute isn’t necessary (although it may be good practice) in the first submission because if there is no instance specified the control will assume the first instance. (So yesterday’s code sent the record to Solr twice, but never sent a commit.)

Advertisements

XForms Sending Multiple Instances/Actions

May 29, 2007

Well, apparently I didn’t look very hard for this solution, because this morning I found a whole list of examples. These are the most useful two that I found:

Turns out it is really simple, and I probably should have figured it out without any examples. Essentially you can use the xforms send action as many times as you like in a single trigger (I should have known this because for I have used multiple inserts in a single trigger before, which is a similar idea).

First in the XForms model:

(:Submit metadata to xquery for processing and updating current record in eXist:)
<xforms:submission id="submitMetadata" method="post" replace="all"
  action="{concat('submitXForm.xql?pid=', $id)}"/>

(:Send solr instance directly to solr:)

<xforms:submission id="submitSolr" method="post" replace="all"
  action="http://localhost/solr/update" ref="instance(solr)"/>

Then in the user interface:

<xforms:trigger>
  <xforms:label>Save Both</xforms:label>
  <xforms:action ev:event="DOMActivate">

<xforms:send submission="submitMetadata" />

<xforms:send submission="submitSolr"/>

</xforms:action>

</xforms:trigger>

This posts the metadata instance to an xquery that then updates the record in eXist and simultaneously posts the Solr instance directly to Solr.

I have a few options on how to do this. I could have two instances, one for metadata editing, and one for the Solr indexing. The solr instance would be tied to the data in the first by a series of bind statements and would get updated as the user edits the metadata record.

Example:

 <xforms:bind  id="title"
  nodeset="instance('solr')/descendant-or-self::*[@name='title']"

  calculate="instance('metadata')//dc:title"

  relevant="string-length(instance('metadata')//dc:title) != 0"/>

I would then need make this submit button conditional, only available when the record is marked as complete. This can be accomplished with a ref=”//dc:status[. = ‘complete’]” attribute on the trigger, or using bind/relevant in the xforms model. This submission would send both instances, one to eXist and one to Solr, as shown above.

The other option would be to have the second submit call an XQuery that creates the Solr index record (in a new XForm). This would require an additional click, for example, when the user saves a completed record there could be a pop-up that asks if the user really wants to “publish” this data, when they click yes this XForm would POST to Solr and close the pop-up window (except I also still need a way to submit the <commit> to Solr). This is the method that I will most likely use. There are a few reasons, one, I already have an XQuery written to create the index records, so it would be faster to put into place. Secondly, when an item has OCR/Transcript, or other full text version, the XQuery will grab that text and submit it to Solr as well. I could probably finagle this into my main XForm but I’m not sure it is worth the effort as I don’t need to pull in the full text for metadata editing. I also like the idea of keeping my two forms separate so that as/if I add Solr fields I will not need to tinker with the main metadata form.

Getting back on track

May 25, 2007

The combination of the launch crazieness (i know, it was over a month ago), a week of vacation, and the R2 recommendations has left me a little disorganized. I still have lots to do, but now that the launch is over I’m having some trouble prioritizing them.

Here’s what my list looks like so far (in no particular order):

  • Get back to the metadata processing side of the CDI
    • Create a METS editor so we can finally get rid of that Access DB
    • Continue work on the MODS XForm in order to liberate our descriptive metadata from Dublin Core. (Check out the xforms@code4lib wiki to see some forms in process from UVM (Firefox extension) and Princeton (Orbeon))
    • Create a one/two button method for sending completed records to Solr, from the descriptive metadata form. Currently I send either a collection at a time, or the entire database at once. This was fine for getting started, and I could have the script run every night to collect newly added items but I would rather have the records get added to the index when their status is changed to “complete.” I would really like to be able to submit two instances simultaneously in my XForm, so that when an item is saved, and marked as complete it saves the record to eXist and also sends it to Solr. Unfortunately I haven’t found any examples of this, and am not sure it can actually be done (with XForms), so we may end up with a two button approach.
    • Create some interface for indexing and managing EADs
    • Test the XQuery OAI data provider, and register the CDI collections with OAI harvesters
    • Solve the pesky URL issue. I’d like to set the exist webapp as my root directory, thus eliminating it from the URL altogether. I have no problem doing this with Tomcat, but once you add Apache into the mix bad things happen.  In general my URL’s are not very user friendly, and I’m wondering about fixing that… not sure what I would need to do, but I should at least look into it, obviously this needs to be done sooner rather than later (and should have been resolved before the launch).
  • Finish the new Finding Aids site (which had a stealth release a few weeks ago)
    • Finish configuring Solr for the EADs
    • Add a FOP processor to the server so we can use the XSL-FO stylesheets that I spent so much time on at PU.
    • Fix the problem of really large EAD files causing out of memory errors (maybe by breaking up the files, or by increasing the memory allocation in the eXist config file)
    • Work with the Curator of Manuscripts to create additional browsing/searching options for the new finding aids site.
  • Work on new additions to the front end
    • Allow users to remove filters from their “narrowed” Solr searches (kind of like this)
    • Add faceted browsing to all the browse pages, including the browse collections page
    • Add a news feed
    • Investigate image zooming options: JPEG2000, Zoomify, etc.
    • Add user generated tags
    • Add commenting
  • Find a web stats program that I like, and figure out how I want it configured (happily I don’t actually have to do the configuring)
  • Workflow management – This is a big one, but not something that I can do alone.
  • Look at integrating JHOVE into our workflow
  • Get a development server up and running (high priority, but heavily dependent on the next point)
  • Get a budget quote, and work on purchasing a server (for image storage) and a book scanner. This is a group project and may require some field trips. Fun!
  • Start working on partnerships with interested departments/faculty/organizations
  • Start working on migrating legacy projects into the CDI
  • Clean up the mess I made in developing the CDI so that it would be possible to pack up the whole system for other people to take a look at. CTL and Academic Computing here at UVM have expressed an interest in using the eXist XForms combo for some of their projects.
  • Continue working on information architecture for the library wide redesign
  • Eat chocolate, lots of chocolate

I was also strongly encouraged by my supervisor to take a day a week to work on “scholarship and creative activities” (trying to get published). I’m kind of ambivalent about publishing, but  there is no ambivalence about it at my library; if you want a promotion you will need to publish. Preferably you will publish (in peer reviewed journals), give talks, and be on several regional/national service committees. So in addition to the list above, I guess I’ll be trying to put together a paper or two.

Library Realignment

May 11, 2007

The library recently hired a consulting group to help us evaluate our workflow and think about restructuring and repurposing existing staff openings to best fit the changing library model. We got the report back yesterday.

I’m intrigued by the recommendations, and what seems to me to be a few curiously large holes in them. In particular there was a heavy emphasis on the library’s move towards more digital content, but no mention of a library webmaster. There was a significant amount of discussion about digital access, and there is a recommendation for a “Discovery and Delivery” group, which would investigate additional ways of meeting virtual information needs, but I fail to see how these can be implemented with the current staffing structure. Currently the library website is maintained by committee and while the committee manages to keep the library website mostly up to date, there is no one dedicated to implementing new features, or staying on top of web technologies. I think that it is very important for the library to have a full time professional dedicated to the libraries virtual presence, and by virtual presence I mean more than just updating the website. There are a lot of possibilities for getting content to users in different ways, remixing current library content to be more context relevant, and to improve existing interfaces and tools. (Check out this post to see some interesting developments in libraries.) This kind of work can not be done by part time members of a committee who all have other jobs and professional interests to keep up with. Although this hole in the recommendations doesn’t really effect the CDI it does have a huge impact on my workload, as I’m on the web team and part of the current redesign efforts.

The report was generally very positive for the CDI. The CDI is listed as a “strategic initiative” which indicates a continued commitment from the library. They recommended that my position be made permanent (yay, because funding runs out soon), with the addition of two new positions; a metadata librarian and a programmer. They also suggested the possibility of repurposing a copy cataloger to do metadata work. Which means I need to get those XForms polished and ready for primetime. (We have also had some interest in our architecture, eXist, XForms, and Solr from the Center for Teaching and Learning.)

But here is where things get a little wonky. Currently the CDI is situated under Special Collections, actually I believe it is called Research Collections. I had mentioned in my interview that I thought this may not be the best place for the CDI, as it could give the impression that the CDI was a Special Collections project rather than a university wide resource. I suggested that the CDI should be its own department. The reason I thought, and still think, the CDI could be its own department is that as a digital library, the CDI has many of the same operations that a physical library has (although we don’t do much in the way of reference service). We have cataloging (metadata), collection development, systems, and some unique CDI functions as well. I also mentioned in my interview that the CDI in its current incarnation has some organizational issues. Because there isn’t a clear (in my mind) head of the CDI there are a lot of loose ends, and some unsupervised work flows.

In the report the consultants recommended the CDI be gradually moved under Collection Development. They went on to qualify that this would only be the collection part of the CDI, the rest could live… elsewhere.

“As the grant funding that enabled CDI development wanes, it will be important for UVM to decide how it wants to use these new capabilities. At bottom, decisions related to content and priorities are collection development decisions, and we believe the CDI program should be driven by Collection Development. (We’re referring specifically to content decisions; the actual operation and technical infrastructure of the CDI could reside elsewhere.)”

Huh? I don’t understand how this solves the organizational problems that I mentioned in my interview with the consultants, as a matter of fact I think it confuses rather than clarifies organization, essentially further diversifying CDI functions and farming them out all over the library. I suppose this is one way of running the center, but the I think a diversified model will only exasperate our organizational/management issues.

The more I think about the issue of where the CDI should live in the organizational workflow chart the more agnostic I become. I’m not sure it matters so much. We will still need to interface with systems, collection development, technical services and reference, what we actually need is internal clarity in our management structure. Someone who is in charge, and can oversee all the different aspects of the CDI operations (scanning, metadata, collection development, relationships with faculty, policy and procedure creation and management). This is kind of a touchy subject, who is managing the project, and I don’t really care who is doing it (well maybe I do, a little), I just think there needs to be someone who can devote the necessary time, and has the right skills to oversee the project. A lot of this I have been doing myself with the metadata portion handled part time by the Curator of Manuscripts, but if it is my job (unclear) then I think I need to have more of a mandate, and also more time to devote to project management.

One other issue that the report raises is the issue of an institutional repository. The report assumes a natural evolution of the CDI from special grant funded project to “something more like an institutional repository.” I’m not sure if the consultants understand the implications of an institutional repository, but I’ve been very careful about not throwing around the phrase “institutional repository” in relation to the CDI. The CDI was built as a digital library project, and while it is pretty flexible, I’m fairly confident it is not heavy hitting enough to function as an institutional repository. Nor do we have the mandate to insure we get participation in an institutional repository from the university administration. Not to mention all of the other issues associated with an IR. (And I follow Dorthea‘s blog, so I have at least a vague idea, of the craziness we could be getting into.) I have a feeling this is more a misunderstand of what an IR is than of what the CDI is. We have talked a lot about the CDI being a place for faculty research collections, and creating long-term classroom use collections, all of which I think the CDI is poised to accomplish, but I think that is a far cry from an IR.

I guess the recommendations are generally very positive for the CDI and I look forward to see where the discussion in the library goes from here.

Other discussions on the R2 recommendations can be found here, and here.