Archive for the ‘code4lib2007’ Category

XForms for Code4lib update

April 20, 2007

Check out the XForms wiki, I posted an example of a DC XForm and Parmit has posted an example of the MODS editor that they are working on. I have the DC form in a few flavors, xhtml, xsl, and xquery but have only posted the xhtml version. The xsl, and xquery are pretty easy to derive from the xhtml version, but I could post those as well if there is an interest. The Princeton MODS editor is using Orbeon, and is still a work in progress, it looks great though, so check it out.

I’m hoping to have some time in the next few weeks to get back to working on XForms, I’ll try to post my work to the wiki, so check back every once in a while.

Solr, finally

April 4, 2007

It took me about 3 weeks from the Solr preconference event at code4lib, but I finally have Solr running semi smoothly with my web application using Cocoon. I didn’t expect it to take so long, but most of that time was spent learning how to use cocoon (and trying to learn Java) . Ideally I would like to have my xqueries send POST and GET requests to Solr, which can be done using Java. However, the Java solution has a much larger learning curve than the Cocoon solution that I currently have in place. Because the release is only two weeks away, I’m sticking with Cocoon for now, with an eventual move to a Java/XQuery solution.Here what my setup currently looks like:

1) A Solr instance on port 8983 , with my website running on port 80 on the same machine. Port 8983 is firewalled so no one can come along and wipe out my index with a delete request.

2) An xquery that pulls data from my METS records for indexing, either a single record or multiple records, depending on the parameters. Using an XSL stylesheet I generate an XForm (with the xquery results as the instance data section of the form). This form then uses POST to send the data to the Solr index. A second button on the form sends a commit command to Solr.

3) A cocoon pipeline that sends GET requests to Solr and transforms the response using xsl. This feature took me a depressingly long time to figure out, in spite of the fact that I found this thread pretty early on.

One of the problems that I was running into was that I had changed my XSLT transformer from Xalan to Saxon (so I could use XSL 2.0). Saxon does not allow daisy chaining (pulling results from one pipeline through another pipeline, or applying multiple transformations). I adjusted my coccon.xconf and sitemap.xmap to use Xalan as an additional transformer and only call it when using the pipeline below.

The pipline for handling search requests looks like this:

<map:match pattern="search">
   <map:generate type="request">
      <map:parameter name="generate-attributes" value="true"/>
   </map:generate>
   <map:transform type="xslt-xsltc" src="solr.xsl">
      <map:parameter name="use-request-parameters" value="true"/>
   </map:transform>
   <map:transform type="cinclude" />
   <map:transform type="xslt-xsltc" src="searchResults.xsl" />
   <map:serialize type="xml"/>
</map:match>

solr.xsl transforms the prameters sent from the search form into Solr style prameters. The cinclude is passed form solr.xsl to Solr as a GET request (you can also use cincludes to POST data but I found it more difficult than posting from the XForm). The final XSL stylesheet transforms the results something attractive for the user.

Here is what my solr.xsl looks like:

<xsl:stylesheet xmlns:h="http:cocoon.apache.org/h"
   xmlns:cinclude="http://cocoon.apach.org/"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:xs="http://www.w3.org/2001/XMLSchema"
   xmlns="http://www.w3.org/1999/xhtml" version="1.0">

<xsl:strip-space elements="*"/>
<xsl:output media-type="text/xml" method="xml"/>
<xsl:param name="term1"/>
<xsl:param name="field1"/>
<xsl:param name="term2"/>
<xsl:param name="field2"/>
<xsl:param name="term3"/>
<xsl:param name="field3"/>
<xsl:param name="bool1"/>
<xsl:param name="bool2"/>
<xsl:param name="start"/>
<xsl:param name="rows"/>
<xsl:param name="indent"/>
<xsl:template match="/">
   <xsl:variable name="param1">
      <xsl:choose>
         <xsl:when test="string-length(normalize-space($term1)) > 1">
            <xsl:choose>
               <xsl:when test="$field1 = 'kw'">
		 <xsl:value-of select="$term1"/></xsl:when>
  	       <xsl:when test="$field1 = 'ti'">
		 <xsl:value-of select="concat('title:','(',$term1,')')"/></xsl:when>
	       <xsl:when test="$field1 = 'au'">
		 <xsl:value-of select="concat('creator:','(',$term1,')')"/></xsl:when>
	       <xsl:when test="$field1 = 'su'">
		 <xsl:value-of select="concat('subject:','(',$term1,')')"/></xsl:when>
	       <xsl:when test="$field1 = 'ab'">
		 <xsl:value-of select="concat('text:','(',$term1,')')"/></xsl:when>
	       <xsl:otherwise><xsl:value-of select="$term1"/></xsl:otherwise>
	   </xsl:choose>
         </xsl:when>
      </xsl:choose>
   </xsl:variable>
   <xsl:variable name="param2">
	<!-- same as param 1 using field2 and term2 -->
   </xsl:variable>
   <xsl:variable name="param3">
 	<!-- same as param 1 using field2 and term2 -->
   </xsl:variable>
   <xsl:variable name="boolean1">
      <xsl:choose>
        <xsl:when test="string-length(normalize-space($term2)) > 1">
         <xsl:choose>
          <xsl:when test="$bool1 = 'and'"> AND </xsl:when>
          <xsl:when test="$bool1 = 'or'"> OR </xsl:when>
          <xsl:when test="$bool1 = 'not'"> NOT </xsl:when>
          <xsl:otherwise> AND </xsl:otherwise>
         </xsl:choose>
       </xsl:when>
       <xsl:otherwise> </xsl:otherwise>
     </xsl:choose>
   </xsl:variable>
<xsl:variable name="boolean2">
 <!-- same as boolean1 -->
</xsl:variable>
<!-- pulling all the params together-->
<xsl:variable name="params">
<xsl:value-of select="concat($param1,' ',$boolean1,' ',$param2,' ',$boolean2,' ',$param3)"/>
</xsl:variable>
   <ci:include
      xmlns:ci="http://apache.org/cocoon/include/1.0"
      src="http://localhost:8983/solr/select/?q=$params&version=2.2&start=$start&rows=$rows&indent=$indent"/>
</xsl:template>
</xsl:stylesheet>

For other approaches using cocoon check out SolrForrest, flowscripts, or try using the webdav module to talk to REST interfaces.

Resources:

Solr

Cocoon

Getting the word out

March 9, 2007

Karen Schneider gave the opening address at the Code4lib conference (you can get a copy of the presentation or read about it here, here or here). Part of her talk discussed restoring the balance of power between libraries and vendors, urging libraries take back control of their content and tools. This sort of set the tone for the conference which was an expose on of exciting new developments in library/information software. Libraries are getting creative, those who can’t replace their “geezy” old ILS systems are creatively remixing the data, creating mashups, and adding new features to old data.

Additionally Karen discussed the problem of developers marketing these new open source tools. She made some great points but I’d like to add to that list: getting the word out. In my library, most of this is falling below the radar (I’m guessing many libraries share this problem) in part because we don’t have a developer or web master on staff. I’m planning on doing a “report-back session” here at UVM but until then I’m posting a short list of things to check out here for those who might be interested, you may also find me proselytizing in the hallways…

The List:

  • Solr [http://lucene.apache.org/solr/] – A customizable, open source, full-text search server that is easy to implement, and enables: hit highlighting, faceted searching, caching of results, and much more. I’m in the process of implementing this to work with our eXist database, I’ll be using it for full text searching and probably for faceting (I’ve been doing some faceting with XQuery, but I have a feeling Solr will be much faster.)

Some Solr examples:

  • Nines/Collex [http://nines.org/collex] – Check out how you can add and remove constraints from your current results set, and the ability to facet results by genre, year, and site.
  • MyResearch Portal [http://research.library.villanova.edu/] – Do a search in the catalog and check out the left menu options for narrowing your search by facet. Another nice feature is the limit by options directly under the search box. This is built off of Voyager (the data is exported from Voyager into the Solr index).
  • Peel’s Prairie Provinces [http://peel.library.ualberta.ca/index.html] – Try a search for “prairie.” In addition to faceting, this site uses tag clouds, google maps, and a visual timeline feature.

Other cool (non-Solr) stuff:

A few projects in development

Not mentioned at code4lib but also interesting:

Also, checkout code4lib’s Open Source Software Directory for some additional projects. Feel free to add projects I’ve omitted/forgotten in the comments section.

XForms resources

March 3, 2007

Update: The XForms wiki at code4lib is back up. (Thanks Kevin!)

The code4lib server that is hosting the XForms wiki (mentioned in my last post) is down and may be down for a while. Here is the list of resources I had posted on the wiki, this is not meant to be an exhaustive list, more of a “getting started with XForms” list. I’ll post an announcement here when the wiki is back up. Hopefully we (specifically the people from the code4lib breakout session, but anyone is welcome) can at least keep up a conversation about XForms for metadata editors until then. Feel free to post additional resource in the comments.

About XForms

Tutorials

Blogs

Other Resources

Tools