Archive for the ‘Solr’ Category

Update on CDI XForms

June 14, 2007

Well, the new version of the Firefox XForms extension came out last week, and all my forms are working again. I am working on a version for Orbeon as well and I have a lot of thoughts about Orbeon vrs. Firefox, but will save that for a later post.

I have posted a version of my metadata “processing pages” on xforms@code4lib. This version is in xquery and includes the new tabbed version of the DC forms and also a version of my Solr form. I have also included a sample record. If you want to take it for a test drive you can install eXist and Solr in tomcat and add the sample record to eXist under /db/mets. You may have to make some changes to the addIndex.xql to specify your Solr instance.

Solr/XForms Continued

May 31, 2007

Once I got down to implementing my Solr/Xforms integration I realized how many options I really have. In addition to the two mentioned in my last post I could also populate my Solr instance by sending a get request that would replace a dummy instance with a Solr record (generated by an XQuery). This request could be triggered when the user marks the form as complete. Then the instance could be sent to Solr when the user clicks on the publish button. Unfortunately I can’t seem to get this option working. I’m not sure if it is because of the way I have my instances are set up, or if I need an additional model for the Solr data. I’m using this tutorial as an example, and will keep plugging away at it. But for now I’m using the pop-up window method.

Here is what my “publish” button looks like:

<xforms:trigger ref="//dc:status[. = 'complete']">

  <xforms:label>Publish</xforms:label>

  <xforms:action ev:event="DOMActivate">

    <xforms:send submission="submitMetadata" />

    <xforms:load resource="javascript:openWin('addRecs.xql?pid={$pid}')" />

  </xforms:action>

</xforms:trigger>

This creates a pop-up XForm window generated by the addRecs.xql. This XQuery grabs all the descriptive metadata and also any transcripts if they are available and transforms it into a Solr style xml. It can be used for a single record or for all the records in the database. The pop-up XForm is very simple it looks like this:

<html xmlns:xf="http://www.w3.org/2002/xforms"

xmlns:ev="http://www.w3.org/2001/xml-events"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml"

xmlns:dc="http://purl.org/dc/elements/1.1/" xsl:version="2.0">

  <head>

    <title>Send Data to Indexer</title>

    <meta http-equiv="Pragma" content="no-cache"/>

      <xf:model>

        <xf:instance>

   	  <add><!--Your Solr Record goes here --></add>

	</xf:instance>

	<xf:instance id="solr-resp">

	  <results xmlns=""/>

	</xf:instance>

	<xf:instance id="sendCommit">

	  <commit/>

	</xf:instance>

	<xf:submission id="submit" method="post" replace="instance"
	  instance="solr-resp" action="http://localhost/solr/update"/>

	*<xf:submission id="submitCommit" method="post" replace="instance"
	  instance="commit" ref="instance('sendCommit')" action="http://localhost/solr/update"/>

      </xf:model>

    </head>

    <body>

      <div id="main">

   	<div id="formInfo">

	  <xf:switch>

	   <xf:case id="case1" selected="true()">

	    <h2>Title of Record</h2>

		<div>

		<xf:trigger>

		  <xf:label>Add</xf:label>

		  <xf:action ev:event="DOMActivate">

		  <xf:send submission="submit" />

		  <xf:toggle case="case2" ev:event="DOMActivate"/>

		  </xf:action>

	   	</xf:trigger>

		</div>

	  </xf:case>

	  <xf:case id="case2">

	    <h2>Item submitted!</h2>

	    <div>

	     <xf:trigger>

	      <xf:label>Okay</xf:label>

		<xf:action ev:event="DOMActivate">

		<xf:send submission="submitCommit" />

		<xf:load resource="javascript:window.close()" />

	        </xf:action>

	    </xf:trigger>

	   </div>

	 </xf:case>

	</xf:switch>

	</div>

     </div>

    </body>

</html>

When the user presses the “add” button the main instance is posted to Solr and the form toggles to the commit view. This view says: “Item Submitted!” The user presses “okay” to send the commit command and close the pop-up window. It may not be the most elegant solution but it works for now.

I also took this opportunity to revamp the user interface. Previously it was one long form with multiple save buttons scattered throughout the form. This was unwieldy and really didn’t make much sense, so I’ve be switched to a tabbed UI (like this) with 4 tabs: “Citation Information,” “Resource Description,” “Subject Analysis,” “Related Items” and “View”. The tabs will eliminate the need for multiple save buttons throughout the form, and make the data more compact and, I hope, easier to navigate. Here is a snapshot of what the new form looks like:

XForm metadata editor

As you can see there are 4 tabs and two buttons. The “View” tab first saves the record and then toggles to an xforms:case with xforms:output for each element. In addition this gives a link to the “public view” of the record so the metadata librarian can see the page images and the data in context. It is a little garish but the multi-colored tabs saves me from writing any JavaScript to highlight the active tab.

*Opps, the code for the pop-up window was missing a crucial part, the ref attribute in the second submit. I added it to the code above, so everything should work now. The ref attribute isn’t necessary (although it may be good practice) in the first submission because if there is no instance specified the control will assume the first instance. (So yesterday’s code sent the record to Solr twice, but never sent a commit.)

XForms Sending Multiple Instances/Actions

May 29, 2007

Well, apparently I didn’t look very hard for this solution, because this morning I found a whole list of examples. These are the most useful two that I found:

Turns out it is really simple, and I probably should have figured it out without any examples. Essentially you can use the xforms send action as many times as you like in a single trigger (I should have known this because for I have used multiple inserts in a single trigger before, which is a similar idea).

First in the XForms model:

(:Submit metadata to xquery for processing and updating current record in eXist:)
<xforms:submission id="submitMetadata" method="post" replace="all"
  action="{concat('submitXForm.xql?pid=', $id)}"/>

(:Send solr instance directly to solr:)

<xforms:submission id="submitSolr" method="post" replace="all"
  action="http://localhost/solr/update" ref="instance(solr)"/>

Then in the user interface:

<xforms:trigger>
  <xforms:label>Save Both</xforms:label>
  <xforms:action ev:event="DOMActivate">

<xforms:send submission="submitMetadata" />

<xforms:send submission="submitSolr"/>

</xforms:action>

</xforms:trigger>

This posts the metadata instance to an xquery that then updates the record in eXist and simultaneously posts the Solr instance directly to Solr.

I have a few options on how to do this. I could have two instances, one for metadata editing, and one for the Solr indexing. The solr instance would be tied to the data in the first by a series of bind statements and would get updated as the user edits the metadata record.

Example:

 <xforms:bind  id="title"
  nodeset="instance('solr')/descendant-or-self::*[@name='title']"

  calculate="instance('metadata')//dc:title"

  relevant="string-length(instance('metadata')//dc:title) != 0"/>

I would then need make this submit button conditional, only available when the record is marked as complete. This can be accomplished with a ref=”//dc:status[. = ‘complete’]” attribute on the trigger, or using bind/relevant in the xforms model. This submission would send both instances, one to eXist and one to Solr, as shown above.

The other option would be to have the second submit call an XQuery that creates the Solr index record (in a new XForm). This would require an additional click, for example, when the user saves a completed record there could be a pop-up that asks if the user really wants to “publish” this data, when they click yes this XForm would POST to Solr and close the pop-up window (except I also still need a way to submit the <commit> to Solr). This is the method that I will most likely use. There are a few reasons, one, I already have an XQuery written to create the index records, so it would be faster to put into place. Secondly, when an item has OCR/Transcript, or other full text version, the XQuery will grab that text and submit it to Solr as well. I could probably finagle this into my main XForm but I’m not sure it is worth the effort as I don’t need to pull in the full text for metadata editing. I also like the idea of keeping my two forms separate so that as/if I add Solr fields I will not need to tinker with the main metadata form.

Solr revisited

April 23, 2007

Pretty much everything I wrote in my previous post about Solr is now obsolete. Up until last Sunday evening I had Solr running with Cocoon. However I had all sorts of problems with Cocoon, some stemming from my complete inability to go back to using XSLT 1.0 (which I needed to do in order to take advantage of daisy-chaining), and some stemming from bad (non HTML) characters in our metadata, most likely from pasting from Word documents.

At the same time I was struggling with Cocoon, this conversation was happening on the eXist listserv, which reminded me that I could use the eXist doc() function to send Solr requests, and transform the resulting response. I’m blaming being overworked as the reason I wasted so much time with Cocoon when I already use this function for retrieving XSL stylesheets for doing transformations in nearly every XQuery that I write.

So now my requests to Solr are sent via an xquery that looks like this:

xquery version "1.0";

declare namespace util="http://exist-db.org/xquery/util";

declare namespace request="http://exist-db.org/xquery/request";

declare namespace x="http://exist.sourceforge.net/dc-ext";

declare namespace xlink = "http://www.w3.org/1999/xlink";

declare namespace xslt="http://exist-db.org/xquery/transform";

declare namespace bh = "http://cdi.uvm.edu/cdi/ns";
(:Fields for limiting search : )

declare variable $field1 {request:request-parameter('field1', 'ft')};

declare variable $field2 {request:request-parameter('field2', 'ft')};

declare variable $field3 {request:request-parameter('field3', 'ft')};(:Search terms:)

declare variable $term1 {replace(request:request-parameter('term1', ''), "'", '"')};

declare variable $term2 {replace(request:request-parameter('term2', ''), "'", '"')};

declare variable $term3 {replace(request:request-parameter('term3', ''), "'", '"')};
(:Boolean operators: )

declare variable $bool1 {request:request-parameter('bool1', 'and')};

declare variable $bool2 {request:request-parameter('bool2', 'and')};
(:Variables for paging through results: )

declare variable $start {request:request-parameter('start', 0) cast as xs:integer};

declare variable $rows {request:request-parameter('rows', 25) cast as xs:integer};
(:Filters applied to search results: )

declare variable $filter {request:request-parameter('filter', '')};
(: Applies correct Solr field for fielded searching : )
declare function bh:field($field as xs:string) as xs:string {
   if ($field = "au") then
      "creator:"
   else if ($field = "ti") then
      "title:"
   else if ($field = "ab") then
      "abstract_text:"
   else if ($field = "su") then
      "topic_text:"
    else ''
};
(: Builds query parameters as a string : )
declare function bh:build-query()as xs:string{
let $queryString :=
   concat(
        if ($term1 != '') then
          concat(bh:field($field1), $term1)
        else '',
        if ($term2 != '') then
           concat(
            if ($bool1 = 'and' and $term1 != '') then ' AND '
            else if ($bool1 = 'or' and $term1 != '') then ' OR '
            else if($bool1 = 'not' and $term1 !='') then ' NOT '
            else ' ',bh:field($field2), $term2)
            else '',
         if ($term3 != '') then
           concat(
             if ($bool2 = 'and' and ($term1 != '' or $term2 != '')) then ' AND '
             else if ($bool2 = 'or' and $term1 != '' or $term2 != '') then ' OR '
             else if ($bool2 = 'not' and $term1 != '' or $term2 != '') then ' NOT '
             else ' ',bh:field($field3), $term3)
             else '',
         if ($term1 = '' and $term2 = '' and $term3 = '') then
            concat('/no-search-terms',' ')
         else '' )
  return encode-for-uri($queryString)
 };
declare function bh:filter(){
 if($filter != '') then
    encode-for-uri(concat(' ',translate($filter,';',' ')))
 else ''
};
declare function bh:fullQuery(){
let $searchPath :=
    concat('http://pathtoSolr/solr/select/?q=',bh:build-query(),bh:filter(),
    '&version=2.2&start=',$start,'&rows=',$rows,'&facet=true&facet.limit=-1
    &facet.sort=true&facet.zeros=false&facet.field=parent_facet&facet.mincount=1
  &facet.field=creator_facet&facet.mincount=1&facet.field=coverage_facet&facet.mincount=1
 &facet.field=genre_facet&facet.mincount=1&facet.field=topic_facet&facet.mincount=2')
return  $searchPath
};
(:Stylesheet used for dispay: )
let $xsl := doc('/path/search.xsl')
(:Format results : )
let $results :=
<query-results term1="{$term1}" field1="{$field1}" bool1="{$bool1}"
   term2="{$term2}"field2="{$field2}" bool2="{$bool2}"
   term3="{$term3}" field3="{$field3}" filter="{bh:filter()}">
      {
         if((exists($term1) and $term1 = '') and (exists(term2) and $term2 = '')
           and (exists(term3) and $term3 = '') ) then
             <response hits="0">Your search returned 0 results</response>
         else    doc(bh:fullQuery())/child::*
}
</query-results>
return xslt:stream-transform($results, $xsl, () )

The results are transformed using XSLT (2.0).

I find this works pretty well, but I’m also very interested in exploring this new HTTP extension model which is pretty much what I was hoping for back when I started exploring the Solr/eXist combination. (Which just demonstrates once again what a great community of developers eXist has.)

Documents are still added to the index using a combination of XQuery and XForms. Next week I’ll be refining our editor to make submitting completed records to the index a one (maybe two) button process. I’m pretty pleased with Solr and have gotten a very positive response to the browsing and limiting features. I still have some features to work on, for example, while my users can add filters to search results, they can not remove them. This seems like a pretty easy javascript fix, but I haven’t really had the time to implement it yet.

Solr, finally

April 4, 2007

It took me about 3 weeks from the Solr preconference event at code4lib, but I finally have Solr running semi smoothly with my web application using Cocoon. I didn’t expect it to take so long, but most of that time was spent learning how to use cocoon (and trying to learn Java) . Ideally I would like to have my xqueries send POST and GET requests to Solr, which can be done using Java. However, the Java solution has a much larger learning curve than the Cocoon solution that I currently have in place. Because the release is only two weeks away, I’m sticking with Cocoon for now, with an eventual move to a Java/XQuery solution.Here what my setup currently looks like:

1) A Solr instance on port 8983 , with my website running on port 80 on the same machine. Port 8983 is firewalled so no one can come along and wipe out my index with a delete request.

2) An xquery that pulls data from my METS records for indexing, either a single record or multiple records, depending on the parameters. Using an XSL stylesheet I generate an XForm (with the xquery results as the instance data section of the form). This form then uses POST to send the data to the Solr index. A second button on the form sends a commit command to Solr.

3) A cocoon pipeline that sends GET requests to Solr and transforms the response using xsl. This feature took me a depressingly long time to figure out, in spite of the fact that I found this thread pretty early on.

One of the problems that I was running into was that I had changed my XSLT transformer from Xalan to Saxon (so I could use XSL 2.0). Saxon does not allow daisy chaining (pulling results from one pipeline through another pipeline, or applying multiple transformations). I adjusted my coccon.xconf and sitemap.xmap to use Xalan as an additional transformer and only call it when using the pipeline below.

The pipline for handling search requests looks like this:

<map:match pattern="search">
   <map:generate type="request">
      <map:parameter name="generate-attributes" value="true"/>
   </map:generate>
   <map:transform type="xslt-xsltc" src="solr.xsl">
      <map:parameter name="use-request-parameters" value="true"/>
   </map:transform>
   <map:transform type="cinclude" />
   <map:transform type="xslt-xsltc" src="searchResults.xsl" />
   <map:serialize type="xml"/>
</map:match>

solr.xsl transforms the prameters sent from the search form into Solr style prameters. The cinclude is passed form solr.xsl to Solr as a GET request (you can also use cincludes to POST data but I found it more difficult than posting from the XForm). The final XSL stylesheet transforms the results something attractive for the user.

Here is what my solr.xsl looks like:

<xsl:stylesheet xmlns:h="http:cocoon.apache.org/h"
   xmlns:cinclude="http://cocoon.apach.org/"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:xs="http://www.w3.org/2001/XMLSchema"
   xmlns="http://www.w3.org/1999/xhtml" version="1.0">

<xsl:strip-space elements="*"/>
<xsl:output media-type="text/xml" method="xml"/>
<xsl:param name="term1"/>
<xsl:param name="field1"/>
<xsl:param name="term2"/>
<xsl:param name="field2"/>
<xsl:param name="term3"/>
<xsl:param name="field3"/>
<xsl:param name="bool1"/>
<xsl:param name="bool2"/>
<xsl:param name="start"/>
<xsl:param name="rows"/>
<xsl:param name="indent"/>
<xsl:template match="/">
   <xsl:variable name="param1">
      <xsl:choose>
         <xsl:when test="string-length(normalize-space($term1)) > 1">
            <xsl:choose>
               <xsl:when test="$field1 = 'kw'">
		 <xsl:value-of select="$term1"/></xsl:when>
  	       <xsl:when test="$field1 = 'ti'">
		 <xsl:value-of select="concat('title:','(',$term1,')')"/></xsl:when>
	       <xsl:when test="$field1 = 'au'">
		 <xsl:value-of select="concat('creator:','(',$term1,')')"/></xsl:when>
	       <xsl:when test="$field1 = 'su'">
		 <xsl:value-of select="concat('subject:','(',$term1,')')"/></xsl:when>
	       <xsl:when test="$field1 = 'ab'">
		 <xsl:value-of select="concat('text:','(',$term1,')')"/></xsl:when>
	       <xsl:otherwise><xsl:value-of select="$term1"/></xsl:otherwise>
	   </xsl:choose>
         </xsl:when>
      </xsl:choose>
   </xsl:variable>
   <xsl:variable name="param2">
	<!-- same as param 1 using field2 and term2 -->
   </xsl:variable>
   <xsl:variable name="param3">
 	<!-- same as param 1 using field2 and term2 -->
   </xsl:variable>
   <xsl:variable name="boolean1">
      <xsl:choose>
        <xsl:when test="string-length(normalize-space($term2)) > 1">
         <xsl:choose>
          <xsl:when test="$bool1 = 'and'"> AND </xsl:when>
          <xsl:when test="$bool1 = 'or'"> OR </xsl:when>
          <xsl:when test="$bool1 = 'not'"> NOT </xsl:when>
          <xsl:otherwise> AND </xsl:otherwise>
         </xsl:choose>
       </xsl:when>
       <xsl:otherwise> </xsl:otherwise>
     </xsl:choose>
   </xsl:variable>
<xsl:variable name="boolean2">
 <!-- same as boolean1 -->
</xsl:variable>
<!-- pulling all the params together-->
<xsl:variable name="params">
<xsl:value-of select="concat($param1,' ',$boolean1,' ',$param2,' ',$boolean2,' ',$param3)"/>
</xsl:variable>
   <ci:include
      xmlns:ci="http://apache.org/cocoon/include/1.0"
      src="http://localhost:8983/solr/select/?q=$params&version=2.2&start=$start&rows=$rows&indent=$indent"/>
</xsl:template>
</xsl:stylesheet>

For other approaches using cocoon check out SolrForrest, flowscripts, or try using the webdav module to talk to REST interfaces.

Resources:

Solr

Cocoon