Archive for the ‘XQuery’ Category

Fixing temprorary fragments

February 12, 2008

Surprise! Or at least to me… My XForms are not the source of the majority of my temporary fragments, at least as far as I can tell. This is good news because the XForms constitute the most complicated part of my web app, and would take the longest amount of time to troubleshoot and fix.

I’ve follwed the trail of temprorary fragments on my development machine by tracking the exist logs. After every query/page request I check the logs to see if a temprary fragment has been created. In addition  I’ve set up a little xquery that pulls the results out of /db/system/temp so I can see excatly what the temprary fragment is, allowing me to pin point the problems in my xqueries.

The results supprised me, although after a little more reading on eXist and temporary fragments, they make sense. In particular this thread on the eXist mailing list was enlightening. I use the doc() function to return the xml response from Solr and then transform it with XSL stored in eXist. For some reason I was calling the results like this:  doc(‘solrResults’)/child::*  As noted in the above thread, applying an xpath to the returned fragment causes it to be stored as a temporary fragment. The fix has been easy enough, I simply removed the child::* operator and adjusted my XSL.

I have a few other queries that were also creating temporary fragments, and for the most part the issues are very similar, the use of xpath on a returned fragment, rather than using xsl, or even creating variables in the xquery to get the values. For the most part it has been pretty simple to rewrite these queries, and has also given me a chance to clean up some of my code.  In addition I’ve allocated more memory to Tomcat. Hopefully these two adjustments will alleviate the issues we have been having the past few weeks.

XQuery and OAI

September 25, 2007

It has been a while since I wrote this post about creating an XQuery OAI-data provider. I haven’t done much work on the code since early spring, but Mike Giarlo who was also working on the code has taken it much further and has published a version of the code here. It validates, and I believe the only missing piece are resumption tokens, which may not be to hard to add in. Because Princeton uses X-Hive as their database, the code is X-Hive specific, and may not work in eXist without some modifications. I haven’t tried it with our collections yet, but hope to in the next week or two.

Update on CDI XForms

June 14, 2007

Well, the new version of the Firefox XForms extension came out last week, and all my forms are working again. I am working on a version for Orbeon as well and I have a lot of thoughts about Orbeon vrs. Firefox, but will save that for a later post.

I have posted a version of my metadata “processing pages” on xforms@code4lib. This version is in xquery and includes the new tabbed version of the DC forms and also a version of my Solr form. I have also included a sample record. If you want to take it for a test drive you can install eXist and Solr in tomcat and add the sample record to eXist under /db/mets. You may have to make some changes to the addIndex.xql to specify your Solr instance.

Solr revisited

April 23, 2007

Pretty much everything I wrote in my previous post about Solr is now obsolete. Up until last Sunday evening I had Solr running with Cocoon. However I had all sorts of problems with Cocoon, some stemming from my complete inability to go back to using XSLT 1.0 (which I needed to do in order to take advantage of daisy-chaining), and some stemming from bad (non HTML) characters in our metadata, most likely from pasting from Word documents.

At the same time I was struggling with Cocoon, this conversation was happening on the eXist listserv, which reminded me that I could use the eXist doc() function to send Solr requests, and transform the resulting response. I’m blaming being overworked as the reason I wasted so much time with Cocoon when I already use this function for retrieving XSL stylesheets for doing transformations in nearly every XQuery that I write.

So now my requests to Solr are sent via an xquery that looks like this:

xquery version "1.0";

declare namespace util="http://exist-db.org/xquery/util";

declare namespace request="http://exist-db.org/xquery/request";

declare namespace x="http://exist.sourceforge.net/dc-ext";

declare namespace xlink = "http://www.w3.org/1999/xlink";

declare namespace xslt="http://exist-db.org/xquery/transform";

declare namespace bh = "http://cdi.uvm.edu/cdi/ns";
(:Fields for limiting search : )

declare variable $field1 {request:request-parameter('field1', 'ft')};

declare variable $field2 {request:request-parameter('field2', 'ft')};

declare variable $field3 {request:request-parameter('field3', 'ft')};(:Search terms:)

declare variable $term1 {replace(request:request-parameter('term1', ''), "'", '"')};

declare variable $term2 {replace(request:request-parameter('term2', ''), "'", '"')};

declare variable $term3 {replace(request:request-parameter('term3', ''), "'", '"')};
(:Boolean operators: )

declare variable $bool1 {request:request-parameter('bool1', 'and')};

declare variable $bool2 {request:request-parameter('bool2', 'and')};
(:Variables for paging through results: )

declare variable $start {request:request-parameter('start', 0) cast as xs:integer};

declare variable $rows {request:request-parameter('rows', 25) cast as xs:integer};
(:Filters applied to search results: )

declare variable $filter {request:request-parameter('filter', '')};
(: Applies correct Solr field for fielded searching : )
declare function bh:field($field as xs:string) as xs:string {
   if ($field = "au") then
      "creator:"
   else if ($field = "ti") then
      "title:"
   else if ($field = "ab") then
      "abstract_text:"
   else if ($field = "su") then
      "topic_text:"
    else ''
};
(: Builds query parameters as a string : )
declare function bh:build-query()as xs:string{
let $queryString :=
   concat(
        if ($term1 != '') then
          concat(bh:field($field1), $term1)
        else '',
        if ($term2 != '') then
           concat(
            if ($bool1 = 'and' and $term1 != '') then ' AND '
            else if ($bool1 = 'or' and $term1 != '') then ' OR '
            else if($bool1 = 'not' and $term1 !='') then ' NOT '
            else ' ',bh:field($field2), $term2)
            else '',
         if ($term3 != '') then
           concat(
             if ($bool2 = 'and' and ($term1 != '' or $term2 != '')) then ' AND '
             else if ($bool2 = 'or' and $term1 != '' or $term2 != '') then ' OR '
             else if ($bool2 = 'not' and $term1 != '' or $term2 != '') then ' NOT '
             else ' ',bh:field($field3), $term3)
             else '',
         if ($term1 = '' and $term2 = '' and $term3 = '') then
            concat('/no-search-terms',' ')
         else '' )
  return encode-for-uri($queryString)
 };
declare function bh:filter(){
 if($filter != '') then
    encode-for-uri(concat(' ',translate($filter,';',' ')))
 else ''
};
declare function bh:fullQuery(){
let $searchPath :=
    concat('http://pathtoSolr/solr/select/?q=',bh:build-query(),bh:filter(),
    '&version=2.2&start=',$start,'&rows=',$rows,'&facet=true&facet.limit=-1
    &facet.sort=true&facet.zeros=false&facet.field=parent_facet&facet.mincount=1
  &facet.field=creator_facet&facet.mincount=1&facet.field=coverage_facet&facet.mincount=1
 &facet.field=genre_facet&facet.mincount=1&facet.field=topic_facet&facet.mincount=2')
return  $searchPath
};
(:Stylesheet used for dispay: )
let $xsl := doc('/path/search.xsl')
(:Format results : )
let $results :=
<query-results term1="{$term1}" field1="{$field1}" bool1="{$bool1}"
   term2="{$term2}"field2="{$field2}" bool2="{$bool2}"
   term3="{$term3}" field3="{$field3}" filter="{bh:filter()}">
      {
         if((exists($term1) and $term1 = '') and (exists(term2) and $term2 = '')
           and (exists(term3) and $term3 = '') ) then
             <response hits="0">Your search returned 0 results</response>
         else    doc(bh:fullQuery())/child::*
}
</query-results>
return xslt:stream-transform($results, $xsl, () )

The results are transformed using XSLT (2.0).

I find this works pretty well, but I’m also very interested in exploring this new HTTP extension model which is pretty much what I was hoping for back when I started exploring the Solr/eXist combination. (Which just demonstrates once again what a great community of developers eXist has.)

Documents are still added to the index using a combination of XQuery and XForms. Next week I’ll be refining our editor to make submitting completed records to the index a one (maybe two) button process. I’m pretty pleased with Solr and have gotten a very positive response to the browsing and limiting features. I still have some features to work on, for example, while my users can add filters to search results, they can not remove them. This seems like a pretty easy javascript fix, but I haven’t really had the time to implement it yet.

A Good Year… for XML

February 16, 2007

Elliotte Rusty Harold over at IBM developerWorks is predicting an exciting year for XML (Ten predictions for XML in 2007) , including a nod at XQuery, native XML databases, and XForms, all of which I’m rather heavily invested in. I’ve been pretty comfortable with the choices I made for the CDI, still, Harold’s predictions are reassuring, it is nice to hear someone else is predicting a bright future for the limb I’ve climbed out onto.

OAI data provider

February 7, 2007

I’ve pretty much finished writing my XQuery OAI-data provider. The process has taken longer than I expected (particularly since the original XQuery I was using was mostly complete). However, I ended up re-writing most of it, partially to insure that I had a thorough understanding of the code, and partially to add some additional features. For example I wanted to be able to provide unqualified Dublin Core records for all the metadata types we hold in the repository, with the flexibility to easily add additional types. Currently the query supports Qualified Dublin Core, MODS, and EAD records and adding additional types should be trivial.

Implementing the data provider is also bringing up some organizational questions. For example, how do I want to support deleted records? How about sets? For our collections I think defining sets as collection of records (rather than an item and its component parts, which is what the METS records do) makes the most sense. For deleted records I’m using the RECORDSTATUS attribute in the mets:header to “deleted” and deleting the actual content, metadata, and full text. I haven’t decided how to implement deleted records for the EAD’s yet, I think it is unlikely they will be deleted. I will probably use the revisiondesc tag, with the value of the item tag as “deleted.”

I’m also getting a little hung up on ResumptionTokens. I have simple paging in place, but have started to wonder if there might be a better method, the guidelines are little vauge on this.

Here are a few of the most helpful resource that I used while putting together the data provider.

  1. The Open Archives Initiative Protocol for Metadata Harvesting – I found this resource to be the most helpful in actual implementation, there are lots of examples.
  2. OAI Best Practices [NSDL]
  3. Open Archives Forum Online Tutorial
  4. Exposing and Harvesting Metadata Using the OAI Metadata Harvesting Protocol: A Tutorial
  5. Proai 1.0 – “Proai is a repository-neutral, Java web application supporting the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) version 2.0.” This may be a reasonable alternative to the XQuery that I’m working on, though I’d like to finish it anyway.

There have also been some recent discussions in my department about being an aggrigator for Vermont based digital collections, as there a several institutions who have expressed interest in collaborative projects, or who have content that would mesh really well with the Vermont centric nature of our current content. So there may be more fun with OAI in my future.

Getting xforms and eXist talking

October 23, 2006

I used three different methods for implementing XForms with eXist (I am currently only using #3).

  1. Orbeon Presentation Server
  2. An XQuery passed through a cocoon pipeline which transformed the XML results using an XSL stylesheet. The edited documents were saved to eXist using REST put.
  3. An XQuery that first authenticates the user and then outputs an xform using the retrieved document as the XForms instance and using REST put for submission.

I looked at Orbeon about 6 months ago, it seems like a great product, especially in light of the poor browser support for XForms. If I needed to make a commercially viable XForms application I would probably spend some more time playing with Orbeon. However, I felt that Orbeon added yet another layer of complication to my application and it was not necessary in our situation because our XForms are being used in a controlled environment (and I don’t have more time). I can be assured that everyone using the editing interface will be using Firefox, not a great solution, but so far so good.

I generally prefer XSL to XQuery, so my XQueries have, in the past been pretty simple, returning very raw results to an XSL stylesheet. My first attempt at XForms used XQuery to call the XML document to be edited (or to create a new document). Then I had a cocoon pipeline  to specify the XSL stylesheet for transformation and the serialization option.

<map:match pattern="getDC.xq">
  <map:generate src="getDC.xq" type="xquery"/>
  <map:transform src="stylesheets/dcForms.xsl"/>
  <map:serialize type="xhtml11"/>
</map:match>

In the main sitemap.xmap file (included with exist), xhtml11 is the serializer that will output mime-type=”application/xhtml+xml”.

The XSL stylesheet then selects the node I’m interested in and uses xsl:copy to populate the XForms instance. The submission element uses put to replace the existing document with the edited document.

<xf:model>
  <xf:instance id="metadata">
    <xsl:copy-of select="/child::*"/>
  </xf:instance>

  <xf:submission id="submit" method="put" replace="all">
    <xsl:attribute name="action">
      <xsl:value-of   select="concat
           ('/exist/servlet/db/mets/collections/',
           $filename,'.mets.xml')"/>
    </xsl:attribute>
  </xf:submission>

</xf:model>

One of the features that I found the most useful in XForms is the ability to repeat elements as needed. Here is an example from a very simple form that we used to edit Dublin Core records that uses XForms repeat. This field populates the dc:subject element and adds a type attribute selected from a drop down menu. New subject elements are added or deleted when the users click the add or delete buttons created by xf:trigger. I use xf:setvalue to create each new element as a blank element, otherwise the element will simply copy the data from the first instance of the element.

<xf:repeat id="repeat.subject" nodeset="//dc:subject">
  <xf:input ref=".">
   <xf:label>Subject Headings:</xf:label>
  </xf:input>

  <xf:select1 ref="@type">
   <xf:label>Type: </xf:label>
   <xf:item>
    <xf:label>topic</xf:label>
    <xf:value>topic</xf:value>
   </xf:item>
   <xf:item>
    <xf:label>name</xf:label>
    <xf:value>name</xf:value>
   </xf:item>
  </xf:select1>
 <xf:trigger class="delete" appearance="minimal">
    <xf:label>Remove</xf:label>
    <xf:action ev:event="DOMActivate">
     <xf:delete nodeset="instance('metadata')//dc:subject"
         at="index('repeat.subject')"/>
    </xf:action>
  </xf:trigger>
 <xf:trigger class="add" appearance="minimal">
    <xf:label>Add a subject field</xf:label>
    <xf:action ev:event="DOMActivate">
      <xf:insert nodeset="instance('metadata')//dc:subject"
           at="index('repeat.subject')" position="after"/>
      <xf:setvalue ref="instance('metadata')//dc:subject[last()]" value=""/>
    </xf:action>
  </xf:trigger>
</xf:repeat>

This method works fine, but I as stated in an earlier post I wanted to route all my metadata processing XQueries through a password authenticating XQuery. This query calls a series of metadata administrative tasks, including the XForms.

I added the following namespaces for XForms to my xquery:

declare namespace xf="http://www.w3.org/2002/xforms";
declare namespace ev="http://www.w3.org/2001/xml-events";

And the exist:serialize option:

declare option exist:serialize "method=xhtml media-type=application/xhtml+xml";

The XForm is contained in a function called by an authenticating function. I made very few changes to the XForm code I had been using in my XSL stylesheets to get it working when called by the XQuery.

Here is a round-up of some of the resources I have found to be most useful in building these forms:

  1. eXist mailing list – search the archives for XForms
  2. eXist – XQuery examples
  3. Cocoon website (pipelines)
  4. XForms tutorial – Adrian de Jonge’s blog
  5. XForms – Tutorials and Cookbook – Wikibooks
  6. XForms for HTML Authors – W3C
  7. O’Reilly XForms Essentials by Micah Dubinko

Function(ing)

October 17, 2006

Well, it turns out that I can no longer get by without understanding how to write my own xquery functions. I finished my simple search xquery which searches items across collections within the database. I ended up following these tips pretty closely, except I do not store the results in an HTTP session. I plan on updating the search so that it does do so, but I had a little trouble writing this part of the query. I also added a function so that I could page though the results.

For the curious, my version of the simple search looks something like this:

(:caculates the end value for each page of results:)
declare function bh:getEnd($max as xs:integer, $start as xs:integer) as xs:integer{
  let $newEnd := $start + $max
  return $newEnd
};
let $max := 50
(:external parameters:)
let $query := request:request-parameter("query", "")
let $start := request:request-parameter("start", "")
(:the search:)
let $results :=
for $hits in collection('/collection')/mets:mets/mets:dmdSec[@ID='dmdDC']
      //descendant::dc:dc[. &= $query]
    let $title := $hits/dc:title[1]
    let $id := $hits/dc:identifier
    let $author := $hits/dc:creator[1]
    let $description := $hits/dc:description
    let $type := $hits/ancestor::mets:mets/@TYPE
    let $result :=
         <dc:dc type="{string($type)}">
          {$id, $title, $author, $description}
         </dc:dc>
    return $result,
     $totalResults := count($results),
     $end := if($totalResults >= $max) then bh:getEnd($max, $start)
             else $totalResults + 1
(:variables used for paging through the results:)
let $prevPg :=
  if ((($start cast as xs:integer) - $max) lt 1) then ''
  else ($start cast as xs:integer) - $max
let $nextPg :=
  if ($end gt $totalResults) then ''
  else $end
(:putting it all together:)
let $searchResults:=
  <results query="{if (empty($query)) then '' else $query}"
  prevPg="{ if (empty($prevPg)) then '' else $prevPg}"
  nextPg="{ if (empty($nextPg)) then '' else $nextPg}"
  total="{$totalResults}" count="{$max}">
    {
     for $i in $start to $end
     let $current := $results[$i]
     return
      <result number="{$i}">{$current}</result>
    }
  </results>

return  $searchResults

I’m moving on to the advanced search tomorrow. I also have several interface design issues outstanding that need to be addressed, some content to create and some sort of news feed to implement, and November is only two weeks away.

One step forward, one step back

October 13, 2006

eXist crashed on Wednesday. Actually crash is probably the wrong word, it seemed to be running fine but then failed to restart when I restarted Tomcat. We have backups, run early every morning, but that doesn’t help for the data that was entered during the day on Wednesday. More worrisome is that it is still unclear to me what caused the corruption in the database. I found a few discussions on the exist mailing list that seemed to be about similar problems, but without any satisfactory answers as to why the corruption occurred.

http://thread.gmane.org/gmane.text.xml.exist/5254/focus=5254

http://thread.gmane.org/gmane.text.xml.exist/7161/focus=7248

After a day and a half of trying to figure out what went wrong I caved and wrote to the list. I try to put that off for as long as possible, because while the list is very active, and generally helpful, I hate asking a question and then figuring out the answer myself later (or, I’ll be honest, getting an answer back that makes me feel stupid). I ‘ve been unable to reproduce the error after replacing the corrupt instance with the one from the backup. I have a feeling it was something I was working on during the morning on Wednesday, which means either my search xquery (which was outputting some java exceptions), or perhaps some of the xupdates I was using to add new elements to a few hundred documents at once.

Now that we are back up and running I’m returning to the question of my search xqueries, which I think need to be a little more sophisticated, the heart of which looks like this:

let $results :=
for $hits in collection('/db/mets')/mets:mets/mets:dmdSec[@ID='dmdDC']
    //descendant::dc:dc/child::* [self::* |= $_query]let $type := 
$hits/ancestor::mets:mets/@TYPElet $title := $hits/parent::*/dc:title[1]
let $id := $hits/parent::*/dc:identifier
return
<item id="{string($id)}" type="{string($type)}">
  <title>{string($title)}</title>
  {$hits}
</item>

It is problematic because it returns multiple hits for a single document. This is a pretty easy fix to make, but I also ran into a problem with this query when I had over 1000 hits, I encountered a java error (as noted above), so I will need rework this. I can limit the number of results returned, or I could use this search to only search collection level records, not item level records, most likely the first option. I have also had a request to include the author/creator field in the results which is a minor fix.

Update: My answer from the eXist list about the database corruption:

I fixed your issue. It wasn’t a “real” corruption, just removing the .lck files would have helped. As the exception shows, the lock files were damaged.

> org.exist.storage.lock.FileLock.read(FileLock.java:208)
> at
> org.exist.storage.lock.FileLock.tryLock(FileLock.java:108)
> at
> org.exist.storage.BrokerPool.canReadDataDir(BrokerPool.java:596)

Anyway, the startup process should handle this. After a database crash, the file locks might be incomplete. eXist will now check this.

So, that is good to know. Also eXist 1.0 and 1.1 final have just been released, I may take some time this week to upgrade to 1.1 final.

Recovered and moving on

October 5, 2006

Well, the weekend was enough time for me to recover from the extreme frustration I experienced on Friday, and I’m happy to say that my xforms are now being called by a password protected xquery.

As mentioned in my last post the big hang up was the mime type (declared in the HTTP header) for the xforms, it has to be application/xhtml+xml (which by the way IE 6 does not support). I had already written a very simple xquery and then transformed the data with an XSL stylesheet. However when I tried the transform:stream-transform using this xquery and stylesheet I had no luck forcing the HTTP header to output the correct mime type. So, much to my disappointment, I ended up having to write the entire xform in xquery. There really isn’t any drawback to this other than that I’m much more comfortable coding with XSL than xquery. The forms are working and are now password protected, so I’m feeling pretty good about that. I can continue research into the problematic mime types at a later date if I really want to go back to using XSL.

So what I have ended up with is one long xquery that authenticates the user and then initiates the various pieces of the metadata processing interface. All portions of the interface are called by different functions in this xquery, which has allowed me to learn user defined xquery functions without to much pain, as I had originaly written all of these as separate xqueries. For this xquery I defined each action (for example: view metadata queue, create new collection and edit records) as a separate function and then call them from a main function which first tests for session authentication and if it doesn’t find it presents the user with a log-on screen.

I’m feeling pretty good about my new understanding of user created functions, and may be ready to try something a little more complex, like recursive functions.

Here is the short list of resources I found most useful for learning functions, and also for writing and troubleshooting my xforms:

Xquery Functions (http://www.stylusstudio.com/xquery/xquery_functions.html)

A “how to” for xforms (http://adriaandej.blogspot.com/)

WC3 intro to xforms (http://www.w3.org/MarkUp/Forms/2003/xforms-for-html-authors)

And the eXist listserv archives (actually there seems to be a lot of talk on the listserv just in the past few weeks about eXist and xforms).

I also have a rudimentary page turning application up and running, more about that later.

Update:  I just wanted to clarify where the problem with the mime-type was cropping up. As one commenter noted it is possible to change the mime type in eXist with declare option exist:serialize “method=xhtml media-type=application/xhtml+xml”; which is what I ended up using for my xquery based xform. I had originally tried to transform the output from this xquery this with an xsl stylesheet, and this is where I “lost” the application/xhtml+xml mime type. I also had an xsl:output statement in my xsl stylesheet that looked like this: <xsl:output omit-xml-declaration=”yes” encoding=”UTF-8″ method=”xhtml” indent=”yes” media-type=”application/xhtml+xml”>. I retrieved two different results. When I used transform:transform which goes through eXist I returned the correct mime type, but could not get my namspace declarations into the html tag and when I used transform:stream-transform, which bypasses eXist, I had all the correct namespaces in my html tags but the mime type was text/html. I’m guessing the problem lies in a default output for eXist and perhaps Tomcat.


Follow

Get every new post delivered to your Inbox.