From my OAI Diaries

I am trying to learn OAI for the MIMO project.  I am currently looking at Jeff Young's oaicat. The problem with is that I only had a vague or no idea at all about the OAI metadata harvesting protocol, Java servlets and Tomcat, Ant etc. when I started to look at it.

Tomcat

Now I am a little more familiar with Tomcat, but I still have problems configuring it on some of my test servers. On my laptop, I use it with cygwin and it seems to hang after some time. It runs best on my webserver so far. I need to look at in detail to describe the problem better.

edit: my experiments on this front

Documentation

I find it a bit confusing that documentation of oaicat is on several different pages and not really good enough for a servlet newbe like me:

Code

Like the documentation is stored in different place, the code comes in several forms as well and at least the war file seems to be outdated:

Starting oaicat

This is a quote from the oaicat demo.  I have been starting tomcat from that dir and it worked correctly:

Caveat! This webapp assumes that Tomcat will be started from the CATALINA_HOME subdirectory. If these pages appear, but the servlets invoked from the forms below fail, chances are you started Tomcat from a different location. In this case, edit the webapps/oaicat/WEB-INF/web.xml file and change the following relative reference to an absolute file path.

Deploying

Today, I finally deployed the war file from http://code.google.com/p/oaicat successfully.  The whole thing runs fine and I can change oaicat.properties and restart tomcat.  I don't really like the html that comes with oaicat, but it's okay. I would rather be able to switch between the naked xml and the html view, and I think it would be helpful to give more info on the kind of data I have to fill in. Some of it should be trivial to adjust; I could restart the oaicat without the oaicat.xsl if nothing else is worth it.

Updating

To update one of the documentations (not the readme in src) says to copy new oaicat.jar into the deployed directory. I did this at my PC at work today and it worked like dream. I kind a miss that I don't see which version of the oaicat I am running.

Ant

The next step seems to be to understand the build process. I can checkout a more current version via svn. Procedure is described at google code.  As far as I am guessing, this build process is a pre-requisite for the procedures I planning in the next section. So the next task seems to be to set up a working oaicat from source via ant.

edit: I convinced ant to build me something, now I am not sure what to do with it. It gives me many warnings (mainly deprecation), but at the end it says built successfully. The result looks very much like the .tar.gz which I can download as well. I am not sure what file I have to copy to the webserver. Probably it would be best for me if I can use eclipse, build and run on the same machine. Currently, I can't. I have this weired tomcat problem on my laptop. So this is still todo.

Netbeans

Update: I have now been able to build oaicat from netbeans.

Setting up oaicat - The plan

My idea at the moment is to set up an oai server which basically understands the mpx which I get from the database and then to use various xslts to transform the data in museumdat and lido.

Customization notes in the readme (see above).

Looking at the demo config I: Classes

essential classes (from oaicat.properties):

AbstractCatalog.oaiCatalogClassName=ORG.oclc.oai.server.catalog.FileSystemOAICatalog
AbstractCatalog.recordFactoryClassName=ORG.oclc.oai.server.catalog.FileRecordFactory

Can I just use them or do I need to change them? I will need to adopt the RecordFactory to my needs, possibly also the FileSystem. Also, I probably  better use the NewFileSystem and NewFileRecordFactory, I suppose.

Data and Transformations

The oaicat demo comes with 5 records of metadata describing publications. The native format is oai_etdms. This is Electronic Theses and Dissertations Metadata Standard. Not a spectacular format, so it is a good demonstration. The native format is mapped to oai_dc by an xslt.

So how is that reflected in the oaicat.properties:

FileSystemOAICatalog.homeDir=webapps/oaicat/WEB-INF/META/

The above seems self-explaining, the below is worth it look at it a little further:

Crosswalks.oai_dc=ORG.oclc.oai.server.crosswalk.FileMap2oai_dc
Crosswalks.oai_etdms=ORG.oclc.oai.server.crosswalk.FileMap2oai_etdms
FileMap2oai_dc.xsltName=webapps/oaicat/WEB-INF/etdms2dc.xsl

The FileMap XSLT from the last line converts from native format (etdms) to dc as the filename suggests. I will shorten it a bit:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
                              xmlns:etdms="http://www.ndltd.org/standards/metadata/etdms/1.0/"
                              xmlns:dc="http://purl.org/dc/elements/1.1/"
                              exclude-result-prefixes="etdms">
  <xsl:output method="xml"
        omit-xml-declaration="yes"
        encoding="utf-8"/>

  <xsl:template match="/etdms:thesis">
    <oai_dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <xsl:apply-templates/>
    </oai_dc:dc>
  </xsl:template>

  <xsl:template match="etdms:title">
    <dc:title><xsl:apply-templates/></dc:title>
  </xsl:template>

...

  <xsl:template match="etdms:degree">
  </xsl:template>

</xsl:stylesheet>

What next?

So much for my analysis of the oaicat demo installation. I guess in the next post I need to repeat the customization notes, provided by OAICAT.

AttachmentSize
README.txt4.34 KB
No votes yet
Theme provided by Danetsoft under GPL license from Danang Probo Sayekti