Exporting MPX, Part II

This is the second part of my blog on exporting MuseumPlus (M+) to xml. I write this part for Frank. In this part, I describe how you get simple XML out of the RTF which produced in the first part. I had written perl, batch and xslt years ago and I was really curious if it still works, -- and yes it does! The outcome of this step is nice xml in a structure which obviously mimicks the "internal" structure of the database. I put internal in quotes since my xml format uses the screennames of the fields.  It does not really look into the representation inside the database. The next transformation which I usually execute is a sanity check on the data; I also correct know bugs, for example, if one collection always confuses "Hersteller" and "Produzent" then I replace it so that in the end it looks better.  I call this third step a fix, because the input format and the output format remain the same. As the fourth step, I used to transform this data to museumdat.

How it works

So, I unzip the contents of the archive, go in that directory with dos and execute the stupid batch (levelup.bat) which runs my perl program and some xslt transformations. If you want to do it too you need to adopt some paths in the batch. It should be self-explaining. I attach the relevant xslt. You need saxon as a xslt processor.

I attach the test data, the stupid batch which you might need to adapt to your system and the simple perl script.

C:
cd temp
levelup

Main Results

The levelup command should produce nice xml output in the format which I call MPX-lvl2 (level2). It does so in a number of steps in the middle which sometimes go wrong and which I should explain some time, but not now.

The main result is a file called "join.lvl2.mpx" which contains the combined data in mpx-lvl2. It's so short that I can quote it here (I leave out some tags, indicated by [...], full version as attachment). Please note that in the file the unicode works, just not here after I post it directly from notepad:

<?xml version="1.0" encoding="UTF-8"?>
<museumPlusExport xmlns="http://www.mpx.org/mpx"
                  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                  xsi:schemaLocation="http://www.mpx.org/mpx File://C:/Perl/site/lib/XSD/mpx-lvl2.v2.xsd">
   <personKörperschaft kueId="4207" exportdatum="2010-03-11T17:36:49">
      <bearbDatum>13.06.2003</bearbDatum>
      <bemerkungen>DÜ-Vermerk: Original - Archiv</bemerkungen>
      <name>Kubik, Gerhard</name>
      <nennform>Kubik, Gerhard</nennform>
      <typ>Person</typ>
      <verantwortlichkeit>EM-Medienarchiv</verantwortlichkeit>
   </personKörperschaft>
 [...]
   <sammlungsobjekt objId="240472" exportdatum="2010-03-11T17:28:19">
      <andereNr art="Produktions-Nr.">GC 2-12539</andereNr>
      <andereNr art="ID">4</andereNr>
      <andereNr art="Alte Inv. Nr.">P 0133</andereNr>
      <andereNr art="DAT-Nr.">1</andereNr>
      <bearbDatum>13.06.2003</bearbDatum>
      <besetzung>vokal/instr.: Sängerin, Mandoline, Tamburin</besetzung>
      <credits>Staatliche Museen zu Berlin, Preußischer Kulturbesitz, Ethnologisches Museum</credits>
      <geogrBezug bezeichnung="Ethnie">Perser</geogrBezug>
      <geogrBezug bezeichnung="Land">UdSSR</geogrBezug>
      <geogrBezug bezeichnung="Ort">Tiflis</geogrBezug>
      <geogrBezug art="Region">Georgien</geogrBezug>
      <identNr art="Ident. Nr.">VII 78/0004</identNr>
      <objekttyp>Audio</objekttyp>
      <personKörperschaftRef funktion="Produzent">Gramophone Concert Record</personKörperschaftRef>
      <sachbegriff>Schellackplatte</sachbegriff>
      <titel art="Titel">Titel A: [Persisch]</titel>
      <verantwortlichkeit>EM-Medienarchiv</verantwortlichkeit>
      <verwaltendeInstitution>Ethnologisches Museum, Staatliche Museen zu Berlin</verwaltendeInstitution>
   </sammlungsobjekt>
   <sammlungsobjekt objId="1432496" exportdatum="2010-03-11T17:28:19">
      <andereNr>M30-37281/2</andereNr>
      <bearbDatum>22.10.2009</bearbDatum>
      <identNr>VII LP 5264</identNr>
      <maßangabe typ="Geschwindigkeit (Schallplatte)">33 U/min</maßangabe>
      <objekttyp>Audio</objekttyp>
      <personKörperschaftRef>kein Eintrag</personKörperschaftRef>
      <sachbegriff>Schallplatte</sachbegriff>
      <verantwortlichkeit>EM-Medienarchiv</verantwortlichkeit>
      <verwaltendeInstitution>Ethnologisches Museum, Staatliche Museen zu Berlin</verwaltendeInstitution>
   </sammlungsobjekt>
</museumPlusExport>

TODO: Intermediary steps

explain the intermediary steps

AttachmentSize
levelup.bat5.27 KB
rtftable2xml.pl.txt18.6 KB
newJoin.xsl1.01 KB
lupmpx.4thGen.xsl38 KB
join.lvl2.mpx6.5 KB
No votes yet
Theme provided by Danetsoft under GPL license from Danang Probo Sayekti