Recently, I installed search files http://drupal.org/project/search_files and noticed that I still cannot search my docx files. So here my first attempt to solve the problem. A little perl script that reads in docx, unzips it, and displays the content in no particular order using libxml. It's my first attempt. Version 0.001 after a few minutes.
Found a similar project: http://docx2txt.sourceforge.net
which uses
- /usr/bin/unzip and
- manual xml conversion (regular expressions) and
- manual unicode conversion
- is not available on CPAN.
I still cannot hide the exceptions from Archive::Zip.
Edit: I attach a version which should run easy enough as a script. Now called catdocx
| Attachment | Size |
|---|---|
| docx2txt.pl.txt | 1.52 KB |
| catdocx_simple.pl.txt | 1.78 KB |
