docx2txt.pl

Recently, I installed search files http://drupal.org/project/search_files and noticed that I still cannot search my docx files. So here my first attempt to solve the problem. A little perl script that reads in docx, unzips it, and displays the content in no particular order using libxml. It's my first attempt. Version 0.001 after a few minutes.

Found a similar project: http://docx2txt.sourceforge.net

which uses

  • /usr/bin/unzip and
  • manual xml conversion (regular expressions) and
  • manual unicode conversion
  • is not available on CPAN.

I still cannot hide the exceptions from Archive::Zip.

 

Edit: I attach a version which should run easy enough as a script. Now called catdocx

 

AttachmentSize
docx2txt.pl.txt1.52 KB
catdocx_simple.pl.txt1.78 KB
No votes yet