Convert Files to Text

interpretive language scripts


Moderator: Forum moderators

Post Reply
s243a
Posts: 501
Joined: Mon Dec 09, 2019 7:29 pm
Has thanked: 90 times
Been thanked: 37 times

Convert Files to Text

Post by s243a »

I wrote this script becasue libreoffice doesn't have the option to output to stdout. I'm using text conversion as part of an indexer (see post).

Here is some simple code (first stab), which works:

Code: Select all

#!/bin/bash
if [ ! -z "`which soffice`" ]; then
  mkdir -p ./tmp_s243a_convert_$$
  soffice --headless --convert-to txt:Text --outdir ./tmp_$$ "$1" #2>/dev/null >/dev/null
  ls -1 ./tmp_$$ | xargs -I % cat '%'
  rm -rf ./tmp_s243a_convert_$$
fi

Some modifications might be to allow the filename to be provided by stdin and maybe an option to also output standard error output.

I would like to expand this to use other conversions utilities and also use "file" to do some mime type checking. Here is some draft code (not tested):

Code: Select all

#!/bin/bash
if [ ! -z "`which soffice`" ]; then
  mkdir -p ./tmp_s243a_convert_$$
  soffice --headless --convert-to txt:Text --outdir ./tmp_$$ "$1" #2>/dev/null >/dev/null
  ls -1 ./tmp_$$ | xargs -I % cat '%'
  rm -rf ./tmp_s243a_convert_$$
elif [ ! -z "`which unoconv`" ]; then
  unoconv --stdout -f $1
elif [ ! "`file --mime-type '$1'`" = */rtf ]; then
  echo "Not implemented yet"
  #Possible utilities;
  #TEXTUTIL/
  #unrtf https://superuser.com/questions/243084/rtf-to-txt-on-unix
  #wv
else
  if [ ! z- "`which antiword`" ]; then #Doesn't work for rtf files which may have a .doc extension.
    antiword $1
  elif [ "`file --mime-type '$1'`" = */vnd.oasis.opendocument* ]; then . vnd.oasis.opendocument
    if [ ! z- "`which odf2txt`" ]; then 
      odf2txt $1 #Not sure if this can handle all open document formats
    fi
  fi
fi


#--filter=application/msword:'unoconv --stdout -f text'
#http://hitekhedhelp.blogspot.com/2011/08/omega-overview.html
Post Reply

Return to “Scripts”