Since version 2.1 indexer can use external parsers to index different file types (mime types). Parser should be any program which converts one of the mime types to plain text or html. For example, if you have postscript files, you can use ps2ascii parser (filter), which reads postscript file from stdin and produces ascii to stdout. We assume parser sends output to stdout. If not, you have to write a little shell script to put results to stdout. Please feel free to contribute your scripts and parsers configuration to devel@search.udm.net. Many parsers could not operate on stdin and requires a file. In this case indexer creates a temporary file in /tmp and removes it after parser exits. Use $1 macro in parser command line to substitute file name. For example, command line for one of the MS Word to ascii converters looks like /usr/bin/catdoc -a $1 Some parsers could produce output in other encoding than your default. Specify encoding to make indexer convert parser's output encoding to right one. Parser's command line might be optional. In this case you can change encoding or mime type. For example, change mime text/tab-separated-values to text/plain: # Note - we do not use parser command line Mime text/tab-separated-values text/plain The better parser you use the better result you get. ----------------------------------------------------------------------------- How to setup parsing - two steps. 1. Configure web server Configure your web server to send appropriate Content-Type: header. For apache, have a look at mime.types file, most mime types are already defined there. 2. Edit indexer.conf. Uncomment or add lines with parsers definitions. Lines have the following format: # Parser definition format Mime [;charset] ["command line [$1]"] \ \ \ \ `- temporary file name \ \ \ `- full UNIX command line \ \ `- parser's output character set \ `- output mime type. text/plain or text/html `- source mime type For example, the following line defines parser for man pages: # I use deroff for parsing man pages ( *.man ) Mime application/x-troff-man text/plain "deroff" One more example: # I like catdoc, but sometimes it produces garbage. Mime application/msword text/plain;cp1251 "catdoc -a $1"