Setting MIME Types

You can use the MIME type criteria options -mimeinclude, -indmimeinclude, -mimeexclude and -indmimeexclude to include or exclude MIME types.

Syntax restrictions

When you specify MIME type criteria, keep in mind the following restrictions.

Using the wildcard character (*)

The asterisk (*) wildcard character does not operate as a regular expression for the value of the MIME type criteria. Instead you can only use it to replace the entire MIME type or MIME sub-type.

For example, the following value is a valid substitute for text/html:

text/*

The following value is NOT a valid substitute for text/html:

text/h*

Multiple parameter values

When you specify a series of parameter values for a single instance of one of the MIME type criteria, and you use quotes, you must enclose each separate parameter value in single quotes.

For example:

-mimeinclude 'text/plain' 'application/*'

If you enclose the entire sequence of parameter values,

-mimeinclude 'text/plain application/*'

the Verity Spider will consider the entire expression as a single value.

You can also use multiple instances of the MIME type criteria, each with a single parameter value, where quotes are necessary only if you use the wildcard character (*).

For example:

-mimeinclude text/plain

-mimeinclude 'application/*'.Setting MIME Types

MIME types and Web crawling

When you index a Web site, the Verity Spider evaluates your MIME Type criteria against the "Content-Type" HTTP headers sent by the Web server hosting that Web site. That Web server passes along MIME Type information based on its own internal tables.

When you encounter MIME Types being dropped, make sure the Web server you are indexing has the necessary MIME Type information. See the documentation for your Web server for information about specifying MIME Types.

You can examine the indexing job's log files for indications that files are being skipped due to MIME Types. For example, a typical ASCII file you might want indexed is a log file (filename.log). Unless the Web server understands that files with .LOG extensions are ASCII text, of MIME Type text/plain, you will see in the indexing job log file that .LOG files are skipped because of MIME Type even if you use:

-mimeinclude 'text/*'

MIME types and file system indexing

When you index a file system, the Verity Spider reads filenames and evaluates your MIME Type criteria against an internal, compiled list of known MIME Types and associated file extensions. You cannot edit this list. However, you can use the -mimemap option to create a custom MIME Type mapping.

When you encounter MIME Types being dropped, check if the Verity Spider recognizes that particular MIME Type. See the table, "Known MIME types for file system indexing" for more details.

You can examine the indexing job's log files for indications that files are being skipped due to MIME types. For example, a typical ASCII file you might want indexed is a log file (filename.log). Since the Verity Spider does not understand that files with .LOG extensions are ASCII text, of MIME Type text/plain, you will see in the indexing job log file that .LOG files are skipped because of MIME Type even if you use:

-mimeinclude 'text/*'.Setting MIME Types

Indexing unknown MIME types

Whenever you find MIME Types being dropped, or you know you will be indexing files whose extensions are not known to the Verity Spider by default, use the -mimemap option to point to a file which contains your own custom mappings for file extensions and MIME Types.

You can also use the regular expression '*/*' for your MIME Type criteria.

For example:

-mimeinclude '*/*'

Remember, on either platform you need to include single quotes for values which include wildcard characters.

Furthermore, you should also use inclusion and exclusion criteria to finely control what is indexed.

Known MIME types for file system indexing

The MIME Types which the Verity Spider recognizes when indexing file systems are listed in the following table.
Format
MIME Type
Extension
HTML
text/html
htm, html
ASCII
text/plain
txt, text
ASCII, source files
text/plain
c, h, cpp, cxx
PDF
application/pdf
pdf
MS Word
application/msword
doc
MS Excel
application/excel
xls
MS PowerPoint
application/vnd.ms-powerpoint
ppt
WordPerfect 5.1
application/wordperfect5.1
wpd
RTF
application/rtf
rtf
FrameMaker MIF
application/vnd.mif
mif



Banner.Novgorod.Ru