Getting Started with the Verity mkvdk Utility

The basic mkvdk syntax is as follows:

mkvdk -collection path [option] [...] [filespec] [...]

Where:

Numerous optional syntax options are listed below. All syntax options must precede the first filespec parameter.

Steps for building a collection

Building a collection with mkvdk involves setting up a collection directory structure and inserting documents into this structure. You can build a collection in two steps, using two separate mkvdk commands, as follows.

  1. Set up a collection using this syntax:
    mkvdk -create -collection collectionname
    
    

    Where collectionname is the path to the collection directory. After running this command, a collection directory is created including style files with configuration information.

  2. Insert documents using this syntax:
    mkvdk -collection collectionname -bulk -insert filespec
    
    

    Where filespec is the name of a bulk insert file which specifies which documents to index and insert into the collection.

Alternatively, you can set up a collection and insert documents in one mkvdk command, using this syntax:

mkvdk -create -collection collectionname -bulk -insert filespec


Note

The -create option can be used only once to create the collection directory structure. After a collection directory structure has been created, do not to use the -create option to update the collection.


Accessing online help for mkvdk

To display a list of mkvdk command-line options, enter:

mkvdk -help

Collection setup options

mkvdk provides a variety of collection setup options, described in the following table:
Option
Description
-create
This option creates a collection in the specified -collection directory. It creates the directory structure, determines the index contents and sets up the documents table schema according to the style files used. If the specified collection already exists, mkvdk exits rather than overwriting the existing collection.
-style dir
This option specifies the style directory that contains the style files to use in creating a collection. This option can only be used with the -create option. If you do not specify this option when you use mkvdk to create a collection, mkvdk uses the style files in the common/style directory.
-description desc
This option sets the collection's description. Enter any alphanumeric text you like, such as "This collection contains electronic mail from ABC Company." Include the quotation marks.
-words
This option builds the word list for all partitions in the collection.

Examples: Setting up collections

Creating a collection

The following command creates a collection in path_2 using the style files in path_1, and submits and indexes the document(s) in filespec.

mkvdk -create -style path_1 -collection path_2 filespec

Building the word list

The following command builds the word list in the collection residing in the path directory.

mkvdk -words -collection path

General processing options

mkvdk provides a variety of general processing options, described in the following table:
Option
Description
-collection path
This option specifies the path of the collection to create or open. This is required to execute mkvdk.
-nolock
This option turns off file locking. Locking is on by default.
-synch
This option performs work immediately. If this option is not used, indexing work is done in the background, as time permits.
-about
This option shows information about the collection, such as its description and the date when it was last modified.
-datapath path
This option specifies the datapath to use to find documents being added to the specified collection. All relative document paths will be relative to this setting. If you do not set this option, mkvdk looks for documents next to the collection directory.
-topicset path
This option creates a topic index for the collection based on the specified topic set and stores it in the collection directory. This facilitates quick and efficient searches over the collection data when using topics.
-mode mode
This option sets the indexing mode. Values are case insensitive. Valid settings are:
  • Generic
  • FastSearch
  • NewsfeedIdx
  • NewsfeedOpt
  • BulkLoad
  • ReadOnly
  • Any custom mode defined in the style.plc file. The default is Generic mode.
-common
This option specifies the path of the Verity common directory. If you do not use this option, the Verity engine looks for the common directory in the directory containing the mkvdk executable, and then along the executable search path. The executable search path is determined by your operating system environment settings. It is the path used by the OS to find the programs you run.
-help
This option displays mkvdk syntax options.
-debug
This option runs mkvdk in debugging mode.
-nooptimize
This option prevents optimization by this instance of mkvdk. Using this option turns off the service level VdkServiceType_Optimize. The service types determine what type of work the Verity engine and its self-administration features will execute on a collection.
-nohousekeep
This option prevents housekeeping by this instance of mkvdk. Housekeeping includes deleting files that are no longer needed. Using this option turns off the service level VdkServiceType_DBA. (Service types are described under nooptimize.)
-noindex
This option prevents indexing by this instance of mkvdk. Documents will not be inserted or deleted. Using this option turns off the service level VdkServiceType_Index. (Service types are described under nooptimize.)
-charmap name
The name of the character set that you would like all strings mapped to for your application. You should set this to name a character set that your system can display properly. Using the search engine with the English locale, the character set that any version of Windows displays is 8859, the character set that a Macintosh computer would display is mac or mac1. Note that this is NOT the name of the character set of documents being indexed, it is only the name of the character set that your display can handle properly. (The character set of the document is set in the style.dft file using the /charmap option, which is described in Chapter 9.)
Valid options are 850, 8859, mac. The default is no mapping.
-locale name
The name of the Verity locale to be used by mkvdk. The locale name must correspond to the name of an existing locale directory which must exist in install_dir/common/locale. Valid options are english, deutsch, and francais. The default is english.
-datefmt format
This option is used to convert a date field value into Verity's internal data representation, and can be used in conjunction with the mkvdk options -extract (for the field extraction feature) and -bulk (for the bulk submit feature). The named format string identifies to the date parsing routines as to what order dates are written in when the date string only consists of a sequence of numbers (for example, 03/03/96). Valid options are described in "Date format options". The default is MDY.
-servlev level
Service level. The specifier, level, is a string consisting of keywords separated by hyphens, such as search-index-optimize. Valid keywords are described in "Date format options".

Examples: Processing documents

Using the Default Options

By default, mkvdk submits and indexes documents specified in the command, and services the specified collection. The following command executes the default options:

mkvdk -collection path filespec

Servicing only

The following command performs servicing only. Use this command if you only want to index submitted documents and service the collection.

mkvdk -collection path

Deleting documents from a collection

The following command deletes documents from a collection.

mkvdk -delete -collection path filespec

Bulk inserting or deleting

The following command specifies bulk insertion of a list of documents:

mkvdk -collection coll -bulk -insert filespec

filespec is the list of files to insert. Since insert is the default, the following command is equivalent to the preceding:

mkvdk -collection coll -bulk filespec

The following command specifies bulk deletion of a list of documents:

mkvdk -collection coll -bulk -delete filespec

filespec is the list of files to delete. It can be the same file used to insert documents; the only difference is that -delete is specified instead of -insert (or no specification).

Date format options

Many import date formats are supported by the Verity engine. In addition to numeric dates in XX-YY-ZZ format listed below, many textual date formats are supported. For more information, see Appendix A
Format Variable
Description
MDY
Dates written as month-day-year (US format, the default)
DMY
Dates written as day-month-year (European formats)
YMD
Dates written as year-month-day (ISO international format)
YDM
Dates written as year-day-month (Swedish format)
USA
Dates written in US format (the same as MDY)
EUR
Dates written in European format (the same as DMY)

Service level keywords

The following table describes the valid keywords for the -servlev keyword:
Keyword
Description
search
Enable search and retrieval
insert
Enable adding and updating documents
optimize
Enable opportunistic collection optimization
assist
Enable building of word list
housekeep
Enable housekeeping of unneeded files
delete
Enable document deletion (see Chapter 3)
backup
Enable backup
purge
Enable background purging
repair
Enable collection repair
dataprep
Same as search-index-optimize-assist-housekeep
index
Same as insert-delete

Messaging options

mkvdk provides a variety of messaging options, described in the following table:

Option
Description
-quiet
This option displays only fatal and error messages to the console. It overrides the -outlevel setting. For a list of message types, refer to "Message Types."
-outlevel (num)
This option indicates which message types to display to the console. Valid values are determined by adding numbers together that correspond to the desired message types. The default value is 15. For more information, refer to "Message Types."
-logfile file name
This option saves messages in the specified file.
-loglevel (num)
This option indicates which message types to route to the optional log file. Valid values are determined by adding numbers together that correspond to the desired message types. The default value is 15. For more information, refer to "Message Types."

Message types

Message types and their corresponding numbers are listed in the table below. To set the -outlevel or -loglevel option, add up the numbers for the message types you want to include. For example, to tell mkvdk to display all messages except debug messages, set -outlevel to 1+2+4+8+16+32=63. The default for both -outlevel and -loglevel is 15, which selects fatal, error, warning, and status messages (15=1+2+4+8).
Type
Number
Fatal
1
Error
2
Warning
4
Status
8
Info
16
Verbose
32
Debug
64

Document processing options

mkvdk provides a variety of document processing options, described in the following table:
Option
Description
-extract
This option extracts field values from documents, using the field extraction rules specified in the style.tde file. For more information, refer to Chapter 9.
-insert
This option adds documents to the collection. This is the default option for mkvdk.
-update
This option adds documents to the collection by replacing all previous information about the specified documents.
-delete
This option marks the specified documents as deleted and makes them unavailable for searches. To actually remove deleted documents from the collection's internal documents table and word indexes, use the squeeze keyword.
-nosave
Specifies that a work list, which is generated by mkvdk automati-cally when the -extract option is used, will not be saved in the collection directory in a file called worklist (in the Verity bulk submit file format). By default, mkvdk saves the worklist in the worklist file.
-nosubmit
Specifies that a work list, which is generated by mkvdk automatically when the -extract option is used, will not be submitted to the indexing engine and will be saved in the collection directory in a file called worklist (in the Verity bulk submit file format). This option allows mkvdk to process field extraction separately from other indexing tasks..Collection Building Tool (mkvdk)



Banner.Novgorod.Ru