Advanced ColdFusion Administration
|
|
Managing Verity Collections with the mkvdk Utility
|
Collection Maintenance Options
mkvdk
provides a variety of collection maintenance options, described in the following table:
Option |
Description |
-backup dir
|
This option backs up the collection into the specified directory. Note that the backup will not include the tde subdirectory. The tde subdirectory is created by and for Topic Document Entry if Topic Document Entry is used to create or maintain the collection.
|
-repair
|
This option repairs the collection, performed by an API call.
|
-purge
|
This option waits the amount specified by the purgewait option and then deletes all documents in the collection, but not the collection itself; it leaves the collection directory structure intact. To specify a different wait period, use the -purgewait option instead of -purge. If you do not use purgewait, the default is 600 seconds.
|
-purgeback
|
This option, used with the -purge option, performs a purge in the background.
|
-purgewait sec
|
This option specifies to the -purge option how many seconds to wait. If you do not specify sec, the default is 600..Collection Building Tool (mkvdk )
|
-noservice
|
This option prevents collection servicing (servicing includes indexing) by this instance of mkvdk , performed by an API call.
|
-persist
|
This option services the collection repeatedly, at default intervals of 30 seconds. Use the -sleeptime option to set a different interval.
|
-sleeptime sec
|
This option specifies the interval between service calls when mkvdk is run with the -persist option.
|
-optimize spec
|
This option performs various optimizations on the collection, depending on the value of spec. The specifier, spec, is a string consisting of keywords separated by hyphens, such as maxmerge-squeeze-readonly. Valid keywords are: described under "Optimization Keywords."
|
-noexit
|
Windows only. This option causes the I/O window to remain after the program is finished. By default, the window closes and the program exits so that scripts calling mkvdk will not hang.
|
Examples: Maintaining collections
Repairing a collection
The following command automatically repairs a collection, or enables it after manual repairs.
mkvdk -repair -collection path
Backing up a collection
The following command backs up a collection to the specified directory.
mkvdk -backup path_1 -collection path_2
Deleting a collection
To delete a collection, use the appropriate command for your operating system. For example, to remove the collection directory structure and control files on a UNIX system, use the following command.
rm -r -collection_path
Purging a collection
The following command deletes all documents from a collection, but does not delete the collection itself.
mkvdk -purge -collection path
Purging in the background
The following command purges the specified collection in the background.
mkvdk -purge -purgeback -collection path
Persistent service
The following command runs mkvdk
as a persistent process, so that servicing is performed repeatedly after num idle seconds.
mkvdk -persist -sleeptime num -collection path
Deleting a Collection
Note that -purge deletes all documents in a collection, but does not delete the collection itself. To delete a collection, use operating system commands such as the rm command on UNIX to remove the collection directory structure and control files.
Optimization Keywords
Optimization keywords for the -optimize option are described below.
Keyword |
Description |
maxclean
|
This keyword performs the most comprehensive housekeeping possible, and removes out-of-date collection files. This optimization is recommended only when you are preparing an isolated collection for publication. Note that when using this type, if the collection is being searched, sometimes files get deleted too early and this affects search results.
|
maxmerge
|
This keyword performs maximal merging on the partitions to create partitions that are as large as possible. This creates partitions that can have up to 64000 documents in them.
|
readonly
|
This keyword makes the collection read only. When used, mkvdk marks the collection as read-only and unchanging after the function call is done. This is appropriate for CD-ROM collections.
|
spanword
|
This keyword creates a spanning word list across all the collection's partitions. A collection consists of numerous smaller units called partitions each of which includes a word list. Optionally, a spanning word list can be built with an ngram index.
|
ngramindex
|
This keyword builds an ngram index for the collection. An ngram index is designed to improve the search performance for queries with the <TYPO> and/or <WILDCARD> operators. An ngram index can not be built without a spanning word list. You can build a spanning word list and ngram index in the same command, for example:
mkvdk -collection collname -optimize spanword-ngramindex
|
squeeze
|
This keyword squeezes deleted documents from the collection. Squeezing deleted documents recovers space in a collection, and improves search performance. Using this option invalidates the search results.
|
vdbopt
|
Each collection consists of smaller units called Verity databases (VDBs). The vdbopt keyword configures the collection's VDBs. This keyword has the effect of linearizing the data in a VDB, and making the collection metadata contained in the VDB more streamlined. It also allows the VDB to grow to a much larger size.
|
tuneup
|
This keyword is a convenience keyword that includes maxmerge, vdbopt, and spanword.
|
publish
|
This keyword is a convenience keyword that includes all of the optimization types. Use this keyword to optimize the collection for the best possible retrieval performance, such as for publication to a network on a server or on a CD-ROM.
|
About squeezing deleted documents
When a document is deleted from a collection, its space is not recovered. It is merely marked as deleted and not available for subsequent searches. Squeezing actually removes deleted documents from the collection's internal documents table and word indexes, thus creating a smaller collection and reducing the collection's disk space. A smaller collection has a more efficient structure that makes searching slightly faster and uses slightly less memory.
When can you squeeze deleted documents? It is safe to squeeze deleted documents anytime for a collection because mkvdk
ensures that the collection is available for searching and servicing through its self-administration features. The application does not need to temporarily disable a collection to squeeze deleted documents because when a squeeze request is made, the mkvdk
assigns a new revision code to the collection. After a squeeze has occurred, the next time the application accesses the collection, the Verity engine notifies the application that dramatic changes have been made, and points the application to the new collection data.
Before squeezing deleted documents, you should be aware of some of its effects. Squeezing deleted documents out of a collection is a significant update to the collection. If users are reviewing search results at the time when squeezing occurs, the search results may be invalidated after the squeeze.
About optimized Verity databases
The Verity Database (VDB) is the fundamental storage mechanism responsible for supporting dynamic access to documents in collections. A VDB consists of simple tables with rows and columns that relate to each other by row position. VDB tables are not relational, and their architecture supports quick and efficient searching over textual data. A VDB consists of segments which are packed into a single file. One of the advantages of having one packed VDB file is optimized search performance. The fewer files that need to be opened during search processing, the faster the search performance.
The VDB optimization option optimizes the packing of a collection's VDBs. When VDBs are built during normal indexing operations, the segments are not stored sequentially in the one-file VDB file system. As a result of VDB optimization, performance can be improved by re-serializing the packed segments in the VDBs so that all segments are contiguous, and VDBs can grow in size. Optimized VDBs can grow up to 2 gigabytes in size as opposed to the maximum 64 megabytes for an unoptimized one.
Using this option may degrade your indexing performance when certain indexing modes are set for the collection.
Performance tuning options
mkvdk
provides performance tuning options, described in the following table:
Option |
Description |
-maxfiles num
|
This option sets the maximum number of files that mkvdk can have open at once. The default is 50.
|
-diskcache num
|
This option sets the size of the mkvdk disk cache in kbytes. The default is 128.
|
Copyright © 2001, Macromedia Inc. All rights reserved. |
|