Specifies the value for the agent name field that is part of the HTTP request. Since Web servers can be configured to return different versions of the same page depending on the requesting agent, you can use -agentname to impersonate a browser client.
Use double-quotes if the name contains a space. Use -cmdfile if the agent name you want to use contains forbidden characters such as slashes or backslashes.
Syntax: -connections num_connections
Details Specifies the maximum number of simultaneous socket connections to make to Web sites for indexing. Each connection implies a separate thread.
Note Verity Spider's dynamic flow control makes the most use of all available connections when indexing Web sites. If you are indexing multiple sites, you may want to increase this number. Note that increasing the number of connections may not always help because of such dependencies as your network connection and the capabilities of the remote hosts. |
Syntax: -delay num_milliseconds
Details Specifies the minimum time between HTTP requests in milliseconds. The default value is 0 milliseconds for no delay.
Specifies an HTTP header to be added to the spidering request. For example:
-header "Referer: http://www.verity.com/"
Verity Spider sends some predefined headers, such as Accept and User-Agent among others, by default. Special headers are sometimes necessary to correctly index a site.
For example, previous versions of Verity Spider did not support the "Host" header, which is needed for Virtual Host indexing. Also, a "Proxy-authentication" header was needed to pass a username and password to a proxy server.
In Verity Spider V3.7, the "Host" header is supported by default, and the -proxyauth option is available for proxy server authentication. Therefore the -header
option is maintained only for backwards compatibility and possible future enhancements.
Note Misuse of this option will cause spider failure. In the event that this happens, re-run the indexing task with modified -header values. |
Syntax: -hostcache num_hostnames
Specifies the number of hostnames to cache to avoid DNS lookups. Without this option, the host cache will continue to grow.
Disables round-robin indexing of Web sites with network flow control.
By default, Verity Spider uses round-robin indexing of Web sites to avoid overwhelming a Web server and to improve indexing performance. Verity Spider connects to each Web server in a round-robin manner, using up to the value for -connections. This means one URL is fetched from each Web server in turn.
Note Using -noflowctrl may result in a significant drop in performance. |
Syntax: -noproxy name_1 [name_n] ...
Used in conjunction with -proxy, -noproxy specifies that the Verity Spider directly access the hosts whose names match those specified. By default, when -proxy is specified, the Verity Spider first tries to access every host with the proxy information. To improve performance, use -noproxy for those hosts you know can be accessed without a proxy host. For the name variable, you can use the asterisk ( * ) wildcard for text strings. For example:
'*.verity.com'
You cannot use the question mark ( ? ) wildcard, and the -regexp option does not allow you to use regular expressions.
On Windows NT, you should include double quotes around the argument to protect the special character ( * ). On UNIX, you should use single quotes. Note that this is only required when you run the indexing job from a command line. Quotes are not necessary within a command file (-cmdfile).
Note You must have valid Verity Spider licensing capability to use this option. |
Specifies host and port for proxy server.
Note You must have valid Verity Spider licensing capability to use this option.
See also -proxyauth for proxy servers that require authentication, and -noproxy for hosts which you know are accessible without having to go through a proxy server.
Syntax: -proxyauth login:password
Specifies login information for proxy server connections that require authorization to get outside the firewall. Used in conjunction with -proxy.
Note You must have valid Verity Spider licensing capability to use this option. Information Server V3.7 does not support retrieving documents for viewing through secure proxy servers. Do not use -proxyauth for indexing documents which are to be viewed through Information Server V3.7 |
Specifies the number of times the Verity Spider should attempt to access an URL. You should use -retry
when it is likely that an unstable network connection will give false rejections.
Specifies the time period, in seconds, that the Verity Spider should wait before timing out on a network connection and on accessing data. The data access value is automatically twice the value you specify for the network connection timeout.
The default value for the network connection timeout is 30 seconds, and therefore the value for the data access timeout is 60 seconds.