Networking Options

-agentname

Syntax: -agentname string

Type: Web crawling only.

Specifies the value for the agent name field that is part of the HTTP request. Since Web servers can be configured to return different versions of the same page depending on the requesting agent, you can use -agentname to impersonate a browser client.

Use double-quotes if the name contains a space. Use -cmdfile if the agent name you want to use contains forbidden characters such as slashes or backslashes.

-connections

Syntax: -connections num_connections

Details Specifies the maximum number of simultaneous socket connections to make to Web sites for indexing. Each connection implies a separate thread.

The default value is 6.


Note

Verity Spider's dynamic flow control makes the most use of all available connections when indexing Web sites. If you are indexing multiple sites, you may want to increase this number. Note that increasing the number of connections may not always help because of such dependencies as your network connection and the capabilities of the remote hosts.


-delay

Syntax: -delay num_milliseconds

Type: Web crawling only.

Details Specifies the minimum time between HTTP requests in milliseconds. The default value is 0 milliseconds for no delay.

-header

Syntax: -header string

Type: Web crawling only

Specifies an HTTP header to be added to the spidering request. For example:

-header "Referer: http://www.verity.com/"

Verity Spider sends some predefined headers, such as Accept and User-Agent among others, by default. Special headers are sometimes necessary to correctly index a site.

For example, previous versions of Verity Spider did not support the "Host" header, which is needed for Virtual Host indexing. Also, a "Proxy-authentication" header was needed to pass a username and password to a proxy server.

In Verity Spider V3.7, the "Host" header is supported by default, and the -proxyauth option is available for proxy server authentication. Therefore the -header option is maintained only for backwards compatibility and possible future enhancements.


Note

Misuse of this option will cause spider failure. In the event that this happens, re-run the indexing task with modified -header values.


-hostcache

Syntax: -hostcache num_hostnames

Specifies the number of hostnames to cache to avoid DNS lookups. Without this option, the host cache will continue to grow.

The default value is 256.

-noflowctrl

Type: Web crawling only.

Disables round-robin indexing of Web sites with network flow control.

By default, Verity Spider uses round-robin indexing of Web sites to avoid overwhelming a Web server and to improve indexing performance. Verity Spider connects to each Web server in a round-robin manner, using up to the value for -connections. This means one URL is fetched from each Web server in turn.


Note

Using -noflowctrl may result in a significant drop in performance.


-noproxy

Syntax: -noproxy name_1 [name_n] ...

Type: Web crawling only.

Used in conjunction with -proxy, -noproxy specifies that the Verity Spider directly access the hosts whose names match those specified. By default, when -proxy is specified, the Verity Spider first tries to access every host with the proxy information. To improve performance, use -noproxy for those hosts you know can be accessed without a proxy host. For the name variable, you can use the asterisk ( * ) wildcard for text strings. For example:

'*.verity.com'

You cannot use the question mark ( ? ) wildcard, and the -regexp option does not allow you to use regular expressions.

On Windows NT, you should include double quotes around the argument to protect the special character ( * ). On UNIX, you should use single quotes. Note that this is only required when you run the indexing job from a command line. Quotes are not necessary within a command file (-cmdfile).


Note

You must have valid Verity Spider licensing capability to use this option.


-proxy

Syntax: -proxy proxyhost:port

Type: Web crawling only.

Specifies host and port for proxy server.

Note You must have valid Verity Spider licensing capability to use this option.

See also -proxyauth for proxy servers that require authentication, and -noproxy for hosts which you know are accessible without having to go through a proxy server.

-proxyauth

Syntax: -proxyauth login:password

Type: Web crawling only.

Specifies login information for proxy server connections that require authorization to get outside the firewall. Used in conjunction with -proxy.


Note

You must have valid Verity Spider licensing capability to use this option. Information Server V3.7 does not support retrieving documents for viewing through secure proxy servers. Do not use -proxyauth for indexing documents which are to be viewed through Information Server V3.7


-retry

Syntax: -retry num_retries

Type: Web crawling only.

Specifies the number of times the Verity Spider should attempt to access an URL. You should use -retry when it is likely that an unstable network connection will give false rejections.

The default value is 4.

-timeout

Syntax: -timeout num_seconds

Type: Web crawling only.

Specifies the time period, in seconds, that the Verity Spider should wait before timing out on a network connection and on accessing data. The data access value is automatically twice the value you specify for the network connection timeout.

The default value for the network connection timeout is 30 seconds, and therefore the value for the data access timeout is 60 seconds.



Banner.Novgorod.Ru