Using Query Expressions

When you search a Verity collection, you use the cfsearch tag in a ColdFusion application page. Use the criteria attribute to specify the query expression you want to pass to the search engine.

You can build two types of query expressions: simple and explicit. A simple query expression is typically a word or words. An explicit query expression can employ a number of operators and modifiers to refine the search, and you must invoke all aspects of the search explicitly. A simple query expression employs operators by default. You can assemble an explicit query expression programmatically, or you can pass a simple query expression to the search engine directly from an HTML input form.

The Verity query language provides many operators and modifiers for composing queries. You can use the following search techniques to search a Verity collection:

Simple query expressions

Simple queries let end users enter simple, comma-delimited strings and use wildcard characters. Users can enter multiple words separated by commas, in which case the comma is treated like a logical OR. If a user omits the commas, the query expression is treated as a phrase.

Ordinarily, operators are employed in explicit query expressions. Operators are normally surrounded by angle brackets (< >). However, a simple query expression can include the AND, OR, and NOT operators without angle brackets.

A simple query automatically employs the STEM operator and the MANY modifier. STEM searches for words that derive from those entered in the query expression, so entering "find" returns documents that contain "find," "finding," "finds," and so on. The MANY modifier presents the documents returned in the search as a list based on a relevancy score.

Explicit query expressions

You can construct explicit queries using a variety of operators, which are described later in this section. Most operators in an explicit query expression must be surrounded by angle brackets < >. You can use the AND, OR, and NOT operators without angle brackets.

Expression syntax

You can use either simple or explicit syntax when stating simple query syntax. The syntax you use determines whether the search words you enter are stemmed, and whether the words that are found contribute to relevance-ranked scoring.

Simple syntax

When you use simple syntax, the search engine implicitly interprets single words as if they were modified by the MANY and STEM operators. By implicitly applying the MANY operator, the search engine calculates each document's score based on the density of the search term in the searched documents. The more frequent is the occurrence of a word in a document, the higher is the document's score.

As a result, the search engine ranks documents according to word density as it searches for the word you specify, as well as words that have the same stem. For example, "films", "filmed," and "filming" are stemmed variations of the word "film." To search for documents containing the word "film" and its stem words, you can enter the word "film" without modification. When documents are ranked by relevance, they appear in a list with the most relevant documents at the top.

Explicit syntax

When you use explicit syntax, the search engine interprets the search terms you enter as literals. For example, by entering the word "film" (including quotation marks) using explicit syntax, the stemmed versions of the word "film", "films," "filmed," and "filming" are ignored.

The following table shows all operators available for conducting searches of ColdFusion Verity collections.
Verity Search Operators
<
CONTAINS
PHRASE
<=
ENDS
SENTENCE
=
MATCHES
STARTS
>
NEAR
STEM
>=
NEAR/N
SUBSTRING
Accrue
OR
WILDCARD
AND
PARAGRAPH
WORD

Special characters

The search engine handles a number of characters in particular ways as described in the following table:
Characters
Description
, ( ) [
These characters end a text token.
= > < !
These characters also end a text token. They are terminated by an associated end character.
' @ ` < { [ !
These characters signify the start of a delimited token. They are terminated by an associated end character.

A backslash (\) removes special meaning from whatever character follows it. To enter a literal backslash in a query, use two in succession; for example:

<FREETEXT>("\"Hello\", said Packard.")

"backslash (\\)"

Composing search expressions

The following rules apply to the composition of search expressions.

Precedence rules

Expressions are read from left to right. The AND operator takes precedence over the OR operator. However, terms enclosed in parentheses are evaluated first. When the search engine encounters nested parentheses, it starts with the innermost term.

Prefix and infix notation

You use can using prefix notation or infix notation to define search strings that use any operator other than an evidence operator. As a result, either of the following expressions is valid:

When you use prefix notation, the expression specifies precedence explicitly. The following example means: Look for documents that contain b and c first, then documents that contain a:

OR (a, AND (b,c))

When you use infix notation, precedence is implicit in the expression. For example, the AND operator takes precedence over the OR operator.

Commas in expressions

If an expression includes two or more search terms within parentheses, a comma is required as a separator between the elements. The following example means: Look for documents that contain any combination of a and b together.

<OR> (a, b)

Note that in this example, angle brackets are used with the OR operator.

Delimiters in expressions

You use angle brackets (< >), double quotation marks ("), and backslashes (\) to delimit various elements in a query expression, as described in the following table:
Angle brackets
Left and right angle brackets are reserved for designating operators and modifiers. They are optional for the AND, OR, and NOT operators, but required for all other operators.
Double quotation marks
You use double quotation marks in expressions to search for a word that is otherwise reserved as an operator, such as AND, OR, and NOT.
Backslashes
To include a backslash in a search expression, insert two backslashes for each backslash character you want included in the search; for example, C:\\CFUSION\\BIN.

Searching with wildcards

The following table shows the wildcard characters that you can use to search Verity collections:
Wildcard
Description
?
Question. Matches any single alphanumeric character.
*
Asterisk. Matches zero or more alphanumeric characters. Avoid using the asterisk as the first character in a search string. Asterisk is ignored in a set, ([]) or an alternative pattern ({}).
[ ]
Square brackets. Matches any one the characters in the brackets, as in "sl[iau]m" which locates "slim," "slam," and "slum." Square brackets indicate an implied OR.
{ }
Curly braces. Matches any one of a set of patterns separated by a comma, as in "hoist{s, ing, ed}", which locates "hoists," "hoisting," and "hoisted".
^
Caret. Matches any character not in the set, as in "sl[^ia]m", which locates "slum" but not "slim" or "slam."
-
Hyphen. Specifies a range of characters in a set, as in "c[a-r]t", which locates every word beginning with "c," ending with "t," and containing any letter from "a" to "r."

Searching for wildcards as literals

To search for a wildcard character in your collection, you need to escape the character with a backslash (\); for example:

Searching for special characters as literals

You must precede the following nonalphanumeric characters with a backslash character (\) in a search string:

In addition to the backslash character, you can use paired backquotes (` `) to interpret special characters as literals. For example, to search for the wildcard string "a{b" you can surround the string with backquotes, as follows:

`a{b`

To search for a wildcard string that includes the literal backquote character (`) you must use two backquotes together and surround the whole string in backquotes:

`*n``t` 

You can use paired backquotes or backslashes to escape special characters. There is no functional difference between the two. For example, you can query for the term: <DDA> in the following ways:

\<DDA\> or `<DDA>`

Operators and modifiers

The power of the cfsearch tag is in the control it provides over the Verity search engine. The engine offers users a high degree of specificity in setting search parameters.

Operators

An operator represents logic to be applied to a search element. This logic defines the qualifications that a document must meet to be retrieved. You can use operators to refine your search or to influence the results in other ways. For example, you could construct an HTML form for conducting searches. In the form, a user could perform a search for a single term: server. You can refine your search by limiting the search scope in a number of ways. Operators are available for limiting a query to a sentence or paragraph, and you can search words based on proximity.

Ordinarily, you use operators in explicit searches, as shown here:

"<operator>search_string"

The following operator types are available:
Operator type
Purpose
Evidence
Specifies basic and intelligent word searches.
Proximity
Specifies the relative location of words in a document.
Relational
Searches fields in a collection.
Concept
Identifies a concept in a document by combining the meanings of search elements.
Score
Manipulates the score returned by a search element. You can set the score percentage display to as many as four decimal places.
Natural language
Allows the use of natural language expressions in forming queries.

Evidence operators

Evidence operators let you specify a basic word search or an intelligent word search. A basic word search finds documents that contain only the word or words specified in the query. An intelligent word search expands the query terms to create an expanded word list so that the search returns documents that contain variations of the query terms.

Documents retrieved using evidence operators are not ranked by relevance unless you use the MANY modifier.

The following tale describes the evidence operators:
Operator
Description
STEM
Expands the search to include the word you enter and its variations. The STEM operator is automatically implied in any simple query. For example, the explicit query expression:
<STEM>believe
yields matches such as "believe," "believing," and "believer".
WILDCARD
Matches wildcard characters included in search strings. Certain characters automatically indicate a wildcard specification, such as apostrophe (*) and question mark(?). For example, the query expression:
spam*
yields matches such as, "spam," "spammer", and "spamming".
WORD
Performs a basic word search, selecting documents that include one or more instances of the specific word you enter. The WORD operator is automatically implied in any SIMPLE query.
THESAURUS
Expands the search to include the word you enter and its synonyms.
SOUNDEX
Expands the search to include the word you enter and one or more words that "sound like," or whose letter pattern is similar to, the word specified. Collections do not have sound-alike indexes by default; to use this feature you must build sound-alike indexes.
TYPO/N
Expands the search to include the word you enter plus words that are similar to the query term. This operator performs "approximate pattern matching" to identify similar words. The optional N variable in the operator name expresses the maximum number of errors between the query term and a matched term, a value called the error distance. If N is not specified, an error distance of 2 is used.

Proximity operators

Proximity operators specify the relative location of specific words in the document. Specified words must be in the same phrase, paragraph, or sentence for a document to be retrieved. In the case of NEAR and NEAR/N operators, retrieved documents are ranked by relevance based on the proximity of the specified words. Proximity operators can be nested; phrases or words can appear within SENTENCE or PARAGRAPH operators, and SENTENCE operators can appear within PARAGRAPH operators.

The following table describes the proximity operators:
Operator
Description
NEAR
Selects documents containing specified search terms. The closer the search terms are to one another within a document, the higher the document's score. The document with the smallest possible region containing all search terms always receives the highest score. Documents whose search terms are not within 1000 words of each other are not selected.
NEAR/N
Selects documents containing two or more search terms within N number of words of each other, where N is an integer between 1 and 1024. NEAR/1 searches for two words that are next to each other. The closer the search terms are within a document, the higher the document's score.
You can specify multiple search terms using multiple instances of NEAR/N as long as the value of N is the same:
commute <NEAR/10> bicycle <NEAR/10>
train <NEAR/10>
PARAGRAPH
Selects documents that include all of the words you specify within the same paragraph. To search for three or more words or phrases in a paragraph, you must use the PARAGRAPH operator between each word or phrase.
<PARAGRAPH> (mission, goal).
PHRASE
Selects documents that include a phrase you specify. A phrase is a grouping of two or more words that occur in a specific order. Examples:
mission oak
"mission oak"
mission <PHRASE> oak
SENTENCE
Selects documents that include all of the words you specify within the same sentence. Examples:
jazz <SENTENCE> musician
<SENTENCE> (jazz, musician)
IN
Selects documents that contain specified values in one or more document zones. A document zone represents a region of a document, such as the document's summary, date, or body text. The IN operator can be qualified with the WHEN operator, to search for a term only within the one or more zones upon which certain conditions have been placed.

Relational operators

Relational operators search document fields that you defined in the collection. Documents containing specified field values are returned. Documents retrieved using relational operators are not ranked by relevance, and you cannot use the MANY modifier with relational operators.

You use the following operators for numeric and date comparisons:
Operator
Description
=
Equals
>
Greater than
>=
Greater than or equal to
<
Less than
<=
Less than or equal to

The following relational operators compare text and match words and parts of words:
Operator
Description
CONTAINS
Selects documents by matching the word or phrase you specify with the values stored in a specific document field. Documents are selected only if the search elements specified appear in the same sequential and contiguous order in the field value; for example, "god" matches "God in heaven," "a god among men," or "good god" but not "godliness," or "gods."
MATCHES
Selects documents by matching the query string with values stored in a specific document field. Documents are selected only if the search elements specified match the field value exactly. If a partial match is found, a document is not selected; for example, "god" matches a document field containing only "god" and does not match "gods," "godliness," or "a god among men."
STARTS
Selects documents by matching the character string you specify with the starting characters of the values stored in a specific document field.
ENDS
Selects documents by matching the character string you specify with the ending characters of the values stored in a specific document field.
SUBSTRING
Selects documents by matching the query string you specify with any portion of the strings in a specific document field; for example, "god" matches "godliness," "a god among men," "godforsaken," and so on.

Document fields

You can specify the values for the cfindex attributes TITLE, KEY, URL, and CUSTOM as document fields for use with relational operators in the criteria attribute. Document fields are referenced in text comparison operators. They are identified as:

For more information on this topic, see the Knowledge Base article, "Verity: Using Document Fields To Narrow Down Searches" (ID# 1082) on our Web site at http://www.coldfusion.com/Support/KnowledgeBase/SearchForm.cfm.

The SUBSTRING operator

You can use the SUBSTRING operator to match a character string with data stored in a specified data source. In the example described in this section, a data source called TEST1 contains the table YearPlaceText, which itself contains three columns: Year, Place, and Text. Year and Place make up the primary key. The following table shows the TEST1 schema:
Year
Place
Text
1990
Utah
Text about Utah 1990
1990
Oregon
Text about Oregon 1990
1991
Utah
Text about Utah 1991
1991
Oregon
Text about Oregon 1991
1992
Utah
Text about Utah 1992

The following application page matches records that have 1990 in the TEXT column and are in the Place Utah. The search is performed against the collection that contains the TEXT column and then is narrowed further by searching for the string "Utah" in the CF_TITLE document field. Recall that document fields are defaults defined in every collection corresponding to the values you define for URL, TITLE, and KEY in the cfindex tag.

<cfquery name="GetText"

  datasource="TEST1">

  SELECT Year+Place

    AS Identifier, text

    FROM YearPlaceText

</cfquery>



<cfindex collection="testcollection"

  action="Update"

  type="Custom"

  title="Identifier"

  key="Identifier"

  body="TEXT"

  query="GetText">



<cfsearch name="GetText_Search"

  collection="testcollection"

  type="Explicit"

  criteria="1990 and CF_TITLE <SUBSTRING> Utah">

<cfoutput>

  Record Counts: <br>

  #GetText.RecordCount# <br>

  #GetText_Search.RecordCount# <br>

</cfoutput>



Query Results --- Should be 5 rows <br>

<cfoutput query="Gettext">

  #Identifier# <br>

</cfoutput>



Search Results -- should be 1 row <br>

<cfoutput query="GetText_Search">

  #GetText_Search.TITLE# <br>

</cfoutput>

Concept operators

Concept operators combine the meaning of search elements to identify a concept in a document. Documents retrieved using concept operators are ranked by relevance. The following table describes each concept operator:
Operator
Description
AND
Selects documents that contain all the search elements you specify.
OR
Selects documents that show evidence of at least one of the search elements you specify.
ACCRUE
Selects documents that include at least one of the search elements you specify. Documents are ranked based on the number of search elements found.
ALL
Selects documents that contain all of the search elements you specify. A score of 1.00 is assigned to each retrieved document. ALL and AND retrieve the same results, but queries using ALL are always assigned a score of 1.00.
ANY
Selects documents that contain at least one of the search elements you specify. A score of 1.00 is assigned to each retrieved document. ANY and OR retrieve the same results, but queries using ANY are always assigned a score of 1.00.

Score operators

Score operators govern how the search engine calculates scores for retrieved documents. The maximum score that a returned search element can have is 1.000. You can set the score percentage display to as many as four decimal places.

When you use a score operator, the search engine first calculates a separate score for each search element found in a document, and then performs a mathematical operation on the individual element scores to arrive at the final score for each document.

Note that the document's score is available as a result column. You can use the SCORE result column to get the relevancy score of any document retrieved. For example:

<cfoutput>

  <a href="#Search1.URL#">#Search1.Title#</a><br>

  Document Score=#Search1.SCORE#<BR>

</cfoutput> 

The following table describes the score operators:
Operator
Description
YESNO
Forces the score of an element to 1 if the element's score is non-zero:
<YESNO>mainframe
If the retrieval result of the search on "mainframe" is 0.75, the YESNO operator forces the result to 1. You can use YESNO to avoid relevance ranking.
PRODUCT
Multiplies the scores for the search elements in each document matching a query:
<PRODUCT>(computers, laptops)
Takes the product of the resulting scores.
SUM
Adds together the scores for the search element in each document matching a query, up to a maximum value of 1:
<SUM>(computers, laptops)
Takes the sum of the resulting scores.
COMPLEMENT
Calculates scores for documents matching a query by taking the complement (subtracting from 1) of the scores for the query's search elements. The new score is 1 minus the search element's original score.
<COMPLEMENT>computers
If the search element's original score is .785, the COMPLEMENT operator recalculates the score as .215.

Modifiers

You combine modifiers with operators to change the standard behavior of an operator in some way. For example, you can use the CASE modifier with an operator to specify that you want to match the case of the search word.

The following table describes the available modifiers.
Modifier
Description
CASE
Specifies a case-sensitive search. Normally, Verity searches are case-insensitive for search text entered in all uppercase or all lowercase, and case-sensitive for mixed-case search strings.
The expression:
<CASE>J[JAVA, java]
Searches for "JAVA" and "Java."
MANY
Counts the density of words, stemmed variations, or phrases in a document and produces a relevance-ranked score for retrieved documents. Use with the following operators:
  • WORD
  • WILDCARD
  • STEM
  • PHRASE
  • SENTENCE
  • PARAGRAPH
Here is an example:
<PARAGRAPH><MANY>javascript <AND> vbscript
You cannot use the MANY modifier with the following:
  • AND
  • OR
  • ACCRUE
  • Relational operators
NOT
Use to exclude documents that contain the specified word or phrase. Use only with the AND and OR operators.
Here is an example:
Java <AND> programming <NOT> coffee
ORDER
Use to specify that the search elements must occur in the same order in which they are specified in the query. Use with the following operators:
  • PARAGRAPH
  • SENTENCE
  • NEAR/N
Place the ORDER modifier before any operator, as follows:
<ORDER><PARAGRAPH>("server", "Java")



Banner.Novgorod.Ru