|
You can search for any word or phrase on a Web site by typing the word
or phrase into a query form and clicking the button to execute the query
(for example, the Execute Query button on the sample query form). This
section covers the following topics:
- Boolean and Proximity Operators:
Shows how to make more precise queries by inserting Boolean and proximity
operators.
- Wildcards: Helps you find
pages containing words similar to a given word.
- Free-Text Queries:
Describes how to formulate a query based on the meaning of a phrase
rather than the exact wording.
- Vector Space Queries:
Explains how to get query results that match a list of words and phrases.
- Property Value Queries:
Tells how to query for the property values of a file.
- Query Examples: Gives examples
of various queries.
- Searches produce a list of files that contain the word or phrase no
matter where they appear in the text. This list gives the rules for
formulating queries:
- Consecutive words are treated as a phrase; they must appear in the
same order within a matching document.
- Queries are case-insensitive, so you can type your query in uppercase
or lowercase.
- You can search for any word except for those in the exception list
(for English, this includes a, an, and, as,
and other common words), which are ignored during a search.
- Words in the exception list are treated as placeholders in phrase
and proximity queries. For example, if you searched for Word for
Windows, the results could give you Word for Windows
and Word and Windows, because for is a noise word
and appears in the exception list.
- Punctuation marks such as the period (.), colon (:), semicolon (;),
and comma (,) are ignored during a search.
- To use specially treated characters such as &, |, ^, #, @, $,
(, ), in a query, enclose your query in quotation marks ().
- To search for a word or phrase containing quotation marks, enclose
the entire phrase in quotation marks and then double the quotation marks
around the word or words you want to surround with quotes. For example,
World-Wide Web or Web searches for
World-Wide Web or Web.
- You can insert Boolean operators (AND,
OR, and NOT) and the proximity
operator (NEAR) to specify additional search information.
- The wildcard character (*) can match words
with a given prefix. The query esc* matches the terms ESC,
escape, and so on.
- Free-text queries can be specified
without regard to query syntax.
- Vector space queries can be specified.
- ActiveX (OLE) and file attribute property
value queries can be issued.
Boolean and proximity operators can create a more precise query.
To Search
For |
Example |
Results |
Both terms in the same page |
access and basic
Or
access & basic |
Pages with both the words access and basic
|
Either term in a page |
cgi or isapi
Or
cgi | isapi |
Pages with the words cgi or isapi
|
The first term without the second term |
access and not
basic
Or
access & ! basic |
Pages with the word access but not basic
|
Pages not matching a property value |
not @size = 100
Or
! @size = 100 |
Pages that are not 100 bytes |
Both terms in the same page, close together |
excel near project
Or
excel ~ project |
Pages with the word excel near the word
project |
Hints:
- You can add parentheses to nest expressions within a query. The expressions
in parentheses are evaluated before the rest of the query.
- Use double quotes () to indicate that a Boolean or NEAR
operator keyword should be ignored in your query. For example, Abbott
and Costello will match pages with the phrase, not pages that
match the Boolean expression. In addition to being an operator, the
word and is a noise word in English.
- The NEAR operator is similar to the AND
operator in that NEAR returns a match if both words
being searched for are in the same page. However, the NEAR
operator differs from AND because the rank assigned
by NEAR depends on the proximity of words. That is,
the rank of a page with the searched-for words closer together is greater
than or equal to the rank of a page where the words are farther apart.
If the searched-for words are more than 50 words apart, they are not
considered near enough, and the page is assigned a rank of zero.
- The NOT operator can be used only after an AND
operator in content queries; it can be used only to exclude pages that
match a previous content restriction. For property value queries, the
NOT operator can be used apart from the AND
operator.
- The AND operator has a higher precedence than OR.
For example, the first three queries are equal, but the fourth is not:a
AND b OR c
c OR a AND b
c OR (a AND b)
(c OR a) AND b
Note The symbols (&, |, !, ~) and
the English keywords AND, OR, NOT,
and NEAR work the same way in all languages supported
by Index Server. Localized keywords are also available when the browser
locale is set to one of the following six languages:
Language |
Keywords |
German |
UND, ODER, NICHT,
NAH |
French |
ET, OU, SANS,
PRES |
Spanish |
Y, O, NO,
CERCA |
Dutch |
EN, OF, NIET,
NABIJ |
Swedish |
OCH, ELLER, INTE,
NÄRA |
Italian |
E, O, NO, VICINO |
Note The NEAR operator can be applied
only to words or phrases.
Wildcard operators help you find pages containing
words similar to a given word.
The query engine finds pages that best match
the words and phrases in a free-text query. This is done by automatically
finding pages that match the meaning, not the exact wording, of the query.
Boolean, proximity, and wildcard operators are ignored within a free-text
query. Free-text queries are prefixed with $contents.
The query engine supports vector space queries. Vector queries return
pages that match a list of words and phrases. The rank of each page indicates
how well the page matched the query.
To Search
For |
Example |
Results |
Pages that contain specific words |
light, bulb |
Files with words that best match the words being searched
for |
Pages that contain weighted prefixes, words, and phrases |
invent*, light[50],
bulb[10], "light bulb"[400] |
Files that contain words prefixed by invent,
the words light, bulb, and the phrase light
bulb (the terms are weighted) |
- Components in vector queries are separated by commas.
- Components in vector queries can be weighted by using the [weight]
syntax.
- Pages returned by vector queries do not necessarily match every term
in the query.
- Vector queries work best when the results are sorted by rank.
With property value queries, you can find files that have property values
that match a given criteria. The properties over which you can query include
basic file information like file name and file size, and ActiveX properties
including the document summary (information) that is stored in files created
by ActiveX-aware applications.
There are two types of property queries:
- Relational property queries
consist of an at character (@), a property
name, a relational operator,
and a property value. For example, to
find all of the files larger than one million bytes, issue the query
@size > 1000000.
- Regular expression property queries consist of a number sign
(#), a property name, and a regular expression
for the property value. For example, to find to find all of the video
(.avi) files, issue the query #filename *.avi. Regular expressions will
never match the special properties contents (#contents) and all (#all).
Properties that are not retrievable at query time cannot be used in
# queries. these include HTML META properties not stored in the property
cache.
This section covers the following topics:
Property names are preceded by either the at (@) or number
sign (#) character. Use @ for relational queries, and # for regular expression
queries.
If no property name is specified, @contents is assumed.
Properties available for all files include:
Property
Name |
Description |
All |
Matches words, phrases, and any property |
Contents |
Words and phrases in the file |
Filename |
Name of the file |
Size |
File size |
Write |
Last time the file was modified |
ActiveX property values can also be used in queries. Web sites with files
created by most ActiveX-aware applications can be queried for these properties:
Property
Name |
Description |
DocTitle |
Title of the document |
DocSubject |
Subject of the document |
DocAuthor |
The documents author |
DocKeywords |
Keywords for the document |
DocComments |
Comments about the document |
Relational operators are used in relational property queries.
To Search
For |
Example |
Results |
Property values in relation to a fixed value |
@size < 100
@size <= 100
@size = 100
@size != 100
@size >= 100
@size > 100 |
Files whose size matches the query |
Property values with all of a set of bits on |
@attrib ^a 0x820 |
Compressed files with the archive bit on |
Property values with some of a set of bits on |
@attrib ^s 0x20 |
Files with the archive bit on |
To Search
For |
Example |
Results |
A specific value |
@DocAuthor = Bill
Barnes |
Files authored by Bill Barnes |
Values beginning with a prefix |
#DocAuthor George*
|
Files whose author property begins with George
|
Files with any of a set of extensions |
#filename *.|(exe|,dll|,sys|)
|
Files with .exe, .dll, or .sys extensions |
Files modified after a certain date |
@write > 96/2/14
10:00:00 |
Files modified after February 14, 1996 at 10:00 GMT |
Files modified after a relative date |
@write > -1d2h |
Files modified in the last 26 hours |
Vectors matching a vector |
@vectorprop = {
10, 15, 20 } |
ActiveX documents with a vectorprop value of { 10,
15, 20 } |
Vectors where each value matches a criteria |
@vectorprop >^a
15 |
ActiveX documents with a vectorprop value in which
all values in the vector are greater than 15 |
Vectors where at least one value matches a criteria |
@vectorprop =^s
15 |
ActiveX documents with a vectorprop value in which
at least one value is 15 |
- Be sure to use the pound (#) character before the property name when
using a regular expression in a property value, and an at
(@) character otherwise. The equal (=) relational operator is assumed
for regular-expression queries.
- File name (#filename) is the only property that efficiently supports
regular expressions with wildcards to the left of text.
- Date and time values are of the form yyyy/mm/dd hh:mm:ss
or yyyy-mm-dd hh:mm:ss. The first two characters of the year
and the entire time can be omitted. If you omit the first two characters
of the year, then 29 or less is interpreted as the year 2000, and 30
or greater is interpreted as the year 1900. All dates and times are
in Greenwich Mean Time (GMT).
- Dates and times relative to the current time can be expressed with
a minus (-) character followed by zero or by more integer unit and time
unit pairs. Time units are expressed as: (y) for years, (m) for months,
(w) for weeks, (d) for days, (h) for hours, (n) for minutes, and (s)
for seconds. A three-digit millisecond value can be optionally specified
after the seconds value in date expressions. For example, 1997/12/8
10:10:03:452
- Currency values are of the form x.y, where x is
the whole value amount and y is the fractional amount. There
is no assumption about units.
- Boolean values are (t) or (true) for TRUE and (f)
or (false) for FALSE.
- Vectors (VT_VECTOR) are expressed as an opening brace ({), followed
by a comma-separated list of values, then a closing brace (}).
- Single-value expressions that are compared against vectors are expressed
as a relational operator, then a
(^a) for all of or a (^s) for some of.
- Numeric values can be in decimal or hexadecimal (preceded by 0x).
- The contents property does not support relational operators.
If a relational operator is specified, no results will be found. For
example, @contents Microsoft will find documents containing Microsoft,
but @contents=Microsoft will find none.
Regular expressions in property queries are defined as follows:
- Any character except asterisk (*), period (.), question mark (?),
and vertical bar (|) defaults to matching just itself.
- Regular expressions can be enclosed in matching quotes (), and
must be enclosed in quotes if they contain a space ( ) or closing parenthesis
()).
- The characters *, ., and ? behave as they behave in Windows; they
match any number of characters, match (.) or end of string, and match
any one character, respectively.
- The character | is an escape character. After |, the following characters
have special meaning:
( opens a group. Must be followed by a matching ).
) closes a group. Must be preceded by a matching (.
[ opens a character class. Must be followed by a matching (un-escaped)
].
{ opens a counted match. Must be followed by a matching }.
} closes a counted match. Must be preceded by a matching {.
, separates OR clauses.
* matches zero or more occurrences of the preceding expression.
? matches zero or one occurrences of the preceding expression.
+ matches one or more occurrences of the preceding expression.
Anything else, including |, matches itself.
- Between square brackets ([]) the following characters have special
meaning:
^ matches everything but following classes. Must be the first character.
] matches ]. May only be preceded by ^, otherwise it closes the class.
- range operator. Preceded and followed by normal characters.
Anything else matches itself (or begins or ends a range at itself).
- Between curly braces ({}) the following syntax applies:
|{m|} matches exactly m occurrences of the preceding expression.
(0 < m < 256).
|{m,|} matches at least m occurrences of the preceding expression.
(1 < m < 256).
|{m,n|} matches between m and n occurrences of the
preceding expression, inclusive. (0 < m < 256, 0 < n < 256).
- To match *, ., and ?, enclose them in brackets (for example, |[*]sample
will match *sample).
Example |
Results |
@size > 1000000 |
Pages larger than one million bytes |
@write > 95/12/23 |
Pages modified after the date |
Apple tree |
Pages with the phrase apple tree |
"apple tree" |
Same as above |
@contents apple
tree |
Same as above |
Microsoft and @size
> 1000000 |
Pages with the word Microsoft that are
larger than one million bytes |
"microsoft
and @size > 1000000" |
Pages with the phrase specified (not the same as above) |
#filename *.avi |
Video files (the # prefix is used because the query
contains a regular expression) |
@attrib ^s 32 |
Pages with the archive attribute bit on |
@docauthor = John
Smith |
Pages with the given author |
$contents why is
the sky blue? |
Pages that match the query |
@size < 100
& #filename *.gif |
Graphics Interchange Format (GIF) files less than 100
bytes in size |
|
|