54
© Copyright 1995 - 2016 Archive Power Systems Inc./ DocuXplorer Software
Conditional Drop Characters - Conditional drop characters are a special type of drop character. These characters are dropped
only if they are at the beginning or the end of a word. This provides a mechanism for allowing certain characters to be
maintained if they appear inside words. The default conditional drop characters are (,.?!;:@#$%^&()-_).
For example:
One of the default conditional characters is the period. This means that periods on the ends of words will be
stripped from the text, but if they are in the middle of a word (e.g., in a number such as 48.5), then they will be
maintained.
Drop Characters - Drop characters are a set of characters that are simply ignored by the FTS (Full Text Search) engine. Drop
characters are ignored in both the text and in search strings. The default drop characters are the double quote, the apostrophe
(single quote), and the back quote (also known as the grave accent). Administrators can add additional Drop Characters in the
Drop Characters/Additional field.
For example:
If the defaults are used and a name such as O'Malley is in the text, the FTS engine will store OMalley (without
the apostrophe) as the key value. A search word of either OMalley or O'Malley will find it because the
apostrophe would be stripped out of the search word as well.
Noise Words - The noise words are words that are ignored by the FTS engine. Once a word is recognized according to the other
rules (after obeying delimiters, drop characters, minimum word length, etc.), the word is checked against the noise word list and
is ignored if it is found in that list.
The default noise word list includes the following words:
about after all also and another any are because been before being between both but came can come could did
does each else for from get got had has have her here him himself his how into its just like make many might
more most much must never now only other our out over said same see should since some still such take than
that the their them then there these they this those through too under use very want was way well were what
when where which while who will with would you your
Delimiters - Delimiters are the set of characters that define word boundaries. The default set of delimiters includes the white
space characters, which are the space (ASCII decimal 32), backspace (ASCII decimal 8), tab (ASCII decimal 9), newline (ASCII
decimal 10), vertical pipe (ASCII decimal 11), form feed (ASCII decimal 12), and carriage return (ASCII decimal 13). This works
for most standard text documents. WordPerfect documents are an example of documents that use delimiters to define
boundaries. DocuXplorer has been programmed with the symbol ( highbit) character € that WordPerfect uses to define its
boundaries.
Delimiters are always case sensitive. If, for example, you want to use "x" (ASCII dceimal 120) and "X" (ASCII decimal 88) as
delimiters, then you must specify both characters as delimiters regardless of the case sensitivity option.
Index Options
Minimum Word Length - This option specifies a minimum cut-off point for word recognition. Any word that is shorter than
the specified minimum length is simply ignored; these words will not be in the FTS index nor will they be used if they are
given in a search condition. When creating an FTS index via SQL, the default minimum word length is 3.
Maximum Word Length - The maximum word length specifies the maximum word size that can be stored in the FTS
index. This is effectively the key length of the index. In general, you should try to choose a length that is longer than
most or all words that are in the information being indexed. The default maximum word length is 30. If a user enters a
word with a length greater than that allowed the word will be truncated to the length allowed.
Protect Numbers - This option covers a very specific situation. If it is given and the comma and/or period is given as a
delimiter character, then numbers that contain commas and/or periods will not be broken into multiple words on those
delimiters. With "normal" text, the default delimiters and conditional drop characters will suffice. Using all default
settings, the comma and period are not delimiter characters (they are conditional drop characters). Text such as
"1,423.99" would be treated as a single word. If you created an FTS index with the period and comma as delimiter
characters, then that text would be broken up into three words "1, 423, and 99". If you use the Protect Numbers option,
then this would not occur. This option may be useful, for example, if the text contains words that have only commas
between them (with no other delimiters). In that case, it may be desirable to treat the comma as a delimiter.
Maintain Automatically - DocuXplorer can automatically maintain the Index Set data.
In the <Tools/Options/Index Set Search Dialog> the option's default is "Checked". This allows new Cabinets to
automatically inherit the "Checked" default. The Drawer object of any new Cabinet is automatically set to "Unchecked"
to facilitate the speed of adding documents to Drawers where document content indexing of electronic documents is not
required. The administrator will need to set a Drawer's Document Content Search Property to Enable and Maintain