63
Note: The Run Export button at the bottom of this window will remain grayed out until you have entered and connected to the Target File/URI. However, you should
(at a minimum) go to the Properties and Records tabs and make any needed selections prior to exporting the data.
Sample Extraction (Optional)
On the Sample Extraction tab, you may specify a subset of the extracted data records to be exported. This is convenient if you need a random sample of the records,
or in cases where you simply want to export a few records for testing. The default selection is All Records. To select one of the other options, click the radio button to
the left of the desired option. The options are:
Filter Extraction (Optional)
On the Filter Extraction tab, you may specify a subset of the extracted data records to be exported, based on record-filtering criteria. This is convenient if you need to
export only the data records that contain specific information in a data field. The options are:
Properties
The options on the Properties tab are properties that you may need to specify for your output data file. After you have selected all the correct choices from the list of
options below, click the Apply button.
Delimited ASCII data has special characters between fields and records. Another name for delimited ASCII files is CSV (comma separated variable) text.
When delimited ASCII is your export file format, the default field separators are commas (,) and the field delimiters are quotation marks ("). The default record
separator is a carriage return-line feed (CR-LF). If different separators and/or delimiters are needed, you can specify them by selecting from the options in the
Properties tab and choosing the desired separators and delimiters for your target file. When you are creating a delimited ASCII file that will be imported into another
application, you should set each of the Properties options as required by application before exporting the data.
CodePage
The selected CodePage Translation Table determines which code table to use when writing out the target data. The default is ANSI, which is the standard in the US.
You may select any of the other code page options from the picklist.
RecordSeparator
A delimited ASCII file is presumed to have a carriage return-line feed (CR-LF) between records. To specify a different a record separator for your output data file,
click in the RecordSeparator cell and click once. Then click the down arrow to the right of the box and click the desired record separator in the list box. The list box
choices are: carriage return-line feed (default), line feed, carriage return, line feed-carriage return, form feed, empty line, ctrl-E and no record separator. If you have or
need an alternate record separator other than one from the list box, you can type it here.
l
If the record separator is not one of the choices from the list box and is a printable character, highlight the CR-LF and then type the correct character. For
example, if the separator is a pipe ( | ), type a pipe from the keyboard.
l
If the record separator is not one of the choices from the list box and is not a printable character, highlight the CR-LF and then enter a backslash ( \ ), an 'X' and
then the hex value for the separator. For example, if the separator is a check mark, type \XFB. For a list of the 256 standard and extended ASCII characters,
see Hex Values Reference Chart in the Data Parser User's Guide.
FieldSeparator
Delimited ASCII files are presumed to have a comma between each field. To specify a different field separator, click in the FieldSeparator Current Value box and click
once. Then click the down arrow to the right of the box to display the list of options. The list box options are: comma (default), tab, space, carriage return-line feed, line
feed, carriage return, line feed-carriage return, ctrl-R, a pipe (|) and no field separator. If you need an alternate field separator other than one from the list box, you can
type it here.
l
If the field separator is not one of the choices from the list box and is a printable character, highlight the CR-LF and then type the correct character. For
example, if the separator is an asterisk (*), type an asterisk from the keyboard.
l
If the field separator is not one of the choices from the list box and is not a printable character, highlight the CR-LF and then enter a backslash (\), an 'X' and the
hex value for the separator. For example, if the separator is a check mark type \XFB.
FieldStartDelimiter
Delimited ASCII files are presumed to have start-of-field and end-of-field delimiters. The default delimiter is a quotation mark ( " ) because it is the most common.
However, some files do not contain field delimiters, so this option is available for you to choose. To create an output data file with no delimiters, click in the
FieldStartDelimiter Current Value box and click once. Then click the down arrow to the right of the box and select None.
A Range of
Records
Select this option when you want to specify the starting and ending record numbers to export. Then enter the desired starting and ending record
numbers in their respective boxes.
Every Nth
Record
Select this option when you want to select a random sample of records to export. Then enter the desired value in the "N" box.
Source
Field
From the picklist, select the source data field that contains the information on which you want to filter the data records.
Operator
From the picklist, select the operator for your filter. The options are: < (less than), <= (less than or equal to), = (equal to), <> (not equal to), >= (greater
than or equal to), and > (greater than).
Value
In this box, enter the information on which you want to filter the data records. Remember, that the selected operator may determine exactly what you
enter here.
55
FieldEndDelimiter
Delimited ASCII files are presumed to have beginning-of-field and end-of-field delimiters. The default delimiter is a quote ( " ) because it is the most common.
However, some files do not contain field delimiters, so this option is available for you to choose, if needed. To create an output file with no delimiters, click in the
FieldEndDelimiter Current Value box and click once. Then click the down arrow and select None.
Header
This option uses the field names you created in your Data Field Definitions to automatically create a header record in your output CSV data file. To create a header
record, click in the Header Current Value box, click once and then click True. The default setting is False.
Note: If you are appending data to an existing file (Output Mode is Append), leave the setting for Header as False.
Field Delimit Style
This option determines whether the specified FieldStartDelimiter and FieldEndDelimiter is used for all fields, only fields containing a separator, or only text fields, as
follows:
StripLeadingBlanks
For an ASCII Target file, leading blanks are stripped by default. If you want to leave the leading blanks, click in the StripLeadingBlanks Current Value box and click
once. Then click the down arrow to the right of the box and click False.
StripTrailingBlanks
For an ASCII Target file, trailing blanks are stripped by default. If you want to leave the trailing blanks, click in the StripTrailingBlanks Current Value box and click
once. Then click the down arrow to the right of the box and click False.
TransliterationIn
Allows you to specify a character, or a set of characters, to be filtered out of the Source data. For any character in Transliterate In, the corresponding character from
the Transliterate Out property is substituted. If there is no corresponding character, the Source character is filtered out completely. Transliterate In supports C-style
escape sequences such as: \n (new line), \r (carriage return) and \t (tab).
TransliterationOut
Allows you to specify a character to be substituted for another character from the Source data. For any character in Transliterate In, the corresponding character from
the Transliterate Out property is substituted. If you wish the Source character to be filtered out completely, leave this field blank. If there are no characters to be
transliterated, this field should be left blank. Transliterate Out supports C-style escape sequences such as: \n (new line), \r (carriage return) and \t (tab).
MaximumDataLen
This option allows you to specify the maximum number of characters to write to a field. If this value is set to 0 (the default), the number of characters that are written to
a field are determined by the field length. If you set this value to any value OTHER THAN zero, data may be truncated.
NullIndicator
This option allows you to specify whether "blank" target data fields will contain an empty string (nothing) or a Null character (00). The default for most CSV text files is
None.
Record
On the Record tab, the picklist contains the name(s) of one or more record types, depending on the number of ACCEPT records you defined when defining the data
extraction rules, as follows:
l
If you defined only one line style as the ACCEPT record, the picklist will contain only one selection.
l
If you defined two or more ACCEPT records, the picklist will contain the names of each of the ACCEPT records. You may export only one type of record for
each export, therefore you must perform multiple exports if you have defined multiple record types in your data extraction rules. See "Tip" below.
Tip: If your source report/text file contains parent and child records for which you need to retain the relationships, you should include a key field in each of the record
types (ACCEPT record line styles) when specifying the fields to include in each. Using this methodology, your task of joining data across multiple files is possible.
Run Export Button
All
Places the delimiters specified in FieldStartDelimiter and FieldEndDelimiter before and after every field. "All" is the default setting. For example:
"Smith","12345","Houston"
Partial
Places the specified delimiters before and after fields only where necessary. A field that contains a character that is the same as the field separator
would have the field delimiters placed around it. A common occurrence of this is where there is a "memo" field that contains quotes within the text of
data. For example: "Customer responded with "No thank you" to my offer".
Text
Places delimiters before and after Text and Name fields (non-numeric fields). Numeric and Date fields have no FieldStartDelimiter or
FieldEndDelimiter. For example: "Smith", 12345,"Houston", 11/13/04
NonNumeric
Places delimiters before and after all non-numeric types, such as Date fields. An important difference between Non-Numeric and Text: Non-Numeric
delimits Date fields, while Text does not delimit Date fields.
64
directory path, and filename you specified on the Connect Info tab.
How to Save an Extract Script
After you have defined your data extraction rules (line styles and data fields definitions), save your extract script for reuse.
Before saving an extract script, notice that new extract scripts show as "Extract: Extract1" on the title bar of the main window.
image\Savescr.gif
You might want to save your script for a variety of reasons, including:
l
You may have a complex report file that requires more than one session for script designing.
l
You might receive a similar report file in the future, in which case you could open this one and modify it, rather than starting from the beginning.
How to Save an Extract Script
To save an extract script, click Save Extract in the toolbar or select File > Save Extract in the menu. Both of the options display the Save Extract window.
Save your extract script in the same path as your report, and have it share the name of your report, with a CXL extension. This is consistent with the standard naming
for Data Extractor scripts. The Data Extractor automatically saves your file with a .CXL extension.
The Author name is defaulted for you but can be changed.
Enter a description if desired and click OK. Your new extract script is saved.
Alternately, use the Save Extract As option to move an extract script file to a new location or save it as just a name, without a path.
How to ReUse a Saved Extract Script
You can open a previously-designed script with a different report or text file as follows:
1. Open the saved extract by double clicking on it in the grid that displays your saved extracts when you first open the Data Extractor.
2. Select Source > Options from the menu.
3. Click the File Properties tab.
4. Click the Text File down arrow and browse to choose the new report file.
Tip: If you modify the extract script to work with the new report/text file, you may want to use the Save As option to save the modified script with a new script
filename, rather than overwriting the original script.
Tool Bar Buttons
There is a toolbar located just below the main menu toward the top of the Extract Schema Designer window. The buttons in the toolbar provide quick access to the
most commonly used commands and windows.
Each of the buttons on the toolbar is described below:
Button Label
Description
New Extract
Selects a representative text file or HTTP address and create a new extract script file.
Open Extract
Opens a script file and report file that already exist.
Close Extract
Closes the current report and script file. If you have made changes to the open script, you will be prompted to save the changes.
Save Extract
Saves the extraction script on which you are working. If you are creating a new script, a dialog box will open in which you may enter or change the
name of the script file, the author name, and a description. If you are modifying an existing script, clicking on this button will overwrite the script. To
save another script with a different script name, select Save As from the File menu.
Validate
Verifies that at least one accept record is defined and all data fields are assigned to at least one accept record. If these conditions are met, a dialog
box confirms "Validation Successful". Otherwise, a dialog box warns you of these errors in your script.
Source Options
Opens a dialog box that gives you some information about the current report. This is also where you can make selections that affect the way the
Extract Schema Designer works with your report. See Source Options Window.
Source Font
Changes the screen font in the Extract Schema Designer window.
Underline Fields
Turns underlining of data fields ON or OFF. When underlining is turned ON, a broken line will display in the Data Panel under each defined fixed
position data field. To turn Underline Fields OFF, click this button again.
Delete All Line
Styles
Deletes all defined line styles in the current script.
Delete All Fields Deletes all defined data fields in the current script.
Clear All Accept
Record Fields
Clears All ACCEPT Records Fields. This allows you to completely scrap your entire export mapping and start over. When you choose this option,
you will be asked if you want to Clear Fields in All ACCEPT Records. If you choose Yes, all of your fields become unassigned again (note the icon
change in the Line Style column). If you choose No, you will return to the Extract Schema Designer window. For more details, see ACCEPT
Record Definition Window.
Search Text
Searches for some specific text string in your report. A dialog box in which you enter the search string will open. See Find Text.
59
Extract Script Manager Window
The Extract Script Manager is the first window the user sees after initiating a new session of Data Parser for Unstructured Text. This is where saved scripts are listed.
However, when you start the Data Parser for the first time, the list will be empty.
To create a new extraction script, click the New Extract button in the toolbar or select File > New Extract. For details, see Open a Text File, Report File or URI.
After one or more scripts have been created and saved, the following information about each appears: Extract Script Name, File Name, Author, Description and Last
Updated.
The columns in the Extract Script Manager can be sized larger or smaller so the user can view the entire contents of each field. To size a column, place the mouse
pointer directly over the vertical line between two of the column headings. The pointer will become a bold cross with arrows pointing left and right. Press and hold
down the left mouse button and drag the pointer in either direction until you have sized the column to the desired width. Then release the mouse button.
The horizontal scroll bar just below the table allows the user to scroll left and right within Extract Script Manager to view additional columns of information. You may
also use the mouse pointer to highlight the script name, and then move left or right using the left or right arrow keys.
If you have more Extracts than will display on one screen, you can use the vertical scroll bar to view additional scripts.
To highlight an extraction script (for viewing, modifying, or deleting), place the mouse pointer just to the left of the extraction script number. The pointer will change to a
right-pointing arrow. Click once to highlight the entire line. The extraction script name appears in the Extract box centered just below the Extract Manager table.
To open a saved extraction script and load the original report, highlight the extraction script as described above and double-click the extract name or select File >
Open Extract.
To open a saved extraction script and load a different report, open the extraction script as described above. Then select Source > Options. Choose the File Properties
tab. Click the down arrow next to text file, browse to select the new file and click Open. Click OK.
To delete a saved extraction script, highlight the extraction script as described above, then press Delete.
Pattern Builder Window
The Pattern Builder window is where you specify the exact criteria for which the Data Extractor is searching when it is determining whether or not a line of text in the
report should be defined as a particular line style.
The Pattern Builder Window is accessed from the Look For? cell in the Line Style Definition Window when you click the down arrow.
Available options in the Pattern Builder window are:
Edit
The menu in the upper left corner allows you to paste text from the white pane of the data display into the Value field. The paste function is only available when the
Type is literal.
Type
There are five types of criteria the Data Extractor looks for. The five options are literal, character class, negated character class, mask, and regular expression.
literal
Select this option to have the Data Extractor search for some specific string of text. The string will be entered in the Value column.
character class
Select this option to have the Data Extractor search for some classification of characters, digits, or other type of criteria. A character class may be any of the classes or
types of information listed below in the Value section, or a user may enter any valid string to specify their own character class.
Debug Extract
verify that your line styles and data field definitions are either correct or need some additional work. For more details see Debug Extract Design
Window.
External Source
Data Viewer
Displays your report in another application, the application it was created in, the application that Windows associates with files of this type, or any
other application of your choice. You can set a default application to use for viewing all of your files in the Source Options window. See Source
Options Window.
Browse Data
Records
Opens the browser after defining line styles and data fields. The Browser will display data from your report in a "flattened" column and row tabular
format. This allows you to verify that your line style and data field definitions are either correct or need some additional work. For more details, see
Record Browser Window.
Vertical Spacer
Shows or hides a red vertical positioning bar in the Data Panel in the main window. The positioning bar is helpful in determining whether or not
particular text characters start and/or stop in the same position throughout the report. After turning the Positioning Bar ON, click the mouse to
position the bar in the Data Panel. To turn the Positioning Bar OFF, click this button again.
Toggle
Space/Tab
Symbol
Shows or hides the space and tab symbols. Default symbols are a small gray dot to show where spaces exist in the source file and a double right
pointing arrow for tabs. In Source > Options > Display Choices, you can choose different symbols if you wish. For details, see Source Options
Window. Space and tab symbols can also be toggled on and off by selecting Preferences > Show Space Symbol from the menu.
Help
Accesses the documentation contents screen.
65
negated character class
Select this option to have the Data Extractor search for anything EXCEPT some classification of characters, digits, or other type of criteria. A character class may be
any of the classes or types of information listed below in the Value section, or a user may enter any valid string to specify their own character class.
mask
Select this option if you need to enter a special expression to define the search criteria. There are three special symbols that you may use along with any printable
character to build the mask. The special symbols are @, #, and *. Each is explained below.
Any printable ASCII character (except the three special characters shown above) and spaces can be used as a literal in combination with the three special characters
to build the mask.
Some mask examples:
To set up the criteria for any Canadian postal code, enter a mask of @#@ #@#. This mask looks for [A - Za - z][0 - 9] [A - Za - z] (space) [0 - 9] [A - Za - z][0 -
9]
To set up the criteria for any social security number, enter a mask of ###-##-####. This mask looks for [0 - 9] [0 - 9] [0 - 9] - [0 - 9] [0 - 9] - [0 - 9] [0 - 9] [0 - 9]
[0 - 9]
regular expression
Select this option if you want to use a language that allows you to specify a string of characters that defines a set of rules for matching character strings. A regular
expression will match a field whether it matches the whole field or just a small sequence within the field.
The following are the special characters that can be used in Data Extractor regular expressions:
| (and) * + ? [and] - . \ ^ $
To use the literal value of a special character within a regular expression, you must precede the special character with a backslash ( \ ). For example, to enter a literal
backslash, you must type it twice ( \\ ); to enter a literal dollar sign, you must type backslash and then dollar sign ( \$ ).
Example:
To set up the criteria for a four-column line position test, enter (0[0-9])(1[1-5]) OR 0[0-9]1[1-5]. This expression would find a line number with the first position of 0,
the second position of 0-9, the third position of 1, and the fourth position of 1-5. Yielding up to 50 different hits.
Value
If you selected literal in the Type column, the value entered here is the exact string for which you want the Data Extractor to search. When entering the value, type only
the exact string. This is case sensitive. The Data Extractor automatically encloses the value in quotation marks. Also see "Set Value = Selected Report Text".
If you highlighted a section of the text before opening the Line Style Definition window, that value is generally entered here automatically.
When you select character class, negated character class or regular expression in the Type column you must either make a selection from the list box in the Value
column or enter your own string. Remember, negated character class means anything EXCEPT what you specify in the Value column.
The options in the Value column list box follow:
Symbol
Description
@
Use the @ (at sign) when you want the Data Extractor to search for any alpha character. For each @ entered in the mask, the Data Extractor searches for
[A - Za - z].
#
Use the # (pound sign) when you want the Data Extractor to search for any digit. For each # entered in the mask, the Data Extractor searches for [0 - 9].
*
Use the * (asterisk) when you want the Data Extractor to search for any alphanumeric character or digit. For each * entered in the mask, the Data Extractor
searches for [A - Za - z0 - 9].
Option
Description
any character
Select this option to have the Data Extractor search for any printable or non-printable character.
digits
Select this option to have the Data Extractor search for any of the digits zero (0) through nine (9).
digits/
Select this option to have the Data Extractor search for any of the digits zero (0) through nine (9) or a forward slash (/).
digits-
Select this option to have the Data Extractor search for any of the digits zero (0) through nine (9) or a hyphen (-).
digits/-
Select this option to have the Data Extractor search for any of the digits zero (0) through nine (9) or a forward slash (/) or a hyphen (-).
letters
Select this option to have the Data Extractor search for any printable alphabetical character (a - z or A - Z).
upper case letters
Select this option to have the Data Extractor search for any printable upper case alphabetical character (A - Z).
lower case letters
Select this option to have the Data Extractor search for any printable lower case alphabetical character (a - z).
alphanumeric characters
Select this option to have the Data Extractor search for any printable alphabetical character (a - z or A - Z) or any digit from zero (0) through
nine (9).
letters and white space Select this option to have the Data Extractor search for any printable upper or lower case letter (A-Z or a-z) and white space.
upper case letters and
white space
Select this option to have the Data Extractor search for any printable upper case letter (A-Z or a-z) and white space.
64
Count
The count determines how many of the specified string(s) the Data Extractor searches for.
It is important to note that Count searches for consecutive values. For example, if you search on the string MM/DD/YY using the following settings in the Pattern
Builder, no matching results display:
l
Type=character count
l
Value=alphanumeric
l
Count=3
There is no match because MM/DD/YY does not contain 3 consecutive alphanumeric characters.
Here are the Count options:
Begin
This is the column number in which the Data Extractor should start searching for the specified string. If you modify the string in the Value column, you may need to enter
a new column number here. Values only appear in the Start cell when Column is selected in Search What?. For more information about the Begin option, see Begin
(line or column).
End
This is the column number in which the Data Extractor should stop searching for the specific string. If you modify the string in the Value column, you may need to enter
a new column number here. Values only appear in the End cell when Column is selected in Search What?. For more information about the End option, see End (line or
column).
Extract Script Designer
Extract Script Designer is the working area where you will use the mouse pointer, shortcut menus, and dialog boxes to define specific parts of the report text that you
want to convert to another format.
The main window may be sized, minimized, or maximized to suit your needs.
A bold Vertical Splitter Bar splits this window into two panes. The left pane is the Line Style Column. The right pane is the Data Panel. Details about this window are
discussed below.
Line Style Column
This is the lightly tinted yellow pane to the left of the Vertical Splitter Bar. When you first open a new report file, the Line Style Column will be empty. As you define
lines of text, the line style names will display in the Line Style Column. Along with the line style names, one or more symbols may also display in the Line Style Column
to designate the line action and/or other information about a particular line in your report. See Line Action.
The main purpose of the Line Style Column is to display the line style name and line action symbols of each defined line of text. The line style names give you a visual
way to verify that each line of text in your report file matches the appropriate line style recognition pattern. The line action icons give you a visual way to identify how
the Data Parser for Unstructured Text is going to process the line of text and any data fields you have defined within that line.
The mouse behavior is slightly different in the Line Style Column as opposed to the Data Panel. If you click the mouse in the Line Style Column, the entire line will be
selected. When you highlight some of the data and right click in the Line Style Column, a different shortcut menu will display from the one that would display if you
highlighted some data and right clicked in the Data Panel. Details about the shortcut menus are described in Shortcut Menus. For details about how to define line styles,
see Defining Line Styles.
After a line of text has been defined, you can double-click a line style name in the Line Style Column to open the Line Style Definition window for viewing and/or
modifying the recognition rule. For details, see Line Style Definition Window.
There is a horizontal scroll bar at the bottom of the Line Style Column. If the width of the Line Style Column is less than the longest line style name, you may scroll to
white space
Select this option to have the Data Extractor search for any printable lower case letter (A-Z or a-z) and white space.
white space
Select this option to have the Data Extractor search for a Tab (hex 09 or ANSI 009) or some designated number of spaces.
space
Select this option to have the Data Extractor search for a single space.
tab
Select this option to have the Data Extractor search for a tab character (hex 09 or ANSI 009). Tab expansion must be set to 0 on the Printer
Emulation tab in the Source Options window in order for the Data Extractor to detect Tab characters in the file.
carriage return
Select this option to have the Data Extractor search for a carriage return (hex 0D or ANSI 013).
Option
Description
0 -
many
Select this option if you want the Data Extractor to search for "None" to "Many" of the specified string(s).
1 -
many
Select this option if you want the Data Extractor to search for "One" to "Many" of the specified string(s).
0 - 1
Select this option if you want the Data Extractor to search for "None" to "One" of the specified string(s).
1 thru 9
Select any of these options when you want the Data Extractor to search for a specific number of the specified string(s). If you want the Data Extractor to
search for more than 9 of a specified value, type the number desired in this cell.
52
Data Panel
This is the large white pane to the right of the bold Vertical Splitter Bar. The Data Panel is the main work area where you highlight selected text that will be used to
define line styles and/or data fields.
To indicate that you want to define a line style or data field, highlight a particular selection from a line of text and click the right mouse button to bring up a series of
shortcut menus from which to work. For details, see Shortcut Menus.
As data fields are defined, the text that is included in the various fields will change from black to red, green, blue, or magenta if the fields are fixed length and fixed
position. Where there is more than one data field on a line of text, the colors will alternate so you can distinguish one field from another on that line. The individual
colors have no particular significance. You may also have the Data Parser for Unstructured Text display a colored, dashed line under each defined data field by turning
Underline Fields ON in the Preferences menu. For details about other display options, see Source Options Window.
For details about defining lines of text, see Defining Line Styles.
Details about defining data fields are discussed in the Define Data Fields.
There is a horizontal scroll bar at the bottom of the Data Panel. If the width of the Data Panel is less than the longest line of text in your report file, you may scroll to the
right to view the portion of the text that does not display. You may also adjust the width of the main Data Parser for Unstructured Text window by dragging its right
border to the right, or by maximizing it.
There is a vertical scroll bar on the right side of the Data Panel when the report is longer than will display in the window. You may scroll up and down in the Data Panel
to view the portions of the report that do not display. The Source Sample setting in Source Options will determine how much of your report file actually shows in the
Data Panel. For details, see Source Options Window.
Vertical Splitter Bar
The bold vertical splitter bar between the Line Style Column and Data Panel may be dragged left and right to adjust the sizes of the two panes of the main window.
Place the mouse pointer directly on the splitter bar. The pointer will change to a bold cross with left- and right-pointing arrows. Click and hold down the left mouse
button and drag the splitter bar to the desired position. Then release the mouse button.
This bold black Vertical Splitter Bar between the two panes of the Script Designer window should not be confused with the thin red Vertical Positioning Bar that can
be placed in the Data Panel by clicking the Vertical Positioning Bar button on the toolbar and then clicking in the Data Panel.
Cursor Position Boxes
In the lower left of this window, below the Line Style Column, there are two boxes that indicate at which text line and column the blinking cursor is positioned. A text or
report file must be opened in the Data Parser for Unstructured Text for this to be active.
Field Name Indicator
When you move the mouse pointer over a defined data field (three exceptions discussed below) in the Data Panel, the mouse pointer will change to a hand. In the
lower left of this window, below the Data Panel and Line Style Column, there is a Field Position Indicator that will display the name of the data field over which the
mouse pointer is located.
There are three exceptions:
l
Data fields that are defined as Floating Tags
l
Data fields that are defined as Relative Word Position
l
Subsequent lines of a data field that continues across multiple lines of text
These fields are non-fixed length and non-fixed position, and therefore will not be colored on the Data Panel.
Mouse Position Box
In the lower right of this window, below the Data Panel, there is a box that indicates at which text line and column the mouse is positioned. The number to the left of the
comma indicates the text line. The number to the right of the comma indicates the column. A text or report file must be opened in the Data Parser for Unstructured Text
for this to be active. And when you move the mouse pointer beyond the End of Line marker, the mouse position boxes will remain blank.
Hex and Decimal Value Box
In the lower right corner of this window, below the Data Panel, there is a box that indicates the hex and decimal value for any character that the mouse pointer is
positioned over. See for a table of values and their associated characters. This box will not display a value for end of line characters. See Decimal and Hexadecimal
Values.
Source Options Window
The Source Options window opens each time you begin a new extract script, unless you go to the Display Choices tab and clear the Display Source Options with New
Extract check box. Otherwise, to open this window, click the Source Options button in the Tool Bar, or select Options from the Source menu. The purpose of the
Source Options window is to allow you to change the way the Data Parser for Unstructured Text reads your text file.
If you are familiar with the text or report file with which you are working, the Source Options window can be opened and some selections made before opening the file.
Other options can be changed to meet the requirements of the report file as you are parsing it.
84
If you make changes to the settings in this window after opening the report file, the Data Parser for Unstructured Text may reread and reload your file. This may take a
few seconds.
The window is divided into seven tabs: Extract Design Choices, Display Choices, File Properties, Printer Emulation, Character Set, Character Filters and External
Viewer. The options in each tab are discussed below:
l
Extract Design Choices
l
Display Choices
l
File Properties
l
Printer Emulation
l
Character Set
l
Character Filters
l
External Viewer
l
Extract Design Choices
Extract Design Choices
This topic covers the settings under Extract Design Choices.
Tag Separator
The tag separator selected here tells the Data Parser for Unstructured Text how to distinguish a field tag from the data field when analyzing a line of text. This is only
relevant when you are making use of the Parse Tagged Data shortcut menu option.
The tag separator choices are:
For details on tagged report data, see Define Data Fields.
Column Separator
The Column Separator selected here tells the Data Parser for Unstructured Text how to distinguish one column of data from the next when analyzing a line or block of
text. This setting is only relevant when you are using the Parse Columnar Data or Parse Columnar w/ Heading shortcut menu options. Each column within a line of text
will become a data field.
The Column Separator choices are:
Flush Field Contents on Accept default
The Data Parser for Unstructured Text outputs each record type as a fixed length, consistent structure record. This means that even if a field or line does not exist in a
Tag
Separator
Description
ColonSpace
(: )
Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a Colon and a Space. Example: Name:
John M. Smith
Colon (:)
This is the default setting. This option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a colon only. Example:
Name: John M. Smith
SpaceColon
( :)
Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a Space and a Colon. Example: Name:
John M. Smith
Dash (-)
Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a dash. Example: Name-John M. Smith
Comma (,)
Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a comma. Example: Name, John M.
Smith
# of Spaces
Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a specified number of spaces. When you
select this option another box appears to the right of the tag separator box in which you type the desired value to specify how many spaces. Example
with 3 spaces specified: Name John M. Smith
# of Spaces +
Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by two or more spaces. Example: Name
John M. Smith
Vertical Bar
( | )
Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a vertical bar or pipe ( | ). Example:
Name|John M. Smith
Other
If your report file has a character that is not on the list of choices separating the field tags from the field data, you can highlight the character shown and
type in any single printable character. Example: Name*John M. Smith
Column
Separator
Description
(2+) Spaces
This is the default setting. This option tells the Data Parser for Unstructured Text to distinguish between two columns of data by two or more spaces.
Example: 10019 John M. Smith. The account number 10019 is one column, or data field, and the person's full name, John M. Smith, which contains
single spaces, is another column, or data field.
(1) Space
Selecting this option tells the Data Parser for Unstructured Text to distinguish between two columns of data by a single space.
Tab
Selecting this option tells the Data Parser for Unstructured Text to distinguish between two columns of data by a tab.
Vertical Bar
( | )
Selecting this option tells the Data Parser for Unstructured Text to distinguish between two columns of data by a vertical bar or pipe ( | ).
Other
If your report file has a character that is not on the list of choices separating columns of data, you can highlight the character shown and type in any
single printable character. Example: 10019 - John M. Smith
Documents you may be interested
Documents you may be interested