64
l
How to create an extract script using automatic processes
l
How to save the extract design as a script file
l
New terms used throughout the Data Parser documentation
Procedure
These steps should be completed in the order shown.
Define Data Fields
After selecting the tutorial file and setting up basic options, the first step in defining most extract scripts is to determine the line of data that marks the end of a record. In
the TUTOR1 data file, the line of text that contains "Category:" marks the end of each record.
1. Highlight the line that contains the string "Category:", up to column 45. Check the indicator in the lower right corner of the screen for column locations.
2. Right-click anywhere in the Data Panel (the large white area of the screen) and select Define Data Field > Parse Tagged Data. Note: Data Parser automatically
defines a Line Style with the string "Category:" in columns 15 through 23 as the recognition pattern, and a Line Action of Collect Fields, and names it "Category".
It also creates a Data Field that collects any data on that line beginning at column 24, one space after the colon, and going to column 45, and names it
"Category". The field is now defined, and the text turns red on the screen.
3. If you wish to check the Data Field definition, you can double-click on the field itself (the red text) and the Field Definition window opens. Make any necessary
changes, then click Update.
4. Proceed to Define the Line Style - Accept Record.
Define the Line Style - Accept Record
Since the Category Line Style is the last line of the record, the Line Action should be Accept Record. When Data Parser creates a line style automatically, it makes the
line style Collect Fields, so the line action needs to be changed.
1. Double-click on the Line Style Name, "Category" in this case, in the Line Style Column, the yellow column on the left part of your screen. The Line Style
Definition window appears.
2. Click on the Line Action tab and select ACCEPT Record [Including] This Line
’
s Fields from the list of choices.
3. Click Update.
4. View the data record by clicking on the Browse Data Record button in the button bar.
5. Proceed to Adjust Data Field Definition.
Adjust Data Field Definition
1. Select the entire Problem No line by left clicking on that line in the Line Style Column (the left yellow column).
2. Right-click in the Data Panel (the large white part on the right) and select Define Data Field > Parse Tagged Data. The Line Style pattern that Data Parser
automatically creates looks for Problem No: in positions 13 through 23.
3. Double-click on Problem_No if you want to check it.
4. Click Close to close the Line Style Definition window.
5. To display the Field Definition window to view the information for the Problem_No: Data Field that was automatically generated, double-click anywhere in the
Data Field where the text is red.
6. Click the End Rule tab. Notice that the end rule is 52. This is larger than the Problem No: Data Field needs to be, because it is defining the size of the Data Field
all the way to the right margin of the report.
7. Change the end rule of the Problem_No: field to 30.
8. Click Update. Notice the selected area on the Data panel for the Problem_No: Data Field is much smaller after the update.
9. Proceed to Define the Header Information.
Define the Header Information
For this exercise, assume that the first line of the report contains information you want.
1. Highlight the report name WINTECH on line 8 in positions 11 through 17.
2. Right-click in the Data panel, and select Define Data Field > New Data Field. The Field Definition window appears.
3. The default Data Field Name is highlighted. Since there is no tag on this line, Data Parser used the data itself as the Line Style name and Data Field name.
Change the field name to ReportName by typing it in the Field Name box.
4. Click Add.
5. To define the report date Data Field, repeat steps 1 through 4, except highlight from columns 11 to 19 and name the field ReportDate.
6. Proceeed to Update Line Style.
Update Line Style
The purpose of this exercise is to update the automatically generated "Jul95" Line Style to make it more generic for different report dates.
1. To edit the "Jul95" Line Style, double-click on Jul95 in the Line Style Column. The Line Style Definition window diplays. Notice that the Pattern for this Line
Style looks for 13-Jul-95 in columns 11 to 19.
2. Size the cells in the grid to view the information better, by following these steps:
a. Position the mouse over the line in the header row of the grid where the column headings are. The mouse pointer becomes a bold vertical bar with arrows
pointing to the left and right.
b. Hold down the mouse button and drag the edge of the column to the left or right.
c. Release the mouse button when the column is the desired size.
d. If desired, adjust the height the same way using the gray border to the left where the triangle and asterisk are located.
3. To change the pattern to look for a line with any date with the dd-mmm-yy format, click once in the Look For? cell on the first row of the grid where 13-Jul-95
is currently displayed. A down arrow appears on the right side of that cell.
57
5. TAB to the Value cell, delete the original value, and type a dash (-).
6. Change the values of both the Begin and End cells to 13 by tabbing to them and typing in the correct number.
7. Click OK. Notice that the Look For?, Begin, and End values have changed in the Line Style Definition window to reflect the changes made in the Pattern
Builder window.
8. Add a new row to the Line Style Definition grid by clicking in the And/Or cell in the second row. Accept the value default of And.
9. Click in the Search What? cell of the second row and click the down arrow.
10. Select Column Range (m-n) from the displayed list.
11. Select Contains from the list displayed in the Operator cell of the second row.
12. Click on the arrow in the Look For? cell of the second row to display the Pattern Builder window again.
13. to the Value cell, delete the original value, and type in a dash (-).
14. Change the Begin and End values to 17. Be careful to only enter a dash in the Value cell and do not leave any spaces around it.
15. Click OK. The line style definition should now match any line with a dash in position 13 and 17.
16. Click Update to save the changes to the ReportDate Line Style.
17. Proceed to Define Remaining Data Fields and Line Styles.
Define Remaining Data Fields and Line Styles
In this exercise, you will Define Data Fields and Line Styles for the Techie, Status, MM/DD/YY, Time, Ser #, Version, Customer Name, Company Name, Phone #,
Source Type, and Target Type Tagged Data Fields.
1. Highlight the Field Tag, the Tag Separator, and the data by dragging the mouse with the left mouse button depressed from the beginning of the Tag to the end of
the Data Field. Remember to extend out to the right to catch wider data in subsequent records.
2. Right-click in the Data Panel and select Define Data Field > Parse Tagged Data. Data Parser creates a Line Style Definition and a Data Field Definition for you.
OR
3. Click the line in the Line Style column to select it.
4. Select Parse Tagged Data.
5. Open the Field Definition window.
6. Adjust settings.
7. Click Update and Close.
Note:Data Parser named the MM/DD/YY, Ser #, and Phone # Data Fields and corresponding Line Styles MMDDYY, Ser, and Phone. Also, Data Fields with
embedded spaces are named with the spaces removed. This was done because Field Names can only contain letters, digits and underscores. Scroll down in the
Data panel and see how the rest of the data is being defined.
8. Browse the data records to see how your data has changed.
9. If desired, rearrange the data fields as needed to meet your export file requirements.
10. Save and close your script.
Tip: This file can be parsed even more automatically. If you wish to try it, follow these steps:
1. Click the Clear Line Styles icon in the button bar.
2. Highlight all the tagged data lines in the entire first record, beginning with the Problem No line and highlighting all the way down and including the Category line.
Be sure to catch all the field tags and data plus some extra space to the right.
3. Right-click in the data panel and select Define Data Field > Parse Tagged Data. The Data Parser creates several new line styles and data fields at once. This
method only works in cases of highly structured and consistent data. And it can be a great time saver when conditions are ideal.
Data Parser for Unstructured Text - Tutorial 3 - Columnar Data
Tutorial 3 guides you through the steps to create and save a script file in Data Parser for Unstructured Text that reads and flattens a report containing columnar data.
In Tutorial 3, you convert the data in a columnar report file, to a flattened format, using the more automatic features of Data Parser.
This tutorial introduces more of the time-saving features of Extract Schema Designer. Since a great many report formats contain columnar data of some kind, it is highly
useful to anyone who wants to use Data Parser.
By following the steps outlined below, you become familiar with both the process of creating an extract script and the terms used throughout the documentation.
Unlike the previous tutorials, this file has multiple Accept Record line styles in a single page of the report. The primary data record information is in the table detail lines.
Each line is essentially a record. Each of these is an Accept Record line.
Tutorial Goals
In this tutorial, you will learn:
l
How to create a script that reads and flattens a report with columnar data
l
How to use more automatic features of Data Parser
l
How to save the script file
Procedure
The following steps should be completed in the order shown:
Define Line Styles and Data Fields for Detail Lines
62
l
Divides the line into seven Data Fields using spaces as a column separator. The Data Fields are given default field names SALES/ MARKETING_1 through
SALES/MARKETING_7.
l
Creates a Line Style for the line. The Line Style that is automatically created has a default Line Name of SALESMARKETING. It identifies all lines in the report
that have the string SALES/MARKETING in positions 1 through 16.
To define the line styles and data fields for detail lines:
1. Select the first detail line (it begins with SALES/MARKETING) by clicking in the Line Style column (the narrow yellow stripe on the left) immediately to the left
of the line. This highlights the entire line of text.
2. Right-click in the Line Style Column (the yellow part of the screen on the left) and select Parse Columnar Data.
3. From the menu, select Preferences and click once on Close Definition Dialogs on Add/Update to disable the option.
4. To view the definitions of the Data Fields created, double-click on the colored sections of the line. For example, double-clicking on the green numbers 75,249 in
the Data Panel brings up the SALES/MARKETING_2 Data Field in the Field Definition window. SALES/MARKETING_2 is the default Data Field name
given to the second Data Field in the SALESMARKETING line. It starts in position 20 and ends in position 27. Since it is defined for the Line Style
SALESMARKETING, only lines that match that recognition pattern contain this Data Field in positions 20 through 27.
5. Proceed to Change Data Field Names.
Change Data Field Names
The Browse Data Record uses the Data Field names as column headings for the Data Fields, so it is a good idea to change the Data Field names for
SALES/MARKETING_1 through SALES/MARKETING_7 to more descriptive field names.
To change Data Field names:
1. Double-click on one of the Data Fields in the SALESMARKETING line to open the Field Definition window.
2. In the Field Definition window, highlight the default Field Name and replace it with a corresponding descriptive name. See table 3-3 below.
3. Click Update.
4. To select the next Data Field, click the Field Name arrow to display a drop-down list of Data Fields that have been defined for the current Line Style.
5. Select the next Data Field and continue until you have renamed all the fields. Close the Field Definition window when finished.
6. Proceed to Change Line Style Name and Definition.
Table 3-3: Tutorial 3 - Suggested Data Field Names
Change Line Style Name and Definition
To view the new Line Style SALESMARKETING, double-click on the name SALESMARKETING in the Line Style column (the yellow column on the left of your
screen). The Line Style Definition window appears.
Notice the SALESMARKETING Line Style is recognized by a pattern where columns 1 to 16 contain the string SALES/MARKETING.
To change the Line Style Name and Line Action:
1. In the Line Style Definition window, change the Line Style Name by highlighting SALESMARKETING in the Line Style Name box and replacing it with Detail.
2. Also in the Line Style Definition window, click the Line Action tab and select the ACCEPT Record Including option.
3. Click Update.
4. Proceed to Define Line Recognition Rules.
Define Line Recognition Rules
The Detail Line Style only matches lines that have SALES/MARKETING in columns 2 through 16. That is the recognition pattern that Data Parser created
automatically, but it is not the pattern that is needed in this case. The pattern needs to be general enough to match all of the detail lines in the text, but specific enough to
match ONLY the detail lines. Update the Line Pattern so that the Line Style match all of the detail lines excluding the TEAM TOTALS line.
Analyze the detail lines to find what makes them unique in comparison to other lines in the text. Things to look for are position of the Data Fields, contents of the Data
Fields, anything that is consistent for each of the detail lines but not contained in non-detail lines. For example, the detail lines contain:
l
Commas in positions 24, 34 and 75 on every line
l
Only letters, white space, and a "/" in columns 2 through 18
l
Only digits, white space, and commas in columns 20 through 79
l
A digit in position 78
l
An upper case letter in each of the first 5 positions
Of all of the above observations, creating a pattern to look for uppercase letters in the first five positions is the best way to go. Here are some reasons why:
Default Name
Suggested Name
SALES/MARKETING_1Department
SALES/MARKETING_2Team1
SALES/MARKETING_3Team2
SALES/MARKETING_4Team3
SALES/MARKETING_5Team4
SALES/MARKETING_6Team5
SALES/MARKETING_7DepartmentTotal
65
l
subsequent reports. Suppose in this same report (created a week later) Team 2 of the Development department went to a pre-paid weeklong class and they
only spent 100 dollars on supplies for the class. This means that a comma would not be in position 34 of that detail line so it would not match the Line Style, and
the essential data on that line would be lost.
l
Defining a pattern to check for letters, white space, and a "/" in columns 2 through 18 would require three pattern lines and would also match the column heading
line.
l
Defining a pattern to match lines that contains at least one digit in positions 20 through 79 and do not contain letters or "/" would require three pattern lines and it
would match the detail lines. However, it also matches the Team Totals line.
l
Defining a pattern to match lines that contain a digit in position 79 would match the detail lines and the Team Totals line.
To define a pattern that looks for upper case letters in positions 2 through 6:
1. Click the Line Recognition Rules tab in the Line Style Definition window.
2. Click once in the Look For? cell in the first row of the grid and click the down arrow. The Pattern Builder window appears.
3. In the Pattern Builder window, click in the Type cell and click the down arrow to display the allowable values for the Type field.
4. Select character class from the list. This tells Extract Schema Designer what kind of data it needs to match for that line style to be valid.
5. Tab to the Value cell and click the arrow to display the allowable values for the Value field.
6. Select upper case letters from the list. This tells Data Parser the specific data it needs to match for that line style to be valid.
7. Change the value in the Count cell to 5 by highlighting the value there and typing a 5.
8. Change the value of the Begin field to 2 and the End field to 6. This tells Data Parser where to look for the data you specified and how many of that particular
data must be found for the line style to match that line.
9. Click OK.
10. Click Update to save the modified line style definition.
11. Proceed to Define Data Fields.
Define Data Fields
In this part of the exercise, you will define the rest of the data in the record, starting with the report title.
To define the ReportTitle Data Field:
1. Select the report title ABC CORPORATION BUDGET on line 1 by highlighting it in the Data Panel.
2. Right-click in the Data Panel and select Define Data Field > New Data Field. The Field Definition window appears.
3. Change the default name to ReportTitle.
4. Click Add. Data Parser takes the selected text and Data Field name to automatically define a Data Field named ReportTitle and a line style as well, named
ABC_CORPORATION_BUDG.
5. Click Close.
6. Double-click on ABC_CORPORATION_BUDG in the Line Style Column to display the Line Style Definition window. Notice that Data Parser automatically
creates a recognition pattern that looks for the literal ABC CORPORATION BUDGET in positions 27 through 48.
7. In the LineStyleName field, type ReportTitle.
8. Click Update and Close.
9. Proceed to Define Line Styles.
Define Line Styles
1. Select the report date 10/26/95 on line 2 by highlighting the text with the mouse.
2. Right-click in the Data Panel and select Define Data Field > New Data Field. The Field Definition window appears.
3. Change the default name to ReportDate.
4. Click Add and then Close. Data Parser takes the selected text and enters Data Field name and automatically define a Data Field and a Line Style.
5. Double-click Style1 in the Line Style Column to display the Line Style Definition window. Notice that Data Parser automatically created a recognition pattern
that looks for the literal "/" in positions 35 and 38.
6. Rename the Line Style to ReportDate.
7. Click Update and Close.
8. Browse the data Records to see how your data has changed.
9. If desired, rearrange the data fields as needed to meet your export file layout requirements.
10. Save and close your script.
Data Parser for Unstructured Text - Tutorial 4 - Floating Tags
Tutorial 4 guides you through the steps to create and save a script in Data Parser for Unstructured Text that reads and flattens a data file that containing floating tag data
in a variable-length ASCII report.
This tutorial is useful to anyone likely to be working with floating tag data. By following the steps outlined below, you become familiar with both the process of creating
an extract script and the terms used throughout the documentation.
Tutorial Goals
In this tutorial, you will learn:
l
How to create a script that reads and flattens an ASCII report with floating tag data
l
How to save the script file
l
New terms located throughout the documentation
Procedure
Documents you may be interested
Documents you may be interested