Content

Overview

styles

General Information

Simple Grid

Crosstab Grid

Freeform

Composite

 

Fields

Creating and Removing Fields

Field Types

Data Types

Field Properties

Validation Rules

 

Overview

Top ../images/arwup.gif (846 bytes)

The Data File Parser is a graphical user interface for parsing data files that are processed according to a Data File Definition.

In this context, "parsing" refers to creating Fields associated with data (in a cell or range of cells) and mapping the data to the Fields. The Processing script (Groovy or Action Block) references these Fields and mapping information when processing the data file, ie., importing the data into LabVantage.

After loading an example file, click the File from the list to open the Data File Parser. The user interface is similar to the Form Builder, i.e., Objects in the upper left frame, Properties (for Fields) in the bottom left frame, and Grid (spreadsheet) in the right frame. The Grid shows data from the example file in rows-by-columns format. You can select cells, rows, columns, and ranges (similar to Excel).

 

 

Frame Description
Grid
(right)
Spreadsheet-like representation of data in the example file.
Properties Frame
(lower left)
Lets you view and set properties of the Field selected in the Objects frame.
Objects Frame
(upper left)
Shows fields defined on the Grid. Clicking a Field shows in properties in the Properties frame.

Icons in the Data File Parser page

Icons Description
Loads an Example File.
Maximizes/Minimizes the editor.

 

styles

Top ../images/arwup.gif (846 bytes)
General Information

These acronyms are displayed in the Grid to define a cell's position relative to the data:

Acronym Meaning Description
EOF End of File Last column or row containing data.
FBC First Blank Column First blank column encountered.
FBR First Blank Row First blank row encountered.

For example, selecting from cell "A1" to "A EOF" means that the selection is in column A, and from row 1 down to the physical end of rows.

Simple Grid

A "Simple Grid" is a basic grid as shown in the example below. Your Example File must match this format.

  A B C D E
1 SampleId SampleType Description AnalystId CreatedDt
2 S-001 Finished For production. GK 2/2/2048
3 S-002 QC For test. LP 2/2/2048
4 S-003 QC For test. US 2/2/2048
5 S-004 Finished For production. TP 2/2/2048

You can define "Column" Fields (a Range of rows in one column), "Row" Fields (a Range of columns in one row), or "Cell" Fields (individual cells).

Crosstab Grid

A "Crosstab" is a more sophisticated grid with headers as shown in the example below. A Crosstab can consist of:

A"Range" Field
Zero or more "header Row" Fields
Zero or more "header Column" Fields
Zero or more "Cell" Fields

Below is an example of a Crosstab layout with 3 header Rows, 2 header Columns, and a Range Field consisting of cells B4 through C9. In general, your Example File must match this kind of format (although in practice you can have a different number of headers, such as 1 header Column and 1 header Row).

    header
Column
header
Column
   
  A B C
1 Metals Metals
2 Au (1) Zn (1)
3 Standard Standard
4 Blank 10 70
5 Unknown 20 80
6 Unknown 30 90
7 Dup 40 100
8 Unknown 50 110
9 Unknown 60 120
       
   

Example of a Range:
Cells B4 - C9

 
 
 
 

 
 
 
 

header Row

header Row

header Row

During processing (when data are created for the Fields), a Crosstabbed grid is "de-crosstabbed". The example below shows how this is done. First, the Range Field is linearized (column B below right). Next, the header Column is replicated once for each column in the Range (column A). Then the header cells Au(1) and Zn(1) are replicated once for each item in the Target Range (column C).

1   Metals Metals
2   Au (1) Zn (1)
3   Standard Standard
4 Blank 10 70
5 Unknown 20 80
6 Unknown 30 90
7 Dup 40 100
8 Unknown 50 110
9 Unknown 60 120
Crosstabbed

A B   C  
Blank 10 Metals Au (1) Standard
Unknown 20 Metals Au (1) Standard
Unknown 30 Metals Au (1) Standard
Dup 40 Metals Au (1) Standard
Unknown 50 Metals Au (1) Standard
Unknown 60 Metals Au (1) Standard
Blank 70 Metals Zn (1) Standard
Unknown 80 Metals Zn (1) Standard
Unknown 90 Metals Zn (1) Standard
Dup 100 Metals Zn (1) Standard
Unknown 110 Metals Zn (1) Standard
Unknown 120 Metals Zn (1) Standard

"Decrosstabbed"

Freeform

This can have any type of Fields. It encompasses functionality of Crosstab grids, but allows other undetermined options to be exercised.

 

Composite

A "Composite" lets you import multiple Excel Worksheets (of dependent data) within the same Data File. Composite DFDs consist of an ordered sequence of child DFDs, each corresponding to a different Worksheet within the same Data File. Since Composite DFDs rely on Excel Worksheets to provide the child DFD import, only Excel file formats are supported (not text/csv files).

Choosing the "Composite" style removes the Parser and Processing details and adds the Data File Definition detail upon Saving.

Define the Data File

The Data File must be a multi-worksheet Excel file that includes the required sheet names. You will then create a DFD file for each sheet.

For example, MasterDataImport requires two worksheets Param and ParamList.

 

List the child DFDs (sheets)

For each sheet, include a child Data File. In the Data File Definition detail, include each child DFD (a child DFD must not be a "Composite" DFD).

Be sure the DFDs are listed in the same order as the sheets of the Excel file, Child DFDs are imported in the exact order specified in the Data File Definition detail. You can change the sheet Name specified in the Child DFD.

NOTE:   If you attempt to import a file that does not have the required worksheets, you will not be allowed to proceed. Click the Back button to upload another file.

Define the Commit Scope

"Process Block Size" and "On Failure" action are defined within each child DFD and can not be defined for the Composite DFD. You can however define the Commit Scope for the Composite DFD.

Option Description
At end of each worksheet This option requires each child DFD to ignore its processing options commit size. Each child DFD will commit its data at the end of the import of a single worksheet.
Composite DFD import, calls the ImportDataFile for each child DFD with an overriding property called COMMITSCOPE = "ALL". This indicates that the transaction is to be committed at the end of "ALL" the records in the worksheet being processed.
Defer to child commit rules This option is the default for all Composite processing. Each child DFD determines the Commit Scope based on the child DFD definition. The Composite does not drive the Commit scope.
When action is committed This option commits the entire import operation when all the import actions are committed.
Composite DFD import, calls the ImportDataFile for each child DFD with an overriding property called COMMITSCOPE = "NONE". This makes sure the child DFD import is not committed until all the child DFDs are committed.

During Import, if the uploaded file has all the required worksheets, the wizard indicates that the check has passed.

Validation Log

A Validation log is displayed in a single table for all worksheets. An additional column indicates which worksheet contains errors or warnings (if any). If any of the worksheets have Errors, the import cannot proceed.

A single message log entry is created for a Composite DFD import. The Validation Log and Process Log display the Message Log for each child DFD import.

 

Fields

Top ../images/arwup.gif (846 bytes)

Creating and Removing Fields

 

To create a Field:

1.   Select data in accordance with the style, right click within the selection, then select "Add To New Field".
2.   In the resulting dialog, specify a Field Id (identifier for the Field), a title (user-friendly version of the Field Id), title and other necessary properties. The dialog automatically detects the type of selection (such as column, row, headercolumn).
3.   Click OK. The Field is created and appears in the Objects frame.

You can right-click a Field object in the Objects frame to create a field either from a selection ("Add Field from Range"), or from scratch ("Add Empty Fields").

To highlight all Fields in the grid, right-click the Field, then select "Highlight Fields".In addition you can also create fields of type "input" by choosing the option "Add Input Field". See Field Types for details.

To remove an existing Field, right-click the Field, then select "Remove Field".

When you click a Field in the Objects frame, the range is selected in the grid and the Field properties are loaded into the Properties frame. You can then edit the properties for that Field.

Field Types

 

The Types of Fields you can create depend on the style you select in the Data File Definition Maintenance page:

style Allowable Field Types
Simple Grid Cell
Row
Column
Input
Crosstab Grid Cell
Range
header Row
header Column
Input
Freeform All

Note the "Input" Field Types. "Input" fields correspond to additional properties that will be passed on to the ImportDataFile Action. Input fields are available in the Processing Rules definition (as is the case with the other fields). The name of the input field should correspond to the name of the input property passed to ImportDataFile. At run time, the value of the input fields is determined from the input properties to the ImportDataFile Action.

Data Types

 

Several of the Field Properties below let you declare a Data Type. The Data File Parser drives the parsing of data from Excel files as follows:

Excel Cell Format LabVantage Data Type Data File Parser Conversion
String String The Excel string is parsed without modification.
String Number The Excel string must correspond to a valid numeric string in the User's locale. Parsing errors are reported at runtime.
String Date The Excel string must correspond to a valid date string in the User's locale. Parsing errors are reported at runtime.
Number String The Excel numeric value is converted to a string. Formatting specified in Excel is retained.
Number Number The Excel numeric value is converted to a numeric string in the User's locale, then passed to the Actions in the processing script.
Number Date The Data File Parser throws an error.
Number
DateFormatted
String The Excel date value is converted to a string. Formatting specified in Excel is retained.
Number
DateFormatted
Number The Data File Parser throws an error.
Number
DateFormatted
Date The Excel date value is converted to a date string in the User's locale, then passed to the Actions in the processing script.

Field Properties

 

Here are the Field Properties supported by each Field Type and style.

Cell (Simple Grid and Crosstab Grid)

Field Details

Property Description
Id Identifier of the Field.
title User-friendly alias to display in place of the Id (above).
Data Type LabVantage Data Type that defines the selected data (string, number, or date).
Validation Define Validation Rules for this field.
Error Prefix In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property Description
Type Field Type (defaults to "cell").
Column Coordinates of the cell defined for the Field (number of the column and row).
Row
Row (Simple Grid)

Field Details

Property Description
Id Identifier of the Field.
title User-friendly alias to display in place of the Id (above).
Data Type LabVantage Data Type that defines the selected data (string, number, or date).
Validation Define Validation Rules for this field.
Error Prefix In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property Description
Type Field Type (defaults to "row").
Row Number of the row defined for the Field.
Start Column Type Defines how the first column in the Range is determined:
 First column in the Range is:
absolute "Start Column" (below).
relativeNot supported.
Start Column First column in the Range.
End Column Type Defines how the last column in the Range is determined:
 Last column in the Range is:
firstblankcolumn First blank column encountered.
endoffileLast column containing data.
absolute"End Column" (below).
End Column Last column in the Range. Applies only if "End Column Type" (above) is "absolute".
Column (Simple Grid)

Field Details

Property Description
Id Identifier of the Field.
title User-friendly alias to display in place of the Id (above).
Data Type LabVantage Data Type that defines the selected data (string, number, or date).
Validation Define Validation Rules for this field.
Error Prefix In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property Description
Type Field Type (defaults to "column").
Column Number of the column defined for the Field.
Start Row Type Defines how the first row in the Range is determined:
 First row in the Range is:
absolute "Start Row" (below).
relativeNot supported.
Start Row First row in the Range.
End Row Type Defines how the last row in the Range is determined:
 Last row in the Range is:
firstblankrow First blank row encountered.
endoffileLast row containing data.
absolute"End Row" (below).
End Row Last row in the Range. Applies only if "End Row Type" (above) is "absolute".
Range (Crosstab Grid)

Field Details

Property Description
Id Identifier of the Field.
title User-friendly alias to display in place of the Id (above).
Data Type LabVantage Data Type that defines the selected data (string, number, or date).
Validation Define Validation Rules for this field.
Error Prefix In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property Description
Type Field Type (should default to "range", but currently says "cellrange").
Start Column Type Defines how the first column in the Range is determined:
 First column in the Range is:
absolute "Start Column" (below).
relativeNot supported.
Start Column First column in the Range.
Start Row Type Defines how the first row in the Range is determined:
 First row in the Range is:
absolute "Start Row" (below).
relativeNot supported.
Start Row First row in the Range.
End Column Type Defines how the last column in the Range is determined:
 Last column in the Range is:
firstblankcolumn First blank column encountered.
endoffileLast column containing data.
absolute"End Column" (below).
End Column Last column in the Range. Applies only if "End Column Type" (above) is "absolute".
End Row Type Defines how the last row in the Range is determined:
 Last row in the Range is:
firstblankrow First blank row encountered.
endoffileLast row containing data.
absolute"End Row" (below).
End Row Last row in the Range. Applies only if "End Row Type" (above) is "absolute".
header Row (Crosstab Grid)

A header Row Field defines a single row.

Field Details

Property Description
Id Identifier of the Field.
title User-friendly alias to display in place of the Id (above).
Data Type LabVantage Data Type that defines the selected data (string, number, or date).
Validation Define Validation Rules for this field.
Error Prefix In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property Description
Type Field Type (defaults to "headerrow").
Row Number of the row defined for the Field.
Target Range Field to which the header Row applies.

Example

header Column (Crosstab Grid)

A header Column Field defines a single column.

Field Details

Property Description
Id Identifier of the Field.
title User-friendly alias to display in place of the Id (above).
Data Type LabVantage Data Type that defines the selected data (string, number, or date).
Validation Define Validation Rules for this field.
Error Prefix In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property Description
Type Field Type (defaults to "headercolumn").
Column Number of the column defined for the Field.
Target Range Field to which the header Column applies.

 

Input Field (SimpleGrid)



Field Details

Property Description
Id Identifier of the Field.
title User-friendly alias to display in place of the Id (above).
Data Type LabVantage Data Type that defines the selected data (string, number, or date).
Validation Define Validation Rules for this field.
Error Prefix In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property Description
Type Field Type (defaults to "input").
Default Value Default Value of the input field, if any.

 

Input Field (CrosstabGrid)



Field Details

Property Description
Id Identifier of the Field.
title User-friendly alias to display in place of the Id (above).
Data Type LabVantage Data Type that defines the selected data (string, number, or date).
Validation Define Validation Rules for this field.
Error Prefix In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property Description
Type Field Type (defaults to "input").
Default Value Default Value of the input field, if any.

 

Input Field (Freeform)

 

Field Details

Property Description
Id Identifier of the Field.
title User-friendly alias to display in place of the Id (above).
Data Type LabVantage Data Type that defines the selected data (string, number, or date).
Validation Define Validation Rules for this field.
Error Prefix In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property Description
Type Field Type (defaults to "input").
Default Value Default Value of the input field, if any.

 

Validation Rules

Top ../images/arwup.gif (846 bytes)

Define Validation Rules when adding new or editing existing field properties. Click the ellipsis in the "Validation" field of the Field Properties to open the Validation Editor.

For each Validation Rule, specify Validation Error Behavior as one of the following (checking the Rule shows the Error Behavior options):

Option Description
Error The import cannot proceed until the file is fixed. The error displays in "Red" in the Validation Log.
Skip The "record" should not be processed if the success criteria is not met, the rest of the file is processed. A Warning is added to the Validation Log.
NOTE:  A "record" corresponds to a row in a Simple Grid, or a row of the "de-crosstabbed" grid in the case of range fields.
Substitute Specify a replacement value for "rogue" values that did not meet the "Success Criteria" so that the import can proceed safely. A Warning is added to the Validation Log.

Following is a list of each Validation Rule and associated options:

Rule Description
Mandatory Indicates whether or not a field value is mandatory. Choose to always require a value for this field, or click "Add Condition" to conditionally require a value.

 

When a Condition is defined (and exists), a value is required. For example, you might require that "Cigarettes per day" is required if the Subject is a Smoker.

Each condition includes three parts:

 

value1 Comparison Operator value 2
[<fieldid>.value] or [<fieldid>.length] where <fieldid> corresponds to a field in the DFD.  Depends on the selection of value1. When comparing field values, the available Comparison Operators is determined by the Data Type of the field selected for value1.
Data Type Comparison Operators
String Not Equals, Contains, Does Not Contain, Starts With, Ends With
DateBefore, After, On or Before, On or After
NumberEquals, Not Equals, Greater Than, Greater than Or Equals, Less Than, Less Than Or Equals
 <literal> or [<fieldid>.value] or [<fieldid>.length] where <fieldid> corresponds to a field in the DFD .


If the condition evaluates to true, the field is mandatory.

Literal Values

Choose to include literal values in conditions for comparison. The following rules apply when specifying literals:

For numeric fields, the value should specify the number in US Locale using "." as the decimal separator
For date fields, the value should be one of the following (you cannot specify a specific date string).
NOW, TODAY, MIDDAYTODAY, ENDTODAY, YESTERDAY, ENDYESTERDAY, TOMORROW, MIDDAYTOMORROW, ENDTOMORROW, STARTWEEK, ENDWEEK, STARTMONTH, ENDMONTH, STARTPREVMONTH, ENDPREVMONTH, STARTNEXTMONTH, ENDNEXTMONTH, STARTYEAR, ENDYEAR, STARTNEXTHOUR, ENDNEXTHOUR

 

NOTE:   If multiple conditions are specified, a logical AND of all conditions is performed.
Distinct Check The Distinct Check rule lets you flag duplicate values for a field, or a combination of fields, in the file being imported.


In the above example, duplicate firstname/lastname values cannot exist (defined against the firstname field).

This rule applies for all Data Types.

Value Check The Value Check rule allows you to compare the current field value against fixed values or values of other fields.

For example:

Signup date can be today, or any date in the past.
Age should be between 18 and 100.
Study start date should be before the end date.

Each condition includes three parts:

value1 Comparison Operator value 2
Select whether you are checking the current fields value or length (choose between [this.value] and [this.length]. Depends on the Data Type of the field.
Data Type Comparison Operators
[this.value]Equals, Not Equals
[this.length]Equals, Not Equals, Greater Than, Greater than Or Equals, Less Than, Less Than Or Equals
 <literal> or [<fieldid>.value] where <fieldid> corresponds to a field in the DFD. The Literals Rules are the same as those detailed in "Mandatory Conditions".
Value List This rule lets you check the current field value against a pre-defined list of values, then optionally substitute a string for each.

For example, you might create a list such as { "Male", "Female" } for the Gender field.

The validation logic checks whether the field Value matches one of the Values in the list. Specify "Substitute" strings for each of the Values. When the field Values are passed to the Actions in the processing script, they are substituted as "M" or "F" instead of passing Male or Female (as is).

Value RefType This rule lets you associate a Reference Type (ReftypeId) with the field.

Select the Reference Type with which the field value corresponds. Note that RefType includes the list "RefValues" and corresponding "RefDisplayValue".

Option The field is validated...
RefType Value To ensure it matches one of the Reference Values (M/F/U).
RefType Display ValueTo ensure it matches one of the Reference Display Values (Male/Female/Unknown).
RefType Display Value, substitute with RefType ValueAgainst the Reference Display Values (Male/Female/Unknown), and it replaces the value with the RefType Value when processing.
Groovy Check Lets you perform additional groovy checks that evaluate to "Y" or "N" (True/False).


For every record being validated, the field values can be referenced using [fieldname.value] in the groovy code. The current field value can be referenced using [this.value].

For example:

For the field "Age" of type "number", you can have the following Check:

 [this.value] > 25 && [Sex.value] == 'Female'
NOTE:  The groovy script is pre-processed to convert the [xxx.value] references to refer to fields.fieldname.value objects before evaluating the groovy script.
Regular Expression Validates the field value against a regular expression.


For example:

Valid Email ID:

^(.+)@(.+)$

If the value of the field matches the pattern, the validation is successful.

SDI Check This validation rule lets you check whether the SDI corresponding to the current field value exists (or not) in the database.
A Field in the data file may correspond to a keycolid of an SDC, or an alternate key column.
The rule definition includes the sdcid, and identifying the sdc column that corresponds to the current field.
You can extend the rule to include additional child tables and a where clause.

For example, you might want to check that:

A productid exists before associating it with a Sample.
Material with a specified ID does not already exist before attempting to create a new one.
A product code exists and we need to replace it with a productid when processing.

The SDI check is made against the database, if a particular SDI is being inserted in the same import operation, it may result in false positives when running in validateonly mode.

 

Option Description
Type Specifies whether the validation is to determine that the SDI "Exists" or "Not Exists".

"Exists" checks whether exactly one row is returned for the SDI selection criteria. This rule is equivalent to running the following query:

select count(*) from <From clause> where <where clause>
Exists is true if count(*) != 0

"Not Exists" is similar to the "Exists" option except it looks for the count(*) == 0.

SDC Identify the SDC to be searched.
From Automatically populates the table, corresponding to the primary table, of the SDC. Additional detail tables can be specified by editing the From table.
Field ColumnIdentifies which column in the SDC the field corresponds to. It could be the primary key column or alternate key column. If the Field Column is not the primary key, specify the "Substitute column" as the primary key column.
WhereSpecifies a Where Clause. This is automatically populated when a Field column is selected:

<columnid>=[this value]

Choose to edit and add more conditions to the Where Clause.

NOTE:  If the Import Action is called in Validateonly mode, SDI checks may show the SDIs (being created in the same DFD import) as "not exists", since the processing is skipped in that mode.

Substitute Keycolid

This option can be used in conjunction with the "Exists" check, where you not only want to check for an SDI by alternate keycolid, you want the value to be replaced by keycolid when processing.
The query executed is:

select <keycolid> from <from clause> where <where clause>

If no rows are returned, there is a validation failure. Otherwise, the returned keycolid is substituted for the column value when processing.

Number Format When the Data Type is number, the Number Format validation is available.

Number Format lets you specify:

A number, using any group separator or decimal separator that can be parsed, provided the field is configured to specify these two properties.
A date string using any format, in any locale, in any timezone can be parsed into a Calendar object and processed.

In the case of numeric columns, the generated data file may contain numbers generated using a particular Decimal Separator, and Group Separator.
For example, assume the generated number is:

1'234'567.89

The validation editor allows you to configure the Decimal Separator and Group Separator. If the parsing fails, the Error Actions are considered. You can choose to throw an "Error", "Skip" the record, or "Substitute" with a default value.

Date Format When the Data Type is "date", the Date Format validation is available.

Date fields in the Data File may have a date format that is different from the date format of the User Importing the file. Date Format validation lets you parse the date correctly from any format by specifying a "Locale" and "Format" string. Additionally, a Time zone can be specified for the date field so that the time is adjusted before processing.

 

If the date field string cannot be parsed into an actual date, a default replacement can be specified as one of the Date Literals.