Overview

The Data File Parser is a graphical user interface for parsing data files that are processed according to a Data File Definition.

In this context, "parsing" refers to creating Fields associated with data (in a cell or range of cells) and mapping the data to the Fields. The Processing script (Groovy or Action Block) references these Fields and mapping information when processing the data file, ie., importing the data into LabVantage.

After loading an example file, click the File from the list to open the Data File Parser. The user interface is similar to the Form Builder, i.e., Objects in the upper left frame, Properties (for Fields) in the bottom left frame, and Grid (spreadsheet) in the right frame. The Grid shows data from the example file in rows-by-columns format. You can select cells, rows, columns, and ranges (similar to Excel).

Frame	Description
Grid (right)	Spreadsheet-like representation of data in the example file.
Properties Frame (lower left)	Lets you view and set properties of the Field selected in the Objects frame.
Objects Frame (upper left)	Shows fields defined on the Grid. Clicking a Field shows in properties in the Properties frame.

Icons in the Data File Parser page

Icons	Description
	Loads an Example File.
	Maximizes/Minimizes the editor.

styles

Top

General Information

These acronyms are displayed in the Grid to define a cell's position relative to the data:

Acronym	Meaning	Description
EOF	End of File	Last column or row containing data.
FBC	First Blank Column	First blank column encountered.
FBR	First Blank Row	First blank row encountered.

For example, selecting from cell "A1" to "A EOF" means that the selection is in column A, and from row 1 down to the physical end of rows.

Simple Grid

A "Simple Grid" is a basic grid as shown in the example below. Your Example File must match this format.

	A	B	C	D	E
1	SampleId	SampleType	Description	AnalystId	CreatedDt
2	S-001	Finished	For production.	GK	2/2/2048
3	S-002	QC	For test.	LP	2/2/2048
4	S-003	QC	For test.	US	2/2/2048
5	S-004	Finished	For production.	TP	2/2/2048

You can define "Column" Fields (a Range of rows in one column), "Row" Fields (a Range of columns in one row), or "Cell" Fields (individual cells).

Crosstab Grid

A "Crosstab" is a more sophisticated grid with headers as shown in the example below. A Crosstab can consist of:

•	A"Range" Field
•	Zero or more "header Row" Fields
•	Zero or more "header Column" Fields
•	Zero or more "Cell" Fields

Below is an example of a Crosstab layout with 3 header Rows, 2 header Columns, and a Range Field consisting of cells B4 through C9. In general, your Example File must match this kind of format (although in practice you can have a different number of headers, such as 1 header Column and 1 header Row).

		header Column	header Column
		↓	↓
	A	B	C
1		Metals	Metals
2		Au (1)	Zn (1)
3		Standard	Standard
4	Blank	10	70
5	Unknown	20	80
6	Unknown	30	90
7	Dup	40	100
8	Unknown	50	110
9	Unknown	60	120

		Example of a Range: Cells B4 - C9

←

header Row

During processing (when data are created for the Fields), a Crosstabbed grid is "de-crosstabbed". The example below shows how this is done. First, the Range Field is linearized (column B below right). Next, the header Column is replicated once for each column in the Range (column A). Then the header cells Au(1) and Zn(1) are replicated once for each item in the Target Range (column C).

1		Metals	Metals
2		Au (1)	Zn (1)
3		Standard	Standard
4	Blank	10	70
5	Unknown	20	80
6	Unknown	30	90
7	Dup	40	100
8	Unknown	50	110
9	Unknown	60	120

Crosstabbed

→

A	B		C
Blank	10	Metals	Au (1)	Standard
Unknown	20	Metals	Au (1)	Standard
Unknown	30	Metals	Au (1)	Standard
Dup	40	Metals	Au (1)	Standard
Unknown	50	Metals	Au (1)	Standard
Unknown	60	Metals	Au (1)	Standard
Blank	70	Metals	Zn (1)	Standard
Unknown	80	Metals	Zn (1)	Standard
Unknown	90	Metals	Zn (1)	Standard
Dup	100	Metals	Zn (1)	Standard
Unknown	110	Metals	Zn (1)	Standard
Unknown	120	Metals	Zn (1)	Standard

"Decrosstabbed"

Freeform

This can have any type of Fields. It encompasses functionality of Crosstab grids, but allows other undetermined options to be exercised.

Composite

A "Composite" lets you import multiple Excel Worksheets (of dependent data) within the same Data File. Composite DFDs consist of an ordered sequence of child DFDs, each corresponding to a different Worksheet within the same Data File. Since Composite DFDs rely on Excel Worksheets to provide the child DFD import, only Excel file formats are supported (not text/csv files).

Choosing the "Composite" style removes the Parser and Processing details and adds the Data File Definition detail upon Saving.

Define the Data File

The Data File must be a multi-worksheet Excel file that includes the required sheet names. You will then create a DFD file for each sheet.

For example, MasterDataImport requires two worksheets Param and ParamList.

List the child DFDs (sheets)

For each sheet, include a child Data File. In the Data File Definition detail, include each child DFD (a child DFD must not be a "Composite" DFD).

Be sure the DFDs are listed in the same order as the sheets of the Excel file, Child DFDs are imported in the exact order specified in the Data File Definition detail. You can change the sheet Name specified in the Child DFD.

NOTE:

If you attempt to import a file that does not have the required worksheets, you will not be allowed to proceed. Click the Back button to upload another file.

Define the Commit Scope

"Process Block Size" and "On Failure" action are defined within each child DFD and can not be defined for the Composite DFD. You can however define the Commit Scope for the Composite DFD.

Option	Description
At end of each worksheet	This option requires each child DFD to ignore its processing options commit size. Each child DFD will commit its data at the end of the import of a single worksheet. Composite DFD import, calls the ImportDataFile for each child DFD with an overriding property called COMMITSCOPE = "ALL". This indicates that the transaction is to be committed at the end of "ALL" the records in the worksheet being processed.
Defer to child commit rules	This option is the default for all Composite processing. Each child DFD determines the Commit Scope based on the child DFD definition. The Composite does not drive the Commit scope.
When action is committed	This option commits the entire import operation when all the import actions are committed. Composite DFD import, calls the ImportDataFile for each child DFD with an overriding property called COMMITSCOPE = "NONE". This makes sure the child DFD import is not committed until all the child DFDs are committed.

During Import, if the uploaded file has all the required worksheets, the wizard indicates that the check has passed.

Validation Log

A Validation log is displayed in a single table for all worksheets. An additional column indicates which worksheet contains errors or warnings (if any). If any of the worksheets have Errors, the import cannot proceed.

A single message log entry is created for a Composite DFD import. The Validation Log and Process Log display the Message Log for each child DFD import.

Fields

Top

Creating and Removing Fields

To create a Field:

1.		Select data in accordance with the style, right click within the selection, then select "Add To New Field".
2.		In the resulting dialog, specify a Field Id (identifier for the Field), a title (user-friendly version of the Field Id), title and other necessary properties. The dialog automatically detects the type of selection (such as column, row, headercolumn).
3.		Click OK. The Field is created and appears in the Objects frame.

You can right-click a Field object in the Objects frame to create a field either from a selection ("Add Field from Range"), or from scratch ("Add Empty Fields").

To highlight all Fields in the grid, right-click the Field, then select "Highlight Fields".In addition you can also create fields of type "input" by choosing the option "Add Input Field". See Field Types for details.

To remove an existing Field, right-click the Field, then select "Remove Field".

When you click a Field in the Objects frame, the range is selected in the grid and the Field properties are loaded into the Properties frame. You can then edit the properties for that Field.

Field Types

The Types of Fields you can create depend on the style you select in the Data File Definition Maintenance page:

style	Allowable Field Types
Simple Grid	Cell
	Row
	Column
	Input
Crosstab Grid	Cell
	Range
	header Row
	header Column
	Input
Freeform	All

Note the "Input" Field Types. "Input" fields correspond to additional properties that will be passed on to the ImportDataFile Action. Input fields are available in the Processing Rules definition (as is the case with the other fields). The name of the input field should correspond to the name of the input property passed to ImportDataFile. At run time, the value of the input fields is determined from the input properties to the ImportDataFile Action.

Data Types

Several of the Field Properties below let you declare a Data Type. The Data File Parser drives the parsing of data from Excel files as follows:

Excel Cell Format	LabVantage Data Type	Data File Parser Conversion
String	String	The Excel string is parsed without modification.
String	Number	The Excel string must correspond to a valid numeric string in the User's locale. Parsing errors are reported at runtime.
String	Date	The Excel string must correspond to a valid date string in the User's locale. Parsing errors are reported at runtime.
Number	String	The Excel numeric value is converted to a string. Formatting specified in Excel is retained.
Number	Number	The Excel numeric value is converted to a numeric string in the User's locale, then passed to the Actions in the processing script.
Number	Date	The Data File Parser throws an error.
Number DateFormatted	String	The Excel date value is converted to a string. Formatting specified in Excel is retained.
Number DateFormatted	Number	The Data File Parser throws an error.
Number DateFormatted	Date	The Excel date value is converted to a date string in the User's locale, then passed to the Actions in the processing script.

Field Properties

Here are the Field Properties supported by each Field Type and style.

Cell (Simple Grid and Crosstab Grid)

Field Details

Property	Description
Id	Identifier of the Field.
title	User-friendly alias to display in place of the Id (above).
Data Type	LabVantage Data Type that defines the selected data (string, number, or date).
Validation	Define Validation Rules for this field.
Error Prefix	In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property	Description
Type	Field Type (defaults to "cell").
Column	Coordinates of the cell defined for the Field (number of the column and row).
Row

Row (Simple Grid)

Field Details

Property	Description
Id	Identifier of the Field.
title	User-friendly alias to display in place of the Id (above).
Data Type	LabVantage Data Type that defines the selected data (string, number, or date).
Validation	Define Validation Rules for this field.
Error Prefix	In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property

Description

Type

Field Type (defaults to "row").

Row

Number of the row defined for the Field.

Start Column Type

Defines how the first column in the Range is determined:

	First column in the Range is:
absolute	"Start Column" (below).
relative	Not supported.

Start Column

First column in the Range.

End Column Type

Defines how the last column in the Range is determined:

	Last column in the Range is:
firstblankcolumn	First blank column encountered.
endoffile	Last column containing data.
absolute	"End Column" (below).

End Column

Last column in the Range. Applies only if "End Column Type" (above) is "absolute".

Column (Simple Grid)

Field Details

Property	Description
Id	Identifier of the Field.
title	User-friendly alias to display in place of the Id (above).
Data Type	LabVantage Data Type that defines the selected data (string, number, or date).
Validation	Define Validation Rules for this field.
Error Prefix	In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property

Description

Type

Field Type (defaults to "column").

Column

Number of the column defined for the Field.

Start Row Type

Defines how the first row in the Range is determined:

	First row in the Range is:
absolute	"Start Row" (below).
relative	Not supported.

Start Row

First row in the Range.

End Row Type

Defines how the last row in the Range is determined:

	Last row in the Range is:
firstblankrow	First blank row encountered.
endoffile	Last row containing data.
absolute	"End Row" (below).

End Row

Last row in the Range. Applies only if "End Row Type" (above) is "absolute".

Range (Crosstab Grid)

Field Details

Property	Description
Id	Identifier of the Field.
title	User-friendly alias to display in place of the Id (above).
Data Type	LabVantage Data Type that defines the selected data (string, number, or date).
Validation	Define Validation Rules for this field.
Error Prefix	In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property

Description

Type

Field Type (should default to "range", but currently says "cellrange").

Start Column Type

Defines how the first column in the Range is determined:

	First column in the Range is:
absolute	"Start Column" (below).
relative	Not supported.

Start Column

First column in the Range.

Start Row Type

Defines how the first row in the Range is determined:

	First row in the Range is:
absolute	"Start Row" (below).
relative	Not supported.

Start Row

First row in the Range.

End Column Type

Defines how the last column in the Range is determined:

	Last column in the Range is:
firstblankcolumn	First blank column encountered.
endoffile	Last column containing data.
absolute	"End Column" (below).

End Column

Last column in the Range. Applies only if "End Column Type" (above) is "absolute".

End Row Type

Defines how the last row in the Range is determined:

	Last row in the Range is:
firstblankrow	First blank row encountered.
endoffile	Last row containing data.
absolute	"End Row" (below).

End Row

Last row in the Range. Applies only if "End Row Type" (above) is "absolute".

header Row (Crosstab Grid)

A header Row Field defines a single row.

Field Details

Property	Description
Id	Identifier of the Field.
title	User-friendly alias to display in place of the Id (above).
Data Type	LabVantage Data Type that defines the selected data (string, number, or date).
Validation	Define Validation Rules for this field.
Error Prefix	In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property	Description
Type	Field Type (defaults to "headerrow").
Row	Number of the row defined for the Field.
Target	Range Field to which the header Row applies.

Example

header Column (Crosstab Grid)

A header Column Field defines a single column.

Field Details

Property	Description
Id	Identifier of the Field.
title	User-friendly alias to display in place of the Id (above).
Data Type	LabVantage Data Type that defines the selected data (string, number, or date).
Validation	Define Validation Rules for this field.
Error Prefix	In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property	Description
Type	Field Type (defaults to "headercolumn").
Column	Number of the column defined for the Field.
Target	Range Field to which the header Column applies.

Input Field (SimpleGrid)

Field Details

Property	Description
Id	Identifier of the Field.
title	User-friendly alias to display in place of the Id (above).
Data Type	LabVantage Data Type that defines the selected data (string, number, or date).
Validation	Define Validation Rules for this field.
Error Prefix	In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property	Description
Type	Field Type (defaults to "input").
Default Value	Default Value of the input field, if any.

Input Field (CrosstabGrid)

Field Details

Property	Description
Id	Identifier of the Field.
title	User-friendly alias to display in place of the Id (above).
Data Type	LabVantage Data Type that defines the selected data (string, number, or date).
Validation	Define Validation Rules for this field.
Error Prefix	In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property	Description
Type	Field Type (defaults to "input").
Default Value	Default Value of the input field, if any.

Input Field (Freeform)

Field Details

Property	Description
Id	Identifier of the Field.
title	User-friendly alias to display in place of the Id (above).
Data Type	LabVantage Data Type that defines the selected data (string, number, or date).
Validation	Define Validation Rules for this field.
Error Prefix	In the Validation Log, this prefix is appended to any generated Validation errors. For example, if validation requires the Subject Id to be unique, the Error Prefix for the subjectid field might be "This Subject must be unique.". This prefix becomes part of the validation log to help identify the validation error.

Data Range

Property	Description
Type	Field Type (defaults to "input").
Default Value	Default Value of the input field, if any.

Validation Rules

Top

Define Validation Rules when adding new or editing existing field properties. Click the ellipsis in the "Validation" field of the Field Properties to open the Validation Editor.

For each Validation Rule, specify Validation Error Behavior as one of the following (checking the Rule shows the Error Behavior options):

Option

Description

Error

The import cannot proceed until the file is fixed. The error displays in "Red" in the Validation Log.

Skip

The "record" should not be processed if the success criteria is not met, the rest of the file is processed. A Warning is added to the Validation Log.

NOTE:

A "record" corresponds to a row in a Simple Grid, or a row of the "de-crosstabbed" grid in the case of range fields.

Substitute

Specify a replacement value for "rogue" values that did not meet the "Success Criteria" so that the import can proceed safely. A Warning is added to the Validation Log.

Following is a list of each Validation Rule and associated options:

Rule

Description

Mandatory

Indicates whether or not a field value is mandatory. Choose to always require a value for this field, or click "Add Condition" to conditionally require a value.

When a Condition is defined (and exists), a value is required. For example, you might require that "Cigarettes per day" is required if the Subject is a Smoker.

Each condition includes three parts:

value1

Comparison Operator

value 2

[<fieldid>.value] or [<fieldid>.length] where <fieldid> corresponds to a field in the DFD.

Depends on the selection of value1. When comparing field values, the available Comparison Operators is determined by the Data Type of the field selected for value1.

Data Type	Comparison Operators
String	Not Equals, Contains, Does Not Contain, Starts With, Ends With
Date	Before, After, On or Before, On or After
Number	Equals, Not Equals, Greater Than, Greater than Or Equals, Less Than, Less Than Or Equals

<literal> or [<fieldid>.value] or [<fieldid>.length] where <fieldid> corresponds to a field in the DFD .

If the condition evaluates to true, the field is mandatory.

Literal Values

Choose to include literal values in conditions for comparison. The following rules apply when specifying literals:

•	For numeric fields, the value should specify the number in US Locale using "." as the decimal separator
•	For date fields, the value should be one of the following (you cannot specify a specific date string). NOW, TODAY, MIDDAYTODAY, ENDTODAY, YESTERDAY, ENDYESTERDAY, TOMORROW, MIDDAYTOMORROW, ENDTOMORROW, STARTWEEK, ENDWEEK, STARTMONTH, ENDMONTH, STARTPREVMONTH, ENDPREVMONTH, STARTNEXTMONTH, ENDNEXTMONTH, STARTYEAR, ENDYEAR, STARTNEXTHOUR, ENDNEXTHOUR

NOTE:

If multiple conditions are specified, a logical AND of all conditions is performed.

Distinct Check

The Distinct Check rule lets you flag duplicate values for a field, or a combination of fields, in the file being imported.

In the above example, duplicate firstname/lastname values cannot exist (defined against the firstname field).

This rule applies for all Data Types.

Value Check

The Value Check rule allows you to compare the current field value against fixed values or values of other fields.

For example:

•	Signup date can be today, or any date in the past.
•	Age should be between 18 and 100.
•	Study start date should be before the end date.

Each condition includes three parts:

value1

Comparison Operator

value 2

Select whether you are checking the current fields value or length (choose between [this.value] and [this.length].

Depends on the Data Type of the field.

Data Type	Comparison Operators
[this.value]	Equals, Not Equals
[this.length]	Equals, Not Equals, Greater Than, Greater than Or Equals, Less Than, Less Than Or Equals

<literal> or [<fieldid>.value] where <fieldid> corresponds to a field in the DFD. The Literals Rules are the same as those detailed in "Mandatory Conditions".

Value List

This rule lets you check the current field value against a pre-defined list of values, then optionally substitute a string for each.

For example, you might create a list such as { "Male", "Female" } for the Gender field.

The validation logic checks whether the field Value matches one of the Values in the list. Specify "Substitute" strings for each of the Values. When the field Values are passed to the Actions in the processing script, they are substituted as "M" or "F" instead of passing Male or Female (as is).

Value RefType

This rule lets you associate a Reference Type (ReftypeId) with the field.

Select the Reference Type with which the field value corresponds. Note that RefType includes the list "RefValues" and corresponding "RefDisplayValue".

Option	The field is validated...
RefType Value	To ensure it matches one of the Reference Values (M/F/U).
RefType Display Value	To ensure it matches one of the Reference Display Values (Male/Female/Unknown).
RefType Display Value, substitute with RefType Value	Against the Reference Display Values (Male/Female/Unknown), and it replaces the value with the RefType Value when processing.

Groovy Check

Lets you perform additional groovy checks that evaluate to "Y" or "N" (True/False).

For every record being validated, the field values can be referenced using [fieldname.value] in the groovy code. The current field value can be referenced using [this.value].

For example:

For the field "Age" of type "number", you can have the following Check:

[this.value] > 25 && [Sex.value] == 'Female'

NOTE:

The groovy script is pre-processed to convert the [xxx.value] references to refer to fields.fieldname.value objects before evaluating the groovy script.

Regular Expression

Validates the field value against a regular expression.

For example:

Valid Email ID:

^(.+)@(.+)$

If the value of the field matches the pattern, the validation is successful.

SDI Check

This validation rule lets you check whether the SDI corresponding to the current field value exists (or not) in the database.

•	A Field in the data file may correspond to a keycolid of an SDC, or an alternate key column.
•	The rule definition includes the sdcid, and identifying the sdc column that corresponds to the current field.
•	You can extend the rule to include additional child tables and a where clause.

For example, you might want to check that:

•	A productid exists before associating it with a Sample.
•	Material with a specified ID does not already exist before attempting to create a new one.
•	A product code exists and we need to replace it with a productid when processing.

The SDI check is made against the database, if a particular SDI is being inserted in the same import operation, it may result in false positives when running in validateonly mode.

Option

Description

Type

Specifies whether the validation is to determine that the SDI "Exists" or "Not Exists".

"Exists" checks whether exactly one row is returned for the SDI selection criteria. This rule is equivalent to running the following query:

select count(*) from <From clause> where <where clause>
Exists is true if count(*) != 0

"Not Exists" is similar to the "Exists" option except it looks for the count(*) == 0.

SDC

Identify the SDC to be searched.

From

Automatically populates the table, corresponding to the primary table, of the SDC. Additional detail tables can be specified by editing the From table.

Field Column

Identifies which column in the SDC the field corresponds to. It could be the primary key column or alternate key column. If the Field Column is not the primary key, specify the "Substitute column" as the primary key column.

Where

Specifies a Where Clause. This is automatically populated when a Field column is selected:

<columnid>=[this value]

Choose to edit and add more conditions to the Where Clause.

NOTE:

If the Import Action is called in Validateonly mode, SDI checks may show the SDIs (being created in the same DFD import) as "not exists", since the processing is skipped in that mode.

Substitute Keycolid

This option can be used in conjunction with the "Exists" check, where you not only want to check for an SDI by alternate keycolid, you want the value to be replaced by keycolid when processing.
The query executed is:

select <keycolid> from <from clause> where <where clause>

If no rows are returned, there is a validation failure. Otherwise, the returned keycolid is substituted for the column value when processing.

Number Format

When the Data Type is number, the Number Format validation is available.

Number Format lets you specify:

•	A number, using any group separator or decimal separator that can be parsed, provided the field is configured to specify these two properties.
•	A date string using any format, in any locale, in any timezone can be parsed into a Calendar object and processed.

In the case of numeric columns, the generated data file may contain numbers generated using a particular Decimal Separator, and Group Separator.
For example, assume the generated number is:

1'234'567.89

The validation editor allows you to configure the Decimal Separator and Group Separator. If the parsing fails, the Error Actions are considered. You can choose to throw an "Error", "Skip" the record, or "Substitute" with a default value.

Date Format

When the Data Type is "date", the Date Format validation is available.

Date fields in the Data File may have a date format that is different from the date format of the User Importing the file. Date Format validation lets you parse the date correctly from any format by specifying a "Locale" and "Format" string. Additionally, a Time zone can be specified for the date field so that the time is adjusted before processing.

If the date field string cannot be parsed into an actual date, a default replacement can be specified as one of the Date Literals.

Data File Parser

Content

Overview

Icons in the Data File Parser page

styles

Define the Data File

List the child DFDs (sheets)

Define the Commit Scope

Validation Log

Fields

Creating and Removing Fields

Field Types

Data Types

Field Properties

Validation Rules

Substitute Keycolid