Transforming Data in Mappings
Using the Anzo for Office plugin for Microsoft Excel, you can transform data to further define relationships between data elements, perform lightweight data preparation, or create sophisticated transformations. This topic provides information about creating advanced mappings and using Excel-like functions to transform data during the ETL process. This topic also describes the most commonly used mapping functions.
If the data source is a database, you can typically achieve better overall ETL pipeline performance by using schema queries to join and/or filter data rather than configuring mappings to perform those types of operations. For more information, see Performance Considerations for Database Pipelines.
For instructions on creating a new mapping, see Creating a New Mapping. For instructions on setting up parameters to ingest a subset of the source data, see Configuring Mappings to Ingest a Subset of the Source Data.
- Configuring Groups, Filters, Joins, Updates, and Merges
- Using Functions to Transform Source Data
- Commonly Used Functions
Configuring Groups, Filters, Joins, Updates, and Merges
This section provides information about configuring groups, joins, filters, references, and merges at the mapping level so that they can be used by any functions that you use to transform the source data. The table below the steps describes each of the mapping level configuration options.
To implement a mapping level configuration:
- Click the cell to the left of the target table name to open the menu. For example:
- Click the Cog icon () in the menu to open the configuration section of the mapping.
- In the configuration section, click the cell that contains the join, filter, group by... text in the Target column. If necessary, click the drop-down arrow next to the cell to open the Configure text box.
- In the Configure text box, start typing the option that you want to configure. Anzo displays the options that match the text. Select an option to add it to the cell. For example:
- Click the cell in the Source column that corresponds to the option you entered in the Target column. Anzo populates the Source text box with the appropriate keywords and arguments. For example:
- In the Source text box, click next to an argument or under a keyword and start typing column names for the columns that you want to add. Press Ctrl + to enter multiple columns. Then click the up arrow to enter the columns in the cell. For example:
The table below describes the mapping configuration options:
Option & Arguments | Description |
---|---|
FILTER | The FILTER keyword restricts the results that the mapping functions return. FILTER supports a single expression, and the expression must return a boolean value. |
GROUP BY | The GROUP BY clause designates data groups and is required for aggregate functions. When an aggregate function is used, the solution is first divided into the groups defined by the GROUP BY clause, and then the aggregate value is calculated for each group. |
JOIN join |
A JOIN combines rows from two tables based on related columns. You can specify joins when you map two sources to one target.
For example, the following join uses an inner join to join all of the records from the MovieActors1 and MovieActors2 tables when the MovieID is the same in both tables. When including joins in mappings, do not create joins that result in multiple primary tables. Mappings with two or more primary tables are invalid. For example, the following mapping is invalid because it has two primary tables, orders and categories. |
MERGE BY | The MERGE BY clause enables you to merge multiple source rows into a single target row. |
REFERENCE element |
The REFERENCE keyword enables you to create a referential join between two tables. |
UPDATE action |
Using Functions to Transform Source Data
This section provides information about how to add functions to perform operations on source data.
The Source Mapping text box that you use to select source fields also includes a list of functions that you can apply to transform the source data. When you type in the Source Mapping box, in addition to available source fields, the mapping tool also displays a list of the functions that match the text you type. For example:
To add a function, select it from the list. The mapping tool adds the function to the cell and the Source Mapping box shows the arguments for the function that you chose. For example:
In the Source Mapping text box, enter the source field name, text, or additional functions that you want to include for the arguments. Enter arguments by typing to the right of the argument name. When entering literal values, press Enter to submit the value.
For example, the DATEPARSE function below converts MovieReleaseDate values from string types to dates in dd-MM-yyyy format:
The format that you specify for dates is flexible. For example, typing the format "dd-MMM-yy" displays values such as "01-JAN-19."
Commonly Used Functions
The table below describes the functions that are commonly used in mappings. For a complete list of the supported functions, see Supported Mapping Functions.
Any time you type a literal value into a function argument, press Enter to submit the value.
Function & Arguments | Description |
---|---|
IF test
|
This function evaluates the condition in the test argument and assigns the value in value if true or value if false based on the results.
|
DATEPARSE date text
|
This function converts a string that contains a date value (date text) to the specified date format.
For example, the source mapping below converts the MovieReleaseDate values from strings to dates in the format "dd-MM-yyyy": The format that you specify for dates is flexible. For example, typing the format "dd-MMM-yy" displays values such as "01-JAN-19." |
DATETIMEPARSE date text
|
This function converts a string that contains a datetime value (date text) to the specified date format.
For example, the source mapping below converts the PATIENTLASTPMODATE from a string value to a datatime value in the format "MM-dd-yyyy HH:mm:ss": |
UPPER value |
This function converts a string value to upper case letters. |
LOWER value |
This function converts a string value to lower case letters. |
REPLACEIFNULL expression
|
This function evaluates the expression. If the result is null, Anzo replaces the null with the value in if null expression.
For example, the source mapping below replaces any null values in the PATIENTID integer column with the integer 999: |
REPLACEIFNULLOREMPTY string expression
|
This function evaluates the string expression. If the result is null or empty (""), Anzo replaces the empty or null with the value in if null or empty expression.
For example, the source mapping below replaces any null or empty values in the GENDER column with "Not Specified": |
ISNULL expression |
This function evaluates the source column values in expression and returns "true" if the value is null and "false" if it is not null. You must choose a column in the expression argument; do not type a literal value or a function. |
SPLIT string
|
This function splits a string value into multiple values based on the specified delimiter.
|
SPLITARRAY string
|
This function splits a string value into an array based on the delimiter. From the array, the function retrieves only the portion of the value that you specify in the index.
For example, the following source mapping retrieves only the last four digits of social security numbers: |
IN value |
This function checks whether a given value exists in a set of values (set to check). If the value exists in the set, IN returns "true." If the value does not exist in the set, IN returns "false." IN does not do comparisons on string values.
For example, the following source mapping checks to see if PATIENTID falls in the set of 1, 100, 1000: |
MAKELIST expression |
This function maps multiple source columns to a single target property. The function does not create a list; it creates new rows, one for each column that is mapped to the target.
|
REGEX input |
This function finds all patterns in the input string that match the specified regular expression (regex). It replaces the input patterns with the value in replace and returns the resulting string.
For example, the source mapping below uses the REGEX function to search for the pattern "PS" in the COMPLAINTSTRING values and replaces each PS with a hyphen (-): |
CONCATENATE text |
This function concatenates multiple string values (text) and returns a single string.
For example, the source mapping below concatenates PATIENTHOMESTATE and PATIENTHOMEZIP: |
EQUAL value1 |
This function compares numeric values and returns "true" if value1 is equal to value2 and "false" if the values are not equal (value1 = value2).
|
NOT_EQUAL value1 |
This function compares numeric values and returns "true" if value1 does not equal value2 and "false" if the values are equal (value1 != value2).
|
GE value1 |
This function compares numeric values and returns "true" if value1 is greater than or equal to value2 and "false" if value1 is less than value2 (value1 >= value2).
|
GT value1 |
This function compares numeric values and returns "true" if value1 is greater than value2 and "false" if value1 is less than or equal to value2 (value1 > value2).
|
LE value1 |
This function compares numeric values and returns "true" if value1 is less than or equal to value2 and "false" if value1 is greater than value2 (value1 <= value2).
|
LT value1 |
This function compares numeric values and returns "true" if value1 is less than value2 and "false" if value1 is greater than or equal to value2 (value1 < value2).
|
AND logical1 |
This logical function evaluates two or more logical statements (logical1, logical2) and returns "true" if all conditions are met or "false" if any condition is not met. All logical statements must evaluate to the same data type.
|
OR logical1 |
This logical function evaluates two or more logical statements (logical1, logical2) and returns "true" if any of the conditions are met or "false" if none of the conditions are met. All logical statements must evaluate to the same data type.
|
NOT logical |
This logical function evaluates whether data does not meet the condition (logical) that you specify.
|
NUMERIC_ADD v1 |
This function adds the values of the numeric expressions that you specify (v1 + v2).
|
DIVIDE v1 |
This function divides the values of the numeric expressions that you specify (v1/v2).
|
MULTIPLY v1 |
This function multiplies the values of the numeric expressions that you specify (v1 x v2).
|
NUMERIC_SUBTRACT v1 |
This function subtracts the values of the numeric expressions that you specify (v1 - v2).
|
LOOKUP from |
This function enables you to look up values in a supplemental table. LOOKUP joins the from lookup table to a source table on the columns specified in the fields and values arguments. The function returns the value from get in the lookup table that corresponds to each row’s value in the values argument.
|
MAP value |
This function retrieves values from a map that you define. The map is a collection of key/value pairs. The function uses the specified value as a key in the map and returns the value associated with the key.
|
URI() | When specified in the Source column in mappings, this function transforms the property values to URI format by concatenating each of the components that you specify in the function. To ensure that values with spaces and other characters are encoded as valid URIs, the URI function is often used with the ENCODE function. For example, the following mapping for a "tickit_events" table transforms an "eventname" string to URI format by prepending "http://csi.com/" to the encoded event names: This example mapping results in triples such as: <tickit_events> <eventname> "http://csi.com/Rolling+Stones" You can also enter URI() in the Target column to specify that the Expression in the Source column should be the URI of the entity that is being created. For example, the mapping below generates an entity URI by prepending "http://csi.com/Flight" to the flight number value: The example URI specification results in triples such as: <http://csi.com/Flight1234> <FLIGHT_NUMBER> 1234 <http://csi.com/Flight1234> <ORIGIN_AIRPORT> "BOS" |