Transforming Data in Mappings

Using the Anzo for Office plugin for Microsoft Excel, you can transform data to further define relationships between data elements, perform lightweight data preparation, or create sophisticated transformations. This topic provides information about creating advanced mappings and using Excel-like functions to transform data during the ETL process. This topic also describes the most commonly used mapping functions.

If the data source is a database, you can typically achieve better overall ETL pipeline performance by using schema queries to join and/or filter data rather than configuring mappings to perform those types of operations. For more information, see Performance Considerations for Database Pipelines.

For instructions on creating a new mapping, see Creating a New Mapping. For instructions on setting up parameters to ingest a subset of the source data, see Configuring Mappings to Ingest a Subset of the Source Data.

Configuring Groups, Filters, Joins, Updates, and Merges

This section provides information about configuring groups, joins, filters, references, and merges at the mapping level so that they can be used by any functions that you use to transform the source data. The table below the steps describes each of the mapping level configuration options.

To implement a mapping level configuration:

  1. Click the cell to the left of the target table name to open the menu. For example:

  2. Click the Cog icon () in the menu to open the configuration section of the mapping.

  3. In the configuration section, click the cell that contains the join, filter, group by... text in the Target column. If necessary, click the drop-down arrow next to the cell to open the Configure text box.

  4. In the Configure text box, start typing the option that you want to configure. Anzo displays the options that match the text. Select an option to add it to the cell. For example:

  5. Click the cell in the Source column that corresponds to the option you entered in the Target column. Anzo populates the Source text box with the appropriate keywords and arguments. For example:

  6. In the Source text box, click next to an argument or under a keyword and start typing column names for the columns that you want to add. Press Ctrl + to enter multiple columns. Then click the up arrow to enter the columns in the cell. For example:

The table below describes the mapping configuration options:

Option & Arguments Description
FILTER The FILTER keyword restricts the results that the mapping functions return. FILTER supports a single expression, and the expression must return a boolean value.
GROUP BY The GROUP BY clause designates data groups and is required for aggregate functions. When an aggregate function is used, the solution is first divided into the groups defined by the GROUP BY clause, and then the aggregate value is calculated for each group.
JOIN

join
element
condition

A JOIN combines rows from two tables based on related columns. You can specify joins when you map two sources to one target.
  • join: The kind of join to use. Type one of the following options:
    • inner join: Returns only the records that have matching values in both tables.
    • outer join: Returns all records from both tables when there is a match in either the left or right table.
    • left join: Returns all records from the left table and joins only the records from the right table that match the condition.
    • right join: Returns all records from the right table and joins only the records from the left table that match the condition.
  • element: One of the tables to join. The table that you specify depends on the type of join you are creating. For right joins, choose the left table. For left joins, choose the right table.
  • condition: The condition to use to join the two tables.

For example, the following join uses an inner join to join all of the records from the MovieActors1 and MovieActors2 tables when the MovieID is the same in both tables.

When including joins in mappings, do not create joins that result in multiple primary tables. Mappings with two or more primary tables are invalid. For example, the following mapping is invalid because it has two primary tables, orders and categories.

MERGE BY The MERGE BY clause enables you to merge multiple source rows into a single target row.
REFERENCE

element
condition

The REFERENCE keyword enables you to create a referential join between two tables.
UPDATE

action
key

 

Using Functions to Transform Source Data

This section provides information about how to add functions to perform operations on source data.

The Source Mapping text box that you use to select source fields also includes a list of functions that you can apply to transform the source data. When you type in the Source Mapping box, in addition to available source fields, the mapping tool also displays a list of the functions that match the text you type. For example:

To add a function, select it from the list. The mapping tool adds the function to the cell and the Source Mapping box shows the arguments for the function that you chose. For example:

In the Source Mapping text box, enter the source field name, text, or additional functions that you want to include for the arguments. Enter arguments by typing to the right of the argument name. When entering literal values, press Enter to submit the value.

For example, the DATEPARSE function below converts MovieReleaseDate values from string types to dates in dd-MM-yyyy format:

The format that you specify for dates is flexible. For example, typing the format "dd-MMM-yy" displays values such as "01-JAN-19."

Commonly Used Functions

The table below describes the functions that are commonly used in mappings. For a complete list of the supported functions, see Supported Mapping Functions.

Any time you type a literal value into a function argument, press Enter to submit the value.

Function & Arguments Description
IF

test
value if true
value if false
value if error

This function evaluates the condition in the test argument and assigns the value in value if true or value if false based on the results.
  • test: Use boolean columns or functions that return boolean: LE, LT, GE, GT, EQUALS, NOT_EQUAL, ISNULL, NOT, IN.
  • value if true: The value to output if test returns true.
  • value if false: The value to output if test returns false.
  • value if error: Cambridge Semantics recommends that you leave this argument blank.
DATEPARSE

date text
date format

This function converts a string that contains a date value (date text) to the specified date format.
  • date text: The property that contains the date value in string format.
  • date format: The format that you want the date to follow. Specify days as "d," months as "M," and years as "y." For example, "yyyy-MM-dd."

For example, the source mapping below converts the MovieReleaseDate values from strings to dates in the format "dd-MM-yyyy":

The format that you specify for dates is flexible. For example, typing the format "dd-MMM-yy" displays values such as "01-JAN-19."

DATETIMEPARSE

date text
date format

This function converts a string that contains a datetime value (date text) to the specified date format.
  • date text: The property that contains the datetime value in string format.
  • date format: The format that you want the datetime to follow. For the date, specify days as "d," months as "M," and years as "y." For the time, specify "H" for hours, "m" for minutes, and "s" for seconds. For example, "yyyy-MM-dd HH:mm:ss."

For example, the source mapping below converts the PATIENTLASTPMODATE from a string value to a datatime value in the format "MM-dd-yyyy HH:mm:ss":

UPPER

value

This function converts a string value to upper case letters.
LOWER

value

This function converts a string value to lower case letters.
REPLACEIFNULL

expression
if null expression

This function evaluates the expression. If the result is null, Anzo replaces the null with the value in if null expression.
  • expression: The source column or expression to evaluate.
  • if null expression: The expression to replace null values with. The resulting value must be the same data type as the target. For example, if mapping to a target with a double data type, "10.01" is valid but the string "missing" is not.

For example, the source mapping below replaces any null values in the PATIENTID integer column with the integer 999:

REPLACEIFNULLOREMPTY

string expression
if null or empty expression

This function evaluates the string expression. If the result is null or empty (""), Anzo replaces the empty or null with the value in if null or empty expression.
  • string expression: The source column or expression that evaluates to string.
  • if null or empty expression: The expression to replace null or empty values with. The resulting value must be a string.

For example, the source mapping below replaces any null or empty values in the GENDER column with "Not Specified":

ISNULL

expression

This function evaluates the source column values in expression and returns "true" if the value is null and "false" if it is not null. You must choose a column in the expression argument; do not type a literal value or a function.
SPLIT

string
delimiter

This function splits a string value into multiple values based on the specified delimiter.
  • string: The source column or function that evaluates to a string.
  • delimiter: The character to use to delimit the string.
SPLITARRAY

string
delimiter
index

This function splits a string value into an array based on the delimiter. From the array, the function retrieves only the portion of the value that you specify in the index.
  • string: The source column or function that evaluates to a string.
  • delimiter: The character to use to delimit the string.
  • index: An integer that specifies the portion of the array to retrieve. Indexes start at zero. The first portion of the array is 0, the second is 1, and so on. Choose an index that you know exists or the mapping becomes invalid.

For example, the following source mapping retrieves only the last four digits of social security numbers:

IN

value
set to check

This function checks whether a given value exists in a set of values (set to check). If the value exists in the set, IN returns "true." If the value does not exist in the set, IN returns "false." IN does not do comparisons on string values.
  • value: The value to look for in the set.
  • set to check: The set of values to compare the value against. After typing a character, press Enter to submit the value, then press Ctrl + to add the next value. All items in the set must be the same data type.

For example, the following source mapping checks to see if PATIENTID falls in the set of 1, 100, 1000:

MAKELIST

expression

This function maps multiple source columns to a single target property. The function does not create a list; it creates new rows, one for each column that is mapped to the target.
  • expression: The list of columns that you want to map to the target. After adding a source column press Ctrl + to select the next column.
REGEX

input
regex
replace

This function finds all patterns in the input string that match the specified regular expression (regex). It replaces the input patterns with the value in replace and returns the resulting string.
  • input: The source column or expression that evaluates to a string.
  • regex: The regular expression to use to find matches in the input string. For information about REGEX syntax, see the W3C Regular Expression Syntax specification.
  • replace: The string that should replace the input patterns that match regex.

For example, the source mapping below uses the REGEX function to search for the pattern "PS" in the COMPLAINTSTRING values and replaces each PS with a hyphen (-):

CONCATENATE

text

This function concatenates multiple string values (text) and returns a single string.
  • text: The string values to concatenate, including any delimiters that you want to use. Press Ctrl + to enter multiple values.

For example, the source mapping below concatenates PATIENTHOMESTATE and PATIENTHOMEZIP:

EQUAL

value1
value2

This function compares numeric values and returns "true" if value1 is equal to value2 and "false" if the values are not equal (value1 = value2).
  • value1: The numeric value to compare to value2.
  • value2: The numeric value to compare to value1.
NOT_EQUAL

value1
value2

This function compares numeric values and returns "true" if value1 does not equal value2 and "false" if the values are equal (value1 != value2).
  • value1: The numeric value to compare to value2.
  • value2: The numeric value to compare to value1.
GE

value1
value2

This function compares numeric values and returns "true" if value1 is greater than or equal to value2 and "false" if value1 is less than value2 (value1 >= value2).
  • value1: The numeric value to compare to value2.
  • value2: The numeric value to compare to value1.
GT

value1
value2

This function compares numeric values and returns "true" if value1 is greater than value2 and "false" if value1 is less than or equal to value2 (value1 > value2).
  • value1: The numeric value to compare to value2.
  • value2: The numeric value to compare to value1.
LE

value1
value2

This function compares numeric values and returns "true" if value1 is less than or equal to value2 and "false" if value1 is greater than value2 (value1 <= value2).
  • value1: The numeric value to compare to value2.
  • value2: The numeric value to compare to value1.
LT

value1
value2

This function compares numeric values and returns "true" if value1 is less than value2 and "false" if value1 is greater than or equal to value2 (value1 < value2).
  • value1: The numeric value to compare to value2.
  • value2: The numeric value to compare to value1.
AND

logical1
logical2

This logical function evaluates two or more logical statements (logical1, logical2) and returns "true" if all conditions are met or "false" if any condition is not met. All logical statements must evaluate to the same data type.
  • logical1: The first logical condition to evaluate. This argument needs to include a logical function that returns a boolean value, such as AND, OR, GT, GE, LE, LT, EQUAL, NOT_EQUAL, ISNULL, NOT, IN.
  • logical2: The second logical condition to evaluate. This argument also needs to include a logical function that returns a boolean value.
OR

logical1
logical2

This logical function evaluates two or more logical statements (logical1, logical2) and returns "true" if any of the conditions are met or "false" if none of the conditions are met. All logical statements must evaluate to the same data type.
  • logical1: The first logical condition to evaluate. This argument needs to include a logical function that returns a boolean value, such as AND, OR, GT, GE, LE, LT, EQUAL, NOT_EQUAL, ISNULL, NOT, IN.
  • logical2: The second logical condition to evaluate. This argument also needs to include a logical function that returns a boolean value.
NOT

logical

This logical function evaluates whether data does not meet the condition (logical) that you specify.
  • logical: The logical condition to evaluate. This argument needs to include a logical function that returns a boolean value, such as AND, OR, GT, GE, LE, LT, EQUAL, NOT_EQUAL, ISNULL, NOT, IN.
NUMERIC_ADD

v1
v2

This function adds the values of the numeric expressions that you specify (v1 + v2).
  • v1: The first addend in the equation.
  • v2: The second addend in the equation.
DIVIDE

v1
v2

This function divides the values of the numeric expressions that you specify (v1/v2).
  • v1: The divisor in the equation.
  • v2: The dividend in the equation.
MULTIPLY

v1
v2

This function multiplies the values of the numeric expressions that you specify (v1 x v2).
  • v1: The multiplicand in the equation.
  • v2: The multiplier in the equation.
NUMERIC_SUBTRACT

v1
v2

This function subtracts the values of the numeric expressions that you specify (v1 - v2).
  • v1: The minuend in the equation.
  • v2: The subtrahend in the equation.
LOOKUP

from
get
fields
values

This function enables you to look up values in a supplemental table. LOOKUP joins the from lookup table to a source table on the columns specified in the fields and values arguments. The function returns the value from get in the lookup table that corresponds to each row’s value in the values argument.
  • from: The lookup table to perform the join against.
  • get: The field or property to retrieve from the lookup table in the from argument.
  • fields: The field or fields from the lookup table to compare with the values from the primary table.
  • values: The values from the primary table to compare with the fields from the lookup table.
MAP

value
map

This function retrieves values from a map that you define. The map is a collection of key/value pairs. The function uses the specified value as a key in the map and returns the value associated with the key.
  • value: The key or keys to use to look up the value from the map.
  • map: The map to use to look up the values. You can click the map argument name to open the Edit Map dialog box and define or change a map.
URI() When specified in the Source column in mappings, this function transforms the property values to URI format by concatenating each of the components that you specify in the function. To ensure that values with spaces and other characters are encoded as valid URIs, the URI function is often used with the ENCODE function.

For example, the following mapping for a "tickit_events" table transforms an "eventname" string to URI format by prepending "http://csi.com/" to the encoded event names:

This example mapping results in triples such as:

<tickit_events> <eventname> "http://csi.com/Rolling+Stones"

You can also enter URI() in the Target column to specify that the Expression in the Source column should be the URI of the entity that is being created. For example, the mapping below generates an entity URI by prepending "http://csi.com/Flight" to the flight number value:

The example URI specification results in triples such as:

<http://csi.com/Flight1234> <FLIGHT_NUMBER> 1234
<http://csi.com/Flight1234> <ORIGIN_AIRPORT> "BOS"
Related Topics