Perform Known Identity Resolution in Snowflake (US Data)

Abstract

LiveRamp's Known Identity Resolution capability in Snowflake allows you to resolve personally-identifiable information (PII) to LiveRamp's Known IDs. This allows you to consolidate PII to known identifiers for a person, place, or household.

Resolution of data in the known identity space allows for any known data (PII) to sit next to the identifier for consolidation (the Known ID). This opens up flexibility for managing customer profiles, consolidating data for enterprise use cases, and unlocking a consistent identity framework that integrates with marketing and CDP workflows as well as LiveRamp’s network.

Note

For more information about Known IDs, see "LiveRamp Known IDs".

The following identifiers can be resolved:

Names
Postal addresses
Email addresses
Phone numbers
Hashed email addresses

Based on the limit you set in the metadata table, you can either receive one Known ID per row (LiveRamp’s recommendation for the “best” ID) or up to ten Known IDs per row.

These capabilities are available within Snowflake through a native app, which creates a share to your account, opening up a view to query the reference data set from within your own Snowflake environment. See "LiveRamp Embedded Identity in Snowflake" for more information.

Performing identity resolution with the LiveRamp Identity native app requires the creation of two tables:

A metadata table, indicated in the sample SQL as customer_meta_table_name.
An input table, indicated in the sample SQL as customer_input_table_name.

Overall Steps

After you've set up the LiveRamp Identity native app in Snowflake (see "Set Up the LiveRamp Native App in Snowflake" for instructions), perform the following steps to perform known identity resolution:

Create the input table(s) for the appropriate identity resolution operation.
Specify the variables to be used in the calls.
Create the metadata table for the appropriate identity resolution operation.
Note
If resolving the same dataset for both known and pseudonymous resolution, note that, although the same input table can be used, the client credentials will differ between the two operations so a different metadata table will need to be used.
Set up permissions for the tables to be used for identity resolution.
Perform the appropriate identity resolution process, depending on the identifiers being resolved.
View the output table.

See the sections below for information on performing these tasks.

Note

The LiveRamp Identity native app is parameterized and relies on variables set by the user with the sample SQL in the Execution worksheet. When executing an identity resolution operation, those variables must be set during each active session.

Create the Input Table for Known Identity Resolution

An input table needs to be prepared for each known identity resolution operation.

When creating tables, keep the following guidelines in mind:

The column names for the input table can be whatever you want to use, as long as the names match the values specified in the metadata table.
Do not include both PII and hashed email addresses in the same input table. A separate input file per job type must be created.
All identifiers to be resolved should be included in the input table.
Do not use any column names that are the same as the column names returned in the output table for the identity resolution operation you're going to run.
Every column name must be unique in a table.
Try not to use additional columns in the input tables required for the identity resolution operation as having extra columns slows down processing.
Per Snowflake guidelines, table names cannot begin with a number.

See the sections below for suggested input table columns and descriptions for each resolution type.

The output table is created by the operation that you run. For an example, see the sections in "View the Output Table" below.

Input Table Columns for PII Resolution

Managing data in the known identity space does not require the same privacy protections as the pseudonymous RampID identity space and, as noted in prior sections, all attribute data included in the input table will pass through to the output table.

Note

This offering includes running our Identity Graph on demand. Our testing indicates that a warehouse size of 2XL that is not being used for any other operations performs best for average workloads, but this is dependent on your individual setup. Contact your LiveRamp representative to understand the performance implications of your setup.

These column names cannot be used in the input table for PII resolution:

knownId
__lr_rank
__lr_filter_ name
__lr_metadata

See the table below for a list of the suggested input table columns and descriptions for PII resolution.

Suggested Column Name	Example	Description
`first_name`	John	You can include separate first name and last name columns or you can combine first name and last name in one column (such as “name”).
`last_name`	Doe	You can include separate first name and last name columns or you can combine first name and last name in one column (such as “name”).
`address_1`	123 Main St	You can include separate address 1 and address 2 columns or you can combine all street address information in one column (such as “address”).
`address_2`	Apt 1	Include values in this column if you have additional street address info for a given row. You can include separate address 1 and address 2 columns or you can combine all street address information in one column (such as “address”).
`city`	Smalltown	When matching on address, `city` is optional.
`state`	CA	When matching on address, `state` is optional. If including State, must be a two-character, capitalized abbreviation ("CA", not "California" or "Ca").
`zip`	12345	Required when matching on addresses. Can be in 5-digit format or 9-digit format (ZIP+4).
`email`	john@email.com	Plaintext emails only. Only one plaintext email per input row is permitted. Other emails must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates. All emails must meet these requirements: Have characters before and after the "@" sign Contain a period character (".") Have characters after the period character Examples of valid emails include: a@a.com A@A.COM email@account.com EMAIL@ACCOUNT.COM email@sub.domain.com EMAIL@SUB.DOMAIN.COM
`phone`	555-123-4567	Plaintext phone numbers only. Only one phone number per input row is permitted. Other phone numbers must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates. All phone numbers must meet these requirements: Can be more than 10 characters if leading numbers over 10 characters are “0” or “1” If no leading numbers are used, must be 10 characters long Can contain hyphens ("-"), parentheses ("(" or ")"), plus signs ("+"), and periods (".") Examples of valid phone numbers include: 8668533267 866.853.3267 (866) 853-3267 8668533267 +1 (866) 853-3267 +18668533267 18668533267 1111111118668533267 08668533267 Examples of invalid phone numbers include: 987654321 (fewer than 10 characters) 98765432109 (more than 10 characters) 1234567890 (after removing the leading "1", less than 10 characters remain) 0987654321 (after removing the leading "0", less than 10 characters remain)
`attribute_1`		For PII resolution, any attribute data passed in will be included in the output table. Per LiveRamp guidelines, online behavioral data must be kept in the pseudonymous identity space, resolved via RampID.

Input Table Columns for Email-Only Resolution

The standard email-only resolution process operates similarly to PII resolution.

See the table below for a list of the suggested input table columns and descriptions for email-only resolution.

Suggested Column Name	Example	Description
`hashed_email`	8c9775a5999b5f0088008c0b26d7fe8549d5c80b0047784996a26946abac0cef	SHA-256, MD5, and SHA-1 hashed emails accepted. Email addresses should be lowercased and UTF-8 encoded prior to hashing. After hashing, convert the resulting hash into lowercase hexadecimal representation. For an example of hashing in Snowflake, see the "Snowflake Hashing Example" section below.
`attribute_1`	Male	For email address resolution, any attribute data passed in will be included in the output table. Per LiveRamp guidelines, online behavioral data must be kept in the pseudonymous identity space, resolved via RampID.

Snowflake Hashing Example

The following code snippet shows an example of hashing emails in Snowflake:

-- SHA-1
SELECT SHA1(TRIM(LOWER('  LiveRamp@example.com  '))); 
-- Expected result: 91ac4ee2ca1782581f12d865a6779eb179f8b22a

-- MD5
SELECT MD5(TRIM(LOWER('  LiveRamp@example.com  ')));
-- Expected result: 39c324aa0c7a3ee896884fe0cf086f0c

-- SHA256
SELECT SHA2(TRIM(LOWER('  LiveRamp@example.com  '))); 
-- Expected result: 28324c709525ec8eda8aac51dfb36730262bc3051402250131c4c81fa453df8c

Specify the Variables

To specify the variables to be used for the operation:

Open the Execution Steps worksheet with the sample SQL for execution.
Update the following variables in the sample SQL that is shown below and then run the SQL:
- DATABASE: The name of your database.
- PUBLIC: The name of the schema that holds the tables.
- INPUT_TABLE: The name of the input table(s) to use for the operation.
- META_TABLE: The name of the metadata table to use for the operation.
- OUTPUT_TABLE: The name of the output table that will be created after the operation has been run.
- IDENTITY_RESOLUTION_AND_TRANSCODING: The name of the database the native app is loaded to.

--Update this section with the appropriate variables
set customer_db_name = 'DATABASE';
set customer_schema_name = concat($customer_db_name, '.', 'PUBLIC');
set customer_input_table_name = concat($customer_schema_name, '.', 'INPUT_TABLE');
-- If there are multiple input tables add additional variables
set customer_input_table_name_2 = concat($customer_schema_name, '.', 'INPUT_TABLE_2');
set customer_meta_table_name = concat($customer_schema_name, '.', 'META_TABLE');
set output_table_name = 'OUTPUT_TABLE';

-- Name of the installed application
set application_name = 'IDENTITY_RESOLUTION_AND_TRANSCODING';

Create the Metadata Table

A metadata table can be reused for multiple operations of the same execution type (such as “pii”), but a separate metadata table must be prepared for each different identity resolution operation per identity space (“Known ID” vs. “RampID”). So even though the same input table can be used in both identity spaces, the credentials differ between identity spaces, so different metadata tables are required.

For known resolution, multiple identifiers and metadata options are available and can be configured in the metadata table. If no variables are configured, the default option will be returned, which includes identifiers and metadata as follows:

Known ID: The person-based identifier, including both maintained and derived identifier types.
Metadata:
- __lr_filter_name returns the filter name where the match occurred, which will be one of the following options:
  - name_address_zip
  - name_email
  - name_phone
  - partial_name_email
  - partial_name_phone
  - strict_name (name + zip)
  - email
  - phone
  - last_name_address
- __lr_rank provides insight on the match cascade level associated with the identifiers. If no maintained Known ID is found, this value will be "null".
Optional identifiers that can be configured to be returned in the output:
- Household ID: The household-based identifier associated with maintained entities. This represents the household grouping of individuals that live and move together.
- Place ID: A place identifier associated with an address entity in our graph.
- ConsumerLink: Another person-based identifier. For this deployment, only maintained ConsumerLinks can be returned.

Beyond configuring the identifier type, match metadata can also be configured. See the sections below for metadata identity configurability options.

See the sections below for instructions on creating the metadata table.

Best Contact Metadata

This set of optional metadata provides flags that can indicate whether a particular postal address, phone number, or email address is the primary one for that individual. Best Contact flags are Boolean operators (TRUE or FALSE) that indicate if the address, email, or phone that was sent matches the one that the Known Identity Graph has determined is the best one for that user.

Best contact flag use cases include:

Determine which record to designate as the primary one when making consolidation decisions.
Target campaigns to gather better contact data for existing customers.
Coordinate multiple touchpoint campaigns.

If a record is used that includes all of the best touchpoints, we will return "true" for all three flags. These flags will be returned in a _lr_metadata column.

The "clickVerifyDate" flag is a signal that is used in designating the best email address for a user that could have other uses in building data assets or scoring email address data. Because this signal relies on specific deterministic signals from a small subset of LiveRamp match data contributors, it appears on a subset of "Best Email" addresses and should not be used as the only filter for understanding active email addresses.

These flags are only available for Known ID person-based identifiers.

Best contact flags include the information shown in the table below:

Flag Name	Description	Example Value
isBestPostalTouchpoint	This flag will return "TRUE" if the address used for making a match is the best one for that consumer in the graph.	TRUE
isBestPhoneTouchpoint	This flag will return "TRUE" if the phone number used for making a match is the best one for that consumer in the graph.	TRUE
isBestEmailTouchpoint	This flag will return "TRUE" if the email address used for making a match is the best one for that consumer in the graph.	TRUE
clickVerifyDate	This flag indicates that a source in LiveRamp’s match network has verified a click on a link within a user’s email address.	TRUE

Create the Metadata Table for PII Resolution

To create the metadata table for PII resolution:

Update the following variables in the sample SQL from the Execution worksheet shown below and then run the SQL:
- <client_id>: Enter either an existing client ID or a new one provided in implementation.
- <client_secret>: Enter the password/secret for the client ID.
- Default output identifier:
  - knownId: A person-based identifier, used for consolidating fragmented consumer touchpoints
- Optional output identifiers:
  - householdId: The household representation of individuals that live and move together
  - placeId: The identifier that represents a place or address within the LiveRamp Known Identity Graph
  - consumerLink: An additional person-based identifier (only maintained identifiers available)
- Default output metadata:
  - matchLevel
  - __lr_rank: Provides insight on the match cascade level associated with the identifiers. If no maintained Known ID is found, this value will be "null".
- Optional output metadata:
  - rank: Provides insight on the match cascade level associated with the identifiers. If no maintained Known ID is found, this value will be "null".
  - clickVerifyDate: This flag indicates that a source in LiveRamp’s match network has verified a click on a link within a user’s email address.
  - isBestEmailTouchpoint: This flag will return "TRUE" if the email address used for making a match is the best one for that consumer in the graph.
  - isBestPhoneTouchpoint: This flag will return "TRUE" if the phone number used for making a match is the best one for that consumer in the graph.
  - isBestPostalTouchpoint: This flag will return "TRUE" if the postal address used for making a match is the best one for that consumer in the graph.
  - distinctMatch: The distinct match flag for person and place documents is only equal to true when there is only one distinct document ID for a given match cascade step.
- <targetColumns>:
  - <city column>: Enter the name of the column to be used for the "city" element.
  - <state column>: Enter the name of the column to be used for the "state" element.
  - <zipcode column>: Enter the name of the column to be used for the "zipcode" element.
  - <phone column>: Enter the name of the column to be used for the "phone" element.
  - <email column>: Enter the name of the column to be used for the "email" element.

Note

You can specify the number of Known IDs to be returned for each set of PII, to a maximum of 10 Known IDs per PII set. By default, 1 Known ID will be returned per PII set. To change this, replace the “1” in the “as limit_val” line at the bottom of the SQL sample shown below with your desired number.

create or replace table identifier($customer_meta_table_name) as
select
{
    'client_id': TO_VARCHAR(DECRYPT(ENCRYPT('liveramp_client', 'HideFromLogs'), 'HideFromLogs'), 'utf-8'),
    'client_secret': TO_VARCHAR(DECRYPT(ENCRYPT('84159be2-ab93-4bf8-24c9-2g123ef08815', 'HideFromLogs'), 'HideFromLogs'), 'utf-8'),
    'executionMode': 'resolution',
    'executionType': 'pii',
    'inputTable': $customer_input_table_name,
 'targetColumns': {
    'name': ['FIRSTNAME', 'LASTNAME'],
    'streetAddress': ['ADDRESSLINE', 'ADDRESSLINE2'],
    'city': 'CITY',
    'state': 'STATE',
    'zipCode': 'ZIPCODE',
    'phone': 'PHONE',
    'email': 'EMAIL'
  },
'outputTable': $output_table_name
    'limit': 1,
    'outputIdentifiers': [
	    'knownId',
	    'householdLink',
	    'placeID'
    ],
    'outputMetadata':[
	    'matchLevel',
	    'rank',
	    'isBestEmailTouchpoint'
    ]
}::variant as config;

Example of PII Resolution Metadata Table SQL

The populated SQL with the suggested input column names might look like the example shown below:

create or replace table identifier($customer_meta_table_name) as
select
{
    'client_id': TO_VARCHAR(DECRYPT(ENCRYPT('liveramp_client', 'HideFromLogs'), 'HideFromLogs'), 'utf-8'),
    'client_secret': TO_VARCHAR(DECRYPT(ENCRYPT('84159be2-ab93-4bf8-24c9-2g123ef08815', 'HideFromLogs'), 'HideFromLogs'), 'utf-8'),
    'executionMode': 'resolution',
    'executionType': 'pii',
    'inputTable': $customer_input_table_name,
    'limit': 1,
    'outputIdentifiers': [
	    'knownId',
	    'householdLink',
	    'placeID'
    ],
    'outputMetadata':[
	    'matchLevel',
	    'rank',
	    'isBestEmailTouchpoint'
    ],
    'outputTable': $output_table_name,
    'targetColumns': {
      'name': ['<up to 4 name column names>'],
      'streetAddress': ['<up to 7 address column names>'],
      'city': '<city column>',
      'state': '<state column>',
      'zipCode': '<zipcode column>',
      'phone': '<phone column>',
      'email': '<email column>'
    }
}::variant as config;

Create the Metadata Table for Email Resolution

To create the metadata table for email resolution:

Update the following variables in the sample SQL from the Execution worksheet shown below and then run the SQL:
- <client_id>: Enter either an existing client ID or a new one provided in implementation.
- <client_secret>: Enter the password/secret for the client ID.
- <column to be resolved>: Enter the name of the column containing the email addresses to be resolved.

Note

You can specify the number of Known IDs to be returned for each email, to a maximum of 10 Known IDs per email. By default, 1 Known ID will be returned per email. To change this, replace the "1" in the "as limit_val" line at the bottom of the SQL sample shown below with your desired number.

-- email
-- FOR EMAIL Update the parameters here for the metadata table
create or replace table identifier($customer_meta_table_name) as
select
    TO_VARCHAR(DECRYPT(ENCRYPT('<client_id>', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_id,
    TO_VARCHAR(DECRYPT(ENCRYPT('<client_secret>', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_secret,
    ‘executionMode': 'resolution',
        'executionType': 'email'
'inputTable': $customer_input_table_name,
 'targetColumn': '<column to be resolved>'
   'outputTable': $output_table_name,
'limit': 1,
'outputIdentifiers':[
	'knownId', 
	'householdLink', 
	'placeId'
],
'outputMetadata':[
	'matchLevel', 
	'rank', 
	'isBestEmailTouchpoint'
]
}::variant as config;

Set Up Permissions

To set up the permissions for the tables used for translation, run the SQL in the Execution Steps worksheet shown below:

Note

This SQL utilizes the variables that were set up in the “Specify the Variables” section above.

--The remainder of the commands should be run for ALL JOB TYPES, please switch to the Native App database and schema and execute the procedure.  Once completed, please run check_for_output for the output table to be written in the appropriate Job Schema

grant usage on database identifier ($customer_db_name) to application identifier($application_name);
grant usage on schema identifier ($customer_schema_name) to application identifier($application_name);
grant select on table identifier ($customer_input_table_name) to application identifier($application_name);
grant select on table identifier ($customer_meta_table_name) to application identifier($application_name);

use database identifier ($application_name);
use schema lr_app_schema;

Perform the Identity Resolution Operation

Once you’ve completed the previous steps, you’re ready to perform the identity resolution operation.

You perform an identity resolution operation by running the identifier resolution procedure shown below. You can then view the output table to check the results.

To perform the identity resolution operation:

Locate the lr_resolution_and_trancoding procedure shown below and run that SQL:
```
call lr_resolution_and_transcoding(
	$customer_meta_table_name
);
```

Once the app returns a success message, the output should be displayed in the native app database under lr_app_schema.

If Snowflake returns a status message of Error, check the error message for any information to help you fix the issue and then try running the operation again. For some issues, the error message will direct you to contact LiveRamp Support. For more information, see "Snowflake Operation Error Codes".

The results end up in the output table in the same database, with the fields shown in the appropriate section below.

View the Output Table

The identity resolution results end up in the output table in the dataset specified during the procedure call.

Once you've confirmed that the output table has been generated, see the appropriate section below for information on the output table format for the type of identity resolution operation that was run.

If for any reason you need to drop the output table, update the parameters in the following command and run:

call DROP_OUTPUT_TABLE(
    '<table_name>'
);

View the PII Resolution Output Table

For known resolution, the identifiers and metadata returned is configurable, as outlined in the "Create the Metadata Table" section above. Depending on the values included, the output table will vary based on the following:

Identifiers included. The default configuration is person-based Known ID, however other identifier types can be output including: Household ID, Place ID, and maintained ConsumerLinks.
Note
Both maintained and derived person-based Known IDs can be returned. Household IDs, Place IDs, and ConsumerLinks are only available for maintained entities.
Metadata included. Supplemental match metadata is included for additional insight into the linkage. This includes information on the match cascade level and filter where the match occurred. Additional metadata bundles are available for maintained Known IDs.

For PII resolution, the default output table includes the columns shown in the table below.

Column	Example	Description
`knownId`	T32100US00ySyxMl2h0ypHBXEymO2-1wl1vYkw	Returns the resolved person-based Known ID in your domain encoding.
`attribute_1`	Male	Any attribute columns passed through the service are returned.
`__lr_rank`	1	Provides insight on the match cascade level associated with the identifiers. If no maintained Known ID is found, this value will be "null".
`__lr_filter_name`	name_phone	Returns the filter name where the match occurred, which will be one of the following options: `name_address_zip` `name_email` `name_phone` `partial_name_email` `partial_name_phone` `strict_name` (name + zip) `email` `phone` `last_name_address` If no maintained Known ID is found, this value will be "null".

For PII resolution, additional configurability options (such as including additional identifiers and including additional metadata) could produce an output table that includes the following:

Column	Example	Description
`knownId`	T32100US00ySyxMl2h0ypHBXEymO2-1wl1vYkw	Returns the resolved person-based Known ID in your domain encoding.
`householdId`	T32100US031Kdcb5EcDgnly95h9ZMwKbl-TPYv	Returns the resolved household-based ID in your domain encoding.
`placeId`	T32100US02Das0oElIhVTaSQvxnnRauu3s2RYI	Returns the resolved place-based ID in your domain encoding.
`attribute_1`	Male	Any attribute columns passed through the service are returned.
`__lr_metadata`	{“rank”: “1”, "matchLevel": “name_address_zip”, “isbestemailtouchpoint”: “true”}	Provides output for the metadata configured. In this example: “rank” provides insight on the match cascade level associated with the identifiers for maintained Known IDs. “matchLevel” returns the filter name where the match occurred. `name_address_zip` `name_email` `name_phone` `partial_name_email` `partial_name_phone` `strict_name` (name + zip) `email` `phone` `last_name_address` “IsBestEmailTouchpoint” returns "TRUE" if the email address used for making a match is the best one for that consumer in the graph.

View the Email-Only Resolution Output Table

The email-only resolution process operates similarly to PII resolution. For known resolution, the identifiers and metadata returned is configurable, as outlined in the "Create the Metadata Table" section above. Depending on the values included, the output table will vary based on:

Identifiers included. The default configuration is person-based Known ID, however other identifier types can be output including: Household ID, Place ID, and maintained ConsumerLinks.
Note
Both maintained and derived person-based Known IDs can be returned. Household IDs, Place IDs, and ConsumerLinks are only available for maintained entities.
Metadata included. Supplemental match metadata is included for additional insight into the linkage. This includes information on the match cascade level and filter where the match occurred. Additional metadata bundles are available for maintained Known IDs.

For email-only resolution, the output table includes the columns shown in the table below.

Column	Example	Description
`knownId`	T32100US00ySyxMl2h0ypHBXEymO2-1wl1vYkw	Returns the resolved person-based Known ID associated with the email address in your domain encoding. If multiple Known IDs are associated with an email address and the `limit` parameter is set to greater than “1”, multiple lines will be created in the output file.
`attribute_1`	Male	Any attribute columns passed through the service are returned.

In this section:

Known IDsKnown IDs

Perform Known Identity Resolution in Snowflake (US Data)

Note

Overall Steps

Note

Note

Create the Input Table for Known Identity Resolution

Input Table Columns for PII Resolution

Note

Input Table Columns for Email-Only Resolution

Snowflake Hashing Example

Specify the Variables

Create the Metadata Table

Best Contact Metadata

Create the Metadata Table for PII Resolution

Note

Example of PII Resolution Metadata Table SQL

Create the Metadata Table for Email Resolution

Note

Set Up Permissions

Note

Perform the Identity Resolution Operation

View the Output Table

View the PII Resolution Output Table

Note

View the Email-Only Resolution Output Table

Note

Search results