Perform Identity Resolution in Snowflake
LiveRamp's Identity Resolution capability in Snowflake allows for the translation of various identifiers to RampIDs. This allows you to resolve personally-identifiable information (PII) or device identifiers to a persistent pseudonymous identifier for persons and households. You can also input an individual-based RampID and get back any household-based RampID that might be associated with that individual.
LiveRamp's Identity Resolution capability in Snowflake allows for the translation of various identifiers to RampIDs. This allows you to resolve personally-identifiable information (PII) or device identifiers to a persistent pseudonymous identifier for persons and households. You can also input an individual-based RampID and get back any household-based RampID that might be associated with that individual.
Caution
This content is for customers utilizing the latest version of the LiveRamp Identity Resolution and Translation native app in Snowflake (starting October 2023). For customers utilizing previous versions of LiveRamp's native apps in Snowflake (the LiveRamp Identity Resolution native app or the LiveRamp Translation native app), see this documentation.
Note
For more information about RampID, see "RampID Methodology".
Once you've translated your data to RampIDs, you can then share that data to your LiveRamp account for activation. For more information, see "Share Data from Snowflake to Your LiveRamp Account".
The following identifiers can be resolved:
Names
Postal addresses
Email addresses
Phone numbers
Cookies
MAIDs (mobile device IDs)
CTV IDs (Connected TV Device IDs)
CIDs (custom identifiers)
Person-based, maintained RampIDs (for resolution to household RampIDs)
Based on the type of identifier you’re resolving, you might receive one RampID per identifier or multiple RampIDs per identifier:
For PII touchpoints (in PII or email resolution), you can choose to receive from 1 to10 RampIDs (if available).
Typically for cookie and mobile device ID resolution, one RampID is returned, given that the devices are not normally shared.
For CTV identifiers it is common to receive multiple individual RampIDs per identifier.
For CIDs, the number of RampIDs will vary.
When resolving individual RampIDs to household RampIDs, only one household RampID is returned.
These capabilities are available within Snowflake through a native app, which creates a share to your account, opening up a view to query the reference data set from within your own Snowflake environment. See "LiveRamp Embedded Identity in Snowflake" for more information.
Performing identity resolution with the LiveRamp Identity native app requires the creation of two tables:
A metadata table, indicated in the sample SQL as
customer_meta_table_name
.An input table, indicated in the sample SQL as
customer_input_table_name
.
Overall Steps
After you've set up the LiveRamp Identity native app in Snowflake (see "Set Up the Native App" for instructions), perform the following steps to perform identity resolution:
Create the input table for the appropriate identity resolution operation.
Note
An input table needs to be prepared for each identity resolution operation and can only contain one type of identifiers.
Specify the variables to be used in the calls.
Create the metadata table for the appropriate identity resolution operation.
Note
A metadata table can be reused for multiple operations, but a separate metadata table must be prepared for each different identity resolution operation you want to perform. For example, if you’re going to perform identity resolution on both MAIDs and hashed emails, you’ll need a different metadata table for each operation.
Set up permissions for the tables to be used for identity resolution.
Perform the appropriate identity resolution process, depending on the identifiers being resolved.
See the sections below for information on performing these tasks.
Note
The LiveRamp Identity native app is parameterized and relies on variables set by the user with the sample SQL in the Execution worksheet. When executing an identity resolution operation, those variables must be set during each active session.
Create the Input Table for Identity Resolution
An input table needs to be prepared for each identity resolution operation.
When creating tables, keep the following guidelines in mind:
The column names for the input table can be whatever you want to use, as long as the names match the values specified in the metadata table.
Do not use any column names that are the same as the column names returned in the output table for the identity resolution operation you're going to run.
Every column name must be unique in a table.
Try not to use additional columns in the input tables required for the identity resolution operation as having extra columns slows down processing.
Per Snowflake guidelines, table names cannot begin with a number.
See the sections below for suggested input table columns and descriptions for each resolution type.
The output table is created by the operation that you run. For an example, see the sections in "View the Output Table" below.
Input Table Columns for PII Resolution
The PII resolution process passes the data through a privacy filter which removes the PII and reswizzles the table. Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.
Note
This offering includes running our Identity Graph on demand. Our testing indicates that a warehouse size of 3XL that is not being used for any other operations performs best for average workloads, but this is dependent on your individual setup. Contact your LiveRamp representative to understand the performance implications of your setup.
When resolving email data only, using the email-only resolution operation can provide higher throughput compared to full PII resolution. This is dependent on warehouse and setup, so talk with your LiveRamp team to determine the best approach for the use case.
These column names cannot be used in the input table for PII resolution:
RampID
__lr_rank
__lr_filter_ name
See the table below for a list of the suggested input table columns and descriptions for PII resolution.
Suggested Column Name | Example | Description |
---|---|---|
| John | You can include separate First Name and Last Name columns or you can combine first name and last name in one column (such as “Name”). |
| Doe | You can include separate First Name and Last Name columns or you can combine first name and last name in one column (such as “Name”). |
| 123 Main St | |
| Apt 1 | You can include separate Address 1 and Address 2 columns or you can combine all street address information in one column (such as “Address”). |
| Smalltown | When matching on address, City is optional. |
| CA |
|
| 12345 |
|
| john@email.com |
|
| 555-123-4567 |
|
| For PII resolution, you can include columns with attribute data. These columns will be returned in the output table (for more information, see the "View the PII Resolution Output Table" section below). |
Input Table Columns for Email-Only Resolution
The email-only resolution process operates similarly to PII resolution. Any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.
Note
When resolving email data only, using email-only resolution can provide higher throughput compared to full PII resolution. Talk with your LiveRamp team to determine the best approach for the use case.
To perform identity resolution across additional PII touchpoints, see the “View the PII Resolution Output Table” section above.
See the table below for a list of the suggested input table columns and descriptions for email-only resolution.
Suggested Column Name | Example | Description |
---|---|---|
| 8c9775a5999b5f0088008c0b26d7fe8549d5c80b0047784996a26946abac0cef |
|
| Male | For email address resolution, you can include columns with attribute data. These columns will be returned in the output table (for more information, see the "View the Email-Only Resolution Output Table" section below). |
Input Table Columns for Device ID Resolution
The device ID resolution operation can be used for the following purposes:
To translate device identifiers (cookies, MAIDs, and CTV IDs) into individual RampIDs
To translate individual RampIDs into their associated household RampIDs
See the tables below for a list of the suggested input table columns and descriptions for these device ID resolution options.
Note
Each device ID resolution input table should contain only one identifier column (either a device identifier or a maintained RampID).
You can include columns with attribute data, but these columns will not be returned in the output table.
See the table below for a list of the suggested input table columns and descriptions for translating device identifiers.
Suggested Column Name | Example | Description |
---|---|---|
| 1f4d256c-1f08-41f6-a108-bbe511de9497 | Can be one of the following identifiers:
|
See the table below for a list of the suggested input table columns and descriptions for translating individual RampIDs into their associated household RampIDs.
Suggested Column Name | Example | Description |
---|---|---|
| XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd |
|
Input Table Columns for CID Resolution
See the table below for a list of the suggested input table columns and descriptions for CID resolution.
Note
You can include columns with attribute data, but these columns will not be returned in the output table.
Suggested Column Name | Example | Description |
---|---|---|
| b916clarib la1;blNj10gtQjQ3QUEwMTNEMTcaktboEc0g9022cxoiaklr20185 | The CID for translation to a RampID |
Specify the Variables
To specify the variables to be used for the operation:
Open the Execution Steps worksheet with the sample SQL for execution.
Update the following variables in the sample SQL shown below and then run the SQL:
DATABASE
: The name of your database.PUBLIC
: The name of the schema that holds the tables.INPUT_TABLE
: The name of the input table to use for the operation.META_TABLE
: The name of the metadata table to use for the operation.METRICS_TABLE
: The name of the metrics table to use for the operation.LOGGING_TABLE
: The name of the logging table to use for the operation.OUTPUT_TABLE
: The name of the output table that will be created after the operation has been run.IDENTITY_RESOLUTION_AND_TRANSCODING
: The name of the database the native app is loaded to.
--Update this section with the appropriate variables set customer_db_name = 'DATABASE'; set customer_schema_name = concat($customer_db_name, '.', 'PUBLIC'); set customer_input_table_name = concat($customer_schema_name, '.', 'INPUT_TABLE'); set customer_meta_table_name = concat($customer_schema_name, '.', 'META_TABLE'); set customer_metrics_table_name = concat($customer_schema_name, '.', 'METRICS_TABLE'); set customer_logging_table_name = concat($customer_schema_name, '.', 'LOGGING_TABLE'); set output_table_name = 'OUTPUT_TABLE'; -- Name of the installed application set application_name = 'IDENTITY_RESOLUTION_AND_TRANSCODING';
Create the Metadata Table
A metadata table can be reused for multiple operations, but a separate metadata table must be prepared for each different identity resolution operation you want to perform. For example, if you’re going to perform identity resolution on both MAIDs and hashed emails, you’ll need a different metadata table for each operation.
See the sections below for instructions on creating the metadata table for each identity resolution type.
Create the Metadata Table for PII Resolution
To create the metadata table for PII resolution:
Update the following variables in the sample SQL from the Execution worksheet shown below and then run the SQL:
<client_id>
: Enter either an existing client ID or a new one provided in implementation.<client_secret>
: Enter the password/secret for the client ID.<up to 4 name column names>
: Enter the names of the column(s) in the input table to be used for the “name” element. Each column name should be enclosed in double quotes. Enter a maximum of 4 name columns. If entering multiple column names, separate the column names with commas.<up to 7 address column names>
: Enter the names of the column(s) in the input table to be used for the “address” element. Each column name should be enclosed in double quotes. Enter a maximum of 7 address columns. If entering multiple column names, separate the column names with commas.<city column>
: Enter the name of the column to be used for the “city” element.<state column>
: Enter the name of the column to be used for the “state” element.<zipcode column>
: Enter the name of the column to be used for the “zipcode” element.<phone column>
: Enter the name of the column to be used for the “phone” element.<email column>
: Enter the name of the column to be used for the “email” element.
Note
You can specify the number of RampIDs to be returned for each set of PII, to a maximum of 10 RampIDs per PII set. By default, 1 RampID will be returned per PII set. To change this, replace the “1” in the “as limit” line at the bottom of the SQL sample shown below with your desired number.
-- FOR PII Update the parameters here for the metadata table -- Not all PII types need to be present, remove unused entries from the target_columns JSON create or replace table identifier($customer_meta_table_name) as select TO_VARCHAR(DECRYPT(ENCRYPT('<client_id>', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_id, TO_VARCHAR(DECRYPT(ENCRYPT('<client_secret>', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_secret, 'resolution' as execution_mode, 'pii' as execution_type, parse_json($$ { "name": ["<up to 4 name column names>"], "streetAddress": ["<up to 7 address column names>"], "city": "<city column>", "state": "<state column>", "zipCode": "<zipcode column>", "phone": "<phone column>", "email": "<email column>" } $$) as target_columns, 1 as limit;
The populated SQL with the suggested input column names might look like the example show below:
-- FOR PII Update the parameters here for the metadata table -- Not all PII types need to be present, remove unused entries from the target_columns JSON create or replace table identifier($customer_meta_table_name) as select TO_VARCHAR(DECRYPT(ENCRYPT('liveramp_client', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_id, TO_VARCHAR(DECRYPT(ENCRYPT('84159be2-ab93-4bf8-24c9-2g123ef08815', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_secret, 'resolution' as execution_mode, 'pii' as execution_type, parse_json($$ { "name": ["first_name",”last_name”], "streetAddress": ["address_1",”address_2], "city": "city", "state": "state", "zipCode": "zip", "phone": "phone", "email": "email" } $$) as target_columns, 1 as limit;
Create the Metadata Table for Email Resolution
To create the metadata table for email resolution:
Update the following variables in the sample SQL from the Execution worksheet shown below and then run the SQL:
<client_id>
: Enter either an existing client ID or a new one provided in implementation.<client_secret>
: Enter the password/secret for the client ID.<column to be resolved>
: Enter the name of the column containing the email addresses to be resolved.
Note
You can specify the number of RampIDs to be returned for each email, to a maximum of 10 RampIDs per email. By default, 1 RampID will be returned per email. To change this, replace the “1” in the “as limit” line at the bottom of the SQL sample shown below with your desired number.
-- email -- FOR EMAIL Update the parameters here for the metadata table create or replace table identifier($customer_meta_table_name) as select TO_VARCHAR(DECRYPT(ENCRYPT('<client_id>', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_id, TO_VARCHAR(DECRYPT(ENCRYPT('<client_secret>', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_secret, 'resolution' as execution_mode, 'email' as execution_type, '<column to be resolved>' as target_column, 1 as limit;
Create the Metadata Table for Device ID Resolution
To create the metadata table for device ID resolution:
Update the following variables in the sample SQL from the Execution worksheet shown below and then run the SQL:
<client_id>
: Enter either an existing client ID or a new one provided in implementation.<client_secret>
: Enter the password/secret for the client ID.<execution_type>
: Enter the execution type:Enter
Cookies
to resolve cookiesEnter
MAID
to resolve mobile device IDsEnter
CTV
to resolve CTV IDsEnter
HHLink
to resolve individual RampIDs into household RampIDs
<column to be resolved>
: Enter the name of the column containing the device identifiers (cookies, MAIDs, or CTV IDs) or RampIDs to be resolved.
--FOR DEVICE OR CID RESOLUTION Update the parameters here for the metadata table create or replace table identifier($customer_meta_table_name) as select TO_VARCHAR(DECRYPT(ENCRYPT('<client_id>', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_id, TO_VARCHAR(DECRYPT(ENCRYPT('<client_secret>', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_secret, 'resolution' as execution_mode, '<execution_type>' as execution_type, '<column to be resolved>' as target_column;
Create the Metadata Table for CID Resolution
To create the metadata table for CID resolution:
Update the following variables in the sample SQL from the Execution worksheet shown below and then run the SQL:
<client_id>
: Enter either an existing client ID or a new one provided in implementation.<client_secret>
: Enter the password/secret for the client ID.<execution_type>
: Enter the CID name provided to you by LiveRamp and specific to the mapping created.<column to be resolved>
: Enter the name of the column containing the CIDs to be resolved.
--FOR DEVICE OR CID RESOLUTION Update the parameters here for the metadata table create or replace table identifier($customer_meta_table_name) as select TO_VARCHAR(DECRYPT(ENCRYPT('<client_id>', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_id, TO_VARCHAR(DECRYPT(ENCRYPT('<client_secret>', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_secret, 'resolution' as execution_mode, '<execution_type>' as execution_type, '<column to be resolved>' as target_column;
Set Up Permissions
To set up the permissions for the tables used for translation, run the SQL in the Execution Steps worksheet shown below:
Note
This SQL utilizes the variables that were set up in the “Specify the Variables” section above.
--The remainder of the commands should be run for ALL JOB TYPES, please switch to the Native App database and schema and execute the procedure. Once completed, please run check_for_output for the output table to be written in the appropriate Job Schema grant usage on database identifier ($customer_db_name) to application identifier($application_name); grant usage on schema identifier ($customer_schema_name) to application identifier($application_name); grant select on table identifier ($customer_input_table_name) to application identifier($application_name); grant select on table identifier ($customer_meta_table_name) to application identifier($application_name); use database identifier ($application_name); use schema lr_app_schema;
Perform the Identity Resolution Operation
Once you’ve completed the previous steps, you’re ready to perform the identity resolution operation.
You perform an identity resolution operation by running the identifier resolution procedure shown below, which includes checking that the output has succeeded. You can then view the output table to check the results.
The output tables vary somewhat, depending on the type of identifiers being resolved.
To perform the identity resolution operation:
Locate the
lr_resolution_and_trancoding
procedure shown below and run that SQL.call lr_resolution_and_transcoding( $customer_input_table_name, $customer_meta_table_name, $output_table_name, $customer_logging_table_name, $customer_metrics_table_name );
The translate operation runs to completion.
Locate the
Check for output
procedure shown below and run that SQL.call check_for_output( $output_table_name );
If Snowflake returns a status message of
error
, run the procedure again until Snowflake returns a status message of "Success. Output Table: <OUTPUT_TABLENAME> is now available".Once the app returns a
success
message, the output should be displayed in the native app database underlr_app_schema
.
The results end up in the output table in the same database, with the fields shown in the appropriate section below.
View the Output Table
The identity resolution results end up in the output table in the same database you specified previously.
Once you've confirmed that the output table has been generated, see the appropriate section below for information on the output table format for the type of identity resolution operation that was run.
If for any reason you need to drop the output table, update the parameters in the following command and run:
call DROP_OUTPUT_TABLE( '<table_name>' );
View the PII Resolution Output Table
The PII resolution process passes the input table through a privacy filter which removes the PII and reswizzles the table (in addition to other operations). Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.
Identity resolution of PII provides supplemental match metadata for additional insight into customer data that can provide powerful signals for making decisions based on RampIDs.
For PII resolution, the output table includes the fields shown in the table below.
Column | Example | Description |
---|---|---|
| XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd | Returns the resolved RampID in your domain encoding. |
| Male | Any attribute columns passed through the service are returned. |
| 1 | Provides insight on the match cascade level associated with the identifiers. If no maintained RampID is found, this value will be "null". |
| name_phone | Returns the filter name where the match occurred, which will be one of the following options:
If no maintained RampID is found, this value will be "null". |
View the Email-Only Resolution Output Table
The email-only resolution process operates similarly to PII resolution. Any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.
For email-only resolution, the results end up in the output table in the same database, with the following fields (as shown below):
RampID
(resolved email data)attributes
(based on other data passed through the service).
Column | Example | Description |
---|---|---|
| XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd | Returns the resolved RampID in your domain encoding. |
| Male | Any attribute columns passed through the service are returned. |
View the Device Identifier Resolution Output Table
For device identifier resolution, the results end up in the output table in the same database, with the following fields (as shown below):
dEVICE_id
(original input)RampID
(converted values).
Column | Example | Description |
---|---|---|
| 93abc799-a0a5-40b5-80dd-d2ab61d4d072 | |
| XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd |
View the CID Resolution Output Table
For CID resolution, the results end up in the output table in the same database, with the following fields (as shown below):
cid_id
(original input)RampID
(converted values).
Column | Sample | Description |
---|---|---|
| 93abc799-a0a5-40b5-80dd-d2ab61d4d072 | Original identifier passed in. |
| XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd | Returns the resolved RampID in your domain encoding. |
Privacy Filter
To minimize the risk of re-identification (the ability to tie PII directly to a RampID), the service includes the following processes when resolving PII identifiers (PII resolution or email-only resolution):
Column Values: The process evaluates the combination of all the column values on a per row basis for unique values. If a particular combination of column values occurs 3 or fewer times, the rows containing those column values will not be matchable and will not be returned in the output table.
>5% of the table unmatchable: If, based on column value uniqueness, >5% of the file rows are unmatchable, the job will fail.
Number of Unique RampIDs: If fewer than 100 unique RampIDs would be returned, the job will fail.
Reswizzle full table: Upon completion, the full table will be reswizzled to return the rows
RampID | attribute_1 | attribute_2 | attribute_n
in a different order than what was submitted in the input table.