Skip to main content

Perform Identity Resolution in Snowflake (Non-US Data)

Abstract

LiveRamp's Identity Resolution capability in Snowflake allows for the translation of various identifiers to RampIDs. This allows you to resolve personally-identifiable information (PII) or device identifiers to a persistent pseudonymous identifier for persons and households. You can also input an individual-based RampID and get back any household-based RampID that might be associated with that individual.

LiveRamp's Identity Resolution capability in Snowflake allows for the translation of various identifiers to RampIDs. This allows you to resolve personally-identifiable information (PII) or device identifiers to a persistent pseudonymous identifier for persons and households. You can also input an individual-based RampID and get back any household-based RampID that might be associated with that individual.

Note

The following identifiers can be resolved:

  • Names

  • Postal codes

  • Email addresses

  • Phone numbers

These capabilities are available within Snowflake through a native app, which creates a share to your account, opening up a view to query the reference data set from within your own Snowflake environment. See "LiveRamp Embedded Identity in Snowflake" for more information.

Performing identity resolution with the LiveRamp Identity native app requires the creation of two tables:

  • A metadata table, indicated in the sample SQL as customer_meta_table_name.

  • An input table, indicated in the sample SQL as customer_input_table_name.

Overall Steps

After you've set up the LiveRamp Identity native app in Snowflake (see "Set Up the LiveRamp Native App in Snowflake" for instructions), perform the following steps to perform identity resolution:

  1. Create the input table(s) for the appropriate identity resolution operation.

    Note

    An input table needs to be prepared for each identity resolution operation and can only contain one type of identifier.

  2. Specify the variables to be used in the calls.

  3. Create the metadata table for the appropriate identity resolution operation.

    Note

    A metadata table can be reused for multiple operations, but a separate metadata table must be prepared for each different job type you want to perform. For example, if you’re going to perform identity resolution on PII and on hashed emails, you’ll need a different metadata table for each operation.

  4. Set up permissions for the tables to be used for identity resolution.

  5. Perform the appropriate identity resolution process, depending on the identifiers being resolved.

  6. View the output table.

See the sections below for information on performing these tasks.

Note

The LiveRamp Identity native app is parameterized and relies on variables set by the user with the sample SQL in the Execution worksheet. When executing an identity resolution operation, those variables must be set during each active session.

Create the Input Table for Identity Resolution

An input table needs to be prepared for each identity resolution operation.

When creating tables, keep the following guidelines in mind:

  • The column names for the input table can be whatever you want to use, as long as the names match the values specified in the metadata table.

  • Do not use any column names that are the same as the column names returned in the output table for the identity resolution operation you're going to run.

  • Every column name must be unique in a table.

  • Try not to use additional columns in the input tables required for the identity resolution operation as having extra columns slows down processing.

  • Per Snowflake guidelines, table names cannot begin with a number.

See the sections below for suggested input table columns and descriptions for each resolution type.

The output table is created by the operation that you run. For an example, see the sections in "View the Output Table" below.

Input Table Columns for PII Resolution

The standard PII resolution process passes the data through a privacy filter which removes the PII and reswizzles the table. Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

Note

Utilizing hashed attributes requires a LiveRamp Data Ethics review and an attestation. We will also work with your team to confirm the separation of known and pseudonymous data prior to enabling permissions.

These column names cannot be used in the input table for PII resolution:

  • RampID

  • __lr_rank

  • __lr_filter_ name

See the table below for a list of the suggested input table columns and descriptions for PII resolution.

Suggested Column Name

Example

Description

first_name

John

You can include separate First Name and Last Name columns or you can combine first name and last name in one column (such as "Name").

last_name

Doe

You can include separate First Name and Last Name columns or you can combine first name and last name in one column (such as "Name").

zip 

110001

Do not include special characters or dashes.

email

john@email.com

  • Plaintext emails only.

  • Only one email per input row is permitted. Other emails must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates.

  • All emails must meet these requirements:

    • Have characters before and after the "@" sign

    • Contain a period character (".")

    • Have characters after the period character

  • Examples of valid emails include:

    • a@a.com

    • A@A.COM

    • email@account.com

    • EMAIL@ACCOUNT.COM

    • email@sub.domain.com

    • EMAIL@SUB.DOMAIN.COM

phone

+442012345678

  • Plaintext phone numbers only.

  • Only one phone number per input row is permitted. Other phone numbers must be dropped or included in an additional row.

  • Follow the ITU-T E.164 format for phone numbers shown below:

    • The structure should be “+ [Country Code][Area Code][Subscriber Number]”

    • Maximum length is 15 digits.

    • The number should not contain spaces, parentheses, or dashes. It should only include the plus sign, country code, area code, and subscriber number.

    • The plus sign is used as a prefix to indicate an international number and replaces the international call prefix.

    • Country code (CC) is the 1 to 3 digit code assigned to each country.

    • The National Destination Code (NDC or “area code”) identifies a specific area or region within the country .

    • The subscriber number (SN) is the individual's unique phone number. 

attribute_1

  • For PII resolution, you can include columns with attribute data. These columns will be returned in the output table (for more information, see the "View the PII Resolution Output Table" section below).

  • If you specify that an attribute column should be hashed, it will appear in the output table with a prefix of "hashed_". The input table must not include a column with the same name as the name of the hashed column in the output table.

Input Table Columns for Email-Only Resolution

The standard email-only resolution process operates similarly to PII resolution. Any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

Note

To perform identity resolution across additional PII touchpoints, see the “View the PII Resolution Output Table” section above.

See the table below for a list of the suggested input table columns and descriptions for email-only resolution.

Suggested Column Name

Example

Description

hashed_email

8c9775a5999b5f0088008c0b26d7fe8549d5c80b0047784996a26946abac0cef

  • SHA-256, MD5, and SHA-1 hashed emails accepted.

  • Email addresses should be lowercased and UTF-8 encoded prior to hashing.

  • After hashing, convert the resulting hash into lowercase hexadecimal representation.

  • For an example of hashing in Snowflake, see the "Snowflake Hashing Example" section below.

attribute_1

Male

For email address resolution, you can include columns with attribute data. These columns will be returned in the output table (for more information, see the "View the Email-Only Resolution Output Table" section below).

Snowflake Hashing Example

The following code snippet shows an example of hashing emails in Snowflake:

-- SHA-1
SELECT SHA1(TRIM(LOWER('  LiveRamp@example.com  '))); 
-- Expected result: 91ac4ee2ca1782581f12d865a6779eb179f8b22a

-- MD5
SELECT MD5(TRIM(LOWER('  LiveRamp@example.com  ')));
-- Expected result: 39c324aa0c7a3ee896884fe0cf086f0c

-- SHA256
SELECT SHA2(TRIM(LOWER('  LiveRamp@example.com  '))); 
-- Expected result: 28324c709525ec8eda8aac51dfb36730262bc3051402250131c4c81fa453df8c

Specify the Variables

To specify the variables to be used for the operation:

  1. Open the Execution Steps worksheet with the sample SQL for execution.

  2. Update the following variables in the sample SQL that is shown below and then run the SQL:

    • DATABASE: The name of your database.

    • PUBLIC: The name of the schema that holds the tables.

    • INPUT_TABLE: The name of the input table(s) to use for the operation.

    • META_TABLE: The name of the metadata table to use for the operation.

    • OUTPUT_TABLE: The name of the output table that will be created after the operation has been run.

    • IDENTITY_RESOLUTION_AND_TRANSCODING: The name of the database the native app is loaded to.

--Update this section with the appropriate variables
set customer_db_name = 'DATABASE';
set customer_schema_name = concat($customer_db_name, '.', 'PUBLIC');
set customer_input_table_name = concat($customer_schema_name, '.', 'INPUT_TABLE');
-- If there are multiple input tables add additional variables
set customer_input_table_name_2 = concat($customer_schema_name, '.', 'INPUT_TABLE_2');
set customer_meta_table_name = concat($customer_schema_name, '.', 'META_TABLE');
set output_table_name = 'OUTPUT_TABLE';

-- Name of the installed application
set application_name = 'IDENTITY_RESOLUTION_AND_TRANSCODING';

Create the Metadata Table

A metadata table can be reused for multiple operations, but a separate metadata table must be prepared for each different identity resolution operation you want to perform. For example, if you’re going to perform identity resolution on both PII and on hashed emails, you’ll need a different metadata table for each operation.

See the sections below for instructions on creating the metadata table.

Create the Metadata Table for PII Resolution

To create the metadata table for PII resolution:

  • Update the following variables in the sample SQL from the Execution worksheet shown below and then run the SQL:

    • <client_id>: Enter either an existing client ID or a new one provided in implementation.

    • <client_secret>: Enter the password/secret for the client ID.

    • <up to 4 name column names>: Enter the names of the columns in the input table to be used for the “name” element. Each input table column name should be enclosed in double-quotes. Enter a maximum of 4 name columns. If entering multiple column names, separate the column names with commas.

    • <zip column>: Enter the name of the column to be used for the "zip" element.

    • <phone column>: Enter the name of the column to be used for the "phone" element.

    • <email column>: Enter the name of the column to be used for the "email" element.

    • up to 10 hashed attribute column names: Enter the names of the attribute column(s) in the input table that should be passed through to the output in hashed format. Each input table column name should be enclosed in double quotes. Enter a maximum of 10 hashed attribute columns. If entering multiple column names, separate the column names with commas.

    • 'config': 'derived': This parameter is required to run a PII-based job on non-US data.

-- FORFor PII Update the parameters here for the metadata table
-- Not all PII types need to be present, remove unused entries from the target_columns JSON
create or replace table identifier($customer_meta_table_name) as
select
    TO_VARCHAR(DECRYPT(ENCRYPT('<client_id>', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_id,
    TO_VARCHAR(DECRYPT(ENCRYPT('<client_secret>', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_secret,
    'resolution' as execution_mode,
    'pii' as execution_type,
    parse_json($$
    {
      "name": ["<up to 4 name column names>"],
      "zipCode": "<zip column>",
      "phone": "<phone column>",
      "email": "<email column>",
      "hashedAttributes": ["<up to 10 hashed attribute column names>"]
    }
    $$) as target_columns,
    1 as limit;

     'config': {'derived':};

The populated SQL with the suggested input column names might look like the example shown below:

-- FOR PII Update the parameters here for the metadata table
-- Not all PII types need to be present, remove unused entries from the target_columns JSON
create or replace table identifier($customer_meta_table_name) as
select
    TO_VARCHAR(DECRYPT(ENCRYPT('liveramp_client', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_id,
    TO_VARCHAR(DECRYPT(ENCRYPT('84159be2-ab93-4bf8-24c9-2g123ef08815', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_secret,
    'resolution' as execution_mode,
    'pii' as execution_type,
    parse_json($$
    {
      "name": ["first_name",”"last_name”"],
      "zipCode": "zip",
      "phone": "phone",
      "email": "email",
      "hashedAttributes": ["cid"]
    }
    $$) as target_columns,
    1 as limit;

     'config': {'derived':};

Create the Metadata Table for Email Resolution

To create the metadata table for email resolution:

  • Update the following variables in the sample SQL from the Execution worksheet shown below and then run the SQL:

    • <client_id>: Enter either an existing client ID or a new one provided in implementation.

    • <client_secret>: Enter the password/secret for the client ID.

    • <column to be resolved>: Enter the name of the column containing the email addresses to be resolved.

    • 'config': 'derived': This parameter is required to run an email-based job on non-US data.

-- email
-- FOR EMAIL Update the parameters here for the metadata table
create or replace table identifier($customer_meta_table_name) as
select
    TO_VARCHAR(DECRYPT(ENCRYPT('<client_id>', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_id,
    TO_VARCHAR(DECRYPT(ENCRYPT('<client_secret>', 'HideFromLogs'), 'HideFromLogs'), 'utf-8') as client_secret,
    'resolution' as execution_mode,
    'email' as execution_type,
    '<column to be resolved>' as target_column,
    1 as limit;

    'config': {'derived':};

Set Up Permissions

To set up the permissions for the tables used for translation, run the SQL in the Execution Steps worksheet shown below:

Note

This SQL utilizes the variables that were set up in the “Specify the Variables” section above.

--The remainder of the commands should be run for ALL JOB TYPES, please switch to the Native App database and schema and execute the procedure.  Once completed, please run check_for_output for the output table to be written in the appropriate Job Schema

grant usage on database identifier ($customer_db_name) to application identifier($application_name);
grant usage on schema identifier ($customer_schema_name) to application identifier($application_name);
grant select on table identifier ($customer_input_table_name) to application identifier($application_name);
--If there are multiple input tables grant permission for all of them here
grant select on table identifier ($customer_input_table_name_2) to application identifier($application_name)
grant select on table identifier ($customer_meta_table_name) to application identifier($application_name);


use database identifier ($application_name);
use schema lr_app_schema;

Perform the Identity Resolution Operation

Once you’ve completed the previous steps, you’re ready to perform the identity resolution operation.

You perform an identity resolution operation by running the identifier resolution procedure shown below. You can then view the output table to check the results.

The output tables vary somewhat, depending on the type of identifiers being resolved.

To perform the identity resolution operation:

  • Locate the lr_resolution_and_trancoding procedure shown below and run that SQL:

    call lr_resolution_and_transcoding(
    	$customer_input_table_name,
    	$customer_meta_table_name,
    	$output_table_name,
    );

    The operation runs to completion.

Once the app returns a success message, the output should be displayed in the native app database under lr_app_schema.

If Snowflake returns a status message of Error, check the error message for any information to help you fix the issue and then try running the operation again. For some issues, the error message will direct you to contact LiveRamp Support. For more information, see "Snowflake Operation Error Codes".

The results end up in the output table in the same database, with the fields shown in the appropriate section below.

View the Output Table

The identity resolution results end up in the output table in the application database under the schema “lr_job_schema”.

Once you've confirmed that the output table has been generated, see the appropriate section below for information on the output table format for the type of identity resolution operation that was run.

If for any reason you need to drop the output table, update the parameters in the following command and run:

call DROP_OUTPUT_TABLE(
    '<table_name>'
);

View the PII Resolution Output Table

The PII resolution process passes the input table through a privacy filter which removes the PII and reswizzles the table (in addition to other operations). Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

For PII resolution, the output table includes the fields shown in the table below.

Column

Example

Description

RampID

Xi1005p_iYcKP7ZlvFwwK9EwR8GKl_VJqIWUhEaAFmHLAjNOQ9b6OQzSkA43XiVFcTYQ9X

Returns the derived RampID in your domain encoding.

attribute_1

Male

Any attribute columns passed through the service are returned.

hashed_cid 

63889cfb9d3cbe05d1bd2be5cc9953fd

Any hashed attribute columns passed through the service are returned with their values MD5 hashed.

__lr_rank

null

For non-US data workflows, this will always return “null” for the derived configuration. For US data workflows, this field provides insight on the match cascade level associated with the identifiers.

__lr_filter_name

name_phone

Returns the filter name where the match occurred, which will be one of the following options:

  • name_email

  • name_phone

  • strict_name (name + zip)

  • email

  • phone

View the Email-Only Resolution Output Table

The email-only resolution process operates similarly to PII resolution. Any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

For email-only resolution without deconfliction, the output table includes the fields shown in the table below.

Column

Example

Description

RampID

Xi1005p_iYcKP7ZlvFwwK9EwR8GKl_VJqIWUhEaAFmHLAjNOQ9b6OQzSkA43XiVFcTYQ9X

Returns the resolved RampID in your domain encoding.

attribute_1

Male

Any attribute columns passed through the service are returned.

Privacy Filter

To minimize the risk of re-identification (the ability to tie PII directly to a RampID), the service includes the following processes when resolving PII identifiers (PII resolution or email-only resolution):

  • Column Values: The process evaluates each column value on a per-row basis for unique values. If any attribute occurs 3 or fewer times, the rows containing those column values will not be matchable and will not be returned in the output table.

    Note

    This check does not apply to hashed attributes.

  • >5% of the table unmatchable: If based on column value uniqueness, >5% of the file rows are unmatchable, the job will fail.

  • Number of Unique RampIDs: If fewer than 100 unique RampIDs would be returned, the job will fail.

  • Reswizzle full table: Upon completion, the full table will be reswizzled to return the rows RampID | attribute_1 | attribute_2 | attribute_n in a different order than what was submitted in the input table.