Skip to main content

Perform Identity Resolution in Snowflake

Abstract

LiveRamp's Identity Resolution application in Snowflake allows for the translation of various identifiers to RampIDs. This allows you to resolve personally-identifiable information (PII) or device identifiers to a persistent pseudonymous identifier for persons and households. You can also input an individual-based RampID and get back any household-based RampID that might be associated with that individual.

LiveRamp's Identity Resolution application in Snowflake allows for the translation of various identifiers to RampIDs. This allows you to resolve personally-identifiable information (PII) or device identifiers to a persistent pseudonymous identifier for persons and households. You can also input an individual-based RampID and get back any household-based RampID that might be associated with that individual.

Note

For more information about RampID, see "RampID Methodology".

Once you've translated your data to RampIDs, you can then share that data to your LiveRamp account for activation. For more information, see "Share Data from Snowflake to Your LiveRamp Account".

The following identifiers can be resolved:

  • Names

  • Postal addresses

  • Email addresses

  • Phone numbers

  • Cookies

  • MAIDs (mobile device IDs)

  • CTV IDs (Connected TV Device IDs)

  • CIDs (custom identifiers)

  • Person-based, maintained RampIDs (for resolution to household RampIDs)

These capabilities are available within Snowflake through a native app, which creates a share to your account, opening up a view to query the reference data set from within your own Snowflake environment. See "LiveRamp Embedded Identity in Snowflake" for more information.

Overall Steps

After you've set up the identity resolution native app in Snowflake (see "Set Up a Native App" for instructions), perform the following steps to perform identity resolution:

  1. Verify that you've performed all the native app setup tasks in the completion checklist.

  2. Prepare and deploy the appropriate input and metadata tables for identity resolution.

    • To resolve PII identifiers (such as name, address, phone, and/or email), follow the instructions on input table creation in the “Input Table Columns for PII Resolution” section below.

    • To resolve SHA-256 hashed email addresses only, follow the instructions on input table creation in the “Input Table Columns for Email-Only Resolution” section below.

      Note

      For email-only data, this execution type may be more performant than the PII resolution process, depending on your warehouse set up.

    • To resolve device identifiers (such as mobile device IDs, CTV IDs, or cookies) or to resolve an individual RampID to a household RampID, follow the instructions on input table creation in the “Input Table Columns for Device Resolution” section below.

    • To resolve custom identifiers (CIDs), follow the instructions on input table creation in the “Input Table Columns for CID Resolution” section below.

  3. Perform the appropriate identity resolution process, depending on the identifiers being resolved:

  4. View the output table.

See the sections below for information on performing these tasks.

Completion Checklist for Native App Setup

Before performing an identity resolution operation for the first time, verify that you've completed the following tasks to set up an identity resolution native app in Snowflake:

Note

For instructions on completing these tasks, see "Set Up a Native App".

  • Accepted the terms and conditions of the application.

  • Installed the appropriate LiveRamp Snowflake native application.

  • Updated the firewall configuration for authentication calls.

  • Created and granted permissions to a role that gives the App Share database access to the appropriate tables.

  • Created logging and metrics tables, added them to a share, and shared that share back to LiveRamp.

After these tasks have been completed, you are ready to prepare the tables and perform the operation. See the sections below for more information.

Prepare the Tables for Identity Resolution

Performing identity resolution requires two tables:

  • A metadata table, indicated in the code as $customer_meta_table_name.

    Note

    A metadata table can be reused for multiple operations, but a separate metadata table must be prepared for each different identity resolution operation you want to perform. For example, if you’re going to perform identity resolution on both MAIDs and hashed emails, you’ll need a different metadata table for each operation.

  • An input table, indicated in the code as $customer_input_table_name.

    Note

    An input table needs to be prepared for each identity resolution operation and can only contain one type of identifiers.

To create the input and metadata tables:

  1. Click in the Create input and metadata table procedure.

    set customer_db_name = 'RESOLUTION_DEMO';
    set customer_schema_name=concat($customer_db_name,'.','public');
    set customer_input_table_name=concat($customer_schema_name,'.','RESOLUTION_INPUT_TABLE');
    set customer_meta_table_name=concat($customer_schema_name>'.','RESOLUTION_META_TABLE');
    set customer_metrics_table_name=concat($customer_schema_name,'.','METRICS_TABLE');
    set customer_logging_table_name=concat($customer_schema_name,'.','LOGGING_TABLE');
    set customer_output_table_name='RESOLUTION_OUTPUT_TABLE';
    
  2. Make any necessary changes to the variables:

    • RESOLUTION_DEMO: The name of your database.

    • public: The name of the schema that holds the tables for identity resolution.

    • RESOLUTION_INPUT_TABLE: The name of the input table to use for the operation.

    • RESOLUTION_META_TABLE: The name of the metadata table to use for the operation.

    • METRICS_TABLE: The name of the metrics table to use for the operation.

    • LOGGING_TABLE: The name of the logging table to use for the operation.

    • RESOLUTION_OUTPUT_TABLE: The name of the output table that will be created after the identity resolution operation has been run.

  3. Click Run.

You can create these tables inside Snowflake or import the tables into your database using Snowflake’s standard methods. The variables should be replaced with your own values. Make sure to reference the names correctly in the metadata table and make sure that the column names also match up correctly.

When creating tables, keep the following guidelines in mind:

  • Every column name must be unique in a table.

  • Try not to use additional columns in the input tables required for the identity resolution operation.

  • Having extra columns slows down processing.

  • Per Snowflake guidelines, table names cannot begin with a number.

Metadata Table Columns and Descriptions

A metadata table can be reused for multiple operations, but a separate metadata table must be prepared for each different identity resolution operation you want to perform. For example, if you’re going to perform identity resolution on both MAIDs and hashed emails, you’ll need a different metadata table for each operation.

Metadata column names must match those shown in the table below. The column names are not case sensitive, and should not be enclosed in single or double quotation marks.

See the table below for a list of the metadata table columns and descriptions:

Example

CLIENT_ID

liveramp_client

CLIENT_SECRET

84159be2-ab93-4bf8-24c9-2g123ef08815

EXECUTION_MODE

Resolution

Resolution is the only option.

EXECUTION_TYPE

PII

Options include: 

  • PII

  • Cookies

  • MAID

  • CTV

  • Email

  • CID name (provided to you by LiveRamp and specific to the mapping created)

  • HHLink (to resolve individual RampIDs into household RampIDs)

Note

Currently, each identifier type has to be separated into its own input table (including CID types) and only one option above can be chosen for each metadata table.

TARGET_COLUMN

{ “name”: [ “FIRSTNAME”,  “LASTNAME” ],  “streetAddress”: [ “ADDRESSLINE” ], “city”: ”CITY”,  “state”: “STATE”, “zipCode”: “ZIPCODE” }

Enter the column name for the input table column(s) which contains the IDs to be resolved.

For PII resolution, enter a string that maps each identifier type to the column names for that identifier, separated by commas and enclosed with curly brackets, as shown in the example. When including multiple column names per identifier type, enclose those column names in straight brackets.

Inputs for PII can include:

  • name: [First Name, Middle Name, Last Name]

  • streetAddress: [ADDRESSLINE1, ADDRESSLINE2]

  • zipCode: [zipcode]

  • city: [city]

  • state: [state]

  • phone: [phone]

  • email: [email]

Note

If resolving street addresses, zipcode is a required field along with either name or address.

LIMIT

5

For email or PII resolution only, you can specify the maximum number of RampIDs to be returned.

Enter an integer between 1 and 10 to specify the maximum number of RampID results returned per input identifier (to return only the “best match”, returning 1 RampID is sufficient).

For other identifiers (device identifiers or CIDs), do not include this column.

Input Table Columns and Descriptions

The column names for the input table can be whatever you want to use, as long as the column name(s) for the identifiers matches the values specified in the TARGET_COLUMN column of the metadata table. Do not use any column names that are the same as the columns names returned in the output table for the identity resolution operation you're going to run.

See the sections below for suggested input table columns and descriptions for each resolution type.

The output table is created by the operation that you run. For an example, see the sections in "View the Output Table" below.

Input Table Columns for PII Resolution

The PII resolution process operates similarly to device identifier and CID resolution, with a key difference in the data output: instead of returning the original identifier mapped to its associated RampID, data running through PII resolution will pass through a privacy filter which removes the PII and reswizzles the table. Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

Note

  • This offering includes running our Identity Graph on demand. Our testing indicates that a warehouse size of 4XL that is not being used for any other operations performs best for average workloads, but this is dependent on your individual setup. Contact your LiveRamp representative to understand the performance implications of your setup.

  • When resolving email data only, using the email-only resolution operation can provide higher throughput compared to full PII resolution. This is dependent on warehouse and setup, so talk with your LiveRamp team to determine the best approach for the use case.

These column names cannot be used in the input table for PII resolution:

  • RampID

  • Rank

  • Filter Name

See the table below for a list of the suggested input table columns and descriptions for PII resolution.

Suggested Column Name

Example

First Name

John

You can include separate First Name and Last Name columns or you can combine first name and last name in one column (such as “Name”).

Last Name

Doe

You can include separate First Name and Last Name columns or you can combine first name and last name in one column (such as “Name”).

Address 1

123 Main St

Address 2

Apt 1

You can include separate Address 1 and Address 2 columns or you can combine all street address information in one column (such as “Address”).

City

Smalltown

When matching on address, City is optional.

State

CA

  • When matching on address, State is optional.

  • If including State, must be a two-character, capitalized abbreviation ("CA", not "California" or "Ca").

ZIP

12345

  • Required when matching on addresses.

  • Can be in 5-digit format or 9-digit format (ZIP+4).

Email

john@email.com

  • Plaintext emails only.

  • Only one email per input row is permitted. Other emails must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates.

  • All emails must meet these requirements:

    • Have characters before and after the “@” sign

    • Contain a period character (“.”)

    • Have characters after the period character

  • Examples of valid emails include:

    • a@a.com

    • A@A.COM

    • email@account.com

    • EMAIL@ACCOUNT.COM

    • email@sub.domain.com

    • EMAIL@SUB.DOMAIN.COM

Phone

555-123-4567

  • Plain text phone numbers only.

  • Only one phone number per input row is permitted. Other phone numbers must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates.

  • All phone numbers must meet these requirements:

    • Can be more than 10 characters if leading numbers over 10 characters are “0” or “1”

    • If no leading numbers are used, must be 10 characters long

    • Can contain hyphens (“-”), parentheses (“(“ or “)”), plus signs (“+”), and periods (“.”)

  • Examples of valid phone numbers include:

    • 8668533267

    • 866.853.3267

    • (866) 853-3267

    • 8668533267

    • +1 (866) 853-3267

    • +18668533267

    • 18668533267

    • 1111111118668533267

    • 08668533267

  • Examples of invalid phone numbers include:

    • 987654321 (fewer than 10 characters)

    • 98765432109 (more than 10 characters)

    • 1234567890 (after removing the leading “1”, less than 10 characters remain)

    • 0987654321 (after removing the leading “0”, less than 10 characters remain)

Attribute 1

For PII resolution, you can include columns with attribute data. These columns will be returned in the output table (for more information, see the "View the PII Resolution Output Table" section below).

Input Table Columns for Email-Only Resolution

The email-only resolution process operates similarly to device identifier and CID resolution, with a key difference in the data output: instead of returning the original identifier mapped to its associated RampID, data running through email-only resolution will pass through a privacy filter which removes the PII and reswizzles the table. Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

Note

  • When resolving email data only, using email-only resolution can provide higher throughput compared to full PII resolution. This is dependent on warehouse and setup, so talk with your LiveRamp team to determine the best approach for the use case.

  • To perform identity resolution across additional PII touchpoints, see the “View the PII Resolution Output Table” section above.

See the table below for a list of the suggested input table columns and descriptions for email-only resolution.

Suggested Column Name

Example

Description

Hashed Email

a64bleicQ3MEYck2YtNUIyco1500QjJGLUszlUGlNjgtQjQ3QUEwMTNEMTA1CgNE

  • SHA-256 hashed emails only.

  • Email addresses should be uppercased and UTF-8 encoded prior to hashing.

  • After hashing, convert the resulting hash into lowercase hexadecimal representation.

Attribute 1

Male

For email address resolution, you can include columns with attribute data. These columns will be returned in the output table (for more information, see the "View the Email-Only Resolution Output Table" section below).

Input Table Columns for Device Resolution

The device resolution operation can be used for the following purposes:

  • To translate device identifiers (cookies, MAIDs, and CTV IDs) into individual RampIDs

  • To translate individual RampIDs into their associated household RampIDs

See the tables below for a list of the suggested input table columns and descriptions for these device resolution options.

Note

  • Each device resolution input table should contain only one identifier column (either a device identifier or a maintained RampID).

  • You can include columns with attribute data, but these columns will not be returned in the output table.

See the table below for a list of the suggested input table columns and descriptions for translating device identifiers.

Suggested Column Name

Example

Description

Device identifier

1f4d256c-1f08-41f6-a108-bbe511de9497

Can be one of the following identifiers:

  • Cookie

  • MAID

  • CTV ID

See the table below for a list of the suggested input table columns and descriptions for translating individual RampIDs into their associated household RampIDs.

Suggested Column Name

Example

Description

RampID

XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd

  • The RampID for translation to a Household RampID.

  • Must be a maintained RampID (to have an associated with a Household RampID).

Input Table Columns for CID Resolution

See the table below for a list of the suggested input table columns and descriptions for CID resolution.

Note

You can include columns with attribute data, but these columns will not be returned in the output table.

Suggested Column Name

Example

Description

CID

b916clarib la1;blNj10gtQjQ3QUEwMTNEMTcaktboEc0g9022cxoiaklr20185

The CID for translation to a RampID

Perform the Identity Resolution Operation

Once you’ve prepared the metadata table and the input table, you’re ready to perform the identity resolution operation.

You perform an identity resolution operation by running the identifier resolution procedure shown below, which includes checking that the output has succeeded. You can then view the output table to check the results.

The output tables vary somewhat, depending on the type of identfiers being resolved.

To perform the identity resolution operation:

  1. On your worksheet, switch to the native app database and LR_APP_SCHEMA under it.

  2. Locate the lr_resolution_and_trancoding procedure shown below and click on that code block.

    call lr_resolution_and_transcoding
    (
        $customer_input_table_name,
        $customer_meta_table_name,
        $output_table_name,
        $customer_logging_table_name,
        $customer_metrics_table_name
    );
  3. Click Run.

    The resolution operation runs to completion.

  4. // Check for output
    /*
    Check for output
    */
    -
    
    call check_for_output(
    $output_table_name
    );
    /*
    END SECTION
    */
  5. Click Run.

  6. Once Snowflake returns a status message of success, you can view the output table in the native app database under lr_app_schema. See the sections below for more information.

    Note

    If an error status message is returned, run the Check for output procedure again.

The results end up in the output table in the same database, with the fields shown in the appropriate section below.

View the Output Table

The identity resolution results end up in the output table in the same database you specified previously.

Once you've confirmed that the output table has been generated, see the appropriate section below for information on the output table format for the type of identity resolution operation that was run.

View the PII Resolution Output Table

The PII resolution process passes the input table through a privacy filter which removes the PII and reswizzles the table (in addition to other operations). Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

Identity resolution of PII provides supplemental match metadata for additional insight into customer data that can provide powerful signals for making decisions based on RampIDs.

For PII resolution, the output table includes the fields shown in the table below.

Column

Sample

Description

Ramp_ID

XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd

Returns the resolved RampID in your domain encoding.

Attribute 1

Male

Any attribute columns passed through the service are returned.

Match Metadata: Rank

1

Provides insight on the “best” RampID for any given input (the lower the number, the “better” the RampID).

If no maintained RampID is found, this value will be "null".

Match Metadata: Filter_Name

name_phone

Returns the filter name where the match occurred, which will be one of the following options:

  • name_address_zip

  • name_email

  • name_phone

  • partial_name_email

  • partial_name_phone

  • strict_name

  • email

  • phone

  • last_name_address

If no maintained RampID is found, this value will be "null".

View the Email-Only Resolution Output Table

The email-only resolution process operates similarly to device identifier and CID resolution, with a key difference in the data output: instead of returning the original identifier mapped to its associated RampID, data running through email-only resolution will pass through a privacy filter which removes the PII and reswizzles the table. Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

For email-only resolution, the results end up in the output table in the same database, with the following fields (as shown below):

  • RAMPID (resolved email data)

  • ATTRIBUTES (based on other data passed through the service).

Column

Sample

Description

Ramp_ID

XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd

Returns the resolved RampID in your domain encoding.

Attribute 1

Male

Any attribute columns passed through the service are returned.

View the Device Identifier Resolution Output Table

For device identifier resolution, the results end up in the output table in the same database, with the following fields (as shown below):

  • DEVICE_ID (original input)

  • RAMPID (converted values).

Device_ID

93abc799-a0a5-40b5-80dd-d2ab61d4d072

Ramp_ID

XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd

View the CID Resolution Output Table

For CID resolution, the results end up in the output table in the same database, with the following fields (as shown below):

  • CID_ID (original input)

  • RAMPID (converted values).

Column

Sample

Description

CID_ID

93abc799-a0a5-40b5-80dd-d2ab61d4d072

Original identifier passed in.

Ramp_ID

XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd

Returns the resolved RampID in your domain encoding.

Privacy Filter

To minimize the risk of re-identification (the ability to tie PII directly to a RampID), the service includes the following processes when resolving PII identifiers (PII resolution or email-only resolution):

  • Column Values: The process evaluates the combination of all the column values on a per row basis for unique values. If a particular combination of column values occurs 3 or fewer times, the rows containing those column values will not be matchable and will not be returned in the output table.

  • >5% of the table unmatchable: If, based on column value uniqueness, >5% of the file rows are unmatchable, the job will fail.

  • Number of Unique RampIDs: If fewer than 100 unique RampIDs would be returned, the job will fail.

  • Reswizzle full table: Upon completion, the full table will be reswizzled to return the rows RampID | attribute 1 | attribute 2 | attribute n in a different order than what was submitted in the input table.