Skip to main content

Perform RampID Identity Resolution in Databricks

LiveRamp's RampID Identity Resolution capability in Databricks allows for the translation of various identifiers to RampIDs. This allows you to resolve personally-identifiable information (PII) to a persistent pseudonymous person-based identifier. 

Note

  • For more information about RampIDs, see "RampID Methodology".

  • This workflow is in limited availability and is by invitation only.

Once you've translated your data to RampIDs, you can then share that data to your LiveRamp account for activation.

The following identifiers can be resolved:

  • Names

  • Postal addresses

  • Email addresses

  • Phone numbers

For PII touchpoints, you can choose to receive from 1 to 10 RampIDs (if available).

These capabilities are available within Databricks through a Delta Share interface, which creates a share of customer data to our account for processing.

Performing identity resolution with LiveRamp Identity in Databricks requires the creation of two tables: A metadata table and an input table.

Overall Steps

Perform the following steps to perform RampID identity resolution in Databricks:

See the sections below for information on performing these tasks.

Create the Input Table

An input table needs to be prepared for each identity resolution operation.

When creating tables, keep the following guidelines in mind:

  • The column names for the input table can be whatever you want to use, as long as the names match the values specified in the metadata table and match Databricks column name syntax without special characters (for more information, see Databricks's documentation). However, for any attribute columns that will be returned in the output table, the column name can only contain letters, numbers, and underscores if the output table will be used for activation in LiveRamp Connect.

  • Do not use any column names that are the same as the column names returned in the output table for the identity resolution operation you're going to run.

  • Every column name must be unique in a table.

See the section below for suggested input table columns and descriptions.

An output table is created when you run the operation. For an example, see the "View the Output Table" section below.

Input Table Columns

The standard PII resolution process passes the data through a privacy filter which removes the PII and reswizzles the table. Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

These column names cannot be used in the input table for PII resolution:

  • RampID

  • __lr_rank

  • __lr_filter_ name

See the table below for a list of the suggested input table columns and descriptions for PII resolution.

Suggested Column Name

Example

Description

first_name

John

You can include separate first name and last name columns or you can combine first name and last name in one column (such as “name”).

last_name

Doe

You can include separate first name and last name columns or you can combine first name and last name in one column (such as “name”).

address_1

123 Main St

You can include separate address 1 and address 2 columns or you can combine all street address information in one column (such as “address”).

address_2

Apt 1

  • Include values in this column if you have additional street address info for a given row.

  • You can include separate address 1 and address 2 columns or you can combine all street address information in one column (such as “address”).

city

Smalltown

When matching on address, city is optional.

state

CA

  • When matching on address, state is optional.

  • If including State, must be a two-character, capitalized abbreviation ("CA", not "California" or "Ca").

zip

12345

  • Required when matching on addresses.

  • Can be in 5-digit format or 9-digit format (ZIP+4).

email

john@email.com

  • Plaintext emails only.

  • Only one plaintext email per input row is permitted. Other emails must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates.

  • All emails must meet these requirements:

    • Have characters before and after the "@" sign

    • Contain a period character (".")

    • Have characters after the period character

  • Examples of valid emails include:

    • a@a.com

    • A@A.COM

    • email@account.com

    • EMAIL@ACCOUNT.COM

    • email@sub.domain.com

    • EMAIL@SUB.DOMAIN.COM

phone

555-123-4567

  • Plaintext phone numbers only.

  • Only one phone number per input row is permitted. Other phone numbers must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates.

  • All phone numbers must meet these requirements:

    • Can be more than 10 characters if leading numbers over 10 characters are “0” or “1”

    • If no leading numbers are used, must be 10 characters long

    • Can contain hyphens ("-"), parentheses ("(" or ")"), plus signs ("+"), and periods (".")

  • Examples of valid phone numbers include:

    • 8668533267

    • 866.853.3267

    • (866) 853-3267

    • 8668533267

    • +1 (866) 853-3267

    • +18668533267

    • 18668533267

    • 1111111118668533267

    • 08668533267

  • Examples of invalid phone numbers include:

    • 987654321 (fewer than 10 characters)

    • 98765432109 (more than 10 characters)

    • 1234567890 (after removing the leading "1", less than 10 characters remain)

    • 0987654321 (after removing the leading "0", less than 10 characters remain)

attribute_1

For PII resolution, you can include columns with attribute data. These columns will be returned in the output table (for more information, see the "View the Output Table" section below).

Create the Metadata Table

A metadata table can be reused for multiple operations.

To create the metadata table for PII resolution, update the following variables in the sample SQL shown below and then run the SQL:

  • <client_id>: Enter either an existing client ID or a new one provided in implementation.

  • <client_secret>: Enter the password/secret for the client ID.

  • <up to 4 name column names>: Enter the names of the columns in the input table to be used for the “name” element. Each input table column name should be enclosed in double-quotes. Enter a maximum of 4 name columns. If entering multiple column names, separate the column names with commas.

  • <up to 7 address column names>: Enter the names of the columns in the input table to be used for the "address" element. Each input table column name should be enclosed in double-quotes. Enter a maximum of 7 address columns. If entering multiple column names, separate the column names with commas.

  • <city column>: Enter the name of the column to be used for the "city" element.

  • <state column>: Enter the name of the column to be used for the "state" element.

  • <zipcode column>: Enter the name of the column to be used for the "zipcode" element.

  • <phone column>: Enter the name of the column to be used for the "phone" element.

  • <email column>: Enter the name of the column to be used for the "email" element.

Note

You can specify the number of RampIDs to be returned for each set of PII, to a maximum of 10 RampIDs per PII set. By default, 1 RampID will be returned per PII set. To change this, replace the “1” in the “as limit” line at the bottom of the SQL sample shown below with your desired number.

The metadata table configures how LiveRamp processes your input. Create exactly one row in the metadata table for the job (it does not accept multiple job configurations).

CREATE TABLE <catalog>.<schema>.pii_meta_table (
    client_id       STRING,
    client_secret   STRING,
    execution_mode  STRING,
    execution_type  STRING,
    target_columns  STRING,
    limit           INT
);

INSERT INTO <catalog>.<schema>.pii_meta_table VALUES (
    '<your_client_id>' as client_id,
    '<your_client_secret>' as client_secret,
    'resolution' as execution_mode,
    'PII' as execution_type,
    '{"name":["FIRSTNAME","LASTNAME"],"streetAddress":["ADDRESSLINE","ADDRESSLINE2"],
      "city":"CITY","state":"STATE","zipCode":"ZIPCODE","phone":"PHONE","email":"EMAIL"}' as target_columns,
    1 as limit
);

Set the Target Columns for the Metadata Table

The target_columns field maps your table's column names to LiveRamp's expected PII fields. Provide only the fields present in your table.

'{
  "name": ["<first_name_col>", "<last_name_col>"],   // up to 4 columns
  "streetAddress": ["<address_line1_col>", "<address_line2_col>"],  // up to 10 columns
  "city": "<city_col>",
  "state": "<state_col>",
  "zipCode": "<zip_col>",
  "phone": "<phone_col>",
  "email": "<email_col>"
}'

For example, if your columns are FIRST, LAST, ADDR, ADDR2, CITY, ST, ZIP, PHONE_NUM, EMAIL_ADDR, run the following SQL:

'{
  "name": ["FIRST", "LAST"],
  "streetAddress": ["ADDR", "ADDR2"],
  "city": "CITY",
  "state": "ST",
  "zipCode": "ZIP",
  "phone": "PHONE_NUM",
  "email": "EMAIL_ADDR"
}'

Share Tables with LiveRamp via Delta Sharing

Add your input table and metadata table to a Delta Share and grant access to LiveRamp's Databricks account.

Create a Share

To create a share, update the following SQL with your actual values for share_name, catalog_name, schema_name, pii_input_table, and pii_meta_table and then run the SQL:

DECLARE OR REPLACE VARIABLE share_name STRING DEFAULT 'testing_share';
DECLARE OR REPLACE VARIABLE catalog_name STRING DEFAULT 'testing_catalog';
DECLARE OR REPLACE VARIABLE schema_name STRING DEFAULT 'input_schema';
DECLARE OR REPLACE VARIABLE pii_input_table STRING;
DECLARE OR REPLACE VARIABLE pii_meta_table STRING;


SET VAR pii_input_table = catalog_name || '.' || schema_name || '.pii_input_table_synthetic';
SET VAR pii_meta_table = catalog_name || '.' || schema_name || '.pii_meta_table_synthetic';


EXECUTE IMMEDIATE 'CREATE SHARE ' || share_name;


EXECUTE IMMEDIATE 'ALTER SHARE ' || share_name || ' ADD TABLE ' || pii_input_table;
EXECUTE IMMEDIATE 'ALTER SHARE ' || share_name || ' ADD TABLE ' || pii_meta_table;

Create a Recipient

To create a recipient, update and run the following SQL, depending on your situation:

Note

We recommend that you use "liveramp" as the recipient name.

  • If using the Databricks-to-Databricks protocol, provide LiveRamp's sharing identifier. No token management is required with this method.

    CREATE RECIPIENT <recipient_name>
      USING ID 'aws:us-east-2:02706dcc-9257-402b-ad70-490d47891319';
  • If using open sharing (bearer token), generate credentials and send the activation link to your LiveRamp contact.

    CREATE RECIPIENT <recipient_name>;

Grant Access

To grant access, update and run the following SQL:

GRANT SELECT ON SHARE <share_name> TO RECIPIENT <recipient_name>;

Notify LiveRamp

Once you’ve performed the steps above, send your LiveRamp contact the following information:

  • The share name

  • The fully qualified table names for the input table and metadata table

  • The name you'd like to use for the output table

  • Your delta share identifier so LiveRamp can confirm that the share came from you.

View the Output Table

The identity resolution results will be shared back to the specified recipient as an output table.

Each output table is retained for a maximum of 14 days or until you run another operation (which overwrites any existing output table).

To use the output table in your Databricks account, you’ll need to copy it before you run another operation or within 14 days (whichever comes first).

The PII resolution process passes the input table through a privacy filter which removes the PII and reswizzles the table (in addition to other operations). Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

Identity resolution of PII provides supplemental match metadata for additional insight into customer data that can provide powerful signals for making decisions based on RampIDs.

For PII resolution, the output table includes the fields shown in the table below.

Column

Example

Description

RampID

XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd

Returns the resolved RampID in your domain encoding.

attribute_1

Male

Any attribute columns passed through the service are returned.

__lr_rank

1

Provides insight on the match cascade level associated with the identifiers.

If no maintained RampID is found, this value will be "null".

__lr_filter_name

name_phone

Returns the filter name where the match occurred, which will be one of the following options:

  • name_address_zip

  • name_email

  • name_phone

  • partial_name_email

  • partial_name_phone

  • strict_name (name + zip)

  • email

  • phone

  • last_name_address

If no maintained RampID is found, this value will be "null".

Privacy Filter

To minimize the risk of re-identification (the ability to tie PII directly to a RampID), the service includes the following processes when resolving PII identifiers (PII resolution or email-only resolution):

  • Column Values: The process evaluates each column value on a per-row basis for unique values. If any attribute occurs 3 or fewer times, the rows containing those column values will not be matchable and will not be returned in the output table.

  • >5% of the table unmatchable: If based on column value uniqueness, >5% of the file rows are unmatchable, the job will fail.

  • Number of Unique RampIDs: If fewer than 100 unique RampIDs would be returned, the job will fail.

  • Reswizzle full table: Upon completion, the full table will be reswizzled to return the rows RampID | attribute_1 | attribute_2 | attribute_n in a different order than what was submitted in the input table.