Perform RampID Identity Resolution in Databricks

LiveRamp's RampID Identity Resolution capability in Databricks allows for the translation of various identifiers to RampIDs. This allows you to resolve personally-identifiable information (PII) to a persistent pseudonymous person-based identifier.

Note

For more information about RampIDs, see "RampID Methodology".
This workflow is in limited availability and is by invitation only.

Once you've translated your data to RampIDs, you can then share that data to your LiveRamp account for activation.

The following identifiers can be resolved:

Names
Postal addresses
Email addresses
Phone numbers

For PII touchpoints, you can choose to receive from 1 to 10 RampIDs (if available).

These capabilities are available within Databricks through a Delta Share interface, which creates a share of customer data to our account for processing.

Performing identity resolution with LiveRamp Identity in Databricks requires the creation of two tables: A metadata table and an input table.

Overall Steps

Perform the following steps to perform RampID identity resolution in Databricks:

Create the input table(s).
Create the metadata table.
Note
A metadata table can be reused for multiple operations if the target columns remain the same.
Add both tables to a Delta Share and grant LiveRamp access.
LiveRamp runs resolution and writes the results to an output table that is shared back with you.
View the output table.

See the sections below for information on performing these tasks.

Create the Input Table

An input table needs to be prepared for each identity resolution operation.

When creating tables, keep the following guidelines in mind:

The column names for the input table can be whatever you want to use, as long as the names match the values specified in the metadata table and match Databricks column name syntax without special characters (for more information, see Databricks's documentation). However, for any attribute columns that will be returned in the output table, the column name can only contain letters, numbers, and underscores if the output table will be used for activation in LiveRamp Connect.
Do not use any column names that are the same as the column names returned in the output table for the identity resolution operation you're going to run.
Every column name must be unique in a table.

See the section below for suggested input table columns and descriptions.

An output table is created when you run the operation. For an example, see the "View the Output Table" section below.

Input Table Columns

The standard PII resolution process passes the data through a privacy filter which removes the PII and reswizzles the table. Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

These column names cannot be used in the input table for PII resolution:

RampID
__lr_rank
__lr_filter_ name

See the table below for a list of the suggested input table columns and descriptions for PII resolution.

Suggested Column Name	Example	Description
`first_name`	John	You can include separate first name and last name columns or you can combine first name and last name in one column (such as “name”).
`last_name`	Doe	You can include separate first name and last name columns or you can combine first name and last name in one column (such as “name”).
`address_1`	123 Main St	You can include separate address 1 and address 2 columns or you can combine all street address information in one column (such as “address”).
`address_2`	Apt 1	Include values in this column if you have additional street address info for a given row. You can include separate address 1 and address 2 columns or you can combine all street address information in one column (such as “address”).
`city`	Smalltown	When matching on address, `city` is optional.
`state`	CA	When matching on address, `state` is optional. If including State, must be a two-character, capitalized abbreviation ("CA", not "California" or "Ca").
`zip`	12345	Required when matching on addresses. Can be in 5-digit format or 9-digit format (ZIP+4).
`email`	john@email.com	Plaintext emails only. Only one plaintext email per input row is permitted. Other emails must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates. All emails must meet these requirements: Have characters before and after the "@" sign Contain a period character (".") Have characters after the period character Examples of valid emails include: a@a.com A@A.COM email@account.com EMAIL@ACCOUNT.COM email@sub.domain.com EMAIL@SUB.DOMAIN.COM
`phone`	555-123-4567	Plaintext phone numbers only. Only one phone number per input row is permitted. Other phone numbers must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates. All phone numbers must meet these requirements: Can be more than 10 characters if leading numbers over 10 characters are “0” or “1” If no leading numbers are used, must be 10 characters long Can contain hyphens ("-"), parentheses ("(" or ")"), plus signs ("+"), and periods (".") Examples of valid phone numbers include: 8668533267 866.853.3267 (866) 853-3267 8668533267 +1 (866) 853-3267 +18668533267 18668533267 1111111118668533267 08668533267 Examples of invalid phone numbers include: 987654321 (fewer than 10 characters) 98765432109 (more than 10 characters) 1234567890 (after removing the leading "1", less than 10 characters remain) 0987654321 (after removing the leading "0", less than 10 characters remain)
`attribute_1`		For PII resolution, you can include columns with attribute data. These columns will be returned in the output table (for more information, see the "View the Output Table" section below).

Create the Metadata Table

A metadata table can be reused for multiple operations.

To create the metadata table for PII resolution, update the following variables in the sample SQL shown below and then run the SQL:

<client_id>: Enter either an existing client ID or a new one provided in implementation.
<client_secret>: Enter the password/secret for the client ID.
<up to 4 name column names>: Enter the names of the columns in the input table to be used for the “name” element. Each input table column name should be enclosed in double-quotes. Enter a maximum of 4 name columns. If entering multiple column names, separate the column names with commas.
<up to 7 address column names>: Enter the names of the columns in the input table to be used for the "address" element. Each input table column name should be enclosed in double-quotes. Enter a maximum of 7 address columns. If entering multiple column names, separate the column names with commas.
<city column>: Enter the name of the column to be used for the "city" element.
<state column>: Enter the name of the column to be used for the "state" element.
<zipcode column>: Enter the name of the column to be used for the "zipcode" element.
<phone column>: Enter the name of the column to be used for the "phone" element.
<email column>: Enter the name of the column to be used for the "email" element.

Note

You can specify the number of RampIDs to be returned for each set of PII, to a maximum of 10 RampIDs per PII set. By default, 1 RampID will be returned per PII set. To change this, replace the “1” in the “as limit” line at the bottom of the SQL sample shown below with your desired number.

The metadata table configures how LiveRamp processes your input. Create exactly one row in the metadata table for the job (it does not accept multiple job configurations).

CREATE TABLE <catalog>.<schema>.pii_meta_table (
    client_id       STRING,
    client_secret   STRING,
    execution_mode  STRING,
    execution_type  STRING,
    target_columns  STRING,
    limit           INT
);

INSERT INTO <catalog>.<schema>.pii_meta_table VALUES (
    '<your_client_id>' as client_id,
    '<your_client_secret>' as client_secret,
    'resolution' as execution_mode,
    'PII' as execution_type,
    '{"name":["FIRSTNAME","LASTNAME"],"streetAddress":["ADDRESSLINE","ADDRESSLINE2"],
      "city":"CITY","state":"STATE","zipCode":"ZIPCODE","phone":"PHONE","email":"EMAIL"}' as target_columns,
    1 as limit
);

Set the Target Columns for the Metadata Table

The target_columns field maps your table's column names to LiveRamp's expected PII fields. Provide only the fields present in your table.

'{
  "name": ["<first_name_col>", "<last_name_col>"],   // up to 4 columns
  "streetAddress": ["<address_line1_col>", "<address_line2_col>"],  // up to 10 columns
  "city": "<city_col>",
  "state": "<state_col>",
  "zipCode": "<zip_col>",
  "phone": "<phone_col>",
  "email": "<email_col>"
}'

For example, if your columns are FIRST, LAST, ADDR, ADDR2, CITY, ST, ZIP, PHONE_NUM, EMAIL_ADDR, run the following SQL:

'{
  "name": ["FIRST", "LAST"],
  "streetAddress": ["ADDR", "ADDR2"],
  "city": "CITY",
  "state": "ST",
  "zipCode": "ZIP",
  "phone": "PHONE_NUM",
  "email": "EMAIL_ADDR"
}'

View the Output Table

The identity resolution results will be shared back to the specified recipient as an output table.

Each output table is retained for a maximum of 14 days or until you run another operation (which overwrites any existing output table).

To use the output table in your Databricks account, you’ll need to copy it before you run another operation or within 14 days (whichever comes first).

The PII resolution process passes the input table through a privacy filter which removes the PII and reswizzles the table (in addition to other operations). Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

Identity resolution of PII provides supplemental match metadata for additional insight into customer data that can provide powerful signals for making decisions based on RampIDs.

For PII resolution, the output table includes the fields shown in the table below.

Column	Example	Description
`RampID`	XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd	Returns the resolved RampID in your domain encoding.
`attribute_1`	Male	Any attribute columns passed through the service are returned.
`__lr_rank`	1	Provides insight on the match cascade level associated with the identifiers. If no maintained RampID is found, this value will be "null".
`__lr_filter_name`	name_phone	Returns the filter name where the match occurred, which will be one of the following options: `name_address_zip` `name_email` `name_phone` `partial_name_email` `partial_name_phone` `strict_name` (name + zip) `email` `phone` `last_name_address` If no maintained RampID is found, this value will be "null".

Privacy Filter

To minimize the risk of re-identification (the ability to tie PII directly to a RampID), the service includes the following processes when resolving PII identifiers (PII resolution or email-only resolution):

Column Values: The process evaluates each column value on a per-row basis for unique values. If any attribute occurs 3 or fewer times, the rows containing those column values will not be matchable and will not be returned in the output table.
>5% of the table unmatchable: If based on column value uniqueness, >5% of the file rows are unmatchable, the job will fail.
Number of Unique RampIDs: If fewer than 100 unique RampIDs would be returned, the job will fail.
Reswizzle full table: Upon completion, the full table will be reswizzled to return the rows RampID | attribute_1 | attribute_2 | attribute_n in a different order than what was submitted in the input table.

In this section:

Perform RampID Identity Resolution in Databricks

Note

Overall Steps

Note

Create the Input Table

Input Table Columns

Create the Metadata Table

Note

Set the Target Columns for the Metadata Table

Share Tables with LiveRamp via Delta Sharing

Create a Share

Create a Recipient

Note

Grant Access

Notify LiveRamp

View the Output Table

Privacy Filter

Search results