Perform RampID Identity Resolution in Databricks
LiveRamp's RampID Identity Resolution capability in Databricks allows for the translation of various identifiers to RampIDs. This allows you to resolve personally-identifiable information (PII) to a persistent pseudonymous person-based identifier.
Note
For more information about RampIDs, see "RampID Methodology".
This workflow is in limited availability and is by invitation only.
Once you've translated your data to RampIDs, you can then share that data to your LiveRamp account for activation.
The following identifiers can be resolved:
Names
Postal addresses
Email addresses
Phone numbers
For PII touchpoints, you can choose to receive from 1 to 10 RampIDs (if available).
These capabilities are available within Databricks through a Delta Share interface, which creates a share of customer data to our account for processing.
Performing identity resolution with LiveRamp Identity in Databricks requires the creation of two tables: A metadata table and an input table.
Overall Steps
Perform the following steps to perform RampID identity resolution in Databricks:
Note
A metadata table can be reused for multiple operations if the target columns remain the same.
LiveRamp runs resolution and writes the results to an output table that is shared back with you.
See the sections below for information on performing these tasks.
Create the Input Table
An input table needs to be prepared for each identity resolution operation.
When creating tables, keep the following guidelines in mind:
The column names for the input table can be whatever you want to use, as long as the names match the values specified in the metadata table and match Databricks column name syntax without special characters (for more information, see Databricks's documentation). However, for any attribute columns that will be returned in the output table, the column name can only contain letters, numbers, and underscores if the output table will be used for activation in LiveRamp Connect.
Do not use any column names that are the same as the column names returned in the output table for the identity resolution operation you're going to run.
Every column name must be unique in a table.
See the section below for suggested input table columns and descriptions.
An output table is created when you run the operation. For an example, see the "View the Output Table" section below.
Input Table Columns
The standard PII resolution process passes the data through a privacy filter which removes the PII and reswizzles the table. Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.
These column names cannot be used in the input table for PII resolution:
RampID
__lr_rank
__lr_filter_ name
See the table below for a list of the suggested input table columns and descriptions for PII resolution.
Suggested Column Name | Example | Description |
|---|---|---|
| John | You can include separate first name and last name columns or you can combine first name and last name in one column (such as “name”). |
| Doe | You can include separate first name and last name columns or you can combine first name and last name in one column (such as “name”). |
| 123 Main St | You can include separate address 1 and address 2 columns or you can combine all street address information in one column (such as “address”). |
| Apt 1 |
|
| Smalltown | When matching on address, |
| CA |
|
| 12345 |
|
| john@email.com |
|
| 555-123-4567 |
|
| For PII resolution, you can include columns with attribute data. These columns will be returned in the output table (for more information, see the "View the Output Table" section below). |
Create the Metadata Table
A metadata table can be reused for multiple operations.
To create the metadata table for PII resolution, update the following variables in the sample SQL shown below and then run the SQL:
<client_id>: Enter either an existing client ID or a new one provided in implementation.<client_secret>: Enter the password/secret for the client ID.<up to 4 name column names>: Enter the names of the columns in the input table to be used for the “name” element. Each input table column name should be enclosed in double-quotes. Enter a maximum of 4 name columns. If entering multiple column names, separate the column names with commas.<up to 7 address column names>: Enter the names of the columns in the input table to be used for the "address" element. Each input table column name should be enclosed in double-quotes. Enter a maximum of 7 address columns. If entering multiple column names, separate the column names with commas.<city column>: Enter the name of the column to be used for the "city" element.<state column>: Enter the name of the column to be used for the "state" element.<zipcode column>: Enter the name of the column to be used for the "zipcode" element.<phone column>: Enter the name of the column to be used for the "phone" element.<email column>: Enter the name of the column to be used for the "email" element.
Note
You can specify the number of RampIDs to be returned for each set of PII, to a maximum of 10 RampIDs per PII set. By default, 1 RampID will be returned per PII set. To change this, replace the “1” in the “as limit” line at the bottom of the SQL sample shown below with your desired number.
The metadata table configures how LiveRamp processes your input. Create exactly one row in the metadata table for the job (it does not accept multiple job configurations).
CREATE TABLE <catalog>.<schema>.pii_meta_table (
client_id STRING,
client_secret STRING,
execution_mode STRING,
execution_type STRING,
target_columns STRING,
limit INT
);
INSERT INTO <catalog>.<schema>.pii_meta_table VALUES (
'<your_client_id>' as client_id,
'<your_client_secret>' as client_secret,
'resolution' as execution_mode,
'PII' as execution_type,
'{"name":["FIRSTNAME","LASTNAME"],"streetAddress":["ADDRESSLINE","ADDRESSLINE2"],
"city":"CITY","state":"STATE","zipCode":"ZIPCODE","phone":"PHONE","email":"EMAIL"}' as target_columns,
1 as limit
);Set the Target Columns for the Metadata Table
The target_columns field maps your table's column names to LiveRamp's expected PII fields. Provide only the fields present in your table.
'{
"name": ["<first_name_col>", "<last_name_col>"], // up to 4 columns
"streetAddress": ["<address_line1_col>", "<address_line2_col>"], // up to 10 columns
"city": "<city_col>",
"state": "<state_col>",
"zipCode": "<zip_col>",
"phone": "<phone_col>",
"email": "<email_col>"
}'For example, if your columns are FIRST, LAST, ADDR, ADDR2, CITY, ST, ZIP, PHONE_NUM, EMAIL_ADDR, run the following SQL:
'{
"name": ["FIRST", "LAST"],
"streetAddress": ["ADDR", "ADDR2"],
"city": "CITY",
"state": "ST",
"zipCode": "ZIP",
"phone": "PHONE_NUM",
"email": "EMAIL_ADDR"
}'View the Output Table
The identity resolution results will be shared back to the specified recipient as an output table.
Each output table is retained for a maximum of 14 days or until you run another operation (which overwrites any existing output table).
To use the output table in your Databricks account, you’ll need to copy it before you run another operation or within 14 days (whichever comes first).
The PII resolution process passes the input table through a privacy filter which removes the PII and reswizzles the table (in addition to other operations). Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.
Identity resolution of PII provides supplemental match metadata for additional insight into customer data that can provide powerful signals for making decisions based on RampIDs.
For PII resolution, the output table includes the fields shown in the table below.
Column | Example | Description |
|---|---|---|
| XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd | Returns the resolved RampID in your domain encoding. |
| Male | Any attribute columns passed through the service are returned. |
| 1 | Provides insight on the match cascade level associated with the identifiers. If no maintained RampID is found, this value will be "null". |
| name_phone | Returns the filter name where the match occurred, which will be one of the following options:
If no maintained RampID is found, this value will be "null". |
Privacy Filter
To minimize the risk of re-identification (the ability to tie PII directly to a RampID), the service includes the following processes when resolving PII identifiers (PII resolution or email-only resolution):
Column Values: The process evaluates each column value on a per-row basis for unique values. If any attribute occurs 3 or fewer times, the rows containing those column values will not be matchable and will not be returned in the output table.
>5% of the table unmatchable: If based on column value uniqueness, >5% of the file rows are unmatchable, the job will fail.
Number of Unique RampIDs: If fewer than 100 unique RampIDs would be returned, the job will fail.
Reswizzle full table: Upon completion, the full table will be reswizzled to return the rows
RampID | attribute_1 | attribute_2 | attribute_nin a different order than what was submitted in the input table.