Getting Started with Identity Resolution in LiveRamp Clean Room

LiveRamp makes it safe and easy to connect data, and we've built our identity infrastructure capabilities into LiveRamp Clean Room to allow you to resolve and connect data directly where it lives.

This capability ensures any node within LiveRamp Clean Room can collaborate with partners on RampIDs as needed, driving an enhanced match within clean room questions and enabling seamless activation of segments built within the clean room.

Note

Identity resolution in LiveRamp Clean Room is only available for Hybrid and Confidential Computing clean rooms.
You can also choose to do identity resolution using other methods that take place outside of LiveRamp Clean Room (such as using our Embedded Identity in Cloud Environments solution or resolving your data by uploading it into LiveRamp Connect. Talk to your LiveRamp representative to get more information on these options.

Overview

To execute data collaboration with enhanced matching on RampIDs, you first connect your universe dataset to LiveRamp Clean Room. A universe dataset is your entire set of data (PII, device data, etc.) that needs to be resolved and unified. This is typically your full customer dataset across CRM-based data, subscriber data, or transaction data. By using the full universe, LiveRamp is able to optimize the fit of our graph to your view of the customer, ensuring analytics use cases can be executed with minimal conflicts.

Your universe dataset will be connected at source. Data connections can be configured to any cloud-based storage location, including AWS, GCS, Azure Blob, Snowflake, Google BigQuery, and Databricks (for more information, see “Connect to Cloud-Based Data”.

The universe dataset is required to have a CID (custom identifier) for each row. This should represent how you define a customer within your own systems.

Once you connect your universe dataset, LiveRamp uses the included identifiers for matching to create an additional linked dataset that maps the provided CIDs with their associated RampIDs. The CIDs in this additional dataset are MD5-hashed to maintain the pseudonymity of the RampIDs. This additional dataset is then used in clean room questions to allow joining datasets between partners on RampIDs.

The more identifier touchpoints you provide per CID, the higher the fidelity of the match and the greater the recognition rate with clean room partners will be. We recommend using plaintext PII (tied to CID) as the primary identifier whenever possible.

Once you’ve connected your universe dataset and the associated CID | Ramp ID dataset has been created, you can create data connections for your other data types, such as attribute data, conversions data, or exposure data. For these datasets, you include a column of MD5-hashed CIDs and do not need to include any other identifiers.

The identity resolution process is refreshed monthly on your dataset, based on the date you configure (other datasets will always be up-to-date because we access that data at source during question runs).

Overall Steps

Using identity resolution in LiveRamp Clean Room involves the following overall steps:

You format your universe dataset.
You create a data connection for your universe dataset.
You perform field mapping for the universe dataset (which involves mapping the fields, adding metadata, and scheduling identity resolution).
LiveRamp creates a linked dataset containing a mapping of MD5-hashed CIDs | RampIDs.
You create additional data connections for other datasets (such as attribute data, conversions data, or exposure data), keyed off of MD5-hashed CIDs.
You provision the required datasets to clean rooms.
Your and your partners create and run clean room questions that use the linked dataset and other datasets keyed off of MD5-hashed CIDs.

For more information on performing these steps, see the sections below.

Format a Universe Dataset

Before creating the data connection for your universe dataset, make sure it’s formatted correctly.

The universe dataset should represent your full audience and should include all user identifiers (PII touchpoints and/or online identifiers) that will be used during identity resolution to resolve to RampIDs.

LiveRamp uses this dataset to create a mapping between your CIDs and their associated RampIDs. This mapping lives in a linked dataset and allows you to use RampIDs as the join key between each partner's datasets in queries.

When formatting your universe dataset, multiple identifier types (including PII, hashed email, and MAIDs) can be included in the same dataset. The examples below can be used for the specific situations listed but you can create a dataset that uses any combination of these identifiers. See the table below for a list of the suggested columns for a universe dataset containing plaintext PII, hashed emails, and MAIDs.

For information on formatting and hashing identifiers, see “Formatting Identifiers”.

Note

When sending PII, it’s important that as many PII touchpoints as possible are provided for LiveRamp’s identity resolution capabilities to yield the best results.
You do not need to include columns for any identifiers that you’re not including in the dataset.
You do not need to include any attribute data columns (or any other non-identifier columns), since these will not be needed for identity resolution and will not be retained in the resulting CID | RampID mapping dataset. Removing attribute columns can help with faster processing times.
Your CRM dataset might also be able to function as a universe dataset.

Field Contents	Recommended Field Name	Field Type	Values Required?	Description/Notes
A unique user ID	cid	string	Yes	LiveRamp uses the values in this field to resolve your data to RampIDs. Plaintext CIDs are preferred. If you choose to hash the CIDs, make sure to use the same hashing type when sending CIDs in other data files or tables.
Consumer’s first name	`first_name`	string	Yes (if Name and Postal is used as an identifier)	You can include separate first name and last name columns or you can combine first name and last name in one column (such as “name”).
Consumer’s last name	`last_name`	string	Yes (if Name and Postal is used as an identifier)	You can include separate first name and last name columns or you can combine first name and last name in one column (such as “name”).
Consumer’s address	`address_1`	string	Yes (if Name and Postal is used as an identifier)	You can include separate address 1 and address 2 columns or you can combine all street address information in one column (such as “address”).
Consumer’s additional address information	`address_2`	string	No	Include values in this column if you have additional street address info for a given row. You can include separate address 1 and address 2 columns or you can combine all street address information in one column (such as “address”).
Consumer’s city	`city`	string	No	When matching on address, `city` is optional.
Consumer’s state	`state`	string	No	When matching on address, `state` is optional. Must be a two-character, capitalized abbreviation ("CA", not "California" or "Ca").
Consumer’s ZIP Code or postal code	`zip`	string	Yes (if Name and Postal is used as an identifier)	Required when matching on addresses. ZIP Codes can be in 5-digit format or 9-digit format (ZIP+4).
Consumer’s best email address	`email`	string	Yes (if email is used as an identifier)	Plaintext emails only. Only one plaintext email per input row is permitted. Other emails must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates. All emails must meet these requirements: Have characters before and after the "@" sign Contain a period character (".") Have characters after the period character Examples of valid emails include: a@a.com A@A.COM email@account.com EMAIL@ACCOUNT.COM email@sub.domain.com EMAIL@SUB.DOMAIN.COM
Consumer’s SHA-1-hashed email address	`sha1-email`	string	No	SHA-1 hashed emails only. Email addresses should be lowercased and UTF-8 encoded prior to hashing. After hashing, convert the resulting hash into lowercase hexadecimal representation.
Consumer’s SHA256-hashed email address	`sha256_email`	string	No	SHA256-hashed emails only. Email addresses should be lowercased and UTF-8 encoded prior to hashing. After hashing, convert the resulting hash into lowercase hexadecimal representation.
Consumer’s MD5-hashed email address	`md5_email`	string	No	MD5-hashed emails only. Email addresses should be lowercased and UTF-8 encoded prior to hashing. After hashing, convert the resulting hash into lowercase hexadecimal representation.
Consumer’s best phone number	`phone`	string	Yes (if phone is used as an identifier)	Plaintext phone numbers only. Only one phone number per input row is permitted. Other phone numbers must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates. All phone numbers must meet these requirements: Can be more than 10 characters if leading numbers over 10 characters are “0” or “1” If no leading numbers are used, must be 10 characters long Can contain hyphens ("-"), parentheses ("(" or ")"), plus signs ("+"), and periods (".") Examples of valid phone numbers include: 8668533267 866.853.3267 (866) 853-3267 8668533267 +1 (866) 853-3267 +18668533267 18668533267 1111111118668533267 08668533267 Examples of invalid phone numbers include: 987654321 (fewer than 10 characters) 98765432109 (more than 10 characters) 1234567890 (after removing the leading "1", less than 10 characters remain) 0987654321 (after removing the leading "0", less than 10 characters remain)
Consumer's mobile device ID (MAID)	`maid`	string	Yes (if MAIDs are used as identifiers)	Can be plaintext or SHA-1 hashed.

Create the Universe Dataset Data Connection

Once your universe dataset has been formatted, create the data connection to that dataset in LiveRamp Clean Room. Follow the instructions for your cloud provider in "Connect to Cloud-Based Data", making sure to use the appropriate article for a Hybrid clean room connection (rather than a cloud native-pattern clean room).

When a connection is initially configured, it will show "Verifying Access" as the configuration status on the Data Connections page. Once the connection is confirmed and the status has changed to "Mapping Required" (usually within 4 hours), map the table's fields.

Perform Field Mapping

As part of this process, once the connection is confirmed, you’ll perform mapping. This process involves several individual steps:

Mapping the dataset fields
Adding metadata
Scheduling identity resolution

Once this process has been completed, the linked dataset containing the MD5-hashed CID | RampID mapping appears on the Data Connections page as a child element under the data connection you created in the previous step.

Map the Fields

During the field mapping step, you specify which columns to include in the identity resolution process:

From the row for the newly created data connection, click the More Options menu (the three dots) and then click Edit Mapping.
Slide the Include toggle to the right for your CID column and any identifier columns.
Note
You do not need to include any attribute data columns (or any other non-identifier columns), since these will not be needed for identity resolution and will not be retained in the resulting CID | RampID mapping dataset. Removing attribute columns can help with faster processing times.
Click Next to advance to the Add Metadata step.

Add Metadata

After you map the fields, you’ll add metadata for each field:

Slide the Enable RampID Resolution toggle to the right to enable the Identity Resolution process.
For the column containing CIDs:
1. Slide the User Identifier toggle to the right
2. Select Customer First Party Identifier as the identifier type
For columns containing identifiers:
1. Slide the PII toggle to the right
2. Slide the User Identifier toggle to the right
3. Select the appropriate identifier type
Click Next to advance to the Schedule Identity Resolution step.

Schedule Identity Resolution

Universe mappings are updated monthly and can be configured to run on specific dates as needed:

Enter the day of the month you’d like the dataset refresh to be performed.
Enter the refresh start date or select it from the calendar.
If needed, enter the refresh end date or select it from the calendar.
Note
All dates use Coordinated Universal Time (UTC).
Click Next to advance to the Review step.
Once you’ve reviewed the information, click Save.

Once you’ve completed the steps above, the identity resolution job begins processing.

The configuration status for the data connection shows ”Job Processing" as the configuration status, which indicates that the universe dataset is being processed into the hashed CID | RampID mapping. This status should only display for a few hours (no more than 10).

Once the configuration status changes to “Completed”, the linked dataset is displayed underneath your universe dataset data connection and the linked dataset is now ready to be provisioned to clean rooms.

Note

Any job that shows a “Failed” status will include a message, displayed as a tooltip. Contact your LiveRamp account team or create a support case with the error message to troubleshoot the issue.

Create Data Connections for Other Datasets

If you haven't already done so, create data connections for your other datasets (such as CRM/attribute data, conversions data, or exposure data), keyed off of MD5-hashed CIDs. When these datasets are used in clean room questions, you'll be able to join them on MD5-hashed CID with your MD5-hashed CID | RampID dataset.

For more information on creating these data connections, see "Connect to Cloud-Based Data".

Provision Datasets to Clean Rooms

Once the above steps have been completed, you can provision the linked CID | RampID dataset to clean rooms. You can also then provision any additional datasets (keyed off of hashed CIDs).

Note

Do not provision the parent universe dataset (the dataset containing PII) to a clean room for a RampID workflow.
When creating a clean room with RampID as the join key, you will need to confirm that your organization meets certain requirements around the use of RampIDs by reviewing the linked document and confirming that you accept the terms.

When mapping the dataset, a toggle is available to allow for the use of RampID with the data connection. Choosing to turn it “on” kicks off the RampID-based configuration and workflow.

When you provision the linked CID | RampID dataset to the clean room, an acceptance box is displayed to confirm your agreement to use RampID as the join key.

Clean rooms may contain datasets containing hashed PII and datasets containing RampIDs, but you cannot use a PII dataset and a RampID dataset in the context of the same question.

Create and Run Questions

Once you’ve provisioned the necessary datasets to the clean room, you can use them in question runs.

When creating a question, use MD5-hashed CIDs as the join key between your MD5-hashed CID | RampID mapping dataset and your other datasets (keyed off of MD5-hashed CIDs). For a partner’s datasets, also use MD5-hashed CIDs as the join key between their MD5-hashed CID | RampID mapping dataset and their other datasets.

Then use RampIDs as the join key between your joined data and your partner’s joined data.

For more information on creating and running questions, see “Question Builder”.

In this section:

Getting Started with Identity Resolution in LiveRamp Clean Room

Note

Overview

Overall Steps

Format a Universe Dataset

Note

Create the Universe Dataset Data Connection

Perform Field Mapping

Map the Fields

Note

Add Metadata

Schedule Identity Resolution

Note

Note

Create Data Connections for Other Datasets

Provision Datasets to Clean Rooms

Note

Create and Run Questions

Search results