Skip to main content

Getting Started with Identity Resolution in LiveRamp Clean Room (US Data)

LiveRamp makes it safe and easy to connect data, and we've built our identity infrastructure capabilities into LiveRamp Clean Room to allow you to resolve and connect data directly where it lives. 

This capability ensures data owners in LiveRamp Clean Room can collaborate with partners on LiveRamp Identity as needed, driving an enhanced match within clean room questions and enabling seamless activation of segments built within the clean room. These capabilities can power both marketing and advertising use cases.

Note

Overview

Identity Resolution in LiveRamp Clean Room allows each customer to connect all of their CID-keyed datasets with their partners’ CID-keyed datasets in clean room questions, while LiveRamp uses RampIDs or Known IDs in the background to create the best possible match between the two sides.

To do this, each clean room question (query) utilizes a “mapping” dataset that maps a customer’s CIDs to a LiveRamp identifier (either Known IDs for marketing use cases or RampIDs for advertising use cases). The mapping datasets are then used to join each customer’s datasets with their partners’s datasets on the LiveRamp identifiers in clean room questions.

This allows each customer to keep their data organized around their own CIDs and join their CRM, conversion, exposure, or other datasets the same way they normally would. For clean room questions that use more than one customer’s data, the data is joined on the LiveRamp identifier to make those partner-to-partner connections more accurate and more interoperable. This gives customers a collaboration workflow that stays anonymous and privacy-conscious, while still making the combined data useful for overlap analysis, audience insights, measurement, and other advertising and marketing use cases.

To generate these mapping datasets, you connect your universe dataset to LiveRamp Clean Room. This universe dataset is required to have a CID (custom identifier) for each row. This should represent how you define a customer within your own systems.

Note

  • A universe dataset is your entire set of data (PII, device data, etc.) that needs to be resolved and unified. This is typically your full customer dataset across CRM-based data, subscriber data, or transaction data. By using the full universe, LiveRamp is able to optimize the fit of our graph to your view of the customer, ensuring analytics use cases can be executed with minimal conflicts.

  • The more identifier touchpoints you provide per CID, the higher the fidelity of the match and the greater the recognition rate with clean room partners will be. We recommend using plaintext PII (tied to CID) as the primary identifier whenever possible.

Your universe dataset will be connected at source. Data connections can be configured to any cloud-based storage location, including AWS, GCS, Azure Blob, Snowflake, Google BigQuery, and Databricks (for more information, see the articles in the “Connect to Cloud-Based Data” section of our documentation site).

Once you connect your universe dataset, LiveRamp uses the included identifiers for matching to create two additional linked datasets that map the provided CIDs with their associated LiveRamp identifiers:

  • One dataset that maps your CIDs to Known IDs (for marketing use case questions)

  • One dataset that maps an MD5-hashed version of your CIDs to RampIDs (for advertising use case questions)

These mapping datasets are then used in clean room questions to allow joining datasets between partners on Known IDs or RampIDs.

You can then create data connections for your other datasets, such as attribute data, conversions data, or exposure data. For these datasets, you include a column of the appropriate type of CIDs (and do not need to include any other identifiers):

  • Marketing use cases: Additional datasets to be used in marketing use cases should be connected with the same type of CID that was included in your universe dataset so that those datasets can be joined with the CID | Known ID mapping dataset.

  • Advertising use cases: Additional datasets to be used in advertising use cases should be connected with an MD5- hashed version of the same type of CID that was included in your universe dataset so that they can be joined with the Hashed CID | RampID mapping dataset (the CIDs are MD5 hashed to maintain the pseudonymity of the RampIDs).

The identity resolution process is refreshed monthly on your universe dataset, based on the date you configure (other datasets will always be up-to-date because we access that data at source during question runs).

Overall Steps

Using identity resolution in LiveRamp Clean Room involves the following overall steps:

  1. You format your universe dataset.

  2. You create a data connection for your universe dataset.

  3. You perform field mapping for the universe dataset (which involves mapping the fields, adding metadata, and scheduling identity resolution).

  4. LiveRamp creates two linked mapping datasets: containing a mapping of MD5-hashed CIDs | RampIDs.

    • A mapping of CIDs | Known IDs

    • A mapping of MD5-hashed CIDs | RampIDs

  5. You create additional data connections for other datasets (such as attribute data, conversions data, or exposure data), keyed off of the appropriate type of CIDs (depending on your use case and the type of CIDs included in the universe dataset).

  6. You provision the required datasets to clean rooms.

  7. Your and your partners create and run clean room questions that use the linked dataset and other datasets keyed off of MD5-hashed CIDs.

For more information on performing these steps, see the sections below.

Format a Universe Dataset (US Data)

Before creating the data connection for your US data universe dataset, make sure it’s formatted correctly. 

Note

For information on formatting a non-US data universe dataset, see “Format a Universe Dataset (Non-US Data)”.

The universe dataset should represent your full audience and should include all user identifiers (PII touchpoints and/or online identifiers) that will be used during identity resolution to resolve to Known IDs and RampIDs. 

Note

Only PII touchpoints (such as name, postal, email, and phone) will be used to resolve to Known IDs. PII touchpoints and online identifiers (such as cookies, MAIDs, and IP addresses) will be used to resolve to RampIDs.

LiveRamp uses this dataset to create a mapping between your CIDs and their associated LiveRamp identifiers (Known IDs or RampIDs). These mappings live in two linked datasets and allow you to use Known IDs (for marketing use cases) or RampIDs (for advertising use cases) as the join key between each partner's datasets in queries.

When formatting your universe dataset, multiple identifier types (including PII, hashed email, and MAIDs) can be included in the same dataset. The examples below can be used for the specific situations listed but you can create a dataset that uses any combination of these identifiers. See the table below for a list of the suggested columns for a universe dataset containing plaintext PII, hashed emails, and MAIDs.  

For information on formatting and hashing identifiers, see “Formatting Identifiers”.

Note

  • When sending PII, it’s important that as many PII touchpoints as possible are provided for LiveRamp’s identity resolution capabilities to yield the best results.

  • You do not need to include columns for any identifiers that you’re not including in the dataset.

  • You do not need to include any attribute data columns (or any other non-identifier columns), since these will not be needed for identity resolution and will not be retained in the resulting CID | Known ID and Hashed CID | RampID mapping datasets. Removing attribute columns can help with faster processing times.

  • Your CRM dataset might also be able to function as a universe dataset.

  • Datasets that will be used in identity resolution must not contain BOM characters. For more information, see “Removing BOM Characters”.

Field Contents

Recommended Field Name

Field Type

Values Required?

Description/Notes

A unique user ID

cid

string

Yes

  • LiveRamp uses the values in this field to resolve your data to RampIDs.

  • Plaintext CIDs are preferred. If you choose to hash the CIDs, make sure to use the same hashing type when including CIDs in other datasets.

Consumer’s first name

first_name

string

Yes (if Name and Postal is used as an identifier)

  • You can include separate first name and last name columns or you can combine first name and last name in one column (such as “name”).

Consumer’s last name

last_name

string

Yes (if Name and Postal is used as an identifier)

  • You can include separate first name and last name columns or you can combine first name and last name in one column (such as “name”).

Consumer’s address

address_1

string

Yes (if Name and Postal is used as an identifier)

  • You can include separate address 1 and address 2 columns or you can combine all street address information in one column (such as “address”).

Consumer’s additional address information

address_2

string

No

  • Include values in this column if you have additional street address info for a given row.

  • You can include separate address 1 and address 2 columns or you can combine all street address information in one column (such as “address”).

Consumer’s city

city

string

No

  • When matching on address, city is optional.

Consumer’s state

state

string

No

  • When matching on address, state is optional.

  • Must be a two-character, capitalized abbreviation ("CA", not "California" or "Ca").

Consumer’s ZIP Code or postal code

zip

string

Yes (if Name and Postal is used as an identifier)

  • Required when matching on addresses.

  • ZIP Codes can be in 5-digit format or 9-digit format (ZIP+4).

Consumer’s best email address

email

string

Yes (if email is used as an identifier)

  • Plaintext emails only.

  • Only one plaintext email per input row is permitted. Other emails must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates.

  • All emails must meet these requirements:

    • Have characters before and after the "@" sign

    • Contain a period character (".")

    • Have characters after the period character

  • Examples of valid emails include:

    • a@a.com

    • A@A.COM

    • email@account.com

    • EMAIL@ACCOUNT.COM

    • email@sub.domain.com

    • EMAIL@SUB.DOMAIN.COM

Consumer’s SHA-1-hashed email address

sha1-email

string

No

  • SHA-1 hashed emails only.

  • Email addresses should be lowercased and UTF-8 encoded prior to hashing.

  • After hashing, convert the resulting hash into lowercase hexadecimal representation.

Consumer’s SHA256-hashed email address

sha256_email

string

No

  • SHA256-hashed emails only.

  • Email addresses should be lowercased and UTF-8 encoded prior to hashing.

  • After hashing, convert the resulting hash into lowercase hexadecimal representation.

Consumer’s MD5-hashed email address

md5_email

string

No

  • MD5-hashed emails only.

  • Email addresses should be lowercased and UTF-8 encoded prior to hashing.

  • After hashing, convert the resulting hash into lowercase hexadecimal representation.

Consumer’s best phone number

phone

string

Yes (if phone is used as an identifier)

  • Plaintext phone numbers only.

  • Only one phone number per input row is permitted. Other phone numbers must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates.

  • All phone numbers must meet these requirements:

    • Can be more than 10 characters if leading numbers over 10 characters are “0” or “1”

    • If no leading numbers are used, must be 10 characters long

    • Can contain hyphens ("-"), parentheses ("(" or ")"), plus signs ("+"), and periods (".")

  • Examples of valid phone numbers include:

    • 8668533267

    • 866.853.3267

    • (866) 853-3267

    • 8668533267

    • +1 (866) 853-3267

    • +18668533267

    • 18668533267

    • 1111111118668533267

    • 08668533267

  • Examples of invalid phone numbers include:

    • 987654321 (fewer than 10 characters)

    • 98765432109 (more than 10 characters)

    • 1234567890 (after removing the leading "1", less than 10 characters remain)

    • 0987654321 (after removing the leading "0", less than 10 characters remain)

Consumer's mobile device ID (MAID)

maid

string

Yes (if MAIDs are used as identifiers)

  • Can be plaintext or SHA-1 hashed.

  • These are only used to  generate RampIDs for the CID | RampID dataset.

Create the Universe Dataset Data Connection

Once your universe dataset has been formatted, create the data connection to that dataset in LiveRamp Clean Room. Follow the instructions for your cloud provider in "Connect to Cloud-Based Data", making sure to use the appropriate article for a Hybrid clean room connection (rather than a cloud native-pattern clean room).

When a connection is initially configured, it will show "Verifying Access" as the configuration status on the Data Connections page. Once the connection is confirmed and the status has changed to "Mapping Required" (usually within 4 hours), map the table's fields.

Perform Field Mapping

As part of this process, once the connection is confirmed, you’ll perform mapping. This process involves several individual steps:

  • Mapping the dataset fields

  • Adding metadata

  • Scheduling identity resolution

Once this process has been completed, the linked datasets containing the CID | Known ID mapping and the MD5-hashed CID | RampID mapping appear on the Data Connections page as child elements under the data connection you created in the previous step.

Map the Fields

During the field mapping step, you specify which columns to include in the identity resolution process:

LCR-Edit_Mapping-Map_Fields_step.png
  1. From the row for the newly created data connection, click the More Options menu (the three dots) and then click Edit Mapping.

  2. Slide the Include toggle to the right for your CID column and any identifier columns. 

    Note

    You do not need to include any attribute data columns (or any other non-identifier columns), since these will not be needed for identity resolution and will not be retained in the resulting mapping datasets. Removing attribute columns can help with faster processing times.

  3. Click Next to advance to the Add Metadata step.

Add Metadata

After you map the fields, you’ll add metadata for each field:

LCR-Edit_Mapping-Add_Metadata_step.png
  1. Slide the EEnable Identity Resolution toggle to the right to enable the Identity Resolution process.

  2. For the column containing CIDs:

    1. Slide the User Identifier toggle to the right

    2. Select Customer First Party Identifier as the identifier type

  3. For columns containing identifiers:

    1. Slide the PII toggle to the right

    2. Slide the User Identifier toggle to the right

    3. Select the appropriate identifier type

  4. Click Next to advance to the Schedule Identity Resolution step.

Schedule Identity Resolution

Universe mappings are updated monthly and can be configured to run on specific dates as needed:

LCR-Edit_Mapping-ID_Resolution_step.png
  1. Enter the day of the month you’d like the dataset refresh to be performed.

  2. Enter the refresh start date or select it from the calendar.

  3. If needed, enter the refresh end date or select it from the calendar. 

    Note

    All dates use Coordinated Universal Time (UTC).

  4. Click Next to advance to the Review step.

  5. Once you’ve reviewed the information, click Save.

Once you’ve completed the steps above, the identity resolution job begins processing.

The configuration status for the data connection shows ”Job Processing" as the configuration status, which indicates that the universe dataset is being processed into the CID | Known ID and hashed CID | RampID mappings. This status should only display for a few hours (no more than 10). 

Once the configuration status changes to “Completed”, the linked datasets are displayed underneath your universe dataset data connection and the linked datasets are now ready to be provisioned to clean rooms.

LCR-Linked_Universe_Data_Connection.png

Note

Any job that shows a “Failed” status will include a message, displayed as a tooltip. Contact your LiveRamp account team or create a support case with the error message to troubleshoot the issue.

Create Data Connections for Other Datasets

If you haven't already done so, create data connections for your other datasets (such as CRM/attribute data, conversions data, or exposure data), keyed off of CIDs in the format used in the universe file (for marketing workflows) or MD5-hashed CIDs (for advertising workflows). When these datasets are used in clean room questions, you'll be able to join them on the CIDs in the appropriate mapping dataset, depending on whether you’re using a marketing workflow or an advertising workflow.

For more information on creating these data connections, see "Connect to Cloud-Based Data".

Provision Datasets to Clean Rooms

Once the above steps have been completed, you can provision the linked mapping datasets to clean rooms. You can also then provision any additional datasets (keyed off of CIDs or hashed CIDs).

Note

  • Do not provision the parent universe dataset (the dataset containing PII) to a clean room for a RampID or Known ID workflow.

  • When creating a clean room with RampID as the join key, you will need to confirm that your organization meets certain requirements around the use of RampIDs by reviewing the linked document and confirming that you accept the terms.

    LCR-RampID_Attestation.png

When you provision the linked CID | RampID dataset to the clean room, an acceptance box is displayed to confirm your agreement to use RampID as the join key.

LCR-Confirm_RampID_Join_Key.png

Clean rooms may contain datasets containing hashed PII and datasets containing RampIDs, but you cannot use a PII dataset and a RampID dataset in the context of the same question.

Create and Run Questions

Once you’ve provisioned the necessary datasets to the clean room, you can use them in question runs.

When creating a question:

  • For marketing use cases, use the CIDs with the CID | Known ID mapping to join across your datasets. Then use Known IDs as the join key between your joined data and your partner’s joined data.

  • For advertising use cases, use the MD5-hashed CIDs with the hashed CID | RampID mapping to join across your datasets. Then use RampIDs as the join key between your joined data and your partner’s joined data.

For more information on creating and running questions, see “Question Builder”.