Skip to main content

Getting Started with Identity Resolution in LiveRamp Clean Room (Non-US Data)

LiveRamp makes it safe and easy to connect data, and we've built our identity infrastructure capabilities into LiveRamp Clean Room to allow you to resolve and connect data directly where it lives. 

This capability ensures data owners in LiveRamp Clean Room can collaborate with partners on RampIDs as needed, driving an enhanced match within clean room questions and enabling seamless activation of segments built within the clean room.

Note

Overview

To execute data collaboration with enhanced matching on RampIDs for non-US data, you first connect your universe dataset to LiveRamp Clean Room. A universe dataset is your entire set of data (PII or hashed email addresses) that needs to be resolved and unified. This is typically your full customer dataset across CRM-based data, subscriber data, or transaction data.

However, for non-US datasets, you can connect the dataset that is most relevant to your use case (even if it is not your full universe) to optimize the match against LiveRamp RampIDs.

Note

For the purposes of this article, we’ll continue to use “universe dataset” to refer to the dataset that you configure for identity resolution in Clean Room, even if you do not end up connecting your entire universe for your Clean Room use case.

Your universe dataset will be connected at source. Data connections can be configured to any cloud-based storage location, including AWS, GCS, Azure Blob, Snowflake, Google BigQuery, and Databricks (for more information, see “Connect to Cloud-Based Data”.

The universe dataset is required to have a CID (custom identifier) for each row. This should represent how you define a customer within your own systems. 

Once you connect your universe dataset, LiveRamp uses the included identifiers for matching to create an additional linked dataset that maps the provided CIDs with their associated RampIDs. The CIDs in this additional dataset are MD5-hashed to maintain the pseudonymity of the RampIDs. This additional dataset is then used in clean room questions to allow joining datasets between partners on RampIDs.

Once you’ve connected your universe dataset and the associated CID | Ramp ID dataset has been created, you can create data connections for your other data types, such as attribute data, conversions data, or exposure data. For these datasets, you include a column of MD5-hashed CIDs and do not need to include any other identifiers. 

The identity resolution process is refreshed monthly on your dataset, based on the date you configure (other datasets will always be up-to-date because we access that data at source during question runs).

Overall Steps

Using identity resolution in LiveRamp Clean Room involves the following overall steps:

  1. You format your universe dataset.

  2. You create a data connection for your universe dataset.

  3. You perform field mapping for the universe dataset (which involves mapping the fields, adding metadata, and scheduling identity resolution).

  4. LiveRamp creates a linked dataset containing a mapping of MD5-hashed CIDs | RampIDs.

  5. You create additional data connections for other datasets (such as attribute data, conversions data, or exposure data), keyed off of MD5-hashed CIDs.

  6. You provision the required datasets to clean rooms.

  7. Your and your partners create and run clean room questions that use the linked dataset and other datasets keyed off of MD5-hashed CIDs.

For more information on performing these steps, see the sections below.

Format a Universe Dataset (Non-US Data)

Before creating the data connection for your non-US data universe dataset, make sure it’s formatted correctly. 

Note

For information on formatting a US data universe dataset, see “Format a Universe Dataset (US Data)”.

The universe dataset should represent your full audience and should include all user identifiers (PII touchpoints) that will be used during identity resolution to resolve to derived RampIDs. However, you can connect the dataset that is most relevant to your use case (even if it is not your full universe) to optimize the match against LiveRamp RampIDs.

LiveRamp uses this dataset to create a mapping between your CIDs and their associated RampIDs. This mapping lives in a linked dataset and allows you to use RampIDs as the join key between each partner's datasets in queries.

When formatting your universe dataset, multiple identifier types (including PII and hashed email) can be included in the same dataset. The examples below can be used for the specific situations listed but you can create a dataset that uses any combination of these identifiers. See the table below for a list of the suggested columns for a universe dataset containing plaintext PII and hashed emails.  

Recommended best practice is to align on one identifier type to generate derived RampIDs and advise partners to do the same. For best results, sending only plaintext emails is recommended, as the LiveRamp normalization and resolution process will burst to cover hashed email as well within the process, optimizing match results with partners.

For information on formatting and hashing identifiers, see “Formatting Identifiers”.

Note

  • You do not need to include columns for any identifiers that you’re not including in the dataset.

  • You do not need to include any attribute data columns (or any other non-identifier columns), since these will not be needed for identity resolution and will not be retained in the resulting CID | RampID mapping dataset.

Field Contents

Recommended Field Name

Field Type

Values Required?

Description/Notes

A unique user ID

cid

string

Yes

  • LiveRamp uses the values in this field to resolve your data to RampIDs.

  • Plaintext CIDs are preferred. If you choose to hash the CIDs, make sure to use the same hashing type when sending CIDs in other data files or tables.

Consumer’s first name

first_name

string

Yes (if Name and Postal is used as an identifier)

  • You can include separate first name and last name columns or you can combine first name and last name in one column (such as “name”).

Consumer’s last name

last_name

string

Yes (if Name and Postal is used as an identifier)

  • You can include separate first name and last name columns or you can combine first name and last name in one column (such as “name”).

Consumer’s post code

zip

string

Yes (if Name and Postal is used as an identifier)

  • Do not include special characters or dashes.

Consumer’s best email address

email

string

Yes (if email is used as an identifier)

  • Plaintext emails only.

  • Only one plaintext email per input row is permitted. Other emails must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates.

  • All emails must meet these requirements:

    • Have characters before and after the "@" sign

    • Contain a period character (".")

    • Have characters after the period character

  • Examples of valid emails include:

    • a@a.com

    • A@A.COM

    • email@account.com

    • EMAIL@ACCOUNT.COM

    • email@sub.domain.com

    • EMAIL@SUB.DOMAIN.COM

Consumer’s SHA-1-hashed email address

sha1-email

string

No

  • SHA-1 hashed emails only.

  • Email addresses should be lowercased and UTF-8 encoded prior to hashing.

  • After hashing, convert the resulting hash into lowercase hexadecimal representation.

Consumer’s SHA256-hashed email address

sha256_email

string

No

  • SHA256-hashed emails only.

  • Email addresses should be lowercased and UTF-8 encoded prior to hashing.

  • After hashing, convert the resulting hash into lowercase hexadecimal representation.

Consumer’s MD5-hashed email address

md5_email

string

No

  • MD5-hashed emails only.

  • Email addresses should be lowercased and UTF-8 encoded prior to hashing.

  • After hashing, convert the resulting hash into lowercase hexadecimal representation.

Consumer’s best phone number

phone

string

Yes (if phone is used as an identifier)

  • Plaintext phone numbers only.

  • Only one phone number per input row is permitted. Other phone numbers must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates.

  • Follow the ITU-T E.164 format for phone numbers shown below:

    • The structure should be “+ [Country Code][Area Code][Subscriber Number]”

    • Maximum length is 15 digits.

    • The number should not contain spaces, parentheses, or dashes. It should only include the plus sign, country code, area code, and subscriber number.

    • The plus sign is used as a prefix to indicate an international number and replaces the international call prefix.

    • Country code (CC) is the 1 to 3 digit code assigned to each country.

    • The National Destination Code (NDC or “area code”) identifies a specific area or region within the country .

    • The subscriber number (SN) is the individual's unique phone number. 

Create the Universe Dataset Data Connection

Once your universe dataset has been formatted, create the data connection to that dataset in LiveRamp Clean Room. Follow the instructions for your cloud provider in "Connect to Cloud-Based Data", making sure to use the appropriate article for a Hybrid clean room connection (rather than a cloud native-pattern clean room).

When a connection is initially configured, it will show "Verifying Access" as the configuration status on the Data Connections page. Once the connection is confirmed and the status has changed to "Mapping Required" (usually within 4 hours), map the table's fields.

Perform Field Mapping

As part of this process, once the connection is confirmed, you’ll perform mapping. This process involves several individual steps:

  • Mapping the dataset fields

  • Adding metadata

  • Scheduling identity resolution

Once this process has been completed, the linked dataset containing the MD5-hashed CID | RampID mapping appears on the Data Connections page as a child element under the data connection you created in the previous step.

Map the Fields

During the field mapping step, you specify which columns to include in the identity resolution process:

LCR-Edit_Mapping-Map_Fields_step.png
  1. From the row for the newly created data connection, click the More Options menu (the three dots) and then click Edit Mapping.

  2. Slide the Include toggle to the right for your CID column and any identifier columns. 

    Note

    You do not need to include any attribute data columns (or any other non-identifier columns), since these will not be needed for identity resolution and will not be retained in the resulting CID | RampID mapping dataset. Removing attribute columns can help with faster processing times.

  3. Click Next to advance to the Add Metadata step.

Add Metadata

After you map the fields, you’ll add metadata for each field:

LCR-Edit_Mapping-Add_Metadata_step.png
  1. Slide the Enable RampID Resolution toggle to the right to enable the Identity Resolution process.

  2. For the column containing CIDs:

    1. Slide the User Identifier toggle to the right

    2. Select Customer First Party Identifier as the identifier type

  3. For columns containing identifiers:

    1. Slide the PII toggle to the right

    2. Slide the User Identifier toggle to the right

    3. Select the appropriate identifier type

  4. Click Next to advance to the Schedule Identity Resolution step.

Schedule Identity Resolution

Universe mappings are updated monthly and can be configured to run on specific dates as needed:

LCR-Edit_Mapping-ID_Resolution_step.png
  1. Enter the day of the month you’d like the dataset refresh to be performed.

  2. Enter the refresh start date or select it from the calendar.

  3. If needed, enter the refresh end date or select it from the calendar. 

    Note

    All dates use Coordinated Universal Time (UTC).

  4. Click Next to advance to the Review step.

  5. Once you’ve reviewed the information, click Save.

Once you’ve completed the steps above, the identity resolution job begins processing.

The configuration status for the data connection shows ”Job Processing" as the configuration status, which indicates that the universe dataset is being processed into the hashed CID | derived RampID mapping. This status should only display for a few hours (no more than 10). 

Once the configuration status changes to “Completed”, the linked dataset is displayed underneath your universe dataset data connection and the linked dataset is now ready to be provisioned to clean rooms.

LCR-Linked_Universe_Data_Connection.png

Note

Any job that shows a “Failed” status will include a message, displayed as a tooltip. Contact your LiveRamp account team or create a support case with the error message to troubleshoot the issue.

Create Data Connections for Other Datasets

If you haven't already done so, create data connections for your other datasets (such as CRM/attribute data, conversions data, or exposure data), keyed off of MD5-hashed CIDs. When these datasets are used in clean room questions, you'll be able to join them on MD5-hashed CID with your MD5-hashed CID | RampID dataset.

For more information on creating these data connections, see "Connect to Cloud-Based Data".

Provision Datasets to Clean Rooms

Once the above steps have been completed, you can provision the linked CID | RampID dataset to clean rooms. You can also then provision any additional datasets (keyed off of hashed CIDs).

Note

  • Do not provision the parent universe dataset (the dataset containing PII) to a clean room for a RampID workflow.

  • When creating a clean room with RampID as the join key, you will need to confirm that your organization meets certain requirements around the use of RampIDs by reviewing the linked document and confirming that you accept the terms.

    LCR-RampID_Attestation.png

When mapping the dataset, a toggle is available to allow for the use of RampID with the data connection.  Choosing to turn it “on” kicks off the RampID-based configuration and workflow.

When you provision the linked CID | RampID dataset to the clean room, an acceptance box is displayed to confirm your agreement to use RampID as the join key.

LCR-Confirm_RampID_Join_Key.png

Clean rooms may contain datasets containing hashed PII and datasets containing RampIDs, but you cannot use a PII dataset and a RampID dataset in the context of the same question.

Create and Run Questions

Once you’ve provisioned the necessary datasets to the clean room, you can use them in question runs.

When creating a question, use MD5-hashed CIDs as the join key between your MD5-hashed CID | RampID mapping dataset and your other datasets (keyed off of MD5-hashed CIDs). For a partner’s datasets, also use MD5-hashed CIDs as the join key between their MD5-hashed CID | RampID mapping dataset and their other datasets.

Then use RampIDs as the join key between your joined data and your partner’s joined data.

For more information on creating and running questions, see “Question Builder”.