Skip to main content

Format Your Data

See the sections below for information on formatting your data, including the recommended schema and format for each dataset type you might need to connect to LiveRamp Clean Room. 

Note

If you’re utilizing LiveRamp Embedded Identity in your cloud environment, the formatting guidelines are somewhat different. For more information, see “Embedded Identity in Cloud Environments” in our Identity docs site.

While not all fields might be required for your current workflows and collaborations, we recommend you include as many of these additional fields now to make it easier for you to expand your collaborations in the future without you (or your IT team) having to perform additional work on your datasets.

The recommended field names shown aren’t mandatory, but using the recommended names can reduce the number of transformations you and your collaborators might need to do during dataset provisioning or query writing.

All data files or tables in a data connection job must have the same schema in order to successfully process. If you have multiple types of schemas, create a new data connection for each one. The order of the columns should match the schema.

General Formatting Guidelines

When formatting your data before creating data connections in LiveRamp Clean Room, keep the following guidelines in mind:

  • CSV and Parquet files supported: For file-based data connections, LiveRamp Clean Room supports CSV and Parquet files, as well as multi-part files.

  • File size:

  • Use UTF-8 character encoding: To ensure the greatest match rate between collaborator datasets, we recommend using UTF-8 character encoding (including before hashing identifiers).

  • Identifiers: Each row in a file or table in a data connection job must contain identifiers. LiveRamp Clean Room supports multiple identifier types, including the identifiers listed in “Formatting Identifiers”. If your file or table does not include an identifier field, add a new field and insert a random ID for every record. This pseudo field must be mapped as the user identifier during the data setup.

  • Hashing identifiers: It is best practice to select a hashing algorithm which matches the NIST standard and results in a final hash value that uses lowercase characters. For more information on hashing identifiers, see "Hashing Identifiers".

  • File headers: All files must have a header in the first row. The name of each header column in a dataset file constitutes an attribute, which is a key for each record in the dataset. For more information on formatting headers, see the “Standard Dataset Header Guidelines” section below

  • Delimiters: If you’re using CSV files, delimiters must be specified.

  • Remove special characters from CSV files: When using CSV files, remove any special characters from your data files. For example, if your file contains integers formatted with commas (i.e. 123,456), remove the comma formatting (i.e. 123456).

Date formatting: The folder where the data files or tables are dropped includes a date macro - {yyyy-MM-dd}. Replace the macro with the appropriate date string when the files are dropped without the curly brackets. LiveRamp Clean Room does not accept date time strings.

Guidelines for Standard Dataset Headers, File Names, and Folder Names

Make sure to follow the guidelines listed below for file and table names, folder names, and headers for files and tables for standard dataset types (such as CRM, transaction, or ad log). File and table names, folder names, and headers must meet these guidelines:

  • Start with a letter (and not start with a special character, such as an underscore, or digit)

  • End with either a letter or a digit

  • Not contain special characters (hyphens (-) and underscores (_ ) are allowed)

  • Not contain whitespaces, such as spaces and control characters ( such as form feed \f, c arriage return \r, line feed/new line \n, or t ab \t)

  • Not contain duplicate header names within the same data connection

  • Not be longer than 255 characters

Here are a few examples of valid names (header examples shown):

  • Header1

  • header_name

  • HEADER-NAME

  • HEADER___NAME__HERE-1

Here are a few examples of invalid names (header examples shown):

  • 123header: Starts with non-alphabet

  • header123_: Ends with non-alphanumeric

  • header@name: Contains special characters

  • header name: Contains spaces

  • header\nname: Contains non-space white spaces

  • \uFEFFHeaderName: Contains a BOM character

Managing BOM Characters

BOM (Byte Order Mark) characters can occur in the header of a CSV file due to the file initially being saved or written out with encoding that includes the BOM, often UTF-8 with BOM.

When a file is saved with UTF-8 encoding, some systems prepend the contents with BOM to indicate that it's a UTF-8 file. BOM can cause problems when reading the file because it might be interpreted as part of the actual data (like the header), rather than metadata about the byte order and encoding.

If a BOM character is detected at the start of your file, an error will appear in the data connection within LiveRamp Clean Room. If this occurs, perform the following on your data before uploading the file again, which can be done within the same data connection:

  • Unix-Based OS (Mac OS, Linux)

    • Confirm the existence of the BOM character by performing the following shell command within a terminal session: cat /path/to/your/input.csv | head

    • The highlighted characters <U+FEFF> (or something similar for different UTF encoding), will appear at the beginning of the file, which notifies the presence of a BOM character. The current shell command can be exited by pressing q on the terminal window.

    • Alternatively for a more precise diagnosis, perform the following shell command. Depending on the encoding of the file the set of BOM characters that will appear are different. Refer to the BOM CharacterSequences section below for more information.

      head -n 1 myfile.txt | od -t x1 | head -n 1 | awk '{print $2,$3,$4}'
    • To remove BOM characters within the terminal session, the following shell command can be performed:

      sed -i '1s/^\xEF\xBB\xBF//' /path/to/your/input.csv > /path/to/your/output.csv
  • Windows OS

    • Confirm the existence of the BOM character by performing one of the following PowerShell command within a Windows Powershell session:

      Get-Content inputFile.csv -Encoding Byte -TotalCount 3 # for UTF-8 and UTF-16
      Get-Content inputFile.csv -Encoding Byte -TotalCount 4 # for UTF-32
    • Depending on the encoding of the file the set of BOM characters that will appear are different. Referring to the table within section iv. BOM Character Sequences.

      • Either remove the BOM character with the following PowerShell command if the file is UTF-8 encoded, or re-export the file as UTF-8 without BOM.

        $inputFile = 'path\to\your\inputFile.csv' $outputFile = 'path\to\your\outputFile.csv' 
        $encoding = New-Object System.Text.UTF8Encoding $false 
        [System.IO.File]::WriteAllLines($outputFile, (Get-Content $inputFile), $encoding)
    • Re-upload the fixed file to the same data connection location. You may need to adjust the date folder based on the current UTC date.

BOM Character Sequences

Encoding

Windows Sequence

UNIX Sequence

UTF-8

239 187 191
ef bb bf

UTF-16 (Little Endian)

255 254
ff fe

UTF-32 (Little Endian)

254 255
fe ff

UTF-32 (Little Endian)

255 254 0 0
ff fe 00 00

UTF-32 (Big Endian)

0 0 254 255
00 00 fe ff

Formatting Datasets

See the sections below for information on the recommended schema and format for each dataset type you might need to connect to LiveRamp Clean Room. 

While not all fields might be required for your current workflows and collaborations, we recommend you include as many of these additional fields now to make it easier for you to expand your collaborations in the future without you (or your IT team) having to perform additional work on your datasets.

The recommended field names shown aren’t mandatory, but using the recommended names can reduce the number of transformations you and your collaborators might need to do during dataset provisioning or query writing.

All data files or tables in a data connection job must have the same schema in order to successfully process. If you have multiple types of schemas, create a new data connection for each one. The order of the columns should match the schema.

Format a Universe Dataset

The universe dataset should represent your full audience and should include all user identifiers (PII touchpoints or online identifiers) that will be used during identity resolution to resolve to RampIDs. 

LiveRamp uses this dataset to create a mapping between your CIDs and their associated RampIDs. This mapping lives in a linked dataset and allows you to use RampIDs as the join key between the various datasets in queries.

For information on formatting and hashing identifiers, see “Formatting Identifiers”.

Note

  • When sending PII, it’s important that as many PII touchpoints as possible are provided for LiveRamp’s identity resolution capabilities to yield the best results.

  • Your CRM dataset might also be able to function as a universe dataset.

Field Contents

Recommended Field Name

Field Type

Values Required?

Description/Notes

A unique user ID

cid

string

Yes

  • LiveRamp uses the values in this field to resolve your data to RampIDs.

  • Plaintext CIDs are preferred. If you choose to hash the CIDs, make sure to use the same hashing type when sending CIDs in other data files or tables.

Consumer’s first name

first_name

string

Yes (if Name and Postal is used as an identifier)

Consumer’s last name

last_name

string

Yes (if Name and Postal is used as an identifier)

Consumer’s address

address_1

string

Yes (if Name and Postal is used as an identifier)

Consumer’s additional address information

address_2

string

No

  • Include values in this column if you have additional street address info for a given row.

Consumer’s city

city

string

Yes (if Name and Postal is used as an identifier)

Consumer’s state

state

string

Yes (if Name and Postal is used as an identifier)

  • Must be a two-character, capitalized abbreviation ("CA", not "California" or "Ca").

Consumer’s ZIP Code or postal code

zip

string

Yes (if Name and Postal is used as an identifier)

  • ZIP codes can be in 5-digit format or 9-digit format (ZIP+4).

Consumer’s best email address

email_1 

string

Yes (if email is used as an identifier)

  • Can be plaintext or one of our allowed hash types (SHA-256, MD5, or SHA-1).

  • If you have multiple emails for a consumer, send your best one in the “email_1” column.

Consumer’s additional email address

email_2

string

No

  • Can be plaintext or one of our allowed hash types (SHA-256, MD5, or SHA-1).

Consumer’s additional email address

email_3

string

No

  • Can be plaintext or one of our allowed hash types (SHA-256, MD5, or SHA-1).

Consumer’s additional email address

email_4

string

No

  • Can be plaintext or one of our allowed hash types (SHA-256, MD5, or SHA-1).

Consumer’s best phone number

phone_1

string

Yes (if phone is used as an identifier)

  • Do not include any hyphens or parentheses.

  • Can be plaintext or SHA-1 hashed.

  • If you have multiple phone numbers for a consumer, send your best one in the “phone_1” column.

Consumer’s additional phone number

phone_2

string

No

  • Do not include any hyphens or parentheses.

  • Can be plaintext or SHA-1 hashed.

Consumer's mobile device ID (MAID)

maid

string

Yes (if MAIDs are used as identifiers)

  • Can be plaintext or SHA-1 hashed.

Format a CRM Dataset

Your CRM dataset should contain a CID for each consumer, as well as attribute (segment) data fields.

Field Contents

Recommended Field Name

Field Type

Values Required?

Description/Notes

A unique user ID

cid

string

Yes

  • LiveRamp uses the values in this field to resolve your data to RampIDs.

  • If you sent plaintext CIDs in your universe dataset, send MD5-hashed CIDs.

  • If you hashed the CIDs in your universe dataset, use the same hashing type.

Consumer attribute category

<User Attribute 1>

varies

No

  • An attribute category, such as “Age”.

  • Use the appropriate field type for the data.

Consumer attribute category

<User Attribute 2>

varies

No

  • An attribute category, such as “Age”.

  • Use the appropriate field type for the data.

Consumer attribute category

<User Attribute 3>

varies

No

  • An attribute category, such as “Age”.

  • Use the appropriate field type for the data.

Consumer attribute category

<User Attribute 4>

varies

No

  • An attribute category, such as “Age”.

  • Use the appropriate field type for the data.

Format a Conversions Dataset

The conversions dataset should include information on the desired conversions (such as transactions or downloads).

Note

If you do not have values for any fields in this schema, we recommend that you still include those fields as placeholders.

Field Contents

Recommended Field Name

Field Type

Values Required?

Description/Notes

A unique user ID

cid

string

Yes

  • LiveRamp uses the values in this field to resolve your data to RampIDs.

  • If you sent plaintext CIDs in your universe dataset, send MD5-hashed CIDs.

  • If you hashed the CIDs in your universe dataset, use the same hashing type.

Whether the consumer is a loyalty customer

is_card_holder

integer

No

  • Use “1” to indicate whether the user is a loyalty customer and use “0” if they are not a loyalty customer.

  • Values are required for this field for Retail Intelligence.

Country where the conversion occurred

country_id

string

No

  • Enter the 3-digit ISO 3166 country code for the country where the conversion occurred (for example, “USA”).

Unique identifier for the transaction

transaction_id

string

Yes

UTC timestamp of the conversion event

conversion_timestamp

timestamp

No

  • Values are required for this field for Retail Intelligence.

Unique identifier for SKU or product

product_id

string

Yes

(Foreign Key for Product Table)

Barcode number

barcode

string

No

  • Can be EAN, GTIN, UPC, PiD or other barcode types

  • Use the same barcode type for all values

  • Values are required for this field for Retail Intelligence.

Unique identifier for the sales channel

store_id

string

No

  • Unique identifier for sales channel (Foreign key to location)

  • Values are required for this field for Retail Intelligence and Cross-Media Intelligence

Unique identifier for a line within a receipt

trans_line_number

string

No

Units of a particular SKU/product ID in transaction

trans_line_quantity

int

No

  • Values are required for this field for Retail Intelligence, Cross-Media Intelligence, and Meta TEE

Value of a particular SKU/product ID in transaction

trans_line_value

numeric

No

  • What the customer has paid in reality

  • Values are required for this field for Retail Intelligence.

Line-level discounts

trans_line_disc_value

numeric

No

Whether this line represents a promotion product

trans_line_promo

integer

No

  • Use “1” to indicate that the line represents a promotion product and use “0” if it does not.

  • Values are required for this field for Retail Intelligence.

Total number of items in an order/basket

trans_total_quantity

integer

No

Total value of order/basket

trans_total_value

numeric

No

Total discounts

trans_total_disc_value

numeric

No

Whether this line represents returned product

return_flag

integer

No

  • Use “1” to indicate that the line represents a returned product and use “0” if it does not.

Iso currency code

currency

string

No

  • Values are required for this field for Retail Intelligence.

USD rate

currency_rate

numeric

No

  • Values are required for this field for Retail Intelligence.

Units*Unit price of a particular SKU/product ID in transaction

gross_amt

double

Yes

Timestamp of the conversion event

conversion_timestamp

timestamp

Yes

Name of the SKU or product

product_name

string

No

  • Values required for Cross-Media Intelligence and Meta TEE.

Where the transaction took place

sales_channel

string

No

  • Such as in-app, web, or in-store.

  • Values required for Cross-Media Intelligence and Meta TEE.

Product categorization

division

string

No

  • More details should be provided in the product dimensions dataset.

  • Values required for Cross-Media Intelligence.

Brand name of the product

brand_name

string

No

  • Values required for Cross-Media Intelligence and Meta TEE.

Format an Exposures Dataset

An exposures dataset should Include data on impressions served or other ad events, along with metadata on the impressions or events.

Note

If you do not have values for any fields in this schema, we recommend that you still include those fields as placeholders.

Field Contents

Recommended Field Name

Field Type

Values Required?

Description/Notes

A unique user ID

cid

string

Yes

  • LiveRamp uses the values in this field to resolve your data to RampIDs.

  • If you sent plaintext CIDs in your universe dataset, send MD5-hashed CIDs.

  • If you hashed the CIDs in your universe dataset, use the same hashing type.

UTC timestamp of the event

event_timestamp

timestamp

Yes

  • (such as “2025-09-18T02:33:03Z”).

Unique ID for the creative served

creative_id

string

No

  • Values are required for this field for Meta TEE.

Name of the creative

creative_name

string

No

Unique ID for the placement, ad group, adSet or line item ID

placement_id

string

No

  • Values are required for this field for Meta TEE.

Name of the placement, ad group, or line item

placement_name

string

No

  • This is used as the most granular reporting metric, such as a group of impressions that is unique to a property or creative.

  • Values are required for this field for Meta TEE.

Unique ID for the campaign

campaign_id

string

No

  • Values are required for this field for Meta TEE.

Name of the campaign

campaign_name

string

No

  • Values are required for this field for Meta TEE.

Unique ID of the publisher or site where the exposure occurred

property_id

string

No

Location on a publisher site, the channel, or media type where the exposure occurred

property_name

string

No

Segment ID that the user belonged to at the time of exposure

segment_id

string

No

Unique ID for the advertiser account

account_id

string

No

  • Values are required for this field for Meta TEE.

Name of the advertiser account

account_name

string

No

  • Values are required for this field for Meta TEE.

Unique ID for the brand

advertiser_id

string

No

Name of the brand

advertiser_name

string

No

  • Values required for Quick-Start Insights, Cross-Media Intelligence, and Meta TEE.

Type of placement or line item

placement_type

string

No

Type of event

event_type

string

No

  • Possible values are “Impressions”, “Click”, or “View”.

  • Values required for Quick-Start Insights.

Cost of the event to the advertiser

cost

double

No

  • Values required for Quick-Start Insights.

Unique identifier for the event

event_id 

string

No

  • This field is very useful when CID to RampID mapping results in a 1 to many.

  • Values required for Quick-Start Insights and Cross-Media Intelligence

Format a Publisher Audience Dataset

For publishers, a publisher audience dataset includes data on user segments in your platform.

Field Contents

Recommended Field Name

Field Type

Values Required?

Description/Notes

A unique user ID

cid

string

Yes

  • LiveRamp uses the values in this field to resolve your data to RampIDs.

  • If you sent plaintext CIDs in your universe dataset, send MD5-hashed CIDs.

  • If you hashed the CIDs in your universe dataset, use the same hashing type.

Unique identifier of audience segment used for targeting this impression

segment_id

string

The name of the segment

segment_name

string

The name of the category

segment_category

string

Format a Product Dimensions Dataset

For Clean Room users with products, a product dimensions dataset includes metadata on your products.

Field Contents

Recommended Field Name

Field Type

Values Required?

Description/Notes

Product ID

product_id

string

Yes

A 1toN relationship between product_id and barcode is accepted (ex: product_id 123 linked to: Barcode ABC).

Barcode number

barcode

string

Yes

  • Can be EAN, GTIN, UPC, PiD or other barcode types

  • Use the same barcode type for all values

The product name

product_name

string

Yes

The product description

product_desc

string

No

country_id

string

No

  • The 3-digit ISO 3166 country code where the conversion occurred (for example, “USA”).

  • Values required for Retailer Insights

The name of the manufacturer

supplier_name

string

Yes

Should be the name of the company manufacturing the brand/item, not the potential 3rd party supplier).

supplier_key

integer

No

The name of brand

brand_name

string

Yes

  • For example, ‘Always’, ‘Nutella’, etc.)

  • Not the supplier name,

brand_key

integer

No

i.e. brand_id (unique identifier for product brand)

The name of the first-level hierarchy category this product belongs to

hierarchy_level1_name

string

Yes

Such as the business group

The name of the second-level hierarchy category this product belongs to

hierarchy_level2_name

string

Yes

Such as the sector

The name of the third-level hierarchy category this product belongs to

hierarchy_level3_name

string

Yes

Such as the department

The name of the fourth-level hierarchy category this product belongs to

hierarchy_level4_name

string

Yes

Such as the class

The name of the fifth-level hierarchy category this product belongs to

hierarchy_level5_name

string

Yes

Such as the category (product category)

The name of the sixth-level hierarchy category this product belongs to

hierarchy_level6_name

string

Yes

Such as the subcategory (product sub category)

Need unit of product

need_unit

string

No

Whether this is a discontinued product

discontinued_flag

integer

No

  • Impact if NA.

  • Use “1” to indicate that the line represents a discontinued product and use “0” if it does not.

  • If this field is included, values are required.

Whether this is a white-label product

is_white_label

integer

No

  • own retailer’s item (needed if white label is to be analyzed or obfuscated).

  • Use “1” to indicate that the line represents a white label product and use “0” if it does not.

  • If this field is included, values are required.

Whether this is a promotion product

is_promo

integer

No

  • is a promotional pack - needed to distinguish promo EAN from shelf EAN

  • Use “1” to indicate that the line represents a promotion product and use “0” if it does not.

  • If this field is included, values are required.

Format a Store Dimensions Dataset

For Clean Room users with stores, a store dimensions dataset includes metadata on your stores.

Field Contents

Recommended Field Name

Field Type

Values Required?

Description/Notes

Main identifier for stores

store_main_id

string

No

  • Values required for Retail Insights

Unique identifier for sales channel

store_id

string

No

  • Such as the location ID

  • Values required for Retailer Insights

Store name

store

string

No

  • Values required for Retailer Insights

Country of store

country_id

string

No

  • Values required for Retailer Insights

Store region

store_region

string

No

Default use state, or official region name for store

store_state

string

No

(or store DMA)

City of store

store_city

string

No

Store's postal code

postal_code

string

No

Whether the store is closed

discontinued_flag

integer

No

  • Use “1” to indicate that the store is closed  and use “0” if the store is not closed.

Channel type

channel_type

string

No

  • Such as “In-store” or “e-commerce” (i.e. store_type)

  • Values required for Retailer Insights

Channel name

channel_name

string

No

  • Ex: Hypermarket, Supermarket, Convenience store, Franchise, Concession

  • Values required for Retailer Insights

Delivery type

delivery_type

string

No

  • Retailer trading company

  • Values required for Retailer Insights

Latitude of store location

latitude

decimal

No

  • Use ”0” for online stores

  • Values required for Retailer Insights

Longitude of store location

longitude

decimal

No

  • Use ”0” for online stores

  • Values required for Retailer Insights