Format Your Data

See the sections below for information on formatting your data, including the recommended schema and format for each dataset type you might need to connect to LiveRamp Clean Room.

Note

If you’re utilizing LiveRamp Embedded Identity in your cloud environment, the formatting guidelines are somewhat different. For more information, see “Embedded Identity in Cloud Environments” in our Identity docs site.

While not all fields might be required for your current workflows and collaborations, we recommend you include as many of these additional fields now to make it easier for you to expand your collaborations in the future without you (or your IT team) having to perform additional work on your datasets.

The recommended field names shown aren’t mandatory, but using the recommended names can reduce the number of transformations you and your collaborators might need to do during dataset provisioning or query writing.

All data files or tables in a data connection job must have the same schema in order to successfully process. If you have multiple types of schemas, create a new data connection for each one. The order of the columns should match the schema.

Recommended Dataset Types

When preparing data for collaboration, consider the following recommended dataset types:

Universe dataset: This represents your full audience and likely includes all user identifiers (PII touchpoints or online identifiers) that will be used to resolve your data to RampIDs.
Note
For brands, you might consider your CRM dataset to be your “universe” of consumers.
CRM dataset: Includes attribute (segment) data on consumers.
Conversions dataset: Includes data on conversions, such as transactions.
Impressions dataset: Includes data on impressions served or other ad events, along with metadata on the impressions or events.
Publisher audience dataset: For publishers, this includes data on user segments in your platform.
Product dimensions dataset: For Clean Room users with products, this includes metadata on your products.
Store dimensions dataset: For Clean Room users with stores, this includes metadata on your stores.

With the exception of datasets for product and store dimensions, all datasets should include a field for a CID (a unique customer identifier you leverage for deduplicating individuals across your datasets):

For PII-based universe datasets, we recommend that plaintext CIDs (not hashed) be sent (if you must hash the CIDs, we ask that you consistently use the same hashing for CIDs in other datasets for interoperability).
For all other datasets (including universe datasets based on online identifiers), if you sent plaintext CIDs in your universe dataset, send MD5-hashed CIDs. If you hashed the CIDs in your universe dataset, use the same hashing type.

Note

If you use a combination of LiveRamp (SFTP) and Embedded Identity (Cloud Hosting), or if you send different dataset types, you must ensure that any hashed data is hashed using MD5 hashing in all files.

General Formatting Guidelines

When formatting your data before creating data connections in LiveRamp Clean Room, keep the following guidelines in mind:

CSV and Parquet files supported: For file-based data connections, LiveRamp Clean Room supports CSV and Parquet files, as well as multi-part files.
File size:
- Parquet files and CSV files for cloud-based data connections should not be larger than 500 MB.
- Files for CSV uploads should not be larger than 50 MB.
Use UTF-8 character encoding: To ensure the greatest match rate between collaborator datasets, we recommend using UTF-8 character encoding (including before hashing identifiers).
Identifiers: Each row in a file or table in a data connection job must contain identifiers. LiveRamp Clean Room supports multiple identifier types, including the identifiers listed in “Formatting Identifiers”. If your file or table does not include an identifier field, add a new field and insert a random ID for every record. This pseudo field must be mapped as the user identifier during the data setup.
Hashing identifiers: It is best practice to select a hashing algorithm which matches the NIST standard and results in a final hash value that uses lowercase characters. For more information on hashing identifiers, see "Hashing Identifiers".
File headers: All files must have a header in the first row. The name of each header column in a dataset file constitutes an attribute, which is a key for each record in the dataset. For more information on formatting headers, see the “Standard Dataset Header Guidelines” section below
Delimiters: If you’re using CSV files, delimiters must be specified.
Remove special characters from CSV files: When using CSV files, remove any special characters from your data files. For example, if your file contains integers formatted with commas (i.e. 123,456), remove the comma formatting (i.e. 123456).

Date formatting: The folder where the data files or tables are dropped includes a date macro - {yyyy-MM-dd}. Replace the macro with the appropriate date string when the files are dropped without the curly brackets. LiveRamp Clean Room does not accept date time strings.

Guidelines for Standard Dataset Headers, File Names, and Folder Names

Make sure to follow the guidelines listed below for file and table names, folder names, and headers for files and tables for standard dataset types (such as CRM, transaction, or ad log). File and table names, folder names, and headers must meet these guidelines:

Start with a letter (and not start with a special character, such as an underscore, or digit)
End with either a letter or a digit
Not contain special characters (hyphens (-) and underscores (_ ) are allowed)
Not contain whitespaces, such as spaces and control characters ( such as form feed \f, c arriage return \r, line feed/new line \n, or t ab \t)
Not contain duplicate header names within the same data connection
Not be longer than 255 characters

Here are a few examples of valid names (header examples shown):

Header1
header_name
HEADER-NAME
HEADER___NAME__HERE-1

Here are a few examples of invalid names (header examples shown):

123header: Starts with non-alphabet
header123_: Ends with non-alphanumeric
header@name: Contains special characters
header name: Contains spaces
header\nname: Contains non-space white spaces
\uFEFFHeaderName: Contains a BOM character

Managing BOM Characters

BOM (Byte Order Mark) characters can occur in the header of a CSV file due to the file initially being saved or written out with encoding that includes the BOM, often UTF-8 with BOM.

When a file is saved with UTF-8 encoding, some systems prepend the contents with BOM to indicate that it's a UTF-8 file. BOM can cause problems when reading the file because it might be interpreted as part of the actual data (like the header), rather than metadata about the byte order and encoding.

If a BOM character is detected at the start of your file, an error will appear in the data connection within LiveRamp Clean Room. If this occurs, perform the following on your data before uploading the file again, which can be done within the same data connection:

Unix-Based OS (Mac OS, Linux)
- Confirm the existence of the BOM character by performing the following shell command within a terminal session: cat /path/to/your/input.csv | head
- The highlighted characters <U+FEFF> (or something similar for different UTF encoding), will appear at the beginning of the file, which notifies the presence of a BOM character. The current shell command can be exited by pressing q on the terminal window.
- Alternatively for a more precise diagnosis, perform the following shell command. Depending on the encoding of the file the set of BOM characters that will appear are different. Refer to the BOM CharacterSequences section below for more information.
```
head -n 1 myfile.txt | od -t x1 | head -n 1 | awk '{print $2,$3,$4}'
```
- To remove BOM characters within the terminal session, the following shell command can be performed:
```
sed -i '1s/^\xEF\xBB\xBF//' /path/to/your/input.csv > /path/to/your/output.csv
```

Windows OS
- Confirm the existence of the BOM character by performing one of the following PowerShell command within a Windows Powershell session:
```
Get-Content inputFile.csv -Encoding Byte -TotalCount 3 # for UTF-8 and UTF-16
Get-Content inputFile.csv -Encoding Byte -TotalCount 4 # for UTF-32
```
- Depending on the encoding of the file the set of BOM characters that will appear are different. Referring to the table within section iv. BOM Character Sequences.
  - Either remove the BOM character with the following PowerShell command if the file is UTF-8 encoded, or re-export the file as UTF-8 without BOM.
    $inputFile = 'path\to\your\inputFile.csv' $outputFile = 'path\to\your\outputFile.csv' $encoding = New-Object System.Text.UTF8Encoding $false [System.IO.File]::WriteAllLines($outputFile, (Get-Content $inputFile), $encoding)
- Re-upload the fixed file to the same data connection location. You may need to adjust the date folder based on the current UTC date.

BOM Character Sequences

Encoding	Windows Sequence	UNIX Sequence
UTF-8	239 187 191	ef bb bf
UTF-16 (Little Endian)	255 254	ff fe
UTF-32 (Little Endian)	254 255	fe ff
UTF-32 (Little Endian)	255 254 0 0	ff fe 00 00
UTF-32 (Big Endian)	0 0 254 255	00 00 fe ff

Formatting Datasets

See the sections below for information on the recommended schema and format for each dataset type you might need to connect to LiveRamp Clean Room.

Format a Universe Dataset

The universe dataset should represent your full audience and should include all user identifiers (PII touchpoints or online identifiers) that will be used during identity resolution to resolve to RampIDs.

LiveRamp uses this dataset to create a mapping between your CIDs and their associated RampIDs. This mapping lives in a linked dataset and allows you to use RampIDs as the join key between the various datasets in queries.

For information on formatting and hashing identifiers, see “Formatting Identifiers”.

Note

When sending PII, it’s important that as many PII touchpoints as possible are provided for LiveRamp’s identity resolution capabilities to yield the best results.
Your CRM dataset might also be able to function as a universe dataset.

Field Contents	Recommended Field Name	Field Type	Values Required?	Description/Notes
A unique user ID	cid	string	Yes	LiveRamp uses the values in this field to resolve your data to RampIDs. Plaintext CIDs are preferred. If you choose to hash the CIDs, make sure to use the same hashing type when sending CIDs in other data files or tables.
Consumer’s first name	first_name	string	Yes (if Name and Postal is used as an identifier)
Consumer’s last name	last_name	string	Yes (if Name and Postal is used as an identifier)
Consumer’s address	address_1	string	Yes (if Name and Postal is used as an identifier)
Consumer’s additional address information	address_2	string	No	Include values in this column if you have additional street address info for a given row.
Consumer’s city	city	string	Yes (if Name and Postal is used as an identifier)
Consumer’s state	state	string	Yes (if Name and Postal is used as an identifier)	Must be a two-character, capitalized abbreviation ("CA", not "California" or "Ca").
Consumer’s ZIP Code or postal code	zip	string	Yes (if Name and Postal is used as an identifier)	ZIP codes can be in 5-digit format or 9-digit format (ZIP+4).
Consumer’s best email address	email_1	string	Yes (if email is used as an identifier)	Can be plaintext or one of our allowed hash types (SHA-256, MD5, or SHA-1). If you have multiple emails for a consumer, send your best one in the “email_1” column.
Consumer’s additional email address	email_2	string	No	Can be plaintext or one of our allowed hash types (SHA-256, MD5, or SHA-1).
Consumer’s additional email address	email_3	string	No	Can be plaintext or one of our allowed hash types (SHA-256, MD5, or SHA-1).
Consumer’s additional email address	email_4	string	No	Can be plaintext or one of our allowed hash types (SHA-256, MD5, or SHA-1).
Consumer’s best phone number	phone_1	string	Yes (if phone is used as an identifier)	Do not include any hyphens or parentheses. Can be plaintext or SHA-1 hashed. If you have multiple phone numbers for a consumer, send your best one in the “phone_1” column.
Consumer’s additional phone number	phone_2	string	No	Do not include any hyphens or parentheses. Can be plaintext or SHA-1 hashed.
Consumer's mobile device ID (MAID)	maid	string	Yes (if MAIDs are used as identifiers)	Can be plaintext or SHA-1 hashed.

Format a CRM Dataset

Your CRM dataset should contain a CID for each consumer, as well as attribute (segment) data fields.

Field Contents	Recommended Field Name	Field Type	Values Required?	Description/Notes
A unique user ID	cid	string	Yes	LiveRamp uses the values in this field to resolve your data to RampIDs. If you sent plaintext CIDs in your universe dataset, send MD5-hashed CIDs. If you hashed the CIDs in your universe dataset, use the same hashing type.
Consumer attribute category	<User Attribute 1>	varies	No	An attribute category, such as “Age”. Use the appropriate field type for the data.
Consumer attribute category	<User Attribute 2>	varies	No	An attribute category, such as “Age”. Use the appropriate field type for the data.
Consumer attribute category	<User Attribute 3>	varies	No	An attribute category, such as “Age”. Use the appropriate field type for the data.
Consumer attribute category	<User Attribute 4>	varies	No	An attribute category, such as “Age”. Use the appropriate field type for the data.

Format a Conversions Dataset

The conversions dataset should include information on the desired conversions (such as transactions or downloads).

Note

If you do not have values for any fields in this schema, we recommend that you still include those fields as placeholders.

Field Contents	Recommended Field Name	Field Type	Values Required?	Description/Notes
A unique user ID	cid	string	Yes	LiveRamp uses the values in this field to resolve your data to RampIDs. If you sent plaintext CIDs in your universe dataset, send MD5-hashed CIDs. If you hashed the CIDs in your universe dataset, use the same hashing type.
Whether the consumer is a loyalty customer	is_card_holder	integer	No	Use “1” to indicate whether the user is a loyalty customer and use “0” if they are not a loyalty customer. Values are required for this field for Retail Intelligence Base Template.
Country where the conversion occurred	country_id	string	No	Enter the 3-digit ISO 3166 country code for the country where the conversion occurred (for example, “USA”).
Unique identifier for the transaction	transaction_id	string	Yes
Unique identifier for SKU or product	product_id	string	Yes	(Foreign Key for Product Table)
Barcode number	barcode	string	No	Can be EAN, GTIN, UPC, PiD or other barcode types Use the same barcode type for all values
Unique identifier for the sales channel	store_id	string	No	Unique identifier for sales channel (Foreign key to location) Values are required for this field for Retail Intelligence Base Template.
Unique identifier for a line within a receipt	trans_line_number	string	No
Units of a particular SKU/product ID in transaction	trans_line_quantity	int	No	Values are required for this field for Retail Intelligence Base Template, Cross-Media Intelligence, and Meta TEE
Value of a particular SKU/product ID in transaction	trans_line_value	numeric	No	What the customer has paid in reality Values are required for this field for Retail Intelligence Base Template.
Line-level discounts	trans_line_disc_value	numeric	No	Values are required for this field for Retail Intelligence Base Template.
Whether this line represents a promotion product	trans_line_promo	integer	No	Use “1” to indicate that the line represents a promotion product and use “0” if it does not. Values are required for this field for Retail Intelligence Base Template.
Total number of items in an order/basket	trans_total_quantity	integer	No
Total value of order/basket	trans_total_value	numeric	No
Total discounts	trans_total_disc_value	numeric	No
Whether this line represents returned product	return_flag	integer	No	Use “1” to indicate that the line represents a returned product and use “0” if it does not.
Iso currency code	currency	string	No
USD rate	currency_rate	numeric	No
Units*Unit price of a particular SKU/product ID in transaction	gross_amt	double	Yes	Values are required for this field for Retail Intelligence Base Template, Cross-Media Intelligence, and Meta TEE
Timestamp of the conversion event	conversion_timestamp	timestamp	Yes	Values are required for this field for Retail Intelligence Base Template, Cross-Media Intelligence, and Meta TEE Retail Intelligence Base Templates only need a date. Cross-Media Intelligence and Meta TEE need a timestamp.
Name of the SKU or product	product_name	string	No
Where the transaction took place	sales_channel	string	No	Such as in-app, web, or in-store. Values required for Meta TEE.
Product categorization	division	string	No	More details should be provided in the product dimensions dataset.
Brand name of the product	brand_name	string	No	Values required for Cross-Media Intelligence and Meta TEE.

Format an Exposures Dataset

An exposures dataset should Include data on impressions served or other ad events, along with metadata on the impressions or events.

Note

If you do not have values for any fields in this schema, we recommend that you still include those fields as placeholders.

Field Contents	Recommended Field Name	Field Type	Values Required?	Description/Notes
A unique user ID	cid	string	Yes	LiveRamp uses the values in this field to resolve your data to RampIDs. If you sent plaintext CIDs in your universe dataset, send MD5-hashed CIDs. If you hashed the CIDs in your universe dataset, use the same hashing type.
UTC timestamp of the event	event_timestamp	timestamp	Yes	(such as “2025-09-18T02:33:03Z”).
Unique ID for the creative served	creative_id	string	No	Values are required for this field for Meta TEE.
Name of the creative	creative_name	string	No
Unique ID for the placement, ad group, adSet or line item ID	placement_id	string	No	Values are required for this field for Meta TEE.
Name of the placement, ad group, or line item	placement_name	string	No	This is used as the most granular reporting metric, such as a group of impressions that is unique to a property or creative. Values are required for this field for Meta TEE.
Unique ID for the campaign	campaign_id	string	No	Values are required for this field for Meta TEE.
Name of the campaign	campaign_name	string	No	Values are required for this field for Meta TEE.
Unique ID of the publisher or site where the exposure occurred	property_id	string	No
Location on a publisher site, the channel, or media type where the exposure occurred	property_name	string	No
Segment ID that the user belonged to at the time of exposure	segment_id	string	No
Unique ID for the advertiser account	account_id	string	No	Values are required for this field for Meta TEE.
Name of the advertiser account	account_name	string	No	Values are required for this field for Meta TEE.
Unique ID for the brand	advertiser_id	string	No
Name of the brand	advertiser_name	string	No	Values required for Quick-Start Insights, Cross-Media Intelligence, and Meta TEE.
Type of placement or line item	placement_type	string	No
Type of event	event_type	string	No	Possible values are “Impressions”, “Click”, or “View”. Values required for Quick-Start Insights.
Cost of the event to the advertiser	cost	double	No	Values required for Quick-Start Insights.
Unique identifier for the event	event_id	string	No	This field is very useful when CID to RampID mapping results in a 1 to many. Values required for Quick-Start Insights and Cross-Media Intelligence

Format a Publisher Audience Dataset

For publishers, a publisher audience dataset includes data on user segments in your platform.

Field Contents	Recommended Field Name	Field Type	Values Required?	Description/Notes
A unique user ID	cid	string	Yes	LiveRamp uses the values in this field to resolve your data to RampIDs. If you sent plaintext CIDs in your universe dataset, send MD5-hashed CIDs. If you hashed the CIDs in your universe dataset, use the same hashing type.
Unique identifier of audience segment used for targeting this impression	segment_id	string
The name of the segment	segment_name	string
The name of the category	segment_category	string

Format a Product Dimensions Dataset

For Clean Room users with products, a product dimensions dataset includes metadata on your products.

Field Contents	Recommended Field Name	Field Type	Values Required?	Description/Notes
Product ID	product_id	string	Yes	A 1toN relationship between product_id and barcode is accepted (ex: product_id 123 linked to: Barcode ABC).
Barcode number	barcode	string	Yes	Can be EAN, GTIN, UPC, PiD or other barcode types Use the same barcode type for all values
The product name	product_name	string	Yes
The product description	product_desc	string	No
	country_id	string	No	The 3-digit ISO 3166 country code where the conversion occurred (for example, “USA”). Values required for Retailer Insights
The name of the manufacturer	supplier_name	string	Yes	Should be the name of the company manufacturing the brand/item, not the potential 3rd party supplier).
	supplier_key	integer	No
The name of brand	brand_name	string	Yes	For example, ‘Always’, ‘Nutella’, etc.) Not the supplier name,
	brand_key	integer	No	i.e. brand_id (unique identifier for product brand)
The name of the first-level hierarchy category this product belongs to	hierarchy_level1_name	string	Yes	Such as the business group
The name of the second-level hierarchy category this product belongs to	hierarchy_level2_name	string	Yes	Such as the sector
The name of the third-level hierarchy category this product belongs to	hierarchy_level3_name	string	Yes	Such as the department
The name of the fourth-level hierarchy category this product belongs to	hierarchy_level4_name	string	Yes	Such as the class
The name of the fifth-level hierarchy category this product belongs to	hierarchy_level5_name	string	Yes	Such as the category (product category)
The name of the sixth-level hierarchy category this product belongs to	hierarchy_level6_name	string	Yes	Such as the subcategory (product sub category)
Need unit of product	need_unit	string	No
Whether this is a discontinued product	discontinued_flag	integer	No	Impact if NA. Use “1” to indicate that the line represents a discontinued product and use “0” if it does not. If this field is included, values are required.
Whether this is a white-label product	is_white_label	integer	No	own retailer’s item (needed if white label is to be analyzed or obfuscated). Use “1” to indicate that the line represents a white label product and use “0” if it does not. If this field is included, values are required.
Whether this is a promotion product	is_promo	integer	No	is a promotional pack - needed to distinguish promo EAN from shelf EAN Use “1” to indicate that the line represents a promotion product and use “0” if it does not. If this field is included, values are required.

Format a Store Dimensions Dataset

For Clean Room users with stores, a store dimensions dataset includes metadata on your stores.

Field Contents	Recommended Field Name	Field Type	Values Required?	Description/Notes
Main identifier for stores	store_main_id	string	No	Values required for Retail Insights
Unique identifier for sales channel	store_id	string	No	Such as the location ID Values required for Retailer Insights
Store name	store	string	No	Values required for Retailer Insights
Country of store	country_id	string	No	Values required for Retailer Insights
Store region	store_region	string	No
Default use state, or official region name for store	store_state	string	No	(or store DMA)
City of store	store_city	string	No
Store's postal code	postal_code	string	No
Whether the store is closed	discontinued_flag	integer	No	Use “1” to indicate that the store is closed and use “0” if the store is not closed.
Channel type	channel_type	string	No	Such as “In-store” or “e-commerce” (i.e. store_type) Values required for Retailer Insights
Channel name	channel_name	string	No	Ex: Hypermarket, Supermarket, Convenience store, Franchise, Concession Values required for Retailer Insights
Delivery type	delivery_type	string	No	Retailer trading company Values required for Retailer Insights
Latitude of store location	latitude	decimal	No	Use ”0” for online stores Values required for Retailer Insights
Longitude of store location	longitude	decimal	No	Use ”0” for online stores Values required for Retailer Insights

In this section:

Format Your Data

Note

Recommended Dataset Types

Note

Note

General Formatting Guidelines

Guidelines for Standard Dataset Headers, File Names, and Folder Names

Managing BOM Characters

Formatting Datasets

Format a Universe Dataset

Note

Format a CRM Dataset

Format a Conversions Dataset

Note

Format an Exposures Dataset

Note

Format a Publisher Audience Dataset

Format a Product Dimensions Dataset

Format a Store Dimensions Dataset

Search results