Data Connection Requirements and Best Practices

Whether you host your own file storage location for data connection jobs or use a location hosted by LiveRamp Clean Room, get up and running faster by keeping these guidelines in mind when configuring and managing your data connections.

Generating File Storage Credentials

If LiveRamp Clean Room hosts your file drop location, you can generate credentials within the LiveRamp Clean Room UI by selecting Clean Room → Data Source Locations from the LiveRamp Clean Room navigation menu.

Supported File Types and Sizes

For file-based data connections, LiveRamp Clean Room supports CSV and Parquet files, as well as multi-part files.

Note

When creating a csv file, we recommend that you use Google Sheets to create the file raher than Excel. When Excel converts the file to .csv, it purposely adds in "ghost characters" or a BOM (byte order mark), which will cause an error during the data connection process.

Parquet files and CSV files for cloud-based data connections should not be larger than 500 MB.
Files for CSV uploads should not be larger than 50 MB.

For datasets that will be used in Clean Compute questions, do not include more than 500 columns.

Use UTF-8 Character Encoding

To ensure the greatest match rate between collaborator datasets, we recommend using UTF-8 character encoding (including before hashing identifiers).

Date Formatting

The folder where the data files are dropped. We recommend that you use hive-style date formatting in the folder path: "date=yyyy-MM-dd".

Identifiers

Each row in a file or table in a data connection job must contain identifiers. LiveRamp Clean Room supports multiple identifier types, including the identifiers listed below:

PII (personally-identifiable information, such as name, address, phone, and email)
Mobile device IDs (MAID)
Custom IDs (CIDs)

For information on identifier formatting, see "Formatting Identifiers".

Most datasets should include a field for a CID (a unique customer identifier you leverage for deduping individuals across your datasets):

For PII-based universe datasets, we recommend that plaintext CIDs (not hashed) be sent (if you must hash the CIDs, we ask that you use MD5 hashing for interoperability).
For all other datasets (including universe datasets based on online identifiers), CIDs can be plaintext or hashed with any of our allowed hashing types (SHA-256, MD5, or SHA-1), but should be formatted consistently across datasets.

Note

Hashing Identifiers: It is best practice to select a hashing algorithm that matches the NIST standard and results in a final hash value that uses lowercase characters. For more information on hashing identifiers, see "Hashing Identifiers".
If your file or table does not include an identifier field, add a new field and insert a random ID for every record. This pseudo field must be mapped as the user identifier during the data setup.

Keep Schemas Consistent

All data files in a data connection job must have the same schema in order to be processed successfully. If you have multiple types of schemas, create a new data connection for each one. The order of the columns should match the schema.

Guidelines for Dataset Headers, File and Table Names, and Folder Names

All files and tables must have a header in the first row. The name of each header column in a dataset file or table constitutes an attribute, which is a key for each record in the dataset. Due to certain constraints imposed by different file sources, LiveRamp Clean Rooms guidelines ensure a dataset can be properly utilized. Below are guidelines for two types of datasets - standard and offline conversions. This section also covers BOM characters and how to handle them.

Guidelines for Standard Dataset Headers, File and Table Names, and Folder Names

Make sure to follow the guidelines listed below for file and table names, folder names, and headers for files and tables for standard dataset types (such as CRM, transaction, or ad log). File and table names, folder names, and headers must meet these guidelines:

Start with a letter (and not start with a special character, such as an underscore, or digit)
End with either a letter or a digit
Not contain special characters (hyphens (-) and underscores (_ ) are allowed)
Not contain whitespaces, such as spaces and control characters ( such as form feed \f, carriage return \r, line feed/new line \n, or t ab \t)
Not contain duplicate header names within the same data connection
Not be longer than 255 characters

Here are a few examples of valid names (header examples shown):

Header1
header_name
HEADER-NAME
HEADER___NAME__HERE-1

Here are a few examples of invalid names (header examples shown):

123header: Starts with non-alphabet
header123_: Ends with non-alphanumeric
header@name: Contains special characters
header name: Contains spaces
header\nname: Contains non-space white spaces
\uFEFFHeaderName: Contains a BOM character

Guidelines for Offline Conversion Dataset Headers

Offline conversion datasets enable the user to effortlessly pipe transactional conversion event data for activation purposes.

Start with a letter
End with a letter
Does not contain digits
Does not contain special characters
Does not contain non-space whitespace, such as spaces and control characters ( form feed \f, carriage Return \r, line feed/new line \n, t ab \t
Does not include consecutive occurrences of word separators, such as a combination of spaces, underscores _, or hyphens -
Does not contain duplicate header names within the same data connection

In addition to the header guidelines for offline conversions, the following attribute names are required for a successful offline conversion data file:

Attribute Name	Data Type	SHA256 Hash	Nullable
email	String	Required	No
first name	String	Required	Yes
last name	String	Required	Yes
city	String	Required	Yes
state	String	Required	Yes
zip	String	Required	Yes
country	String	Required	Yes
phone number	String	Required	Yes
conversion name	String	Do not hash	No
conversion time	Timestamp formatted as: `yyyy-MM-dd HH:mm:ss ZZZ` `yyyyMMdd HHmmss ZZZ`	Do not hash	No
conversion value	Double (No $)	Do not hash	No
conversion currency	String (ISO 4217)	Do not hash	No

The following optional attributes for offline conversions are useful to include if there is additional information you to attach to each conversion event:

Attribute Name	Data Type	SHA256 Hash	Nullable
conversion detail	String	Do not hash	Yes
client user agent	String	Do not hash	Yes
action source	String	Do not hash	Yes
order id	String	Optional	Yes
event id	String	Optional	Yes

Removing BOM Characters

Use the table below to determine what BOM characters might appear when you confirm the presence of BOM characters, depending on the type and OS type.

When a file is saved with UTF-8 encoding, some systems prepend the contents with BOM to indicate that it's a UTF-8 file. BOM can cause problems when reading the file because it might be interpreted as part of the actual data (like the header), rather than metadata about the byte order and encoding.

If a BOM character is detected in your file, an error will appear in the data connection within LiveRamp Clean Room. If this occurs, remove the BOM characters by perform the steps listed below, depending on your OS:

Unix-Based OS (Mac OS, Linux)
1. Confirm the presence of BOM characters by running the following shell command within a terminal session:
  cat /path/to/your/input.csv | head
  The highlighted characters <U+FEFF> (or something similar for different UTF encoding) will appear in the file, which indicates the presence of BOM characters. Refer to the "BOM Characters That Might Appear" section below for more information on the characters that might appear, depending on the encoding type.
  Note
  The current shell command can be exited by pressing q on the terminal window.
  Alternatively, for a more precise diagnosis, perform the following shell command. Depending on the encoding of the file, the set of BOM characters that will appear is different.
  head -n 1 myfile.txt | od -t x1 | head -n 1 | awk '{print $2,$3,$4}'
2. To remove BOM characters within the terminal session, run the following shell command:
```
sed -i '1s/^\xEF\xBB\xBF//' /path/to/your/input.csv > /path/to/your/output.csv
```
3. If you’ve already created a data connection for this file, re-upload the fixed file to the same data connection location. You may need to adjust the date folder based on the current UTC date.

Windows OS
1. Confirm the presence of BOMs character by performing one of the following PowerShell commands within a Windows PowerShell session:
```
Get-Content inputFile.csv -Encoding Byte -TotalCount 3 # for UTF-8 and UTF-16
Get-Content inputFile.csv -Encoding Byte -TotalCount 4 # for UTF-32
```
  Depending on the encoding of the file, the set of BOM characters that will appear is different. Refer to the "BOM Character Sequences" section below for more information on the characters that might appear, depending on the encoding type.
2. Either remove the BOM characters with the following PowerShell command if the file is UTF-8 encoded, or re-export the file as UTF-8 without BOM.
```
$inputFile = 'path\to\your\inputFile.csv' $outputFile = 'path\to\your\outputFile.csv' 
$encoding = New-Object System.Text.UTF8Encoding $false 
[System.IO.File]::WriteAllLines($outputFile, (Get-Content $inputFile), $encoding)
```
3. If you’ve already created a data connection for this file, re-upload the fixed file to the same data connection location. You may need to adjust the date folder based on the current UTC date.

BOM Characters That Might Appear

Use the table below to determine what BOM characters might appear when you confirm the presence of BOM characters, depending on the type and OS type.

Encoding Type	BOM Characters in Windows OS	BOM Characters in UNIX OS
UTF-8	239 187 191	ef bb bf
UTF-16 (Little Endian)	255 254	ff fe
UTF-32 (Little Endian)	254 255	fe ff
UTF-32 (Little Endian)	255 254 0 0	ff fe 00 00
UTF-32 (Big Endian)	0 0 254 255	00 00 fe ff

Formatting String Fields That Contain Commas and Double Quotes

When a CSV file has a string field that contains both double quotes (") and commas (,), the inner double quotes must be escaped correctly so the system treats the entire value as a single field. If the inner double quotes are not escaped as expected, the commas inside the field can be misinterpreted as delimiters (our system uses a comma as the default delimiter unless configured otherwise), causing the value to be split across multiple fields.

In our system, all inner double quotes must be escaped with a backslash (\). The entire field value must still be enclosed with an outer, non-escaped pair of double quotes.

For example, suppose you have the three fields, customer_id, date, and description, and that one row has the following values for those fields:

customer_id	date	description
2435898	2026-01-01	Lorem ipsum dolor sit amet, "consectetur adipiscing", elit

To represent this correctly in a CSV row for our system, use double quotes around the entire field value for the description field and escape the inner double quotes in that field with backslashes, as shown in the example below:

customer_id,2026-01-01,"Lorem ipsum dolor sit amet, \"consectetur adipiscing\", elit"

If you do not escape the inner quotes, the system can treat the commas inside the description as delimiters, splitting the third field into three separate fields:

Lorem ipsum dolor sit amet
"consectetur adipiscing"
elit

Note

While some CSV dialects use double double-quotes, our system does not use that format. Because the system expects backslash as the escape character, using double double-quotes (as shown below) can again cause the description to be split into multiple fields instead of being parsed as a single string.

customer_id,2026-01-01,"Lorem ipsum dolor sit amet, ""consectetur adipiscing"", elit"

Delimiters

If you are using the CSV file type, delimiters must be specified.

Integer Nuances

When using the CSV file type, remove any special characters from your data files. For example, if your file contains integers formatted with commas (i.e., 123,456), remove the comma formatting (i.e., 123456).

Using Date Macros in the File Path

When you use a date macro in the file path, you are telling LiveRamp to treat a date-based folder segment as a partition that controls which data is read at run time. For example:

Hive-style: gs://bucket/path/to/files/snap_dt={yyyy-MM-dd}/
Non–Hive style: gs://bucket/path/to/files/{yyyy-MM-dd}/

When you create the data connection, keep the following guidelines in mind::

The macro {yyyy-MM-dd} must resolve to at least one existing date folder so that a sample file can be discovered.
If you do not provide a sample file path, you must ensure there is data available within a date bucket in the last 7 days; otherwise, the Mapping step will fail.
When a data connection path includes a date macro, LiveRamp automatically creates a _data_connection_date field in the Mapping step and treats it as a partition column.

Question runs then use the following formats for partition parameters:

{data_connection}.data_connection_date_start
{data_connection}.data_connection_date_end

These parameters define the date window over which the data connection reads data:

In the UI, they appear as partition parameters you fill in when you start a new run.
For API-based runs, you must include these parameter names and values in the run request payload.

Note

We strongly recommend always specifying an explicit date range via the partition parameters to avoid scanning more data than intended (which can increase run times and cost).
@START_DATE and @END_DATE are no longer used to drive date-macro resolution for data connections:
- They are not wired to date macros in the file path.
- They do not control which date folders are read for the data connection.
- You may still use START_DATE / END_DATE as normal runtime parameters in your SQL (for example, to filter a fact table on its own date field), but:
- To filter the data read from a date-macro path, always rely on the _data_connection_date partition column and its corresponding {data_connection}.data_connection_date_start / {data_connection}.data_connection_date_end parameters, not on @START_DATE / @END_DATE.

For general guidance on setting up partitions and choosing partition columns, see “Partition a Dataset in LiveRamp Clean Rooms”.

Field Mapping

Once files are processed, in the field mapper, map each field as PII or non-PII.

In this section:

Data Connection Requirements and Best Practices

Generating File Storage Credentials

Supported File Types and Sizes

Note

Use UTF-8 Character Encoding

Date Formatting

Identifiers

Note

Keep Schemas Consistent

Guidelines for Dataset Headers, File and Table Names, and Folder Names

Guidelines for Standard Dataset Headers, File and Table Names, and Folder Names

Guidelines for Offline Conversion Dataset Headers

Removing BOM Characters

Note

BOM Characters That Might Appear

Formatting String Fields That Contain Commas and Double Quotes

Note

Delimiters

Integer Nuances

Using Date Macros in the File Path

Note

Field Mapping

Search results