Data Connection Requirements and Best Practices
Whether you host your own file storage location for data connection jobs or use a location hosted by LiveRamp Clean Room, get up and running faster by keeping these guidelines in mind when configuring and managing your data connections.
Generating File Storage Credentials
If LiveRamp Clean Room hosts your file drop location, you can generate credentials within the LiveRamp Clean Room UI by selecting Clean Room → Data Source Locations from the LiveRamp Clean Room navigation menu.
Supported File Types and Sizes
For file-based data connections, LiveRamp Clean Room supports CSV and Parquet files, as well as multi-part files.
Note
When creating a csv file, we recommend that you use Google Sheets to create the file raher than Excel. When Excel converts the file to .csv, it purposely adds in "ghost characters" or a BOM (byte order mark), which will cause an error during the data connection process.
Parquet files and CSV files for cloud-based data connections should not be larger than 500 MB.
Files for CSV uploads should not be larger than 50 MB.
For datasets that will be used in Clean Compute questions, do not include more than 500 columns.
Use UTF-8 Character Encoding
To ensure the greatest match rate between collaborator datasets, we recommend using UTF-8 character encoding (including before hashing identifiers).
Date Formatting
The folder where the data files are dropped. We recommend that you use hive-style date formatting in the folder path: "date=yyyy-MM-dd".
Identifiers
Each row in a file or table in a data connection job must contain identifiers. LiveRamp Clean Room supports multiple identifier types, including the identifiers listed below:
PII (personally-identifiable information, such as name, address, phone, and email)
Mobile device IDs (MAID)
Custom IDs (CIDs)
For information on identifier formatting, see "Formatting Identifiers".
Most datasets should include a field for a CID (a unique customer identifier you leverage for deduping individuals across your datasets):
For PII-based universe datasets, we recommend that plaintext CIDs (not hashed) be sent (if you must hash the CIDs, we ask that you use MD5 hashing for interoperability).
For all other datasets (including universe datasets based on online identifiers), CIDs can be plaintext or hashed with any of our allowed hashing types (SHA-256, MD5, or SHA-1), but should be formatted consistently across datasets.
Note
Hashing Identifiers: It is best practice to select a hashing algorithm that matches the NIST standard and results in a final hash value that uses lowercase characters. For more information on hashing identifiers, see "Hashing Identifiers".
If your file or table does not include an identifier field, add a new field and insert a random ID for every record. This pseudo field must be mapped as the user identifier during the data setup.
Keep Schemas Consistent
All data files in a data connection job must have the same schema in order to be processed successfully. If you have multiple types of schemas, create a new data connection for each one. The order of the columns should match the schema.
Guidelines for Dataset Headers, File and Table Names, and Folder Names
All files and tables must have a header in the first row. The name of each header column in a dataset file or table constitutes an attribute, which is a key for each record in the dataset. Due to certain constraints imposed by different file sources, LiveRamp Clean Rooms guidelines ensure a dataset can be properly utilized. Below are guidelines for two types of datasets - standard and offline conversions. This section also covers BOM characters and how to handle them.
Guidelines for Standard Dataset Headers, File and Table Names, and Folder Names
Make sure to follow the guidelines listed below for file and table names, folder names, and headers for files and tables for standard dataset types (such as CRM, transaction, or ad log). File and table names, folder names, and headers must meet these guidelines:
Start with a letter (and not start with a special character, such as an underscore, or digit)
End with either a letter or a digit
Not contain special characters (hyphens (
-) and underscores (_) are allowed)Not contain whitespaces, such as spaces and control characters ( such as form feed
\f, carriage return\r, line feed/new line\n, or t ab\t)Not contain duplicate header names within the same data connection
Not be longer than 255 characters
Here are a few examples of valid names (header examples shown):
Header1header_nameHEADER-NAMEHEADER___NAME__HERE-1
Here are a few examples of invalid names (header examples shown):
123header: Starts with non-alphabetheader123_: Ends with non-alphanumericheader@name: Contains special charactersheader name: Contains spacesheader\nname: Contains non-space white spaces\uFEFFHeaderName: Contains a BOM character
Guidelines for Offline Conversion Dataset Headers
Offline conversion datasets enable the user to effortlessly pipe transactional conversion event data for activation purposes.
Start with a letter
End with a letter
Does not contain digits
Does not contain special characters
Does not contain non-space whitespace, such as spaces and control characters ( form feed
\f, carriage Return\r, line feed/new line\n, t ab\tDoes not include consecutive occurrences of word separators, such as a combination of spaces, underscores
_, or hyphens-Does not contain duplicate header names within the same data connection
In addition to the header guidelines for offline conversions, the following attribute names are required for a successful offline conversion data file:
Attribute Name | Data Type | SHA256 Hash | Nullable |
|---|---|---|---|
String | Required | No | |
first name | String | Required | Yes |
last name | String | Required | Yes |
city | String | Required | Yes |
state | String | Required | Yes |
zip | String | Required | Yes |
country | String | Required | Yes |
phone number | String | Required | Yes |
conversion name | String | Do not hash | No |
conversion time | Timestamp formatted as:
| Do not hash | No |
conversion value | Double (No $) | Do not hash | No |
conversion currency | String (ISO 4217) | Do not hash | No |
The following optional attributes for offline conversions are useful to include if there is additional information you to attach to each conversion event:
Attribute Name | Data Type | SHA256 Hash | Nullable |
|---|---|---|---|
conversion detail | String | Do not hash | Yes |
client user agent | String | Do not hash | Yes |
action source | String | Do not hash | Yes |
order id | String | Optional | Yes |
event id | String | Optional | Yes |
Removing BOM Characters
Use the table below to determine what BOM characters might appear when you confirm the presence of BOM characters, depending on the type and OS type.
When a file is saved with UTF-8 encoding, some systems prepend the contents with BOM to indicate that it's a UTF-8 file. BOM can cause problems when reading the file because it might be interpreted as part of the actual data (like the header), rather than metadata about the byte order and encoding.
If a BOM character is detected in your file, an error will appear in the data connection within LiveRamp Clean Room. If this occurs, remove the BOM characters by perform the steps listed below, depending on your OS:
Unix-Based OS (Mac OS, Linux)
Confirm the presence of BOM characters by running the following shell command within a terminal session:
cat /path/to/your/input.csv | headThe highlighted characters
<U+FEFF>(or something similar for different UTF encoding) will appear in the file, which indicates the presence of BOM characters. Refer to the "BOM Characters That Might Appear" section below for more information on the characters that might appear, depending on the encoding type.Note
The current shell command can be exited by pressing
qon the terminal window.Alternatively, for a more precise diagnosis, perform the following shell command. Depending on the encoding of the file, the set of BOM characters that will appear is different.
head -n 1 myfile.txt | od -t x1 | head -n 1 | awk '{print $2,$3,$4}'
To remove BOM characters within the terminal session, run the following shell command:
sed -i '1s/^\xEF\xBB\xBF//' /path/to/your/input.csv > /path/to/your/output.csv
If you’ve already created a data connection for this file, re-upload the fixed file to the same data connection location. You may need to adjust the date folder based on the current UTC date.
Windows OS
Confirm the presence of BOMs character by performing one of the following PowerShell commands within a Windows PowerShell session:
Get-Content inputFile.csv -Encoding Byte -TotalCount 3 # for UTF-8 and UTF-16 Get-Content inputFile.csv -Encoding Byte -TotalCount 4 # for UTF-32
Depending on the encoding of the file, the set of BOM characters that will appear is different. Refer to the "BOM Character Sequences" section below for more information on the characters that might appear, depending on the encoding type.
Either remove the BOM characters with the following PowerShell command if the file is
UTF-8encoded, or re-export the file asUTF-8without BOM.$inputFile = 'path\to\your\inputFile.csv' $outputFile = 'path\to\your\outputFile.csv' $encoding = New-Object System.Text.UTF8Encoding $false [System.IO.File]::WriteAllLines($outputFile, (Get-Content $inputFile), $encoding)
If you’ve already created a data connection for this file, re-upload the fixed file to the same data connection location. You may need to adjust the date folder based on the current UTC date.
BOM Characters That Might Appear
Use the table below to determine what BOM characters might appear when you confirm the presence of BOM characters, depending on the type and OS type.
Encoding Type | BOM Characters in Windows OS | BOM Characters in UNIX OS |
|---|---|---|
UTF-8 | 239 187 191 | ef bb bf |
UTF-16 (Little Endian) | 255 254 | ff fe |
UTF-32 (Little Endian) | 254 255 | fe ff |
UTF-32 (Little Endian) | 255 254 0 0 | ff fe 00 00 |
UTF-32 (Big Endian) | 0 0 254 255 | 00 00 fe ff |
Formatting String Fields That Contain Commas and Double Quotes
When a CSV file has a string field that contains both double quotes (") and commas (,), the inner double quotes must be escaped correctly so the system treats the entire value as a single field. If the inner double quotes are not escaped as expected, the commas inside the field can be misinterpreted as delimiters (our system uses a comma as the default delimiter unless configured otherwise), causing the value to be split across multiple fields.
In our system, all inner double quotes must be escaped with a backslash (\). The entire field value must still be enclosed with an outer, non-escaped pair of double quotes.
For example, suppose you have the three fields, customer_id, date, and description, and that one row has the following values for those fields:
customer_id | date | description |
|---|---|---|
2435898 | 2026-01-01 | Lorem ipsum dolor sit amet, "consectetur adipiscing", elit |
To represent this correctly in a CSV row for our system, use double quotes around the entire field value for the description field and escape the inner double quotes in that field with backslashes, as shown in the example below:
customer_id,2026-01-01,"Lorem ipsum dolor sit amet, \"consectetur adipiscing\", elit"
If you do not escape the inner quotes, the system can treat the commas inside the description as delimiters, splitting the third field into three separate fields:
Lorem ipsum dolor sit amet
"consectetur adipiscing"
elit
Note
While some CSV dialects use double double-quotes, our system does not use that format. Because the system expects backslash as the escape character, using double double-quotes (as shown below) can again cause the description to be split into multiple fields instead of being parsed as a single string.
customer_id,2026-01-01,"Lorem ipsum dolor sit amet, ""consectetur adipiscing"", elit"
Delimiters
If you are using the CSV file type, delimiters must be specified.
Integer Nuances
When using the CSV file type, remove any special characters from your data files. For example, if your file contains integers formatted with commas (i.e., 123,456), remove the comma formatting (i.e., 123456).
Using Date Macros in the File Path
When using a date macro in the file path, follow the guidelines listed below:
Note
We recommend that you not rely on these parameters to do date filtering, but rather include a date or timestamp field and use partitioning to specify the date range. Date parameters are a legacy feature we are planning to deprecate at some point.
If a date macro is used and no sample file path is provided, you must ensure there is data available within a date bucket in the last 7 days. Otherwise, the Mapping step will fail.
You should also note that date macros in the file path are used if
START_DATEandEND_DATEdate parameters are turned on in the question. This is not the same as using a date parameter as a normal runtime parameter.If
START_DATEandEND_DATEdate parameters are used, we will look back between the specified dates for data in that bucket range.If you use the date macro and do not use date parameters (
START_DATEorEND_DATE) or runtime parameters that are a date range (which requires referencing a date in the SQL) in questions, we will choose the most recent data in the last 365 days. To avoid pulling in more data than you intended, we recommend using the date parameters in questions.
Field Mapping
Once files are processed, in the field mapper, map each field as PII or non-PII.