Data Connection Requirements and Best Practices

Whether you host your own file storage location for data connection jobs or use a location hosted by LiveRamp Clean Room, get up and running faster by keeping these guidelines in mind when configuring and managing your data connections.

Generating File Storage Credentials

If LiveRamp Clean Room hosts your file drop location, you can generate credentials within the LiveRamp Clean Room UI by navigating to Data Management → Data Source Locations from the LiveRamp Clean Room navigation pane.

Supported File Types and Sizes

For file-based data connections, LiveRamp Clean Room supports CSV and Parquet files, as well as multi-part files.

Note

When creating a csv file, we recommend that you use Google Sheets to create the file raher than Excel. When Excel converts the file to .csv, it purposely adds in "ghost characters" or a BOM (byte order mark) which will cause an error during the data connection process.

Parquet files and CSV files for cloud-based data connections should not be larger than 500 MB.
Files for CSV uploads should not be larger than 50 MB.

Use UTF-8 Character Encoding

To ensure the greatest match rate between collaborator datasets, we recommend using UTF-8 character encoding (including before hashing identifiers).

Date Formatting

The folder where the data files are dropped. We recommend that you use hive-style date formatting in the folder path: “date=yyyy-MM-dd".

Identifiers

Each row in a file or table in a data connection job must contain identifiers. LiveRamp Clean Room supports multiple identifier types, including the identifiers listed below:

PII (personally-identifiable information, such as name, address, phone, and email)
Mobile device IDs (MAID)
Custom IDs (CIDs)

For information on identifier formatting, see "Formatting Identifiers".

Most datasets should include a field for a CID (a unique customer identifier you leverage for deduping individuals across your datasets):

For PII-based universe datasets, we recommend that plaintext CIDs (not hashed) be sent (if you must hash the CIDs, we ask that you use MD5 hashing for interoperability).
For all other datasets (including universe datasets based on online identifiers), CIDs can be plaintext or hashed with any of our allowed hashing types (SHA-256, MD5, or SHA-1) but should be formatted consistently across datasets.

Note

Hashing Identifiers: It is best practice to select a hashing algorithm which matches the NIST standard and results in a final hash value that uses lowercase characters. For more information on hashing identifiers, see "Hashing Identifiers".
If your file or table does not include an identifier field, add a new field and insert a random ID for every record. This pseudo field must be mapped as the user identifier during the data setup.

Keep Schemas Consistent

All data files in a data connection job must have the same schema in order to successfully process. If you have multiple types of schemas, create a new data connection for each one. The order of the columns should match the schema.

Guidelines for Dataset Headers, File and Table Names, and Folder Names

All files and tables must have a header in the first row. The name of each header column in a dataset file or table constitutes an attribute, which is a key for each record in the dataset. Due to certain constraints imposed by different file sources, LiveRamp Clean Rooms guidelines ensure a dataset can be properly utilized. Below are guidelines for two types of datasets - standard and offline conversions. This section also covers BOM characters and how to handle them.

Guidelines for Standard Dataset Headers, File and Table Names, and Folder Names

Make sure to follow the guidelines listed below for file and table names, folder names, and headers for files and tables for standard dataset types (such as CRM, transaction, or ad log). File and table names, folder names, and headers must meet these guidelines:

Start with a letter (and not start with a special character, such as an underscore, or digit)
End with either a letter or a digit
Not contain special characters (hyphens (-) and underscores (_ ) are allowed)
Not contain whitespaces, such as spaces and control characters ( such as form feed \f, c arriage return \r, line feed/new line \n, or t ab \t)
Not contain duplicate header names within the same data connection
Not be longer than 255 characters

Here are a few examples of valid names (header examples shown):

Header1
header_name
HEADER-NAME
HEADER___NAME__HERE-1

Here are a few examples of invalid names (header examples shown):

123header: Starts with non-alphabet
header123_: Ends with non-alphanumeric
header@name: Contains special characters
header name: Contains spaces
header\nname: Contains non-space white spaces
\uFEFFHeaderName: Contains a BOM character

Guidelines for Offline Conversion Dataset Headers

Offline conversion datasets enable the user to effortlessly pipe transactional conversion event data for activation purposes.

Start with a letter
End with a letter
Does not contain digits
Does not contain special characters
Does not contain non-space whitespace, such as spaces and control characters ( form feed \f, c arriage Return \r, l ine feed/new line \n, t ab \t
Does not include consecutive occurrences of word separators, such as a combination of spaces, underscores _, or hyphens -
Does not contain duplicate header names within the same data connection

In addition to the header guidelines for offline conversions, the following attribute names are required for a successful offline conversion data file:

Attribute Name	Data Type	SHA256 Hash	Nullable
email	String	Required	No
first name	String	Required	Yes
last name	String	Required	Yes
city	String	Required	Yes
state	String	Required	Yes
zip	String	Required	Yes
country	String	Required	Yes
phone number	String	Required	Yes
conversion name	String	Do not hash	No
conversion time	Timestamp formatted as: `yyyy-MM-dd HH:mm:ss ZZZ` `yyyyMMdd HHmmss ZZZ`	Do not hash	No
conversion value	Double (No $)	Do not hash	No
conversion currency	String (ISO 4217)	Do not hash	No

The following optional attributes for offline conversions are useful to include if there is additional information you to attach to each conversion event:

Attribute Name	Data Type	SHA256 Hash	Nullable
conversion detail	String	Do not hash	Yes
client user agent	String	Do not hash	Yes
action source	String	Do not hash	Yes
order id	String	Optional	Yes
event id	String	Optional	Yes

Managing BOM Characters

BOM (Byte Order Mark) characters can occur in the header of a CSV file due to the file initially being saved or written out with encoding that includes the BOM, often UTF-8 with BOM.

When a file is saved with UTF-8 encoding, some systems prepend the contents with BOM to indicate that it's a UTF-8 file. BOM can cause problems when reading the file because it might be interpreted as part of the actual data (like the header), rather than metadata about the byte order and encoding.

If a BOM character is detected at the start of your file, an error will appear in the data connection within LiveRamp Clean Room. If this occurs, perform the following on your data before uploading the file again, which can be done within the same data connection:

Unix-Based OS (Mac OS, Linux)
- Confirm the existence of the BOM character by performing the following shell command within a terminal session: cat /path/to/your/input.csv | head
- The highlighted characters <U+FEFF> (or something similar for different UTF encoding), will appear at the beginning of the file, which notifies the presence of a BOM character. The current shell command can be exited by pressing q on the terminal window.
- Alternatively for a more precise diagnosis, perform the following shell command. Depending on the encoding of the file the set of BOM characters that will appear are different. Refer to the BOM CharacterSequences section below for more information.
```
head -n 1 myfile.txt | od -t x1 | head -n 1 | awk '{print $2,$3,$4}'
```
- To remove BOM characters within the terminal session, the following shell command can be performed:
```
sed -i '1s/^\xEF\xBB\xBF//' /path/to/your/input.csv > /path/to/your/output.csv
```

Windows OS
- Confirm the existence of the BOM character by performing one of the following PowerShell command within a Windows Powershell session:
```
Get-Content inputFile.csv -Encoding Byte -TotalCount 3 # for UTF-8 and UTF-16
Get-Content inputFile.csv -Encoding Byte -TotalCount 4 # for UTF-32
```
- Depending on the encoding of the file the set of BOM characters that will appear are different. Referring to the table within section iv. BOM Character Sequences.
  - Either remove the BOM character with the following PowerShell command if the file is UTF-8 encoded, or re-export the file as UTF-8 without BOM.
    $inputFile = 'path\to\your\inputFile.csv' $outputFile = 'path\to\your\outputFile.csv' $encoding = New-Object System.Text.UTF8Encoding $false [System.IO.File]::WriteAllLines($outputFile, (Get-Content $inputFile), $encoding)
- Re-upload the fixed file to the same data connection location. You may need to adjust the date folder based on the current UTC date.

BOM Character Sequences

Encoding	Windows Sequence	UNIX Sequence
UTF-8	239 187 191	ef bb bf
UTF-16 (Little Endian)	255 254	ff fe
UTF-32 (Little Endian)	254 255	fe ff
UTF-32 (Little Endian)	255 254 0 0	ff fe 00 00
UTF-32 (Big Endian)	0 0 254 255	00 00 fe ff

Delimiters

If you are using CSV file type, delimiters must be specified.

Integer Nuances

When using CSV file type, remove any special characters from your data files. For example, if your file contains integers formatted with commas (i.e. 123,456), remove the comma formatting (i.e. 123456).

Using Date Macros in the File Path

When using a date macro in the file path, follow the guidelines listed below:

Note

We recommend that you not rely on these parameters to do date filtering, but rather include a date or timestamp field and use partitioning to specify the date range. Date parameters are a legacy feature we are planning to deprecate at some point.

If a date macro is used and no sample file path is provided, you must ensure there is data available within a date bucket in the last 7 days. Otherwise the Mapping step will fail.
You should also note that date macros in the file path are used if START_DATE and END_DATE date parameters are turned on in the question. This is not the same as using a date parameter as a normal runtime parameter.
- If START_DATE and END_DATE date parameters are used, we will look back between the specified dates for data in that bucket range.
- If you use the date macro and do not use date parameters (START_DATE or END_DATE) or runtime parameters that are a date range (which requires referencing a date in the SQL) in questions, we will choose the most recent data in the last 365 days. To avoid pulling in more data then you intended, we recommend using the date parameters in questions.

Field Mapping

Once files are processed, in the field mapper, map each field as PII or non-PII.

In this section:

Data Connection Requirements and Best Practices

Generating File Storage Credentials

Supported File Types and Sizes

Note

Use UTF-8 Character Encoding

Date Formatting

Identifiers

Note

Keep Schemas Consistent

Guidelines for Dataset Headers, File and Table Names, and Folder Names

Guidelines for Standard Dataset Headers, File and Table Names, and Folder Names

Guidelines for Offline Conversion Dataset Headers

Managing BOM Characters

Delimiters

Integer Nuances

Using Date Macros in the File Path

Note

Field Mapping

Search results