Data Connection Requirements and Best Practices
Whether you host your own file storage location for data connection jobs or use a location hosted by LiveRamp Clean Room, get up and running faster by keeping these guidelines in mind when configuring and managing your data connections.
Generating File Storage Credentials
If LiveRamp Clean Room hosts your file drop location, you can generate credentials within the LiveRamp Clean Room UI by navigating to → from the LiveRamp Clean Room navigation pane.
Supported File Types and Sizes
For file-based data connections, LiveRamp Clean Room supports CSV and Parquet files, as well as multi-part files.
Note
When creating a csv file, we recommend that you use Google Sheets to create the file raher than Excel. When Excel converts the file to .csv, it purposely adds in "ghost characters" or a BOM (byte order mark) which will cause an error during the data connection process.
Parquet files and CSV files for cloud-based data connections should not be larger than 500 MB.
Files for CSV uploads should not be larger than 50 MB.
Use UTF-8 Character Encoding
To ensure the greatest match rate between collaborator datasets, we recommend using UTF-8 character encoding (including before hashing identifiers).
Date Formatting
The folder where the data files are dropped includes a date macro - {yyyy-MM-dd}. Replace the macro with the appropriate date string when the files are dropped without the curly brackets. LiveRamp Clean Room does not accept date time strings.
Identifiers
Each row in a file or table in a data connection job must contain identifiers. LiveRamp Clean Room supports multiple identifier types, including the identifiers listed below:
PII (personally-identifiable information, such as name, address, phone, and email)
Mobile device IDs (MAID)
Custom IDs (CIDs)
For information on identifier formatting, see "Formatting Identifiers".
Most datasets should include a field for a CID (a unique customer identifier you leverage for deduping individuals across your datasets):
For PII-based universe datasets, we recommend that plaintext CIDs (not hashed) be sent (if you must hash the CIDs, we ask that you use MD5 hashing for interoperability).
For all other datasets (including universe datasets based on online identifiers), CIDs can be plaintext or hashed with any of our allowed hashing types (SHA-256, MD5, or SHA-1) but should be formatted consistently across datasets.
Note
Hashing Identifiers: It is best practice to select a hashing algorithm which matches the NIST standard and results in a final hash value that uses lowercase characters. For more information on hashing identifiers, see "Hashing Identifiers".
If your file or table does not include an identifier field, add a new field and insert a random ID for every record. This pseudo field must be mapped as the user identifier during the data setup.
Keep Schemas Consistent
All data files in a data connection job must have the same schema in order to successfully process. If you have multiple types of schemas, create a new data connection for each one. The order of the columns should match the schema.
Guidelines for Dataset Headers, File and Table Names, and Folder Names
All files and tables must have a header in the first row. The name of each header column in a dataset file or table constitutes an attribute, which is a key for each record in the dataset. Due to certain constraints imposed by different file sources, LiveRamp Clean Rooms guidelines ensure a dataset can be properly utilized. Below are guidelines for two types of datasets - standard and offline conversions. This section also covers BOM characters and how to handle them.
Guidelines for Standard Dataset Headers, File and Table Names, and Folder Names
Make sure to follow the guidelines listed below for file and table names, folder names, and headers for files and tables for standard dataset types (such as CRM, transaction, or ad log). File and table names, folder names, and headers must meet these guidelines:
Start with a letter (and not start with a special character, such as an underscore, or digit)
End with either a letter or a digit
Not contain special characters (hyphens (
-
) and underscores (_
) are allowed)Not contain whitespaces, such as spaces and control characters ( such as form feed
\f
, c arriage return\r
, line feed/new line\n
, or t ab\t
)Not contain duplicate header names within the same data connection
Not be longer than 255 characters
Here are a few examples of valid names (header examples shown):
Header1
header_name
HEADER-NAME
HEADER___NAME__HERE-1
Here are a few examples of invalid names (header examples shown):
123header
: Starts with non-alphabetheader123_
: Ends with non-alphanumericheader@name
: Contains special charactersheader name
: Contains spacesheader\nname
: Contains non-space white spaces\uFEFFHeaderName
: Contains a BOM character
Guidelines for Offline Conversion Dataset Headers
Offline conversion datasets enable the user to effortlessly pipe transactional conversion event data for activation purposes.
Start with a letter
End with a letter
Does not contain digits
Does not contain special characters
Does not contain non-space whitespace, such as spaces and control characters ( form feed
\f
, c arriage Return\r
, l ine feed/new line\n
, t ab\t
Does not include consecutive occurrences of word separators, such as a combination of spaces, underscores
_
, or hyphens-
Does not contain duplicate header names within the same data connection
In addition to the header guidelines for offline conversions, the following attribute names are required for a successful offline conversion data file:
Attribute Name | Data Type | SHA256 Hash | Nullable |
---|---|---|---|
String | Required | No | |
first name | String | Required | Yes |
last name | String | Required | Yes |
city | String | Required | Yes |
state | String | Required | Yes |
zip | String | Required | Yes |
country | String | Required | Yes |
phone number | String | Required | Yes |
conversion name | String | Do not hash | No |
conversion time | Timestamp formatted as:
| Do not hash | No |
conversion value | Double (No $) | Do not hash | No |
conversion currency | String (ISO 4217) | Do not hash | No |
The following optional attributes for offline conversions are useful to include if there is additional information you to attach to each conversion event:
Attribute Name | Data Type | SHA256 Hash | Nullable |
---|---|---|---|
conversion detail | String | Do not hash | Yes |
client user agent | String | Do not hash | Yes |
action source | String | Do not hash | Yes |
order id | String | Optional | Yes |
event id | String | Optional | Yes |
Managing BOM Characters
BOM (Byte Order Mark) characters can occur in the header of a CSV file due to the file initially being saved or written out with encoding that includes the BOM, often UTF-8 with BOM.
When a file is saved with UTF-8 encoding, some systems prepend the contents with BOM to indicate that it's a UTF-8 file. BOM can cause problems when reading the file because it might be interpreted as part of the actual data (like the header), rather than metadata about the byte order and encoding.
If a BOM character is detected at the start of your file, an error will appear in the data connection within LiveRamp Clean Room. If this occurs, perform the following on your data before uploading the file again, which can be done within the same data connection:
Unix-Based OS (Mac OS, Linux)
Confirm the existence of the BOM character by performing the following shell command within a terminal session:
cat /path/to/your/input.csv | head
The highlighted characters
<U+FEFF>
(or something similar for different UTF encoding), will appear at the beginning of the file, which notifies the presence of a BOM character. The current shell command can be exited by pressingq
on the terminal window.Alternatively for a more precise diagnosis, perform the following shell command. Depending on the encoding of the file the set of BOM characters that will appear are different. Refer to the BOM CharacterSequences section below for more information.
head -n 1 myfile.txt | od -t x1 | head -n 1 | awk '{print $2,$3,$4}'
To remove BOM characters within the terminal session, the following shell command can be performed:
sed -i '1s/^\xEF\xBB\xBF//' /path/to/your/input.csv > /path/to/your/output.csv
Windows OS
Confirm the existence of the BOM character by performing one of the following PowerShell command within a Windows Powershell session:
Get-Content inputFile.csv -Encoding Byte -TotalCount 3 # for UTF-8 and UTF-16 Get-Content inputFile.csv -Encoding Byte -TotalCount 4 # for UTF-32
Depending on the encoding of the file the set of BOM characters that will appear are different. Referring to the table within section iv. BOM Character Sequences.
Either remove the BOM character with the following PowerShell command if the file is
UTF-8
encoded, or re-export the file asUTF-8
without BOM.$inputFile = 'path\to\your\inputFile.csv' $outputFile = 'path\to\your\outputFile.csv' $encoding = New-Object System.Text.UTF8Encoding $false [System.IO.File]::WriteAllLines($outputFile, (Get-Content $inputFile), $encoding)
Re-upload the fixed file to the same data connection location. You may need to adjust the date folder based on the current UTC date.
BOM Character Sequences
Encoding | Windows Sequence | UNIX Sequence |
---|---|---|
UTF-8 | 239 187 191 | ef bb bf |
UTF-16 (Little Endian) | 255 254 | ff fe |
UTF-32 (Little Endian) | 254 255 | fe ff |
UTF-32 (Little Endian) | 255 254 0 0 | ff fe 00 00 |
UTF-32 (Big Endian) | 0 0 254 255 | 00 00 fe ff |
Delimiters
If you are using CSV file type, delimiters must be specified.
Integer Nuances
When using CSV file type, remove any special characters from your data files. For example, if your file contains integers formatted with commas (i.e. 123,456), remove the comma formatting (i.e. 123456).
Using Date Macros in the File Path
When using a date macro in the file path, follow the guidelines listed below:
Note
We recommend that you not rely on these parameters to do date filtering, but rather include a date or timestamp field and use partitioning to specify the date range. Date parameters are a legacy feature we are planning to deprecate at some point.
If a date macro is used and no sample file path is provided, you must ensure there is data available within a date bucket in the last 7 days. Otherwise the Mapping step will fail.
You should also note that date macros in the file path are used if
START_DATE
andEND_DATE
date parameters are turned on in the question. This is not the same as using a date parameter as a normal runtime parameter.If
START_DATE
andEND_DATE
date parameters are used, we will look back between the specified dates for data in that bucket range.If you use the date macro and do not use date parameters (
START_DATE
orEND_DATE
) or runtime parameters that are a date range (which requires referencing a date in the SQL) in questions, we will choose the most recent data in the last 365 days. To avoid pulling in more data then you intended, we recommend using the date parameters in questions.
Field Mapping
Once files are processed, in the field mapper, map each field as PII or non-PII.