Connect to Cloud-Based Data
Data connections connect your LiveRamp Clean Room organization to your data at your cloud provider so that it can be accessed in a clean room. This allows the data to be used in list and analytical questions within clean rooms.
Data connections can be configured to any cloud-based storage or data warehouse location, including AWS, GCS, Azure Blob, Snowflake, Google BigQuery, and Databricks. Connections are specific to both the cloud provider and the clean room type (Hybrid, Confidential Computing, or native pattern), meaning the exact configuration will depend on source storage location, data types, and structures. For example, Snowflake data connections must be configured differently for use in Hybrid clean rooms versus Snowflake clean rooms, based on whether the data being used lives in different clouds and/or cloud regions.
Note
Your Clean Room representative will work with you to determine the type(s) of data connections you’ll need for your situation.
Once you've determined the type of data connections you'll need (based on your data source and preferred configuration type), select the appropriate article to see specific configuration steps.
Each data connection results in a single dataset within LiveRamp Clean Room. All data files in a data connection job must have the same schema in order to successfully process. For more information on standard schemas, see "Format Your Clean Room Data".
To enable distinct tables or sets of files as datasets, you will need a data connection for each table or set of files.
Note
To view an interactive walkthrough demo of the process of connecting to your cloud-based data by creating a data connection, click here.
Data Connection Prerequisites
Before creating a new data connection, you might want to have the desired data prepared and present in your cloud location. This can help speed up the connection to the data. For more information, see "Format Your Clean Room Data".
To utilize partitioning for cloud storage data connections, you need to organize your data into folders that reflect the partition columns. LiveRamp encourages users to use Hive-style partitioning, typically by date (such as s3://bucket/path/date=YYYY-MM-DD/). For more information, see "Partition a Dataset in LiveRamp Clean Rooms".
For information on utilizing partitioning for cloud warehouse data connections, see "Partition a Dataset in LiveRamp Clean Rooms".
When creating a data connection, you will either need to utilize existing credentials that you’ve created previously for that cloud provider or you’ll need to add a new credential during the process. For more information, see "Add Credentials".
Supported Clean Room Types
To determine which clean room types support datasets from each data connection type, see the table below.
Note
For more information on clean room types, see "Configure Clean Rooms".
Clean Room Types | ||||||
Data Connection Type | Hybrid | Confidential Computing (HCC) | Amazon Web Services | Snowflake | Databricks | Google Cloud BigQuery |
Yes | Yes (as partner) | |||||
| Yes | |||||
Yes | Yes (as partner) | Yes | ||||
Yes | Yes (as partner) | Yes | ||||
Yes | Yes (as partner) | Yes | ||||
Yes | Yes (as partner) | Yes | ||||
Yes | Yes (as partner) | |||||
Yes | ||||||
Yes | Yes (as partner) | |||||
Yes | ||||||
Yes | Yes (as partner) | |||||
Yes | Yes (as partner) | |||||
Yes | Yes (as partner) | |||||
Next Steps After Connecting Your Data
After you’ve created the data connection and Clean Room has validated the connection by connecting to the data in your cloud account, you will then need to map the fields before the data connection is ready to use. This is where you specify which fields can be queryable across any clean rooms, which fields contain identifiers to be used in matching, and any columns by which you wish to partition the dataset for questions.
After fields have been mapped, you’re ready to provision the resulting dataset to your desired clean rooms. Within each clean room, you’ll be able to set dataset analysis rules, exclude or include columns, filter for specific values, and set permission levels.
Once this has been done, you're ready to use these datasets in your clean room questions.
Data Connection FAQs
See the FAQs below for common data connection questions.
How should my files or tables be formatted for a successful data connection?
To avoid schema and formatting errors:
Use supported file types and sizes:
For file-based connections, use CSV or Parquet, staying within documented size limits for each file type and connection type.
Make sure to use a consistent schema:
All files/tables in the connection must share the same set of columns in the same order and compatible types.
If you have multiple schemas, create one data connection per schema.
Make sure to use clean, compliant headers:
Include a single header row.
Use unique header names with no duplicates.
Avoid disallowed characters in headers, and keep names within character length limits.
Include identifiers:
Each row should contain at least one supported identifier type if you plan to use the data in identity-aware workflows.
Use encoding:
Save CSV files as UTF‑8 without a BOM (byte order mark) to avoid “ghost character” and parsing errors.
For more information, see “Format Your Clean Room Data”.
Can one data connection include multiple tables or different schemas?
No:
Each data connection results in a single dataset within the platform.
All files/tables in that connection must share the same schema to process successfully.
To use distinct tables or file sets as separate datasets, you must do one of the following:
Create a separate data connection for each table or file with a different schema.
(Where available) create a derived dataset from a view of multiple data connections, which is still represented as a separate dataset downstream.
Why does partitioning matter?
Partitioning optimizes the dataset even before you get to the query stage. Partitioning improves query performance because data processing during question runs occurs only on the relevant filtered data. For more information, see “Data Connection Partitioning”.
What are the best practices for partitioning?
Data partitioning (dividing a large dataset into smaller, more manageable subsets) is recommended for optimizing query performance and leading to faster processing times. By indicating partition columns for your data connections, data processing during question runs occurs only on the relevant filtered data, which reduces query cost and time to execute. Best practices include:
Partition at the source: When configuring your data connection to LiveRamp Clean Room, define partition columns.
Consider the collaboration context: Make sure that the partition columns make sense for the types of questions that a dataset is likely to be used for. For example:
If you anticipate questions that analyze data over time, partition the dataset by a date field (e.g., event_date or impression_date). This allows queries that filter by date ranges to scan only relevant partitions, reducing processing time and costs.
If the main use case is to analyze data by different brands or products, then partitioning by a brand or product_id column makes sense. This strategy ensures that queries filtering by brand will only access the necessary subset of the data.
Verify column data types: Partitioning supports date, string, integer, and timestamp field types. Complex types (such as arrays, maps, or structs) are not allowed.
Cloud-specific formatting: For cloud storage sources like S3, GCS, and Azure, structure your buckets and file paths in a partitioning format based on the partition column. For BigQuery and Snowflake, make sure columns are indicated as partition keys in your source tables.
For more information, see "Data Connection Partitioning".
How should I map fields if my data contains a RampID column?
If your data contains a column with RampIDs, do not slide the PII toggle for that column. Mark the RampID column as a User Identifier and select "RampID" as the identifier type. If the data contains a RampID column, no other columns can be enabled as PII.
Why do I need to map a user identifier during the field mapping process, and what if I don’t have one?
Every data connection must have at least one user identifier mapped:
Many clean room workflows rely on identifying individuals or households to perform the following actions:
Link your data to other datasets.
Perform identity resolution and deconfliction.
Enable audience and measurement use cases.
At least one field must be mapped as a User Identifier in the connection’s field mapping for identity-aware use cases to work correctly.
If you have no obvious identifier column, consider one of the following options:
Add a customer or account ID that is stable and unique per entity to your dataset
Generate a “synthetic” or surrogate key upstream, if that is acceptable for your use case.
Note
Be aware that using a synthetic ID that never appears anywhere else will allow you to analyze that dataset internally but will not create overlap with partner data.
If you’re unsure which column to choose, align with your internal data owners and your LiveRamp contact before finalizing the mapping.
What should I try if my new data connection won’t complete setup (for example, it stays in Draft, Verifying Access, Configuration Failed, or Missing Valid Schema)?
Start with these quick checks:
Verify access and permissions:
Confirm the credential you selected is valid, not expired, and has read access to the target location (project/database, schema, table, bucket, or folder).
If your environment uses network rules or IP allowlisting, make sure the required LiveRamp IP ranges are authorized before retrying the connection.
Verify path or object details:
Verify the project/database, dataset/schema, table/view, or storage path is correct and still exists.
Ensure the location contains non‑empty data that matches what you intend to connect (for example, not a placeholder or empty folder).
Verify schema and headers
Confirm that all files/tables share the same schema, with the same columns in the same order and compatible data types.
Make sure headers are present, unique, and do not contain disallowed characters (such as duplicate names, certain special characters, or overly long column names).
Verify identifiers:
Ensure the data includes at least one column that can be mapped as a user identifier during field mapping; missing identifiers can block runs or yield empty results later.
After fixing anything uncovered above, retry the data connection to re-run validation.
What should I do if the tooltip error message for my data connection seems to indicate that the credentials might be an issue?
When a tooltip or error message explicitly calls out “authentication”, “invalid credentials”, or a “token/session expiry”, it means the platform could not sign in to your cloud source with the saved credential (even if the credential object still exists in the cloud provider UI).
To troubleshoot the issue, perform these steps:
Check the credential in your cloud console:
For the credential used by the data connection (for example, service account key, SAS token, keypair, or password), confirm the following in your cloud provider:
The key/token/user still exists and has not been revoked or disabled.
Any explicit expiry time on the key/token has not passed.
If you see that the key/token is expired or rotated, generate a new one.
Test the same credential directly against the source:
Use your cloud provider’s native tools (CLI, SQL client, storage browser) with the exact same key/token/user that is configured in the LiveRamp credential.
Try a simple read action (for example, list objects in the bucket, select from the table/view).
If this direct test fails with auth-related errors (for example, invalid token, expired token, authentication failed), the credential is no longer valid and must be regenerated.
Rotate or regenerate the credential, then update it in Clean Room:
Create a new key/token/secret in your cloud environment following your internal security practices.
In Clean Room, perform one of the following actions:
Edit the existing credential to paste the new secret.
Create a new credential and point the data connection to it.
Make sure the new credential has at least read/list access to the configured project/database, schema, table, bucket, or folder.
Retry the data connection and re-read the tooltip if needed:
After updating the credential, retry the affected connection.
If authentication is now working, the status should move from “Configuration Failed” to “Mapping Required” or “Completed”.
If it fails again, re-check the tooltip to see whether it still points to authentication/credentials, or whether it now points to a different issue (for example, missing table, invalid path, or schema problems) and troubleshoot accordingly.
What should I do if the status for my data connection is “Mapping Required”?
If the status for your data connection is “Mapping Required”, that means that Clean Room can reach your data and has read the schema, but you still need to perform the map fields step.
From the row for the data connection on the Data Connections page, click the More Options menu (the three dots) and then select Edit Mapping. In the mapping flow, choose which columns to include, set data types and PII/identifier fields, and then save. After mapping is complete, the status should change to “Completed” (or "Ready" for CSV Upload connections). For more information, see the appropriate article for your cloud provider in “Connect to Cloud-Based Data”.
What should I do if the status for my data connection is “Configuration Failed”?
If the status for your data connection is “Configuration Failed”, that means that Clean Room was unable to connect to your data source and read the dataset during the “Verifying Access” stage. You can hover over the status icon to see a tooltip with more detail on what went wrong (for example, a permissions issue, wrong project or database, missing table or view, or an invalid path).
Fix the underlying issue in your cloud environment (such as credentials, roles, allowlisting, region, or object name) and then retry the data connection. If the issue is fixed, the status should change to “Mapping Required”.
What should I do if the status for my data connection is “Failed”?
If the status for your data connection is “Failed”, that means that an identity resolution job for that data connection did not complete successfully. You can hover over the status icon to see a tooltip with the error message and additional context (for example, problems with identifier fields, schema changes, or configuration).
Review that message, fix the underlying issue in your environment (such as identifier columns, data formats, or permissions), and then retry the data connection. If the job fails again with the same error, contact your LiveRamp representative or create a support case.
What should I do if the status for my data connection is “Missing Valid Schema”?
If the status for your data connection is “Missing Valid Schema”, that means that Clean Room can reach the data location but cannot find a usable table/view or file schema there.
For cloud storage data connections (such as AWS S3, GCS, Azure, Databricks, and Iceberg Catalog), confirm that the expected data files or data schema reference file are present, have valid headers, and match the configured schema.
For cloud warehouse data connections (such as BigQuery or Snowflake), confirm that the project/database, dataset/schema, and table/view values you entered are correct and still point to an existing object with the expected columns.
After you fix the underlying issue, retry the data connection so that Clean Room can re-read the schema.
What should I do if the status for my data connection is “Waiting for File”?
If the status for your data connection is “Waiting for File”, that means that a CSV Upload / Local Upload data connection does not yet have a valid CSV file at the upload location.
From the row for the data connection on the Data Connections page, edit the data connection, use the button to upload the CSV file, and then save the data connection. After Clean Room has uploaded and validated the file, the status should change from “Waiting for File” to “Mapping Required”, and you can then complete the Map Fields step.
Why is validation for my CSV file data connection failing even though my CSV file looks fine?
Even CSVs that look fine can fail (for example, a status of “Configuration Failed” or “Missing Valid Schema”) for small issues:
Hidden BOM / “ghost characters”:
Some tools (especially Excel) add a BOM at the start of the file, which can break header detection.
Re-save as UTF‑8 without BOM or use a script to strip the BOM before re-uploading.
Inconsistent headers across files:
Slight differences (extra spaces, different capitalization, extra columns in some files) can cause schema mismatch.
Ensure every file in the connection uses exactly the same header row and column order.
Delimiter and quoting issues:
If data contains commas, quotes, or line breaks inside fields, but the parser isn’t configured to handle them, you can see errors like “more columns than headers”.
Normalize your CSVs to a consistent delimiter/quote strategy, or consider using a columnar format such as Parquet when possible.
Trailing or malformed rows:
Extra footer lines, partial rows, or corrupted lines at the end of files can also cause validation to fail.
Check for and remove any non-data lines.
After any fix, upload the corrected file(s) and retry the data connection so the schema can be re-read.
For more information, see “Format Your Clean Room Data”.