Clean Room FAQs
See the sections below for answers to common LiveRamp Clean Room questions.
Data Connection FAQs
See the FAQs below for common data connection questions.
Why does partitioning matter?
Partitioning optimizes the dataset even before you get to the query stage. Partitioning improves query performance because data processing during question runs occurs only on the relevant filtered data. For more information, see “Data Connection Partitioning”.
What are the best practices for partitioning?
Data partitioning (dividing a large dataset into smaller, more manageable subsets) is recommended for optimizing query performance and leading to faster processing times. By indicating partition columns for your data connections, data processing during question runs occurs only on the relevant filtered data, which reduces query cost and time to execute. Best practices include:
Partition at the source: When configuring your data connection to LiveRamp Clean Room, define partition columns.
Consider the collaboration context: Make sure that the partition columns make sense for the types of questions that a dataset is likely to be used for. For example:
If you anticipate questions that analyze data over time, partition the dataset by a date field (e.g., event_date or impression_date). This allows queries that filter by date ranges to scan only relevant partitions, reducing processing time and costs.
If the main use case is to analyze data by different brands or products, then partitioning by a brand or product_id column makes sense. This strategy ensures that queries filtering by brand will only access the necessary subset of the data.
Verify column data types: Partitioning supports date, string, integer, and timestamp field types. Complex types (such as arrays, maps, or structs) are not allowed.
Cloud-specific formatting: For cloud storage sources like S3, GCS, and Azure, structure your buckets and file paths in a partitioning format based on the partition column. For BigQuery and Snowflake, make sure columns are indicated as partition keys in your source tables.
For more information, see “Data Connection Partitioning”.
Clean Room Setup FAQs
See the FAQs below for common setup questions.
Where do queries run in a hybrid or confidential computing clean room?
Queries run within LiveRamp's secure Clean Room environment. For hybrid and hybrid confidential computing (HCC) clean rooms, the underlying engine that executes your SQL queries is Apache Spark SQL in a distributed, multi-tenant environment.
How do I choose where to execute to improve performance?
"Where" primarily refers to the type of clean room that your organization selects based on your collaboration goals and the location of your data and your partners' data. Different clean room types use different execution engines, which impact how queries are processed. Optimizing for performance partly depends on your clean room type (such as Snowflake, Google BigQuery, or Hybrid).
Snowflake: Optimize your queries for Snowflake's native SQL engine. For more information, see "Query Data in Snowflake" in Snowflake's documentation.
BigQuery: Queries use the GoogleSQL dialect and the compute typically runs within the clean room owner's BigQuery project. For more information, see "Optimize Queries" in the BigQuery documentation.
Hybrid and Hybrid Confidential Computing (HCC): Queries are executed using Apache Spark SQL within LiveRamp's secure data plane environment. For more information, see "Performance Tuning" in the Apache Spark documentation.
Google Cloud BigQuery Clean Room FAQs
See the FAQs below for common Google Cloud BigQuery clean room questions.
BigQuery Clean Room Setup FAQs
Prior to orchestrating BigQuery clean rooms in LiveRamp Clean Room, it is important to configure the necessary permissions in the Google Cloud Platform (GCP) and LiveRamp Clean Room, as well as enable certain APIs for your project. For information, see “Configuring BigQuery Permissions for BigQuery Clean Rooms”.
Yes, multiple Google service accounts can be used in an organization to bring data.
BigQuery Clean Room Permissions FAQs
LiveRamp uses these permissions to create a dataset in the owner/partner project. This is done to create an authorized view in a dataset that is shared as a private exchange.
The role helps render the data connections screen UI. It helps fetch the table metadata and render it on the LiveRamp Clean Room UI.
We create an authorized view from the owner/partner table and it lives in a separate shared dataset other than the owner dataset. LiveRamp orchestrates the creation of this shared dataset. Note that this shared dataset is different from the source dataset and is only created to be part of the private exchange in Analytics Hub.
You can create custom roles with the minimum set of permissions listed in “Configuring BigQuery Permissions for BigQuery Clean Rooms” and assign them to the project.
BigQuery Metadata Viewer permission is expected at the table level. So LiveRamp doesn’t have access to the rest of the tables which do not have the above role and are not a part of the data connections screen.
LiveRamp is creating the authorized view in the clean room Owner’s BigQuery project. In order to facilitate the process of creating the authorized view, LiveRamp first creates a dataset and then creates an authorized view in it which is accessible to LiveRamp.The permissions needed to do this are bigquery.datasets.create, bigquery.datasets.get, and bigquery.datasets.update. LiveRamp can only create, update, or get a dataset which is the one LiveRamp creates while creating the authorized view. The LiveRamp service account does not have access to list any other datasets in the Owner BigQuery project.
If some columns are masked, the Owner service account would need to provide the BigQuery Data Viewer role at the table level. If there is no masking, the Owner service account needs the BigQuery Metadata Viewer role (listed in “Configuring BigQuery Permissions for BigQuery Clean Rooms”).
When the owner/partner configures the data connection in a clean room, LiveRamp creates a dataset, an authorized view, and a private exchange and a listing under the exchange and then adds the LiveRamp service account as an Analytics Hub subscriber to the listing.
Authorized views, Private Exchange, and listing are created in the Client Project. At the time of clean room question execution, the LiveRamp service account subscribes to the private listing, creates a BigQuery job, and, after the job is complete, unsubscribes from the listing. This is what is known as “ephemeral access”.
BigQuery Clean Room Data FAQs
Partitioning is highly recommended for best performance.
BigQuery Clean Room Compute FAQs
We support compute in the clean room Owner’s project today through a clean room parameter in the BigQuery clean room called “Billing Project ID”. It's set in the clean room configuration screen and is where the compute will happen and the BigQuery jobs will be created.
BigQuery Clean Room Billing FAQs
The Billing Project ID of the clean room determines which project gets billed for the job execution.
It doesn’t matter. It could be any BigQuery project with the appropriate billing setup.
Permissions FAQs
See the FAQs below for common permissions questions.
How are permissions managed in LiveRamp Clean Room?
Clean room owners control how internal users and partners engage with clean rooms, questions, and outputs and can manage permissions at the clean room and the question level.
Clean room-level permissions define what users can do within the clean room itself.
Question-level permissions control interactions with specific questions within a clean room.
An organization's administrators manage partner level and role level permissions.
Partner level: Clean room owners specify which clean room permissions are available to their partners to assign.
Role level: Clean room partners can create custom user roles based on the available clean room permissions.
For more information, see "Managing Clean Room Permissions" and "Question Management".
Question and Query FAQs
See the FAQs below for common question and query questions.
What can I do to improve the performance of my question runs?
Perform data validation in your own environment before connecting your data source to LiveRamp Clean Room. The key areas to consider include making sure that you’re seeing the expected row counts and fill rates for key fields.
The fields listed in LiveRamp's sample schemas can be used to prepare your data for your clean room collaboration use cases. Pay close attention to column names, data types, any required hashing for PII, and so on. Having the proper file formatting in place will make the remainder of your setup much more seamless. Validating it with QA queries in your own environment can reduce costs and save time once you've connected your data to LiveRamp.
Enable partitioning for date columns and key string columns for the datasets assigned to the question, and use partitioning so that only the relevant data is used during execution.
Optimize your queries based on the cloud environment of your clean room type.
What SQL engine should I optimize for?
The SQL engine you should optimize for depends on the specific type of clean room you are using, because different clean room types may support different SQL dialects:
Hybrid and Hybrid Confidential Computing (HCC) clean rooms use Apache Spark SQL because of its support for distributed processing of large datasets.
Google BigQuery clean rooms use GoogleSQL.
Snowflake clean rooms support standard SQL.
What is LiveRamp's default CPU allotment for a question run?
Question Builder's "Advanced Question Settings" includes "Default" and "Large" options for the "Processing capacity needed to run the question". The "Default" processing capacity is appropriate for most queries and datasets". "Large" should only be used if an optimized query takes longer than 8 hours to run because it increases compute costs.
Why do queries even need optimizing (they work in my cloud data warehouse)?
LiveRamp's Hybrid and Hybrid Confidential Compute (HCC) clean rooms use Apache Spark SQL as the underlying engine for executing queries, not the native query engines used by cloud data warehouses like Google BigQuery (which uses GoogleSQL) or Snowflake (which supports standard SQL). While your SQL query might work in your native cloud data warehouse, the LiveRamp Clean Room execution environment is different.
Spark is designed for the distributed processing of large datasets. Optimizing queries for Spark involves considering how data is partitioned and processed across nodes. Inefficient queries in this distributed environment can lead to performance bottlenecks, high memory consumption, and potential failures.
What could be leading to memory issues with clean rooms, such as functions failing after 3 hours?
Failures after a long duration, such as 3 hours, are often linked to memory issues that arise during question runs due to how the query is structured. It is not likely caused by functions alone but also by other aspects of the query, such as cross joins on large datasets or multiple CTEs reused throughout the query. Failures can also occur for a variety of other reasons, such as within custom Python code, insufficient processing capacity for the warehouse size, a lack of partitioning, transient interruptions, or timeouts.
If troubleshooting steps, such as query optimization and adjusting warehouse size, do not resolve the issue, contact your LiveRamp representative to explore additional options.
Why are my runs queuing for so long?
Long queuing times for question runs can be related to the performance and resource availability within the Clean Room execution environment. If many complex or large queries are submitted simultaneously, there might be insufficient compute resources immediately available, leading to a queue. Problems with the data itself (such as missing data or problematic formatting) could delay the start of a question run if the system encounters problems accessing the necessary data from its source connections.
LiveRamp Clean Room AI FAQs
How LiveRamp Clean Room Uses AI
LiveRamp Clean Room provides several opportunities to interact with LiveRamp Clean Room AI, Clean Room's generative AI helper bot. The goal of LiveRamp Clean Room AI is to help our users reduce time-to-value in performing tasks within the Clean Room Console by predicting which analyses would be beneficial to your organization, how questions, user lists, and alerts can be built, and how to describe questions to your partners for approval.
LiveRamp recognizes that AI is a broad discipline, and it is often hard to know what data is used to generate AI-powered responses. In the spirit of transparency, we'd like to explain how LiveRamp Clean Room AI works.
Clean Room leverages models available via the OpenAI Platform as the foundation models for providing responses to Clean Room Console users. For more information, see "AI Disclosure".
LiveRamp Clean Room integration uses OpenAI's API to provide responses based on data trained to inform OpenAI's models. To fine-tune the model, Clean Room only leverages the schemas (names, fields, and data types) of the data connections already provided within your organization to provide more relevant responses and suggestions. It's important to note:
OpenAI has no access to messages you write in LiveRamp Clean Room AI or data you have connected to Clean Room Console.
Clean Room does not use your organization's underlying data to inform the models.
Your messages to LiveRamp Clean Room AI are not used to fine-tune the messages you receive.
Yes, Clean Room securely stores chat histories as a reference for our Customer Success team in cases where they may need to provide additional support and make improvement requests.
While the accuracy of outputs from AI tools is improving at a rapid rate, generative AI is still a nascent field. This means we cannot guarantee that all responses will be accurate and always provide users with the opportunity to correct an AI-generated response before using it.
While we hope LiveRamp Clean Room AI is a useful tool for most users, it is completely optional. If you’d prefer not to use it, we've provided the option to hide LiveRamp Clean Room AI's helper bot.
How to Interact with LiveRamp Clean Room AI
You can interact with LiveRamp Clean Room AI in the same manner you would interact with a chatbot on any other website. We recommend asking the helper bot questions and prompting it with messages describing what you hope to achieve. For example:
I want a SQL query that will allow me to understand the overlap between my audience and my partner's audience for CRM and exposure log data.
Give me a query that will tell me the ROAS for my March campaign across all destinations where I've activated data from Clean Room.
Set up an alert that will email me anytime the average basket size across my SKUs at retail locations in Boston dips below $10.00.
If you have any concerns with LiveRamp Clean Room AI's responses or are having trouble using the tools effectively, contact your LiveRamp representative.
AI Disclosure
OpenAI ChatGPT API Integration and Privacy
LiveRamp uses OpenAI, L.L.C.'s ("OpenAI") ChatGPT API to help you create alerts, queries, and natural language code descriptions. OpenAI's ChatGPT API uses large language models, and the content it creates may not be fully accurate or reliable. We want you to know:
LiveRamp’s agreement with OpenAI prevents OpenAI from using your data to train its models. OpenAI describes its data usage policies here.
In addition to the industry-standard technical, administrative, and physical controls that LiveRamp uses to protect your data, OpenAI's requests and responses are encrypted using transport layer security (TLS), and OpenAI API is SOC 2 Type 2 compliant.