Question and Query FAQs
See the FAQs below for common question and query questions.
What can I do to improve the performance of my question runs?
Perform data validation in your own environment before connecting your data source to LiveRamp Clean Room. The key areas to consider include making sure that you’re seeing the expected row counts and fill rates for key fields.
The fields listed in LiveRamp's sample schemas can be used to prepare your data for your clean room collaboration use cases. Pay close attention to column names, data types, any required hashing for PII, and so on. Having the proper file formatting in place will make the remainder of your setup much more seamless. Validating it with QA queries in your own environment can reduce costs and save time once you've connected your data to LiveRamp.
Enable partitioning for date columns and key string columns for the datasets assigned to the question, and use partitioning so that only the relevant data is used during execution.
Optimize your queries based on the cloud environment of your clean room type.
What SQL engine should I optimize for?
The SQL engine you should optimize for depends on the specific type of clean room you are using, because different clean room types may support different SQL dialects:
Hybrid and Hybrid Confidential Computing (HCC) clean rooms use Apache Spark SQL because of its support for distributed processing of large datasets.
Google BigQuery clean rooms use GoogleSQL.
Snowflake clean rooms support standard SQL.
What is LiveRamp's default CPU allotment for a question run?
Question Builder's "Advanced Question Settings" includes "Default" and "Large" options for the "Processing capacity needed to run the question". The "Default" processing capacity is appropriate for most queries and datasets". "Large" should only be used if an optimized query takes longer than 8 hours to run because it increases compute costs.
Why do queries even need optimizing (they work in my cloud data warehouse)?
LiveRamp's Hybrid and Hybrid Confidential Compute (HCC) clean rooms use Apache Spark SQL as the underlying engine for executing queries, not the native query engines used by cloud data warehouses like Google BigQuery (which uses GoogleSQL) or Snowflake (which supports standard SQL). While your SQL query might work in your native cloud data warehouse, the LiveRamp Clean Room execution environment is different.
Spark is designed for the distributed processing of large datasets. Optimizing queries for Spark involves considering how data is partitioned and processed across nodes. Inefficient queries in this distributed environment can lead to performance bottlenecks, high memory consumption, and potential failures.
What could be leading to memory issues with clean rooms, such as functions failing after 3 hours?
Failures after a long duration, such as 3 hours, are often linked to memory issues that arise during question runs due to how the query is structured. It is not likely caused by functions alone but also by other aspects of the query, such as cross joins on large datasets or multiple CTEs reused throughout the query. Failures can also occur for a variety of other reasons, such as within custom Python code, insufficient processing capacity for the warehouse size, a lack of partitioning, transient interruptions, or timeouts.
If troubleshooting steps, such as query optimization and adjusting warehouse size, do not resolve the issue, contact your LiveRamp representative to explore additional options.
Why are my runs queuing for so long?
Long queuing times for question runs can be related to the performance and resource availability within the Clean Room execution environment. If many complex or large queries are submitted simultaneously, there might be insufficient compute resources immediately available, leading to a queue. Problems with the data itself (such as missing data or problematic formatting) could delay the start of a question run if the system encounters problems accessing the necessary data from its source connections.