Guidelines for Using LiveRamp SQL Editor
This topic provides an overview of what LiveRamp SQL Editor supports and technical considerations regarding its use. It assumes an audience with SQL experience that needs to know about LiveRamp use cases and any differences in comparison to other SQL tools.
SQL Editor supports a syntax that conforms to ANSI SQL and is similar to Apache Spark SQL. It supports some Apache Spark SQL functions and is extended with a few LiveRamp functions.
You can write any valid query you want and SQL Editor will verify access and usage restrictions defined by the owner of the data assets. If a query does not meet data permissions, query thresholds, and other restrictions, an alert will indicate why the query cannot return results. If restrictions are partially met and are defined in a way that permits partial query execution, the permitted results are returned.
You can also use SQL Editor to create user-defined functions (UDFs), which enable you to share the results of your analyses with your partners without sharing record-level data.
Working with Joins
You can use a join clause to combine rows based on a related column from two or more tables to create a combined row that is used in the query.
Partner data is never joined into a single table. However, a single view can include "virtually joined" data from multiple partners by applying appropriate data permissions. That way, query results include the data from two or more partners in one view as long each data owner's rules are met. For more information, see "Create a Data Permission".
All first-party data can be queried at row-level granularity, while partner data can be queried and joined at row-level granularity. Results will be displayed in either row-level granularity or aggregated format, depending on the permissions set by the data owner.
If your query contains a
join
of 2 or more base tables withrampid
columns, make sure to userampid_matched(rampid)
on both sides of thejoin
. Base tables with arampid
column will usually contain a number of unmatched rows. When joining without the extrarampid_matched
filter, both sides of the joins with unmatched rows will cause an "exploding join". For example, if each side has 100,000 unmatched rows, the resulting join for just these rows will be an extra 100,000 x 100,000 = 10,000,000,000 rows.SQL Editor supports transcoded joins for more than two tables, which can be useful when working with RampIDs. For example:
SELECT COUNT(*) FROM table1 t1 INNER JOIN table2 t1 ON (t1.rampid = t2.rampid) INNER JOIN table3 t3 ON (t1.rampid = t3.rampid)
Applying Query Thresholds to Your Data Assets
You can prevent the exposure of row-level data by using the qt
function to apply a minimum aggregation to the results (a "query threshold").
For example, you could include qt(brand, rampid, 75)
in your UDF creation statement to apply a minimum aggregation of 75 results. If the UDF's query returns only 45 unique records, the results are not displayed.
You can also apply a query threshold to a table or a view when you create a data permission.
For more information about LiveRamp's qt
function, see "LiveRamp Functions". Some functions that are otherwise supported do not work with the qt
function.
Querying Data Assets that Contain RampIDs
LiveRamp the Data Collaboration Platform provides SQL functions to return information about RampIDs:
You can use
rampid_matched
to filter out rows that lack a valid RampID. This can be helpful when working with large files that include unmatched RampID rows.You can use the rampid_type SQL function to determine the type of RampIDs used in a data asset. For information, see "LiveRamp Data Collaboration Platform Functions" and "RampID Types".
You cannot group by a transformed RampID (or other sensitive value). For example, the following query is not allowed:
select length(rampid), count(*) from pub2 group by 1
Marking Columns as Sensitive
When columns in your data are marked as "sensitive", their values will never be displayed in query results. Most SQL functions cannot be run on sensitive columns, but counts and joins are allowed. You can identify sensitive columns in the SQL Editor's All Assets pane by displaying an asset's schema and looking for an asterisk next to a column in the schema.
By default, RampID columns are marked as sensitive. When you bring your organization's data into LiveRamp, you can request that specific columns also be marked as sensitive, such as your organization's custom IDs.
Once columns are marked as sensitive, they cannot be used in a join condition with a non-sensitive column. Both columns need to be sensitive or not sensitive in order to be joined. Many SQL functions cannot be applied to a sensitive column and you cannot use sensitive columns in filter expressions. You may also see privacy alerts in your results, such as "sensitive columns are not displayed" or "output completely suppressed due to sensitive columns".
Sampling Table Rows
To sample table rows, use the rand SQL function instead of the tablesample
clause.
Supported Set Operators
SQL Editor supports the following set operators:
except
intersect
minus
union
union all
union distinct
For example:
SELECT COUNT(rampid) FROM ( SELECT rampid FROM table1 INTERSECT SELECT rampid FROM table2 EXCEPT SELECT rampid FROM table3 )
Note
intersect all
andexcept all
are not supported.The
intersect
set operator takes precedence overunion
andexcept
.The qt function is not supported when used with the
intersect
set operator .
For information, see "Set Operators" in the Apache Spark documentation.