Connect to Data in an Iceberg Catalog Table

If you have data tables in Apache Iceberg format, you can connect to that data by configuring a connection to the data from LiveRamp Clean Room.

Note

This connection type currently uses AWS Glue Catalog as the mechanism for connecting to the Iceberg tables. Other catalog types will be available in the future.
For information on how LiveRamp Clean Room interprets the data types from Glue Catalog, see “Glue Catalog”.

An Iceberg Catalog data connection can be used in the following clean room types:

Hybrid
Confidential Computing
Note
For Confidential Computing clean rooms, clean room partners should use these instructions for this data connection type. Clean room owners of Confidential Computing clean rooms need to use the CSV Catalog data connection type (contact your LiveRamp account team for more information).

Note

For more information on clean room types, see "Configure Clean Rooms".
To view an interactive walkthrough demo of the process of connecting to your cloud-based data by creating a data connection, click here.

After you’ve created the data connection and Clean Room has validated the connection by connecting to the data in your cloud account, you will then need to map the fields before the data connection is ready to use. This is where you specify which fields can be queryable across any clean rooms, which fields contain identifiers to be used in matching, and any columns by which you wish to partition the dataset for questions.

Note

To utilize partitioning for cloud storage data connections, you need to organize your data into folders that reflect the partition columns. LiveRamp encourages users to use Hive-style partitioning, typically by date (such as s3://bucket/path/date=YYYY-MM-DD/). For more information, see "Partition a Dataset in LiveRamp Clean Rooms".

After fields have been mapped, you’re ready to provision the resulting dataset to your desired clean rooms. Within each clean room, you’ll be able to set dataset analysis rules, exclude or include columns, filter for specific values, and set permission levels.

To configure an Iceberg Catalog data connection, see the instructions below.

Overall Steps

After making sure all prerequisites are in place, perform the following overall steps to configure an Iceberg Catalog data connection in LiveRamp Clean Room:

For information on performing these steps, see the sections below.

Prerequisites

The Iceberg table must be cataloged in the AWS Glue Catalog.

The following information is needed to configure your Iceberg Catalog data connection in LiveRamp Clean Room:

Add the Credentials

To add credentials:

From the navigation menu, select Clean Room → Credentials to open the Credentials page.
Click Create Credential.
Enter a descriptive name for the credential.
For the Credentials Type, select "AWS IAM User Credentials".
Enter the following parameters associated with your AWS configuration:
- AWS Access Key ID
- AWS Secret Access Key
- AWS User ARN
- AWS Region
Click Save Credential.

Create the Data Connection

To create the data connection:

Note

if your cloud security limits access to only approved IP addresses, talk to your LiveRamp representative before creating the data connection to coordinate any necessary allowlisting of LiveRamp IP addresses.
When you create the data connection, the dataset type is set to Generic by default.

From the navigation menu, select Clean Room → Data Connections to open the Data Connections page.
From the Data Connections page, click Create Data Connection.
From the New Data Connection screen, select "Iceberg Catalog".
Select the credentials created in the previous procedure from the list.
Note
You can also create credentials here by clicking New Credentials and following the instructions in the "Add the Credentials" section above.
Complete the following fields in the Set up Data Connection section:
- Name: Enter a name for the data connection (this will be the name for the dataset that you'll provision to clean rooms).
- Category: Enter a category of your choice.
- Catalog Type: Select GLUE.
- Database Name: Enter the name of the database that contains your data.
- Table Name: Enter the name of the Apache Iceberg table.
- Catalog Name: Enter the name of the AWS account that contains the Iceberg table.
- Catalog ID: Enter the ID of the AWS account that contains the Iceberg table.
Review the data connection details and click Save Data Connection.
Note
All configured data connections can be seen on the Data Connections page.

When a connection is initially configured, it will show "Verifying Access" as the configuration status. Once the connection is confirmed and the status has changed to "Mapping Required", map the table's fields.

LCR-Configure_Iceberg_Catalog_Data_Connection-Mapping_Required_status.png

You will receive file processing notifications via email.

Map the Fields

Once the above steps have been performed in Google Cloud Platform, perform the overall steps in the sections below in LiveRamp Clean Room.

Note

Before mapping the fields, we recommend confirming any expectations your partners might have for field types for any specific fields that will be used in questions.

From the row for the newly created data connection, click the More Options menu (the three dots) and then click Edit Mapping.
The Map Fields screen opens, and the file column names auto-populate.
For any columns that you do not want to be queryable, slide the Include toggle to the left.
Click Next.
The Add Metadata screen opens.
For any column that contains PII data, slide the PII toggle to the right.
Note
If your data contains a column with RampIDs, do not slide the PII toggle for that column. Mark the RampID column as a User Identifier and select "RampID" as the identifier type. If the data contains a RampID column, no other columns can be enabled as PII.
Select the data type (field type) for each column (for more information on supported field types, see "Field Types for Data Connections").
For columns that you want to partition, slide the Allow Partitions toggle to the right.
If a column contains PII, slide the User Identifiers toggle to the right and then select the user identifier that defines the PII data.
Note
When you select "Raw Email" as the user identifier for an email column, those email addresses will be automatically SHA256 hashed. The resulting hashed emails are then available for query within your clean rooms.
Click Save.

Your data connection configuration is now complete and the status changes to "Completed".

You can now provision the resulting dataset to your desired Hybrid or Confidential Computing clean room.

In this section:

Connect to Data in an Iceberg Catalog Table

Note

Note

Note

Note

Overall Steps

Prerequisites

Add the Credentials

Create the Data Connection

Note

Note

Note

Map the Fields

Note

Note

Note

Search results