Skip to main content

Perform Identity Resolution in BigQuery

Abstract

LiveRamp's identity resolution capabilities in BigQuery allow you to resolve hashed email addresses to RampIDs, LiveRamp’s persistent pseudonymous identifier for persons and households. Identity resolution allows you to have a more holistic view of your data at an individual or household level.

LiveRamp's identity resolution capabilities in BigQuery allow you to resolve hashed email addresses to RampIDs, LiveRamp’s persistent pseudonymous identifier for persons and households. Identity resolution allows you to have a more holistic view of your data at an individual or household level.

Note

For more information about RampIDs, see "RampID Methodology".

This service leverages LiveRamp’s Identity Graph, connecting fragmented consumer touchpoints to a person or household-based view.

These capabilities are available within BigQuery through a LiveRamp solution, which creates a share to your account, opening up a view to query the reference data set from within your own BigQuery environment. See "LiveRamp Embedded Identity in BigQuery" for more information.

Overall Steps

Once you’ve enabled LiveRamp Embedded Identity in BigQuery, performing an identity resolution involves the following tasks:

Note

For instructions on enabling Live Ramp Embedded Identity in BigQuery, see “Enabling LiveRamp Embedded Identity in BigQuery”.

  1. You prepare the input and metadata tables to be used for identity resolution.

  2. You share the tables and datasets with LiveRamp.

  3. You notify LiveRamp to initiate identity resolution.

  4. LiveRamp processes the input and writes the output to the output dataset designated. Once this process has been completed, LiveRamp will email you to confirm completion.

See the sections below for information on performing these tasks.

Authentication

The LiveRamp Identity Service in BigQuery relies on the same authentication service as LiveRamp's AbiliTec and RampID APIs (Identity APIs). If you have credentials to those APIs, you can use your previously assigned credentials.

Note

Coordinate with LiveRamp to enable these API’s credentials for use with embedded identity.

Authenticating with LiveRamp's GCP service requires a call on behalf of the customer to LiveRamp's core services.

Client credentials are used to obtain an access token by passing the client ID and client secret values.

Prepare the Tables for Identity Resolution

Identity resolution with the LiveRamp solution requires the preparation and deployment of two tables:

  • A metadata table, indicated in the code as <{{resolution_meta_table}}>.

    Note

    As long as the column names in the input table stay the same, the original metadata table can be reused for multiple operations. You only need to create a new metadata table if you change the column names in the input table.

  • An input table, indicated in the code as <{{resolution_input_table}}>.

    Note

    An input table needs to be prepared for each identity resolution operation.

You can create these tables inside BigQuery or import the tables into your database using BigQuery's standard methods. The <...> variables may be substituted with your own values. Be sure to reference the names correctly in the metadata table, which has as its default name <resolution_meta_table>, and make sure that the column names also match up correctly.

When creating tables, keep the following guidelines in mind (in addition to the guidelines listed in the sections below):

  • The column names for the input table can be whatever you want to use, as long as the names match the values specified in the metadata table.

  • Do not use any column names that are the same as the column names returned in the output table for the identity resolution operation you're going to run.

  • Every column name must be unique in a table.

  • Try not to use additional columns in the input tables required for the identity resolution operation as having extra columns slows down processing.

  • The identity resolution operation can process records containing blank fields.

Table Naming Guidelines

When naming tables, follow these guidelines:

  • Table names must use ASCII characters and not contain either spaces or special characters such as !@#$%.

  • You can use underscores “_” within the name, but not as the initial character.

  • Consider using the following elements in your table names: type of data or description, a date or timestamp, and an identity designation. For example, the table name Identity_TwoButtonSuitsCampaign_impressions_2022-06-01 contains all three element types.

Metadata Table Columns and Descriptions

The metadata table passes the required credentials, specifies the type of operation, and specifies the column names in the input table to reference for the original RampIDs, the domain to translate to, and the identifier type.

As long as the column names in the input table stay the same, the original metadata table can be reused for multiple operations. You only need to create a new metadata table if you change the column names in the input table.

Metadata column names must match those shown in the table below. The column names are not case sensitive, and should not be enclosed in single or double quotation marks.

Column

Description

CLIENT_ID

Enter either an existing CLIENT_ID or a new one provided in implementation.

CLIENT_SECRET

Enter the password/secret for the CLIENT_ID.

EXECUTION_MODE

Enter Resolution.

EXECUTION_TYPE

Enter Email.

TARGET_COLUMN

Enter the name of the column containing the email addresses to be resolved.

Note

You can specify the number of RampIDs to be returned for each email, to a maximum of 10 RampIDs per email. By default, 1 RampID will be returned per email. To change this, replace the “1” in the “as limit” line at the bottom of the SQL sample shown below with your desired number.

Here is an example of BigQuery SQL for creating a metadata table:

// Example SQL query to create a metadata table for email resolution

CREATE OR REPLACE TABLE <dataset>.<metadata_table_name> (
CLIENT_ID STRING, 
CLIENT_SECRET STRING, 
EXECUTION_MODE STRING, 
EXECUTION_TYPE STRING, 
1 as LIMIT
);

For each run, if any changes are to be made to the metadata table to configure the run differently, update the metadata table.

// Example SQL query to insert a row into the metadata table for email resolution

insert into <dataset>.<metadata_table> values ('<client_id>', '<client_secret>', 'resolution', 'email', 'hashed_email', 1);

Input Table Columns and Descriptions

An input table needs to be prepared for each identity resolution operation.

The column names for the input table can be whatever you want to use, as long as the names match the values specified in the metadata table.

See the table below for a list of the suggested input table columns and descriptions for email-only resolution.

Suggested Column Name

Example

Description

hashed_email

8c9775a5999b5f0088008c0b26d7fe8549d5c80b0047784996a26946abac0cef

  • SHA-256 hashed emails only.

  • Email addresses should be uppercased and UTF-8 encoded prior to hashing.

  • After hashing, convert the resulting hash into lowercase hexadecimal representation.

attribute_1

Male

For email address resolution, you can include columns with attribute data. These columns will be returned in the output table (for more information, see the "Privacy Filter" section below).

Here are some examples of BigQuery SQL for creating this table:

// Example SQL query to create an input table for email resolution

CREATE OR REPLACE TABLE  <dataset>.<input_table_name> (
HASHED_EMAIL STRING, 
ATTRIBUTE_1 STRING, 
ATTRIBUTE_12 STRING
);

// Or setup against an existing table

CREATE OR REPLACE VIEW <dataset>.<view_name> as 
SELECT 
d.hashed_email as HASHED_EMAIL, 
d.gender as ATTRIBUTE_1, 
d.name as ATTRIBUTE_2 
FROM my_bq_table d;
// Example SQL query to insert a row into the input table for email resolution
INSERT INTO <dataset>.<input_table> values ('some_ramp_id', '1234', 'RAMPID');

The output table is created after you share the tables and notify LiveRamp to initiate identity resolution. For output table example, see the "Notify LiveRamp to Initiate Identity Resolution" section below.

Share Tables and Datasets with LiveRamp

In order for LiveRamp to be able to process the data, the tables and dataset must be shared with LiveRamp’s GCP Principal account (LiveRamp will share this service account ID during initial implementation).

The following script shows an example and the correct permissions to share the two tables created above and the output dataset.

GRANT `roles/bigquery.dataViewer`
ON TABLE <dataset_name>.<input_table_name>
TO "<LiveRamp Principal Account Project ID>";

GRANT `roles/bigquery.dataViewer`
ON TABLE <dataset_name>.<metadata_table_name>
TO "<LiveRamp Principal Account Project ID>";

-- GRANT Write for Output Tables
GRANT `roles/bigquery.dataEditor`
ON SCHEMA <output_dataset_name>
TO "<LiveRamp Principal Account Project ID>";

Optional Share and Query Data

You can use a parameterized and repeatable share script that selects the appropriate data to share with LiveRamp afterwards. See the code below for an example.

BEGIN
  -------------------------------------------------------------------------------
  -- UPDATE VARIABLES WITH CORRECT FULLY QUALIFIED TABLE & DATASET IDENTIFIERS --
  -------------------------------------------------------------------------------
  -- input_table is table that has ramp_ids to transcode from, target_domain, and type
  DECLARE input_table STRING DEFAULT 'liveramp_consumer_tests.transcoding_input';
  -- input_metadata table has arugments for your run including client_id / client_secret
  DECLARE input_metadata STRING DEFAULT 'liveramp_consumer_tests.metadata_input';
  -- output_dataset is an empty dataset you created to share to LiveRamp to write outputs to.
  DECLARE output_dataset STRING DEFAULT 'liveramp_consumer_tests_output';
  -- project_name is the project in which your input tables and output_dataset lives.
  DECLARE project_name STRING DEFAULT 'unrestricted-coding';
  -------------------------------------------------------------------------------
  -- END VARIABLES DO NOT EDIT BELOW THIS
  -------------------------------------------------------------------------------
  DECLARE liveramp_iam_principle STRING DEFAULT 'user:new.user@liveramp.com';

  -- GRANT Read of Input Tables
  EXECUTE IMMEDIATE "GRANT `roles/bigquery.dataViewer` ON TABLE " || input_table || " TO '" || liveramp_iam_principle || "';";
  EXECUTE IMMEDIATE "GRANT `roles/bigquery.dataViewer` ON TABLE " || input_metadata || " TO '" || liveramp_iam_principle || "';";

  -- GRANT Write for Output Tables
  EXECUTE IMMEDIATE "GRANT `roles/bigquery.dataEditor` ON SCHEMA " || output_dataset || " TO '" || liveramp_iam_principle || "';";

  with fqdns AS
    (
      SELECT STRUCT("input_table" as key, project_name || "." || input_table as fqdn) as item
      UNION ALL
      SELECT STRUCT("input_metadata" as key, project_name || "." || input_metadata as fqdn) as item
      UNION ALL
      SELECT STRUCT("output_dataset" as key, project_name || "." || output_dataset as fqdn) as item
    )
  SELECT f.item.*
  FROM fqdns f;

END

Notify LiveRamp to Initiate Identity Resolution

After the input and metadata tables have been shared with LiveRamp, send an email to LiveRampIdentitySupport@liveramp.com asking to process the input dataset and include the following information:

  • Subject: <company_name> - Kickoff BigQuery Translation

  • The fully qualified names for the following objects:

    • Input table

    • Metadata table

    • Output dataset

    Note

    To be fully qualified, each name must include the following information:

    • Project ID (for example, “liveramp_test_project”)

    • Dataset ID (for example, “test_input_dataset”)

    • Asset ID (for example, “input_table_name”)

  • The name you want LiveRamp to use for the output table.

See below for an example:

To whom it may concern:

Our input dataset is ready to process, please use the following information to run the Embedded Identity Job:

- input: liveramp_test_project.test_input_dataset.input_table_name

- metadata: liveramp_test_project.test_input_dataset.metadata_table_name

- output: liveramp_test_project.test_output_dataset_name

I would like the output table to be named with "output_company_<datetime>"

Thanks,

<contact name>

LiveRamp will process the input and write the output to the output dataset designated. Once this process has been completed, LiveRamp will email you to confirm completion.

If there are any issues with the input data table, the metadata table, or the output table location, LiveRamp will reach out to the individual that kicked off the run to discuss.

The average turnaround time for this is 3 business days or less.

The results end up in the output table in the same database (see below for an example).

Column

Example

Description

RampID

XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd

Returns the resolved RampID in your domain encoding.

attribute_1

Male

Any attribute columns passed through the service are returned.

Privacy Filter

To minimize the risk of re-identification (the ability to tie PII directly to a RampID), the service includes the following processes when resolving PII identifiers (PII resolution or email-only resolution):

  • Column Values: The process evaluates the combination of all the column values on a per row basis for unique values. If a particular combination of column values occurs 3 or fewer times, the rows containing those column values will not be matchable and will not be returned in the output table.

  • >5% of the table unmatchable: If, based on column value uniqueness, >5% of the file rows are unmatchable, the job will fail.

  • Number of Unique RampIDs: If fewer than 100 unique RampIDs would be returned, the job will fail.

  • Reswizzle full table: Upon completion, the full table will be reswizzled to return the rows RampID | attribute_1 | attribute_2 | attribute_n in a different order than what was submitted in the input table.