Perform Identity Resolution in BigQuery

Abstract

LiveRamp's identity resolution capabilities in BigQuery allows you to resolve various identifiers to RampIDs, LiveRamp’s persistent pseudonymous identifier for persons and households. This allows you to resolve personally-identifiable information (PII) or device identifiers to a persistent pseudonymous identifier for persons and households. You can also input an individual-based RampID and get back any household-based RampID that might be associated with that individual. Identity resolution allows you to have a more holistic view of your data at an individual or household level.

Note

For more information about RampIDs, see "RampID Methodology".

LiveRamp’s identity capabilities are available within BigQuery through the BigQuery Entity Resolution Framework. LiveRamp writes the results to a table in your account, enabling you to query the dataset from within your own BigQuery environment. See "LiveRamp Embedded Identity in BigQuery" for more information.

The following identifiers can be resolved:

Names
Postal addresses
Email addresses
Phone numbers
Cookies
MAIDs (mobile device IDs)
CTV IDs (Connected TV Device IDs)
CIDs (custom identifiers)
Person-based, maintained RampIDs (for resolution to household RampIDs)

Based on the type of identifier you’re resolving, you might receive one RampID per identifier or multiple RampIDs per identifier:

For PII touchpoints (in PII or email resolution), you can choose to receive from 1 to 10 RampIDs (if available).
For PII touchpoints, you can also choose a deconfliction configuration that will return RampIDs that are ranked most relevant based on how active the linkages are in the digital ecosystem and other factors. This can be particularly useful for large datasets where overconnected RampIDs can introduce noise into analytics. For more information, see the “Deconfliction Options” section below.
For cookie and mobile device ID resolution, typically one RampID is returned (but shared touchpoints with more than one RampID can exist).
For CTV identifiers, it is common to receive multiple individual RampIDs per identifier.
For existing CID syncs, the number of RampIDs will vary.
When resolving individual RampIDs to household RampIDs, only one household RampID is returned.

Deconfliction Options

Deconfliction refers to an optional identity resolution configuration that optimizes the fit of the LiveRamp graph to your definition of a customer, which is particularly beneficial for universe datasets.

Deconfliction attempts to reduce the number of “conflicting” RampIDs for each individual record in a universe dataset. A RampID is considered conflicting if the same RampID is linked to the record for another user (such as a family member) in your system. Deconfliction attempts to include only the RampIDs that are most relevant to an individual record and reduce the number of “shared” RampIDs.

To utilize deconfliction, you’ll need to provide a CID (custom ID) for each record, and for the configuration to drive the highest value, the full universe is required to deconflict across the table. The output you receive will include the best fidelity RampIDs for each CID.

There are two deconfliction options you can choose from:

Standard: This configuration returns the RampIDs that are determined to be most relevant and removes CIDs that are determined to be duplicates (based on linking to the same RampID). This is the default option and covers most advertiser use cases.
Maximized first-party fidelity: This configuration returns the RampIDs that are determined to be most relevant but preserves additional CIDs even if there are RampIDs that indicate LiveRamp could consolidate them. This option is ideal for publishers and other data owners.

For more information on deconfliction, see “Using Deconfliction on a Universe Dataset”.

Overall Steps

Once you’ve enabled LiveRamp Embedded Identity in BigQuery, performing an identity resolution involves the following tasks:

Note

For instructions on enabling Live Ramp Embedded Identity in BigQuery, see “Enabling LiveRamp Embedded Identity in BigQuery”.

You prepare an input table and a metadata table for the appropriate identity resolution operation in your BigQuery account.
You share read-only access to those tables with LiveRamp’s BigQuery account and you share write-only access to a target dataset in your account where the output tables will be written.
LiveRamp shares a stored procedure with your BigQuery account.
You call the shared stored procedure to initiate the identity resolution operation, referencing your tables.
LiveRamp uses your input and metadata tables to perform the operation and writes the output to a table in your account.

See the sections below for information on performing these tasks.

Authentication

The LiveRamp Identity Service in BigQuery relies on the same authentication service as LiveRamp's AbiliTec and RampID APIs (Identity APIs). If you have credentials to those APIs, you can use your previously assigned credentials.

Note

Coordinate with LiveRamp to enable these API’s credentials for use with embedded identity.

Authenticating with LiveRamp's GCP service requires a call on behalf of the customer to LiveRamp's core services.

Client credentials are used to obtain an access token by passing the client ID and client secret values.

Create the Input Table for Identity Resolution

An input table needs to be prepared for each identity resolution operation. You can create these tables inside BigQuery or import the tables into your database using BigQuery's standard methods. The <...> variables may be substituted with your own values. Be sure to reference the names correctly in the metadata table, which has as its default name <resolution_meta_table>, and make sure that the column names also match up correctly.

When creating input tables, keep the following guidelines in mind:

The column names for the input table can be whatever you want to use, as long as the names match the values specified in the metadata table.
Do not use any column names that are the same as the column names returned in the output table for the identity resolution operation you're going to run.
Every column name must be unique in a table.
Try not to use additional columns in the input tables required for the identity resolution operation as having extra columns slows down processing.
The identity resolution operation can process records containing blank fields.
Table names must use ASCII characters and not contain either spaces or special characters such as !@#$%.
Table names can use underscores "_" within the name, but not as the initial character.

See the sections below for suggested input table columns and descriptions for each resolution type.

The output table is created after you share the input and metadata tables and run the shared “bq_lr_invoke” stored procedure. For output table examples, see the sections in "View the Output Table" below.

Input Table Columns for PII Resolution (Without Deconfliction)

The standard PII resolution process (without deconfliction configured) passes the data through a privacy filter which removes the PII and reswizzles the table. Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

You can include up to 4 name element columns and 7 address element columns in your PII resolution input table.

These column names cannot be used in the input table for PII resolution:

RampID
__lr_rank
__lr_filter_ name

See the table below for a list of the suggested input table columns and descriptions for PII resolution.

Suggested Column Name	Example	Description
`first_name`	John	You can include separate First Name and Last Name columns or you can combine first name and last name in one column (such as "Name").
`last_name`	Doe	You can include separate First Name and Last Name columns or you can combine first name and last name in one column (such as "Name").
`address_1`	123 Main St
`address_2`	Apt 1	You can include separate Address 1 and Address 2 columns or you can combine all street address information in one column (such as "Address").
`city`	Smalltown	When matching on address, `city` is optional.
`state`	CA	When matching on address, `state` is optional. If including State, must be a two-character, capitalized abbreviation ("CA", not "California" or "Ca").
`zip`	12345	Required when matching on addresses. Can be in 5-digit format or 9-digit format (ZIP+4).
`email`	john@email.com	Plaintext emails only. Only one plaintext email per input row is permitted. Other emails must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates. All emails must meet these requirements: Have characters before and after the "@" sign Contain a period character (".") Have characters after the period character Examples of valid emails include: a@a.com A@A.COM email@account.com EMAIL@ACCOUNT.COM email@sub.domain.com EMAIL@SUB.DOMAIN.COM
`phone`	555-123-4567	Plaintext phone numbers only. Only one phone number per input row is permitted. Other phone numbers must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates. All phone numbers must meet these requirements: Can be more than 10 characters if leading numbers over 10 characters are “0” or “1” If no leading numbers are used, must be 10 characters long Can contain hyphens ("-"), parentheses ("(" or ")"), plus signs ("+"), and periods (".") Examples of valid phone numbers include: 8668533267 866.853.3267 (866) 853-3267 8668533267 +1 (866) 853-3267 +18668533267 18668533267 1111111118668533267 08668533267 Examples of invalid phone numbers include: 987654321 (fewer than 10 characters) 98765432109 (more than 10 characters) 1234567890 (after removing the leading "1", less than 10 characters remain) 0987654321 (after removing the leading "0", less than 10 characters remain)
`attribute_1`		For PII resolution, you can include columns with attribute data. These columns will be returned in the output table (for more information, see the "View the PII Resolution Output Table (Without Deconfliction)" section below). If you specify that an attribute column should be hashed, it will appear in the output table with a prefix of "hashed_". The input table must not include a column with the same name as the name of the hashed column in the output table.

Here are some examples of BigQuery SQL for creating this table:

// Example SQL query to create an input table for PII resolution

CREATE OR REPLACE TABLE  <dataset>.<input_table_name> (
first_name text,
middle_name text,
last_name text,
address_1 text,
address_2 text,
city text,
state text,
zip text,
email text,
phone text,
attribute_1 text
);

// Or setup against an existing table

CREATE OR REPLACE VIEW <dataset>.<view_name> as 
SELECT 
d.first_name as first_name,
d.middle_name as middle_name,
d.last_name as last_name,
d.address_1 as address_1,
d.address_2 as address_2,
d.city as city,
d.state as state,
d.zip as zip,
d.email as email,
d.phone as phone,
d.attribute_1 as attribute_1
FROM my_bq_table d;


// Example SQL query to insert a row into the input table for PII resolution

INSERT INTO <dataset>.<view_name> values (
	‘John’,
	‘Farmer’,
	‘Smith’,
	‘123 Test Dr,
	‘Apt 321’,
	‘San Francisco’,
	‘CA’,
	‘94112’,
	‘test@gmail.com’,
	‘123-123-1234’,
	‘attribute_1_value’

);

Input Table Columns for Hashed Email-Only Resolution

The email-only resolution process operates similarly to PII resolution. Any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

Note

To perform identity resolution across additional PII touchpoints, see the “Input Table Columns for PII Resolution (Without Deconfliction)” section above.

See the table below for a list of the suggested input table columns and descriptions for email-only resolution.

Suggested Column Name	Example	Description
`hashed_email`	8c9775a5999b5f0088008c0b26d7fe8549d5c80b0047784996a26946abac0cef	SHA-256, MD5, and SHA-1 hashed emails accepted. Email addresses should be lowercased and UTF-8 encoded prior to hashing. After hashing, convert the resulting hash into lowercase hexadecimal representation.
`attribute_1`	Male	For email address resolution, you can include columns with attribute data. These columns will be returned in the output table (for more information, see the "Privacy Filter" section below).

Here are some examples of BigQuery SQL for creating this table:

// Example SQL query to create an input table for email resolution

CREATE OR REPLACE TABLE  <dataset>.<input_table_name> (
HASHED_EMAIL STRING, 
ATTRIBUTE_1 STRING, 
ATTRIBUTE_12 STRING
);

// Or setup against an existing table

CREATE OR REPLACE VIEW <dataset>.<view_name> as 
SELECT 
d.hashed_email as HASHED_EMAIL, 
d.gender as ATTRIBUTE_1, 
d.name as ATTRIBUTE_2 
FROM my_bq_table d;

// Example SQL query to insert a row into the input table for email resolution
INSERT INTO <dataset>.<input_table> values ('some_hashed_email', 'attribute_1_value', 'attribute_2_value');

Input Table Columns for Device ID Resolution

The device ID resolution operation can be used for the following purposes:

To translate device identifiers (MAIDs, and CTV IDs) into individual RampIDs
To translate individual RampIDs into their associated household RampIDs

See the tables below for a list of the suggested input table columns and descriptions for these device ID resolution options.

Note

Each device ID resolution input table should contain only one identifier column (either a device identifier or a maintained RampID).
You can include columns with attribute data, but these columns will not be returned in the output table.

See the table below for the suggested input table column and description for translating device identifiers.

Suggested Column Name	Example	Description
`device_identifier`	1f4d256c-1f08-41f6-a108-bbe511de9497	Can be one of the following identifiers: MAID CTV ID COOKIE

Here are some examples of BigQuery SQL for creating this table:

// Example SQL query to create an input table for device resolution

CREATE OR REPLACE TABLE  <dataset>.<input_table_name> (
DEVICE_IDENTIFIER STRING
);

// Or setup against an existing table

CREATE OR REPLACE VIEW <dataset>.<view_name> as 
SELECT 
d.device_identifier as DEVICE_IDENTIFIER
FROM my_bq_table d;

// Example SQL query to insert a row into the input table for device resolution

INSERT INTO <dataset>.<input_table> values ('some_device_id');

See the table below for the suggested input table column and description for translating individual RampIDs into their associated household RampIDs.

Suggested Column Name	Example	Description
`RampID`	XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd	The RampID for translation to a Household RampID. Must be a maintained RampID (to have an associated with a Household RampID).

Here are some examples of BigQuery SQL for creating this table:

// Example SQL query to create an input table for translating individual RampIDs into household RampIDs

CREATE OR REPLACE TABLE  <dataset>.<input_table_name> (
RAMPID STRING;

// Or setup against an existing table

CREATE OR REPLACE VIEW <dataset>.<view_name> as 
SELECT 
d.rampid as RAMPID
FROM my_bq_table d;

// Example SQL query to insert a row into the input table for translating individual RampIDs into household RampIDs

INSERT INTO <dataset>.<input_table> values ('some_ramp_id');

Input Table Columns for CID Matching

See the table below for the suggested input table column and description for CID matching. This process enables the retrieval of an existing CID to RampID mapping, hosted by LiveRamp.

Note

You can include columns with attribute data, but these columns will not be returned in the output table.

Suggested Column Name	Example	Description
`cid`	b916clarib la1;blNj10gtQjQ3QUEwMTNEMTcaktboEc0g9022cxoiaklr20185	The CID for translation to a RampID

Here are some examples of BigQuery SQL for creating this table:

// Example SQL query to create an input table for CID resolution

CREATE OR REPLACE TABLE  <dataset>.<input_table_name> (
CID STRING
);

// Or setup against an existing table

CREATE OR REPLACE VIEW <dataset>.<view_name> as 
SELECT 
d.cid as CID
FROM my_bq_table d;

// Example SQL query to insert a row into the input table for CID resolution

INSERT INTO <dataset>.<input_table> values ('some_cid');

Input Table Columns for Universe Dataset Resolution with Deconfliction

This process allows for resolution of a universe dataset with a deconfliction configuration, minimizing conflicts across the entire dataset. The output from this process is a deconflicted hashed CID to RampID mapping.

When resolving a universe dataset, you can use one input table or multiple input tables. Each input table can include one identifier type or multiple identifier types (including PII, hashed email, or MAIDs). The examples below can be used for the specific situations listed but you can create an input table that uses any combination of these identifiers.

Note

You can include columns with attribute data, but these columns will not be returned in the output table.

The requirements for each identifier type (listed in the relevant sections above) apply here. For example, any PII-based records must include zip in order to match on address and hashed emails must be lowercased and UTF-8 encoded prior to hashing.

.See the table below for a list of the suggested input table columns for a job that will contain both hashed emails and MAIDs with deconfliction.

Suggested Column Name	Example	Description
`cid`	g221lariab la8;blNj10gtQjQ3QUEwMTNEMTcaktboEc0g9022cxoiaklr91054	A custom identifier that represents an individual in the dataset. Data format is UTF-8 compliant alphanumeric string of up to 256 characters
`hashed_email`	8c9775a5999b5f0088008c0b26d7fe8549d5c80b0047784996a26946abac0cef	SHA-256, MD5, and SHA-1 hashed emails accepted. Email addresses should be lowercased and UTF-8 encoded prior to hashing. After hashing, convert the resulting hash into lowercase hexadecimal representation.
`device_identifier`	1f4d256c-1f08-41f6-a108-bbe511de9497	MAIDs are the only supported device identifier at this time

See the table below for a list of the suggested input table columns for a job that will contain plaintext PII.

Suggested Column Name	Example	Description
`cid`	`cid`	A custom identifier that represents an individual in the dataset. Data format is UTF-8 compliant alphanumeric string of up to 256 characters.
`first_name`	John	You can include separate First Name and Last Name columns or you can combine first name and last name in one column (such as "Name").
`last_name`	Doe	You can include separate First Name and Last Name columns or you can combine first name and last name in one column (such as "Name").
`address_1`	123 Main St
`address_2`	Apt 1	You can include separate Address 1 and Address 2 columns or you can combine all street address information in one column (such as "Address").
`city`	Smalltown	When matching on address, City is optional.
`state`	CA	When matching on address, State is optional. If including State, must be a two-character, capitalized abbreviation ("CA", not "California" or "Ca").
`zip`	12345	Required when matching on addresses. Can be in 5-digit format or 9-digit format (ZIP+4).
`email`	john@email.com	Plaintext emails only. Only one email per input row is permitted. Other emails must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates. All emails must meet these requirements: Have characters before and after the “@” sign Contain a period character (“.”) Have characters after the period character Examples of valid emails include: a@a.com A@A.COM email@account.com EMAIL@ACCOUNT.COM email@sub.domain.com EMAIL@SUB.DOMAIN.COM
`phone`	555-123-4567	Plaintext phone numbers only. Only one phone number per input row is permitted. Other phone numbers must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates. All phone numbers must meet these requirements: Can be more than 10 characters if leading numbers over 10 characters are “0” or “1” If no leading numbers are used, must be 10 characters long Can contain hyphens ("-"), parentheses ("(" or ")"), plus signs ("+"), and periods (".") Examples of valid phone numbers include: 8668533267 866.853.3267 (866) 853-3267 8668533267 +1 (866) 853-3267 +18668533267 18668533267 1111111118668533267 08668533267 Examples of invalid phone numbers include: 987654321 (fewer than 10 characters) 98765432109 (more than 10 characters) 1234567890 (after removing the leading "1", less than 10 characters remain) 0987654321 (after removing the leading "0", less than 10 characters remain)

Here are some examples of BigQuery SQL for creating this table:

// Example SQL query to create an input table for PII resolution

CREATE OR REPLACE TABLE  <dataset>.<input_table_name> (
cid text,
first_name text,
middle_name text,
last_name text,
address_1 text,
address_2 text,
city text,
state text,
zip text,
email text,
phone text,
attribute_1 text
);

// Or setup against an existing table

CREATE OR REPLACE VIEW <dataset>.<view_name> as 
SELECT 
d.cid as cid,
d.first_name as first_name,
d.middle_name as middle_name,
d.last_name as last_name,
d.address_1 as address_1,
d.address_2 as address_2,
d.city as city,
d.state as state,
d.zip as zip,
d.email as email,
d.phone as phone,
d.attribute_1 as attribute_1
FROM my_bq_table d;


// Example SQL query to insert a row into the input table for PII resolution

INSERT INTO <dataset>.<view_name> values (
	‘g221lariab la8;blNj10gtQjQ3QUEwMTNEMTcaktboEc0g9022cxoiaklr91054’
        ‘John’,
	‘Farmer’,
	‘Smith’,
	‘123 Test Dr,
	‘Apt 321’,
	‘San Francisco’,
	‘CA’,
	‘94112’,
	‘test@gmail.com’,
	‘123-123-1234’,
	‘attribute_1_value’

);

Create the Metadata Table

The metadata table passes the required credentials, specifies the type of operation, and specifies the column names in the input table to reference for the original RampIDs, the domain to translate to, and the identifier type.

A metadata table can be reused for multiple operations, but a separate metadata table must be prepared for each different identity resolution operation you want to perform. For example, if you’re going to perform identity resolution on both MAIDs and hashed emails, you’ll need a different metadata table for each operation.

When creating metadata tables, keep the following guidelines in mind:

Metadata column names must match those shown in the tables below. Metadata table column names are not case-sensitive, and should not be enclosed in single or double quotation marks.
Table names must use ASCII characters and not contain either spaces or special characters such as !@#$%.
Table names can use underscores "_" within the name, but not as the initial character.

See the sections below for instructions on creating the metadata table.

Create the Metadata Table for PII Resolution (Without Deconfliction)

To create a metadata table for PII resolution without deconfliction, include the following columns:

<client_id>: Enter either an existing client ID or a new one provided in implementation.
<client_secret>: Enter the password/secret for the client ID.
<execution_mode>: Enter resolution for the execution mode.
<execution_type>: Enter pii for the execution type.
<up to 4 name column names>: Enter the names of the columns in the input table to be used for the “name” element. Each input table column name should be enclosed in double-quotes. Enter a maximum of 4 name columns. If entering multiple column names, separate the column names with commas.
<up to 7 address column names>: Enter the names of the columns in the input table to be used for the "address" element. Each input table column name should be enclosed in double-quotes. Enter a maximum of 7 address columns. If entering multiple column names, separate the column names with commas.
<city column>: Enter the name of the column to be used for the "city" element.
<state column>: Enter the name of the column to be used for the "state" element.
<zipcode column>: Enter the name of the column to be used for the "zipcode" element.
<phone column>: Enter the name of the column to be used for the "phone" element.
<email column>: Enter the name of the column to be used for the "email" element.
<up to 10 hashed attribute column names>: Enter the names of the attribute columns in the input table that should be passed through to the output in MD5-hashed format. Each input table column name should be enclosed in double quotes. Enter a maximum of 10 hashed attribute columns. If entering multiple column names, separate the column names with commas.

Note

You can specify the number of RampIDs to be returned for each set of PII, to a maximum of 10 RampIDs per PII set. By default, 1 RampID will be returned per PII set. To change this, replace the “1” in the “as limit_val” line at the bottom of the SQL sample shown below with your desired number.

Here are some examples of BigQuery SQL for creating this table:

-- FOR PII Update the parameters here for the metadata table
-- Not all PII types need to be present, remove unused entries from the target_columns JSON
create or replace table identifier($customer_meta_table_name) as
select
    '<client_id>' as client_id,
    '<client_secret>' as client_secret,
    'resolution' as execution_mode,
    'pii' as execution_type,
    '{"name":["<up to 4 name column names>"],"streetAddress":["<up to 7 address column names>"],"city":"<city column>","state":"<state column>","zipCode":"zipcode column","phone":"<phone column>","email":"<email column>","hashedAttributes":["<up to 10 hashed attribute column names>"]}'
 as target_columns,
    1 as limit_val;

Example of PII Resolution Metadata Table SQL

The populated SQL with the suggested input column names might look like the example shown below:

-- FOR PII Update the parameters here for the metadata table
-- Not all PII types need to be present, remove unused entries from the target_columns JSON
create or replace table identifier($customer_meta_table_name) as
select
    'liveramp_client' as client_id,
    '84159be2-ab93-4bf8-24c9-2g123ef08815' as client_secret,
    'resolution' as execution_mode,
    'pii' as execution_type,
    '{"name":["FIRSTNAME","LASTNAME"],"streetAddress":["ADDRESSLINE","ADDRESSLINE2"],"city":"CITY","state":"STATE","zipCode":"ZIPCODE","phone":"PHONE","email":"EMAIL"}' as target_columns,
    1 as limit_val;

Create the Metadata Table for Hashed Email Resolution

To create a metadata table for email resolution, include the following columns:

<client_id>: Enter either an existing client ID or a new one provided in implementation.
<client_secret>: Enter the password/secret for the client ID.
<execution_mode>: Enter resolution for the execution mode.
<execution_type>: Enter email for the execution type.
<column to be resolved>: Enter the name of the column containing the email addresses to be resolved.

Note

You can specify the number of RampIDs to be returned for each email, to a maximum of 10 RampIDs per email. By default, 1 RampID will be returned per email. To change this, replace the "1" in the "as limit_val" line at the bottom of the SQL sample shown below with your desired number.

Here are some examples of BigQuery SQL for creating this table:

-- email
-- FOR EMAIL Update the parameters here for the metadata table
create or replace table identifier($customer_meta_table_name) as
select
    '<client_id>' as client_id,
    '<client_secret>' as client_secret,
    'resolution' as execution_mode,
    'email' as execution_type,
    '<column to be resolved>' as target_column,
    1 as limit_val;

Create the Metadata Table for Device ID Resolution

To create a metadata table for device ID resolution, include the following columns:

<client_id>: Enter either an existing client ID or a new one provided in implementation.
<client_secret>: Enter the password/secret for the client ID.
<execution_mode>: Enter resolution for the execution mode.
<execution_type>: Enter the execution type:
- Enter MAID to resolve mobile device IDs
- Enter CTV to resolve CTV IDs
- Enter HHLink to resolve individual RampIDs into household RampIDs
<column to be resolved>: Enter the name of the column containing the device identifiers (MAIDs or CTV IDs) or RampIDs to be resolved.

Here are some examples of BigQuery SQL for creating this table:

--FOR DEVICE OR CID RESOLUTION Update the parameters here for the metadata table
create or replace table identifier($customer_meta_table_name) as
select
    '<client_id>' as client_id,
    '<client_secret>' as client_secret,
    'resolution' as execution_mode,
    '<execution_type>' as execution_type,
    '<column to be resolved>' as target_column;

Create the Metadata Table for CID Matching

To create a metadata table for CID resolution, include the following columns:

<client_id>: Enter either an existing client ID or a new one provided in implementation.
<client_secret>: Enter the password/secret for the client ID.
<execution_mode>: Enter resolution for the execution mode.
<execution_type>: Enter the CID name provided to you by LiveRamp and specific to the mapping created.
<column to be resolved>: Enter the name of the column containing the CIDs to be mapped.

Here are some examples of BigQuery SQL for creating this table:

--FOR DEVICE OR CID RESOLUTION Update the parameters here for the metadata table
create or replace table identifier($customer_meta_table_name) as
select
    '<client_id>' as client_id,
    '<client_secret>' as client_secret,
    'resolution' as execution_mode,
    '<execution_type>' as execution_type,
    '<column to be resolved>' as target_column;

Create the Metadata Table for Universe Dataset Resolution with Deconfliction

To create the metadata table for universe dataset resolution with deconfliction, including the ability to use one or multiple input tables:

Update the following variables in the sample SQL from the Execution worksheet shown below and then run the SQL:
- <client_id>: Enter either an existing client ID or a new one provided in implementation.
- <client_secret>: Enter the password/secret for the client ID.
- inputs: Enter the details regarding the input tables and target columns. The sample SQL shows two input tables, one containing PII and the other containing hashed emails and MAIDs as specified by the targetColumns. Additional tables can be added as separate entries in the inputs array. Each table can have a mix of any of the supported column types, although all must contain a cid column, and the name of the cid column must be the same in all tables.
- <deconflictionConfig>: Enter one of the two deconfliction options listed below:
  - standard: This configuration returns the RampIDs that are determined to be most relevant and removes CIDs that are determined to be duplicates (based on linking to the same RampID). This is the default option and covers most advertiser use cases.
  - maximized_1P_fidelity: This configuration returns the RampIDs that are determined to be most relevant but preserves additional CIDs even if there are RampIDs that indicate LiveRamp could consolidate them. This option is ideal for publishers and other data owners.
  Note
  For more information on deconfliction, see “Using Deconfliction on a Universe Dataset”.

Note

When deconfliction is configured, the limit parameter is not needed. All RampIDs will be used to deconflict across the entire dataset returning an optimized CID to RampID table for the output.

Here are some examples of BigQuery SQL for creating this table:

CREATE OR REPLACE TABLE identifier($customer_meta_table_name) as
SELECT '{' ||
        '"client_id": ": "<client_id>", ' ||
        '"client_secret": "<client_secret>", ' ||
        '"execution_mode": "resolution", ' ||
        '"execution_type": "cid_mapping", ' ||
        '"inputs": [' ||
            '{"input_table": "' || "<project_id>.<input_dataset>.<input_table_maid>" || '" , "target_columns": {"maid": "device_id", "cid": "cid_col"}}, ' ||
'{"input_table": "' || "<project_id>.<input_dataset>.<input_table_pii>" || '" , "target_columns": {"name": "['<up to 4 name column names>']", "streetAddress": "['<up to 7 address column names>']", "city": "<city column>", "state": "<state column>", "zipCode": "<zipcode column>", "phone": "<phone column>", "email": "<email column>", "cid": "cid_col"}}, ' ||
        '], ' ||
        '"config": {"deconflictionConfig": "<deconfliction configuration>"}, ' ||
        '"output_table": "' || "<project_id>.<output_dataset>.<output_table_name>" || '"  ' ||
    '}'

Example of PII Resolution (with Deconfliction) Metadata Table SQL

The populated SQL with the suggested input column names might look like the example shown below:

-- FOR PII Update the parameters here for the metadata table
-- Not all PII types need to be present, remove unused entries from the target_columns JSON
create or replace table identifier($customer_meta_table_name) as
select
    'liveramp_client' as client_id,
    '84159be2-ab93-4bf8-24c9-2g123ef08815' as client_secret,
    'resolution' as execution_mode,
    'cid_mapping' as execution_type,

'{“cid”:“cid”,"name":["FIRSTNAME","LASTNAME"],"streetAddress":["ADDRESSLINE","ADDRESSLINE2"],"city":"CITY","state":"STATE","zipCode":"ZIPCODE","phone":"PHONE","email":"EMAIL"}' as target_columns,
    {“deconflictionConfig”:”standard”}’ as config;

Share Tables and Datasets with LiveRamp

In order for LiveRamp to be able to process the data, the tables and dataset must be shared with LiveRamp’s GCP Principal account (LiveRamp will share this service account ID during initial implementation).

The following script shows an example and the correct permissions to share the two tables created above and the output dataset.

GRANT `roles/bigquery.dataViewer`
ON TABLE <dataset_name>.<input_table_name>
TO "serviceaccount:embedded-execution-account@eng-id-embedded-prod-producer.iam.gserviceaccount.com";

GRANT `roles/bigquery.dataViewer`
ON TABLE <dataset_name>.<metadata_table_name>
TO "serviceaccount:embedded-execution-account@eng-id-embedded-prod-producer.iam.gserviceaccount.com";

-- GRANT Write for Output Tables
GRANT `roles/bigquery.dataEditor`
ON SCHEMA <output_dataset_name>
TO "serviceaccount:embedded-execution-account@eng-id-embedded-prod-producer.iam.gserviceaccount.com";

Optional Share and Query Data

You can use a parameterized and repeatable share script that selects the appropriate data to share with LiveRamp afterwards. See the code below for an example.

BEGIN
  -------------------------------------------------------------------------------
  -- UPDATE VARIABLES WITH CORRECT FULLY QUALIFIED TABLE & DATASET IDENTIFIERS --
  -------------------------------------------------------------------------------
  -- input_table is table that has ramp_ids to transcode from, target_domain, and type
  DECLARE input_table STRING DEFAULT '<dataset_name>.<input_table_name>';
  -- input_metadata table has arguments for your run including client_id / client_secret
  DECLARE input_metadata STRING DEFAULT '<dataset_name>.<metadata_table_name>';
  -- output_dataset is an empty dataset you created to share to LiveRamp to write outputs to.
  DECLARE output_dataset STRING DEFAULT '<output_dataset_name>';
  -- project_name is the project in which your input tables and output_dataset lives.
  DECLARE project_name STRING DEFAULT '<project_name>';
  -------------------------------------------------------------------------------
  -- END VARIABLES DO NOT EDIT BELOW THIS
  -------------------------------------------------------------------------------
  DECLARE liveramp_iam_principle STRING DEFAULT 'serviceaccount:embedded-execution-account@eng-id-embedded-prod-producer.iam.gserviceaccount.com';

  -- GRANT Read of Input Tables
  EXECUTE IMMEDIATE "GRANT `roles/bigquery.dataViewer` ON TABLE " || input_table || " TO '" || liveramp_iam_principle || "';";
  EXECUTE IMMEDIATE "GRANT `roles/bigquery.dataViewer` ON TABLE " || input_metadata || " TO '" || liveramp_iam_principle || "';";

  -- GRANT Write for Output Tables
  EXECUTE IMMEDIATE "GRANT `roles/bigquery.dataEditor` ON SCHEMA " || output_dataset || " TO '" || liveramp_iam_principle || "';";

  with fqdns AS
    (
      SELECT STRUCT("input_table" as key, project_name || "." || input_table as fqdn) as item
      UNION ALL
      SELECT STRUCT("input_metadata" as key, project_name || "." || input_metadata as fqdn) as item
      UNION ALL
      SELECT STRUCT("output_dataset" as key, project_name || "." || output_dataset as fqdn) as item
    )
  SELECT f.item.*
  FROM fqdns f;

END

Call the Shared Stored Procedure to Initiate Identity Resolution

LiveRamp will share a stored procedure to initiate the identity resolution operation to a service account principal you provide, The stored procedure will invoke a remote connection from BigQuery to a cloud function hosted in the LiveRamp environment which will run your job on your behalf. LiveRamp will also create a job_history_table in the shared output dataset which will allow you to track the status of your jobs via a UUID returned by the invocation procedure.

Here is an example of BigQuery SQL for calling the invocation procedure:

-- This example assumes you've created your input and meta tables ahead of time based on your workflow type

CALL `eng-id-embedded-prod-producer.invoke_ds.bq_lr_invoke`(JSON '{"input_table":"<project_name>.<dataset_name>.<input_table_name>", "meta_table":"<project_name>.<dataset_name>.<meta_table_name>", "output_table":"<project_name>.<dataset_name>.<output_table_name>"}');

LiveRamp will process the input and write the output to the output dataset designated.

For information on viewing the output table, see the "View the Output Table" section below.

View the Output Table

The identity resolution results end up in the output table in the dataset specified during the procedure call.

Once you've confirmed that the output table has been generated, see the appropriate section below for information on the output table format for the type of identity resolution operation that was run.

View the PII Resolution Output Table (Without Deconfliction)

The PII resolution process (without deconfliction) passes the input table through a privacy filter which removes the PII and reswizzles the table (in addition to other operations). Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

Identity resolution of PII provides supplemental match metadata for additional insight into customer data that can provide powerful signals for making decisions based on RampIDs.

For PII resolution, the output table includes the fields shown in the table below.

Column	Example	Description
`RampID`	XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd	Returns the resolved RampID in your domain encoding.
`attribute_1`	Male	Any attribute columns passed through the service are returned.
`hashed_cid`	63889cfb9d3cbe05d1bd2be5cc9953fd	Any hashed attribute columns passed through the service are returned with their values MD5 hashed.
`__lr_rank`	1	Provides insight on the match cascade level associated with the identifiers. If no maintained RampID is found, this value will be "null".
`__lr_filter_name`	name_phone	Returns the filter name where the match occurred, which will be one of the following options: `name_address_zip` `name_email` `name_phone` `partial_name_email` `partial_name_phone` `strict_name` (name + zip) `email` `phone` `last_name_address` If no maintained RampID is found, this value will be "null".

View the Email-Only Resolution Output Table

For email-only resolution, the results end up in the output table in the same database, with the following fields (as shown below):

Column	Example	Description
`RampID`	XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd	Returns the resolved RampID in your domain encoding.
`attribute_1`	Male	Any attribute columns passed through the service are returned.

View the Device Identifier Resolution Output Table

For device identifier resolution, the output table includes the fields shown in the table below:

Note

If multiple RampIDs are associated with a device identifier, multiple lines will be created in the output file.

Column	Example	Description
`device_id`	93abc799-a0a5-40b5-80dd-d2ab61d4d072	The original identifier included in the input file.
`RampID`	XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd	For input files containing device identifiers, the RampID associated with the device identifier.

For resolving individual RampIDs to household RampIDs, the output table includes the fields shown in the table below:

Column	Example	Description
`RampID`	XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd	The original identifier included in the input file.
`hhlink`	hYT999YlUKuB5m-Fg2SnYyJixm908klluTJRs_yOYdlJO3DEw	For input files containing individual RampIDs, the household RampIDs associated with those individual RampIDs.

View the CID Matching Output Table

For CID matching against an existing CID sync with LiveRamp, the output table includes the fields shown in the table below:

Column	Sample	Description
`cid_id`	93abc799-a0a5-40b5-80dd-d2ab61d4d072	Original identifier passed in.
`RampID`	XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd	Returns the resolved RampID in your domain encoding.

View the Universe Dataset Resolution Output Table

For universe dataset resolution with deconfliction, the output table includes the fields shown in the table below.

Column	Example	Description
`hashed_cid`	93abc799-a0a5-40b5-80dd-d2ab61d4d072	The CID passed into the process will be returned as an MD5-hashed CID.
`RampID`	XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd	Returns the resolved and deconflicted RampID in your domain encoding.

Privacy Filter

To minimize the risk of re-identification (the ability to tie PII directly to a RampID), the service includes the following processes when resolving PII identifiers (PII resolution or email-only resolution):

Note

When the deconfliction configuration is used, attributes are not preserved so the privacy filter is not applied.

Column Values: The process evaluates each column value on a per-row basis for unique values. If any attribute occurs 3 or fewer times, the rows containing those column values will not be matchable and will not be returned in the output table.
>5% of the table unmatchable: If based on column value uniqueness, >5% of the file rows are unmatchable, the job will fail.
Number of Unique RampIDs: If fewer than 100 unique RampIDs would be returned, the job will fail.
Reswizzle full table: Upon completion, the full table will be reswizzled to return the rows RampID | attribute_1 | attribute_2 | attribute_n in a different order than what was submitted in the input table.

In this section:

Perform Identity Resolution in BigQuery

Note

Deconfliction Options

Overall Steps

Note

Authentication

Note

Create the Input Table for Identity Resolution

Input Table Columns for PII Resolution (Without Deconfliction)

Input Table Columns for Hashed Email-Only Resolution

Note

Input Table Columns for Device ID Resolution

Note

Input Table Columns for CID Matching

Note

Input Table Columns for Universe Dataset Resolution with Deconfliction

Note

Create the Metadata Table

Create the Metadata Table for PII Resolution (Without Deconfliction)

Note

Example of PII Resolution Metadata Table SQL

Create the Metadata Table for Hashed Email Resolution

Note

Create the Metadata Table for Device ID Resolution

Create the Metadata Table for CID Matching

Create the Metadata Table for Universe Dataset Resolution with Deconfliction

Note

Note

Example of PII Resolution (with Deconfliction) Metadata Table SQL

Share Tables and Datasets with LiveRamp

Optional Share and Query Data

Call the Shared Stored Procedure to Initiate Identity Resolution

View the Output Table

View the PII Resolution Output Table (Without Deconfliction)

View the Email-Only Resolution Output Table

View the Device Identifier Resolution Output Table

Note

View the CID Matching Output Table

View the Universe Dataset Resolution Output Table

Privacy Filter

Note

Search results