Skip to main content

Perform Identity Resolution Using AWS Entity Resolution

LiveRamp’s Identity Resolution using AWS Entity Resolution allows you to resolve personally-identifiable information (PII) to RampIDs, LiveRamp’s persistent pseudonymous identifier for persons and households. Identity resolution allows you to have a more holistic view of your data at an individual or household level.

You can access LiveRamp Identity Resolution using AWS Entity Resolution within the AWS Marketplace, meaning identity resolution can be performed within AWS. For more information on LiveRamp Identity using AWS Entity Resolution, see “LiveRamp Identity Using AWS Entity Resolution”.

Note

  • This article contains information on performing identity resolution with LiveRamp’s Identity offerings using AWS Entity Resolution. If you plan to perform identity resolution through ADX standalone, see “Perform Identity Resolution Through ADX".

  • For more information about RampIDs, see "RampID Methodology".

This service leverages LiveRamp’s Identity Graph, connecting fragmented consumer touchpoints to a person or household-based view.

The following identifiers can be resolved:

  • Names

  • Postal addresses

  • Email addresses

  • Phone numbers

Overall Steps

To execute an identity resolution operation in AWS Entity Resolution, you perform the following overall steps:

  1. You prepare input data tables with the data to resolve.

  2. If not already done, you upload your input data tables to Amazon S3 buckets.

  3. You create AWS Glue tables from the input data tables in your S3 buckets.

  4. You create a schema mapping in AWS Entity Resolution to define the input data you want to resolve, as well as any data columns you want to pass through.

  5. You create and run a matching workflow in AWS Entity Resolution.

  6. You view the output.

See the sections below for more information on performing these tasks.

Format the Input Data Table

See the sections below for information on formatting the input data table. Once your tables have been formatted, they must be uploaded to Amazon S3 buckets (see the instructions from AWS here).

Input Table Formatting Guidelines

Input data tables for identity resolution should be formatted as CSV files. When creating input data tables, follow these additional guidelines:

  • Include a header row in the first line of every table. Tables cannot be processed without headers.

  • You can name your columns however you want, but every column name must be unique in a table.

  • Column names must be alphanumeric (other than underscores) and start with a letter.

  • Do not use spaces in column names. Use underscores.

  • The first column(s) in the input table must be the column(s) that contain the identifiers to be resolved.

  • When performing identity resolution on multiple tables in one job, make sure the identifier column headers are the same in every table and that they match the value given for the “target_column” parameter in the call to initiate identity resolution.

  • The identity resolution operation can process records containing blank fields.

Format to Use for a PII Resolution Table

The PII resolution process passes the data through a privacy filter which removes the PII and reswizzles the table. Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

These column names cannot be used in the input file for PII resolution:

  • RampID

  • __lr_rank

  • __lr_filter_ name

See the table below for a list of the suggested input file columns and descriptions for PII resolution.

Suggested Column Name

Example

Notes

first_name

John

You can include separate First Name and Last Name columns or you can combine first name and last name in one column (such as “Name”).

last_name

Doe

You can include separate First Name and Last Name columns or you can combine first name and last name in one column (such as “Name”).

address_1

123 Main St

address_2

Apt 1

You can include separate Address 1 and Address 2 columns or you can combine all street address information in one column (such as “Address”).

city

Smalltown

When matching on address, City is optional.

state

CA

  • When matching on address, State is optional.

  • If including State, must be a two-character, capitalized abbreviation ("CA", not "California" or "Ca").

zip

12345

  • Required when matching on addresses.

  • Can be in 5-digit format or 9-digit format (ZIP+4).

email

john@email.com

  • Plaintext emails only.

  • Only one email per input row is permitted. Other emails must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates.

  • All emails must meet these requirements:

    • Have characters before and after the “@” sign

    • Contain a period character (“.”)

    • Have characters after the period character

phone

555-123-4567

  • Plain text phone numbers only.

  • Only one phone number per input row is permitted. Other phone numbers must be dropped or included in an additional row. If you include an additional row, repeat the values for the name fields for the best match rates.

  • All phone numbers must meet these requirements:

    • Can be more than 10 characters if leading numbers over 10 characters are “0” or “1”

    • If no leading numbers are used, must be 10 characters long

    • Can contain hyphens (“-”), parentheses (“(“ or “)”), plus signs (“+”), and periods (“.”)

  • Examples of valid phone numbers include:

    • 8668533267

    • 866.853.3267

    • (866) 853-3267

    • 8668533267

    • +1 (866) 853-3267

    • +18668533267

    • 18668533267

    • 1111111118668533267

    • 08668533267

  • Examples of invalid phone numbers include:

    • 987654321 (fewer than 10 characters)

    • 98765432109 (more than 10 characters)

    • 1234567890 (after removing the leading “1”, less than 10 characters remain)

    • 0987654321 (after removing the leading “0”, less than 10 characters remain)

attribute_1

Gender

For PII resolution, you can include columns with attribute data. These columns will be returned in the output file (for more information, see the "Output File for PII Resolution" section below).

Format to Use for an Email-Only Resolution Table

The email-only resolution process operates similarly to PII resolution. Any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

See the table below for a list of the suggested input table columns and descriptions for email-only resolution.

Suggested Column Name

Example

Description

hashed_email

8c9775a5999b5f0088008c0b26d7fe8549d5c80b0047784996a26946abac0cef

  • SHA-256, MD5, and SHA-1 hashed emails accepted.

  • Email addresses should be lowercased and UTF-8 encoded prior to hashing.

  • After hashing, convert the resulting hash into lowercase hexadecimal representation.

attribute_1

Male

For email address resolution, you can include columns with attribute data. These columns will be returned in the output table (for more information, see the "Privacy Filter" section below).

Create AWS Glue Tables

AWS Entity Resolution reads from AWS Glue as the input. After you’ve created your input data tables and saved them to your Amazon S3 buckets, you need to create AWS Glue tables from those input data tables. For more information, see the instructions from AWS here.

Create the Schema Mapping

Before you can run a matching workflow to perform identity resolution, you must create a schema mapping for AWS Entity Resolution to understand what input fields you want to use. You can bring your own data schema, or blueprint, from an existing AWS Glue data input, or build your custom schema using an interactive user interface or JSON editor.

Note

By default, the schema mapping is set to normalize the data inputs (such as removing special characters and extra spaces, and formatting text to lowercase) before matching. Because only hashed emails are used for input data, you should turn off normalization.

There are three ways to create a schema mapping in AWS Entity Resolution:

  • Import existing schema information

  • Manually define the input

  • Use a JSON editor to create, paste, or import a schema mapping.

For information on creating a schema mapping, see the instructions from AWS here and follow the additional guidelines listed below.

Schema Mapping Guidelines

When creating the schema mapping, make sure to follow these guidelines:

  • You do not need to specify a Unique ID for LiveRamp identity resolution operations.

  • Set the input type to “LiveRamp ID” and set the match key to the appropriate PII touchpoint(s), such as “Name + Address + Email” or “Email”.

Create and Run the Matching Workflow

After you’ve created your input data table in AWS Glue and created a schema mapping for that table, you can create and run the matching workflow to run the identity resolution operation. For information, see the instructions from AWS here.

On the Metrics tab, under Job history, you can view the following:

  • The Status of the ID mapping workflow job: In progress, Completed, Failed

  • The total records processed.

  • The duration of the job.

  • The Job ID.

After the matching workflow job completes (status is “Completed”), you can go to the Data output tab and then select your Amazon S3 location to view the results.

View Identity Output

The output file(s) from the identity resolution process will be compressed and then written to the specified S3 bucket.

The file naming convention for the output file will be "<JOB_ID>_0_0_0.csv.gz"

The Job ID will be a unique ID plus your AWS region name.

Ex: 17697C67E98D4702BEB4ED7B3B0FA_AWS_US_EAST_1_0_0_0.csv.gz

Output File for PII Resolution

The standard PII resolution process passes the input table through a privacy filter which removes the PII and reswizzles the table (in addition to other operations). Because of this, any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

Identity resolution of PII provides supplemental match metadata for additional insight into customer data that can provide powerful signals for making decisions based on RampIDs.

For PII resolution, the output table includes the fields shown in the table below.

Column

Sample

Description

RampID

XYT999wXyWPB1SgpMUKlpzA013UaLEz2lg0wFAr1PWK7FMhsd

Returns the resolved RampID in your domain encoding.

attribute_1

Male

Any attribute columns passed through the service are returned.

__lr_rank

1

Provides insight on the match cascade level associated with the identifiers.

If no maintained RampID is found, this value will be "null".

__lr_filter_name

name_phone

Returns the filter name where the match occurred, which will be one of the following options:

  • name_address_zip

  • name_email

  • name_phone

  • partial_name_email

  • partial_name_phone

  • strict_name (name + zip)

  • email

  • phone

  • last_name_address

If no maintained RampID is found, this value will be "null".

Output File for Email Address Resolution

The email-only resolution process operates similarly to PII resolution. Any attributes you need to keep associated with the identifier need to be included in the input table. For more information, see the "Privacy Filter" section below.

For email-only resolution, the results end up in the output table in the same database, with the following fields (as shown below):

  • RampID (resolved email data)

  • attributes (based on other data passed through the service).

For email resolution, the output table includes the fields shown in the table below.

Column

Example

Description

RampID

XYT999RkQ3MEY1RUYtNUIyMi00QjJGLUFDNjgtQjQ3QUEwMTNEMTA1CgMjVBMkNEMTktRD

The RampID associated with the email address.

attribute_1

Male

The original attribute columns included in the input file.

Privacy Filter

To minimize the risk of re-identification (the ability to tie PII directly to a RampID), the service includes the following processes when resolving PII identifiers (PII resolution or email-only resolution):

  • Column Values: The process evaluates the combination of all the column values on a per row basis for unique values. If a particular combination of column values occurs 3 or fewer times, the rows containing those column values will not be matchable and will not be returned in the output table.

  • >5% of the table unmatchable: If, based on column value uniqueness, >5% of the file rows are unmatchable, the job will fail.

  • Number of Unique RampIDs: If fewer than 100 unique RampIDs would be returned, the job will fail.

  • Reswizzle full table: Upon completion, the full table will be reswizzled to return the rows RampID | attribute_1 | attribute_2 | attribute_n in a different order than what was submitted in the input table.

Edit a Matching Workflow

To edit a matching workflow, follow the instructions from AWS here.

Delete a Matching Workflow

To delete a matching workflow, follow the instructions from AWS here.