Perform Known Identity Resolution Through ADX
LiveRamp’s nown Identity Resolution in the Amazon Data Exchange (ADX) allows you to resolve personally identifiable information (PII) to LiveRamp Known IDs. Identity resolution allows you to have a more holistic view of your data at an individual or household level.
Resolution of data in the known identity space allows for any known data (PII) to sit next to the identifier for consolidation (the Known ID). This opens up flexibility for managing customer profiles, consolidating data for enterprise use cases, and unlocking a consistent identity framework that integrates with marketing and CDP workflows as well as LiveRamp’s network.
Note
For more information about Known IDs, see "".
You can also input an individual-based RampID and get back any household-based RampID that might be associated with that individual.
You can access LiveRamp Identity Resolution within the AWS Marketplace, meaning identity resolution can be performed within AWS. For more information on LiveRamp Identity in ADX, see “LiveRamp Identity in the ADX Marketplace”.
This service leverages LiveRamp’s Known Identity Graph, connecting fragmented consumer touchpoints to a person or household-based view.
The following identifiers can be resolved:
Names
Postal addresses
Email addresses
Phone numbers
Hashed email addresses
Based on the limit you set in the metadata table, you can either receive one Known ID per row (LiveRamp’s recommendation for the “best” ID) or up to ten Known IDs per row.
Overall Steps
Before you can perform identity resolution, you must perform the steps to enable LiveRamp Identity in the ADX Marketplace. For information on performing these steps, see “LiveRamp Identity in the ADX Marketplace”.
After you’ve performed the steps to enable LiveRamp Identity in ADX, perform the following steps to perform identity resolution:
Note
To avoid errors, you might want to verify that your setup has been performed correctly before performing the operation. For more information, see the "Checklist to Verify Your Setup for LiveRamp Identity in ADX" section below.
Format the appropriate input data file(s) and load them into your AWS S3 input location.
Initiate identity resolution by calling the LiveRamp Workflows API endpoint.
Initiate output file delivery by calling the LiveRamp Polling API endpoint.
After you initiate file delivery, LiveRamp delivers the resolved output file(s) to the specified S3 output location and associated usage metrics are reported to AWS for billing.
See the sections below for more information on performing these steps.
Checklist to Verify Your Setup for LiveRamp Identity in ADX
To avoid errors, use the checklists in the sections below to verify that all the necessary native app setup steps have been successfully performed before executing an operation.
AWS Region Alignment
Region in Contract: Confirm that the AWS region you provided to LiveRamp during contract execution is consistent with your actual AWS services.
AWS CLI Region Check: Run
aws configure get regionto verify the AWS region for the IAM user or profile you're using.
Note
The ADX offer will be made and accepted in US-East-2, and API calls will be made from US-East-2.
The region you provide for your contract is where your buckets need to be. As long as your buckets are in US-East-1, your job (compute at LiveRamp's end for your job) will run in US-East-1 and will not incur cross-region data transfer costs. For bucket regions other than US-East-1 and US-West-2, you need
cross regionto be true in your incoming request, and it will run in US-East-1.
IAM User and Permissioning
IAM User for ADX: Confirm that there is an IAM user configured specifically for ADX operations.
ADX Permissioning: Confirm that the IAM user has the required permissions for starting and polling jobs in ADX.
S3 Bucket Permissioning: Confirm that the IAM user has been granted read and write permissions for both the input and output S3 buckets.
S3 Bucket Setup
Input Bucket Configuration: Confirm that there is an S3 bucket exclusively dedicated for input files for LiveRamp processing.
Output Bucket Configuration: Confirm that there is a separate S3 bucket dedicated for output files from LiveRamp processing.
Bucket Policy Verification: Confirm that the bucket policies for the input and output buckets are aligned with LiveRamp's required permissions.
Bucket Accessibility Test: Execute
aws s3 ls s3://<input-bucket-name>andaws s3 ls s3://<output-bucket-name>to verify IAM user access to the buckets.
Format the Input Data File
An input data file needs to be prepared for each identity resolution operation. For more information, see the sections below.
Input File Formatting Guidelines
Identity resolution input data files should be formatted as CSV files. When creating input data files, follow these additional guidelines:
Include a header row in the first line of every file. Files cannot be processed without headers.
Do not include both PII and hashed email addresses in the same input file. A separate input file per job type must be created.
All identifiers to be resolved should be included in the input data file.
The column names for the input file can be whatever you want to use, as long as the names match the values specified in the metadata table.
Column names must be alphanumeric (other than underscores) and start with a letter.
Do not use spaces in column names. Use underscores.
The first column(s) in the input file must be the column(s) that contain the identifiers to be resolved.
When performing identity resolution on multiple files in one job, make sure the identifier column headers are the same in every file and that they match the value given for the “target_column” parameter in the call to initiate identity resolution.
Try not to include additional columns. Having extra columns slows down processing.
File Format for PII Resolution
Managing data in the known identity space does not require the same privacy protections as the pseudonymous RampID identity space and, as noted in prior sections, all attribute data included in the input table will pass through to the output table.
These column names cannot be used in the input file for PII resolution:
knownId
__lr_rank
__lr_filter_ name
See the table below for a list of the suggested input file columns and descriptions for PII resolution.
Suggested Column Name | Example | Notes |
|---|---|---|
| John | You can include separate First Name and Last Name columns or you can combine first name and last name in one column (such as "Name"). |
| Doe | You can include separate First Name and Last Name columns or you can combine first name and last name in one column (such as "Name"). |
| 123 Main St | |
| Apt 1 | You can include separate Address 1 and Address 2 columns or you can combine all street address information in one column (such as "Address"). |
| Smalltown | When matching on address, |
| CA |
|
| 12345 |
|
| john@email.com |
|
| 555-123-4567 |
|
| Gender |
|
File Format for Email-Only Resolution
The standard email-only resolution process operates similarly to PII resolution.
Note
To perform identity resolution across additional PII touchpoints, see the “File Format for PII Resolution” section above.
See the table below for a list of the suggested input table columns and descriptions for email-only resolution.
Suggested Column Name | Example | Description |
|---|---|---|
| 8c9775a5999b5f0088008c0b26d7fe8549d5c80b0047784996a26946abac0cef |
|
| Male |
|
Initiate Identity Resolution
Once your data files have been prepared and placed into your S3 bucket, initiate the identity resolution process. This is done by making a call with the AWS CLI to the LiveRamp Workflows ADX API that follows the format of the examples shown below.
Note
For information on the parameters to include in the call, see the “API Parameters” section below.
Only include the
match_limitparameter for PII or email resolution where you want to specify the maximum number of Known ID results returned per input identifier (default is “1” and the maximum is "10").Use the
input_columnsparameter to specify which columns are the target_column(s) and any attribute columns you wish to pass through to the output table.Only include the
target_columnsparameter for PII resolution. Use this parameter to specify the target PII columns to use for identity resolution. When using thetarget_columnsparameter, do not include thetarget_columnparameter.For email resolution, use the
target_columnparameter instead oftarget_columns.For information on troubleshooting errors that might occur when performing calls, see "Troubleshoot Calls in ADX".
Once you've received a successful response, make a poll job request to initiate the delivery of the output file to the output S3 bucket (for more information, see the "Initiate Output File Delivery" section below).
For known resolution, multiple identifiers and metadata options are available and can be configured in the metadata table. If no variables are configured, the default option will be returned, which includes identifiers and metadata as follows:
Known ID: The person-based identifier, including both maintained and derived identifier types.
Metadata:
__lr_filter_namereturns the filter name where the match occurred, which will be one of the following options:name_address_zipname_emailname_phonepartial_name_emailpartial_name_phonestrict_name(name + zip)emailphonelast_name_address
__lr_rankprovides insight on the match cascade level associated with the identifiers. If no maintained Known ID is found, this value will be "null".
Optional identifiers that can be configured to be returned in the output:
Household ID: The household-based identifier associated with maintained entities. This represents the household grouping of individuals that live and move together.
Place ID: A place identifier associated with an address entity in our graph.
ConsumerLink: Another person-based identifier. For this deployment, only maintained ConsumerLinks can be returned.
Beyond configuring the identifier type, match metadata can also be configured. See the sections below for metadata identity configurability options.
See the sections below for instructions on creating the metadata table.
Best Contact Metadata
This set of optional metadata provides flags that can indicate whether a particular postal address, phone number, or email address is the primary one for that individual. Best Contact flags are Boolean operators (TRUE or FALSE) that indicate if the address, email, or phone that was sent matches the one that the Known Identity Graph has determined is the best one for that user.
Best contact flag use cases include:
Determine which record to designate as the primary one when making consolidation decisions.
Target campaigns to gather better contact data for existing customers.
Coordinate multiple touchpoint campaigns.
If a record is used that includes all of the best touchpoints, we will return "true" for all three flags. These flags will be returned in a _lr_metadata column.
The "clickVerifyDate" flag is a signal that is used in designating the best email address for a user that could have other uses in building data assets or scoring email address data. Because this signal relies on specific deterministic signals from a small subset of LiveRamp match data contributors, it appears on a subset of "Best Email" addresses and should not be used as the only filter for understanding active email addresses.
These flags are only available for Known ID person-based identifiers.
Best contact flags include the information shown in the table below:
Flag Name | Description | Example Value |
|---|---|---|
isBestPostalTouchpoint | This flag will return "TRUE" if the address used for making a match is the best one for that consumer in the graph. | TRUE |
isBestPhoneTouchpoint | This flag will return "TRUE" if the phone number used for making a match is the best one for that consumer in the graph. | TRUE |
isBestEmailTouchpoint | This flag will return "TRUE" if the email address used for making a match is the best one for that consumer in the graph. | TRUE |
clickVerifyDate | This flag indicates that a source in LiveRamp’s match network has verified a click on a link within a user’s email address. | TRUE |
AWS CLI Calls to Initiate Identity Resolution
See below for the format of an AWS CLI call to initiate identity resolution:
aws dataexchange send-api-asset \
--data-set-id <data-set-id> \
--revision-id <revision-id> \
--asset-id <asset-id> \
--method POST \
--region us-east-2 \
--path "/adx/job/start" \
--body '{
"input_s3": "<Input S3 bucket>",
"file_format": "csv",
"file_pattern": "<Regex pattern for input files>[.]csv",
"workflow_type": "resolution",
"workflow_sub_type": "<Resolution sub type>",
"target_column": "<Identifier column header>",
"client_id": "<Client ID>",
"client_secret": "<Client secret>",
"input_columns": {<"Column name": "Column type">},
"cross_region": "true"
}'
See below for examples of what a populated AWS CLI call to initiate translation might look like.
See below for examples of AWS CLI calls.
PII Resolution Call Example
aws dataexchange send-api-asset \
--data-set-id <data-set-id> \
--revision-id <revision-id> \
--asset-id <asset-id> \
--method POST \
--region us-east-2 \
--path "/adx/job/start" \
--body '{
"input_s3": "s3://my-input-bucket-name",
"file_format": "csv",
"file_pattern": "pii_input[.]csv",
"workflow_type": "resolution",
"workflow_sub_type" : "PII",
"target_columns": {
"name": ["FIRSTNAME", "LASTNAME"],
"address": ["ADDRESSLINE", "ADDRESSLINE2"],
"city": "CITY",
"state": "STATE",
"zip": "ZIPCODE",
"email": "EMAIL",
"phone": "PHONE",
},
"client_id": "my-client-id",
"client_secret": "my-client-secret",
"input_columns": {
"FIRSTNAME": "text",
"LASTNAME": "text",
"ADDRESSLINE": "text",
"ADDRESSLINE2": "text",
"CITY": "text",
"STATE": "text",
"ZIPCODE": "text",
"EMAIL": "text",
"PHONE": "text",
"CID": "text",
"LIKES_DOGS": "text"
},
"cross_region": "true"
}'
'limit': 1,
‘outputIdentifiers’:[
“knownId”,
“householdLink”,
“placeId”
],
‘outputMetadata’:[
“matchLevel”,
“rank”,
“isBestEmailTouchpoint”
],
'outputTable': $output_table_name
“targetColumns”: {
"name": ["<up to 4 name column names>"],
"streetAddress": ["<up to 7 address column names>"],
"city": "<city column>",
"state": "<state column>",
"zipCode": "<zipcode column>",
"phone": "<phone column>",
"email": "<email column>",
}
}::variant as config;
Email Resolution Call Example
aws dataexchange send-api-asset \
--data-set-id <data-set-id> \
--revision-id <revision-id> \
--asset-id <asset-id> \
--method POST \
--region us-east-2 \
--path "/adx/job/start" \
--body '{
"input_s3": "s3://my-input-bucket-name",
"file_format": "csv",
"file_pattern": "resolution_input_2.*[.]csv",
"workflow_type": "resolution",
"workflow_sub_type": "EMAIL",
"target_column": "hashed_email",
"client_id": "my-client-id",
"client_secret": "my-client-secret",
"input_columns": {
"hashed_email": "text",
"gender": "text"
},
"cross_region": "true"
}'Example Responses for Calls to Initiate Resolution
The following is an example of a response for a successful job submission for a call to initiate identity resolution:
{
"ResponseHeaders": {
"Content-Type": "application/json",
"Content-Length": "97",
...
},
"Body": "{\"Job ID\": \"E660EC80F3BF4473A120D3CAC890CADC_AWS_US_EAST_1\", \"Status\": \"ADX Start job submitted\"}"
}Use the Job ID in the poll job request to initiate the delivery of the output file (for more information, see the "Initiate Output File Delivery" section below).
Note
For information on troubleshooting errors that might occur when performing calls, see "Troubleshoot Calls in ADX".
Initiate Output File Delivery
Once you’ve initiated the identity resolution process, you must make a poll job request to initiate the delivery of the output file to the output S3 bucket after processing is complete. One of the parameters you'll need to make that call is the Job ID that was included in the response to the call to initiate identity resolution.
Note
For information on the parameters to include in the call, see the “API Parameters” section below.
It is recommended that polling be done programmatically at recurring intervals until the processing is complete and the output file has been delivered.
For information on troubleshooting errors that might occur when performing calls, see "Troubleshoot Calls in ADX".
AWS CLI Calls to Initiate Delivery
See below for the format of an AWS CLI call used to initiate output file delivery:
aws dataexchange send-api-asset \
--data-set-id <data-set-id> \
--revision-id <revision-id> \
--asset-id <asset-id> \
--method POST \
--region us-east-2 \
--path "/adx/job/poll" \
--body '{
"job_id": "<Job ID>",
"output_s3": "<Output S3 bucket>",
"file_format": "csv",
"client_id": "<Client ID>",
"client_secret": "<Client secret>",
"cross_region": "true"
}'See below for an example of what a populated AWS CLI call used to initiate output file delivery might look like:
aws dataexchange send-api-asset \
--data-set-id <data-set-id> \
--revision-id <revision-id> \
--asset-id <asset-id> \
--method POST \
--region us-east-2 \
--path "/adx/job/poll" \
--body '{
"job_id": "JOB_ID_123",
"output_s3": "s3://my-output-bucket",
"file_format": "csv",
"client_id": "my-client-id",
"client_secret": "my-client-secret",
"cross_region": "true"
}'Example Responses for Calls to Initiate Delivery
The following is an example of a response when processing is complete:
{
"ResponseHeaders": {
"Content-Type": "application/json",
"Content-Length": "158",
...
},
"Body": "{\"Job ID\": \"E660EC80F3BF4473A120D3CAC890CADC_AWS_US_EAST_1\", \"Status\": \"ADX Poll job started for delivering output results. Re-poll later for updated status\"}"
}In addition to the response received when processing is complete, you might get one of the following responses in the status parameter:
''Upload to AWS S3 in progress. Re-poll later or wait for the delivery notification'
'Output results uploaded to AWS S3 bucket'
Note
For information on troubleshooting errors that might occur when performing calls, see "Troubleshoot Calls in ADX".
API Parameters
See the tables below for a list of the API header parameters and request parameters.
Header Parameters
Header Parameter | Data Type | Description |
|---|---|---|
data-set-id | string | Your AWS-provided Data set ID. |
revision-id | string | Your AWS-provided Revision ID. |
asset-id | string | Your AWS-provided Asset ID. |
For information on finding the AWS-provided parameters, see this AWS article.
Request Parameters
Request Parameter | Description |
|---|---|
client_id | Either an existing LiveRamp client ID (if you already have Identity API credentials) or a new one provided by LiveRamp. Client IDs for known data use cases are separate from pseudonymous data use cases. |
client_secret | Password/secret for the LiveRamp client_ID (either an existing password/secret (if you already have Identity API credentials) or a new one provided by LiveRamp) |
workflow_type | “resolution” for all identity resolution processes |
workflow_sub_type | The type of identifiers being resolved. Options include:
NoteEach identifier type has to be separated into its own input data file and only one option above can be chosen for each operation. |
input_s3 | S3 directory for input files. |
output_s3 | S3 directory for output files. |
file_format | Specifies the format for input files. The accepted file format is "CSV". |
file_pattern | Regex pattern for input files. For example, the pattern ‘input_2.*[.]csv’ would result in the processing of the following files: input_20.csv input_221.csv input_225.csv |
target_column | The column header name for the input field which contains the IDs to be resolved. Ex: “ADDRESS” |
input_columns | The target column and any attribute columns you want to pass through into the output table. For example: "input_columns": {
"hashed_email": "text",
"gender": "text",
"last_car": "text"
}NoteFor PII resolution, these column names cannot be used in the input table:
|
target_columns | A subset of input_columns used in PII resolution jobs. These are the PII elements that will be resolved to create the output Known IDs. "target_columns": {
"name": ["name"],
"streetAddress": ["address"],
"zipCode": "zip",
"phone": "phone",
"email": "email",
}NoteDo not include the name of attribute columns you want to pass through in this parameter. Use the |
cross_region | “true” or “false”. If “true”, then workloads are processed in the default region (us-east-1) if the target region is unavailable. If “false”, then workloads are not processed in the default region if the target region is unavailable and a status message to enable cross region is returned to the caller. |
output_identifiers | If output identifiers are not configured, the default will provide a person-based Known ID. Other configurable output identifiers include:
|
output_metadata | If output_metadata is not configured, the default will provide both of the following:
|
match_limit | Enter an integer between 1 and 10 to specify the maximum number of Known ID results returned per input identifier (to return only the “best match”, returning 1 Known ID is sufficient). The default is “1”. |
job_id | For polling requests, enter the Job ID returned in the response for the call to initiate identity resolution. The Job ID consists of a unique ID plus your AWS region name. |
View Identity Resolution Output
The output file(s) from the identity resolution process will be compressed and then written to the specified S3 bucket provided in the poll job request.
The file naming convention for the output file will be "<JOB_ID>_0_0_0.csv.gz"
The Job ID will be a unique ID plus your AWS region name.
Ex: 17697C67E98D4702BEB4ED7B3B0FA_AWS_US_EAST_1_0_0_0.csv.gz
View the PII Resolution Output File
For known resolution, the identifiers and metadata returned is configurable, as outlined in the "Initiate Identity Resolution" section above. Depending on the values included, the output file will vary based on the following:
Identifiers included. The default configuration is person-based Known ID, however other identifier types can be output including: Household ID, Place ID, and maintained ConsumerLinks.
Note
Both maintained and derived person-based Known IDs can be returned. Household IDs, Place IDs, and ConsumerLinks are only available for maintained entities.
Metadata included. Supplemental match metadata is included for additional insight into the linkage. This includes information on the match cascade level and filter where the match occurred. Additional metadata bundles are available for maintained Known IDs.
For PII resolution, the default output file includes the columns shown in the table below.
Column | Sample | Description |
|---|---|---|
| T32100US00ySyxMl2h0ypHBXEymO2-1wl1vYkw | Returns the resolved person-based Known ID in your domain encoding. |
| Male | Any attribute columns passed through the service are returned. |
| 1 | Provides insight on the match cascade level associated with the identifiers. If no maintained Known ID is found, this value will be "null". |
| name_phone | Returns the filter name where the match occurred, which will be one of the following options:
If no maintained Known ID is found, this value will be "null". |
For PII resolution, additional configurability options (such as including additional identifiers and including additional metadata) could produce an output file that includes the following:
Column | Example | Description |
|---|---|---|
| T32100US00ySyxMl2h0ypHBXEymO2-1wl1vYkw | Returns the resolved person-based Known ID in your domain encoding. |
| T32100US031Kdcb5EcDgnly95h9ZMwKbl-TPYv | Returns the resolved household-based ID in your domain encoding. |
| T32100US02Das0oElIhVTaSQvxnnRauu3s2RYI | Returns the resolved place-based ID in your domain encoding. |
| Male | Any attribute columns passed through the service are returned. |
| {“rank”: “1”, "matchLevel": “name_address_zip”, “isbestemailtouchpoint”: “true”} | Provides output for the metadata configured. In this example:
|
View the Email-Only Resolution Output File
The email-only resolution process operates similarly to PII resolution. For known resolution, the identifiers and metadata returned is configurable, as outlined in the "Initiate Identity Resolution" section above. Depending on the values included, the output file will vary based on:
Identifiers included. The default configuration is person-based Known ID, however other identifier types can be output including: Household ID, Place ID, and maintained ConsumerLinks.
Note
Both maintained and derived person-based Known IDs can be returned. Household IDs, Place IDs, and ConsumerLinks are only available for maintained entities.
Metadata included. Supplemental match metadata is included for additional insight into the linkage. This includes information on the match cascade level and filter where the match occurred. Additional metadata bundles are available for maintained Known IDs.
For email-only resolution, the output file includes the columns shown in the table below.
Column | Example | Description |
|---|---|---|
| T32100US00ySyxMl2h0ypHBXEymO2-1wl1vYkw |
|
| Male | The original attribute columns included in the input file. |