Perform Identity Resolution Through ADX
LiveRamp’s Identity Resolution in the Amazon Data Exchange (ADX) allows you to resolve device identifiers or email addresses to RampIDs, LiveRamp’s persistent pseudonymous identifier for persons and households. Identity resolution allows you to have a more holistic view of your data at an individual or household level. A common use case for identity resolution includes resolution of device-based exposure logs from DSPs into RampIDs, driving more accurate insights and analytics.
You can also input an individual-based RampID and get back any household-based RampID that might be associated with that individual.
You can access LiveRamp Identity Resolution within the AWS Marketplace, meaning identity resolution can be performed within AWS. For more information on LiveRamp Identity in ADX, see “LiveRamp Identity in the ADX Marketplace”.
This service leverages LiveRamp’s Identity Graph, connecting fragmented consumer touchpoints to a person or household based view.
The following identifiers can be resolved:
Cookies
MAIDs (mobile device IDs)
CTV IDs (Connected TV Device IDs)
CIDs (custom identifiers)
Email addresses (SHA-256 hashed)
Person-based, maintained RampIDs (for resolution to household RampIDs)
Based on the type of identifier you’re resolving, you might receive one RampID per identifier or multiple RampIDs per identifier. Typically for cookie and mobile device ID resolution, one RampID is returned, given that the devices are not normally shared. Also, when resolving individual RampIDs to household RampIDs, only one household RampID is returned. However, for CTV identifiers it is common to receive multiple individual RampIDs per identifier.
When resolving hashed email addresses, you can choose to receive from 1 to 15 associated RampIDs, if available.
Overall Steps
Before performing identity resolution, you must perform the steps to enable LiveRamp Identity in the ADX Marketplace. For information on performing these steps, see “LiveRamp Identity in the ADX Marketplace”.
After you’ve performed the steps to enable LiveRamp Identity in ADX, perform the following steps to perform identity resolution:
Format the input data file and load it into your AWS S3 input location.
Initiate identity resolution by calling the LiveRamp Workflows API endpoint.
Initiate output file delivery by calling the LiveRamp Polling API endpoint.
After you initiate file delivery, LiveRamp delivers the resolved output file(s) to the specified S3 output location and associated usage metrics are reported to AWS for billing.
See the sections below for more information on performing these steps.
Format the Input Data File
See the sections below for information on formatting the input data file.
Input File Formatting Guidelines
Identity resolution input data files should be formatted as CSV files. When creating input data files, follow these additional guidelines:
Include a header row in the first line of every file. Files cannot be processed without headers.
Include only one of the following allowed identifier types per file:
Cookies
Mobile device IDs (MAIDs)
CTV IDs
CIDs (custom identifiers)
SHA-256 hashed email addresses
Individual maintained RampIDs
Note
If the input file contains individual RampIDs, those will be resolved to household RampIDs.
You can name your columns however you want, but every column name must be unique in a table.
The first column in the input file must be the column that contains the identifiers to be resolved.
When performing identity resolution on multiple files in one job, make sure the identifier column headers are the same in every file and that they match the value given for the “target_column” parameter in the call to initiate identity resolution.
Try not to include additional columns. Having extra columns slows down processing.
Note
For device or CID resolution, additional columns (such as attribute data columns) can be included in the input file, but only the input identifiers and RampIDs will be returned in the output file. For email address resolution, any additional columns will be returned in the output file, but the email addresses will be removed and the row order randomized.
Formatting device identifiers:
Cookies: Do not modify (for example, by changing casing) cookie values.
Mobile device IDs:
Mobile device IDs should be downcased and hyphenated. For example: 1f4d256c-1f08-41f6-a108-bbe511de9497
Plaintext AAID and IDFA can be included together. LiveRamp can match off of both IDs at the same time as long as they are in plaintext.
File Format Example
See the table below for an example of how to format an input data file.
Column | Example | Description |
---|---|---|
Device identifier, CID, hashed email, or RampID | 1f4d256c-1f08-41f6-a108-bbe511de9497 | Can be one of the following identifiers: cookie, MAID, CTV ID, CID, or SHA-256 hashed email (for resolution to RampID), or maintained RampID (for resolution to household RampID). |
Attribute 1 | Male | For email address resolution, you can include columns with attribute data. These columns will be returned in the output file. Any attribute columns included in an input file used for device or CID resolution will not be returned in the output file. |
Formatting Guidelines for Email Address Hashing
Follow these best practices for hashing email addresses:
Email addresses should be uppercased prior to hashing
Use SHA-256, hex-encoding string to be lowercased, character set UTF-8
Initiate Identity Resolution
Once your data files have been prepared and placed into your S3 bucket, initiate the identity resolution process. This is done by making a call to the LiveRamp Workflows ADX API that follows the format of the example curl command shown below.
Note
Only include the "match_limit" parameter for email address resolution, where you want to specify the maximum number of RampID results returned per input identifier (default is “1”).
Https Curl Call Examples
See below for the format of an https curl call:
curl --location --request POST 'https://<data-exchange-url>/adx/job/start' \ --header 'Content-Type: application/json' \ --header 'x-amzn-dataexchange-data-set-id: <data-set-id>' \ --header 'x-amzn-dataexchange-revision-id: <revision-id>' \ --header 'x-amzn-dataexchange-asset-id: <asset-id>' \ --header 'x-amzn-dataexchange-http-method: POST' \ --data-raw '{ "httpMethod": "POST", "input_s3": "<Input S3 bucket>", "file_format": "csv", "file_pattern": "<Regex pattern for input files>", "workflow_type": "<Resolution type>", "workflow_sub_type": "<Resolution sub type>", "target_column": <Identifier column header>, "client_id": "<Client ID>", "client_secret": "<Client Secret>", "cross_region": “true” "match_limit": "<# of RampIDs returned>" }'
See below for an example of what a populated https curl call might look like:
curl --location --request POST 'https://<data-exchange-url>/adx/job/start' \ --header 'Content-Type: application/json' \ --header 'x-amzn-dataexchange-data-set-id: <data-set-id>' \ --header 'x-amzn-dataexchange-revision-id: <revision-id>' \ --header 'x-amzn-dataexchange-asset-id: <asset-id>' \ --header 'x-amzn-dataexchange-http-method: POST' \ --data-raw '{ "httpMethod": "POST", "input_s3": "s3://my-input-bucket-name<Input S3 bucket>", "file_format": "csv", "file_pattern": "resolution_input_2.*[.]csv", "workflow_type": "resolution", "workflow_sub_type": "CTV", "target_column": "DEVICE_ID", "client_id": "my-client-id", "client_secret": "my-client-secret", "cross_region": “true” }'
AWS CLI Call Examples
See below for the format of an AWS CLI call:
aws dataexchange send-api-asset \ --data-set-id <data-set-id> \ --revision-id <revision-id> \ --asset-id <asset-id> \ --request-headers ‘x-api-key=XXXX-XXXX-XXXX-XXX-<client_id>’ \ --method POST \ --path "/adx/job/start" \ --body "{\"input_s3\": \"<Input S3 bucket>", \"file_format\": \"csv\", \"file_pattern\": \"<Regex pattern for input files>*[.]csv\", \"workflow_type\": \"device_resolution\", \"workflow_sub_type\": \"<Resolution Sub type>\", \"target_column\": \"<Identifier column header>", \"client_id\": \"<Client ID>", \"client_secret\": \"<Client_sectret>", \"cross_region\": \"true\" }"
See below for an example of what a populated AWS CLI call might look like:
aws dataexchange send-api-asset \ --data-set-id <data-set-id> \ --revision-id <revision-id> \ --asset-id <asset-id> \ --method POST \ --request-headers ‘x-api-key=XXXX-XXXX-XXXX-XXX-<client_id>’ \ --path "/adx/job/start" \ --body "{\"input_s3\": \"s3://my-input-bucket-name\", \"file_format\": \"csv\", \"file_pattern\": \"resolution_input_2.*[.]csv\", \"workflow_type\": \"resolution\", \"workflow_sub_type\": \"email\", \"target_column\": \"device_id\", \"client_id\": \"my-client-id\", \"client_secret\": \"my-client-secret\", \"cross_region\": \"true\", \"match_limit\": \"1\" }"
Example Responses
The following is an example of a response for a successful job submission:
{ "Job ID": "9863C6588358503285051D4F0BC83_AWS_US_EAST_1", "Status": "ADX Start job submitted" }
In addition to the response received for a successful job submission, you might get one of the following responses in the status parameter:
"ADX Start Job Lambda function failed to locate the S3 bucket region"
"ADX Start Job Lambda function failed to process the request for Job ID"
"ADX Start Job Lambda function failed to extract AWS Canonical ID from S3 bucket for Job ID"
"ADX API received an error response while authenticating for Job ID "
"ADX API failed to fetch an auth token for Job ID "
Initiate Output File Delivery
Once you’ve initiated the identity resolution process, you must make a poll job request to initiate the delivery of the output file to the output S3 bucket after processing is complete. This is done by making a call that follows the format of the example curl command shown below:
Note
It is recommended that polling be done programmatically at recurring intervals until the processing is complete and the output file has been delivered.
Https Curl Call Examples
See below for the format of an https curl call:
curl --location --request POST 'https://<data-exchange-url>/adx/job/poll' \ --header 'Content-Type: application/json' \ --header 'x-amzn-dataexchange-data-set-id: <data-set-id>' \ --header 'x-amzn-dataexchange-revision-id: <revision-id>' \ --header 'x-amzn-dataexchange-asset-id: <asset-id>' \ --header 'x-amzn-dataexchange-http-method: POST' \ --data-raw '{ "httpMethod": "POST", "job_id": "<Job ID>", "output_s3": "<Output S3 bucket>", "file_format": "csv", "client_id": "<Client ID>", "client_secret": "<Client Secret>" }'
See below for an example of what a populated https curl call might look like:
curl --location --request POST 'https://<data-exchange-url>/adx/job/poll' \ --header 'Content-Type: application/json' \ --header 'x-amzn-dataexchange-data-set-id: <data-set-id>' \ --header 'x-amzn-dataexchange-revision-id: <revision-id>' \ --header 'x-amzn-dataexchange-asset-id: <asset-id>' \ --header 'x-amzn-dataexchange-http-method: POST' \ --data-raw '{ "httpMethod": "POST", "job_id": "JOB_ID_123", "output_s3": "s3://<my-output-bucket-name>\", "aws_key_id": "<AWS Key ID>", "aws_secret_key": "<AWS Secret Key>", "file_format": "csv", "client_id": "<my-client-id>", "client_secret": "<my-client-secret>" }'
AWS CLI Call Examples
See below for the format of an AWS CLI call:
aws dataexchange send-api-asset \ --data-set-id <data-set-id> \ --revision-id <revision-id> \ --asset-id <asset-id> \ --method POST \ --request-headers ‘x-api-key=XXXX-XXXX-XXXX-XXX-<client_id>’ \ --path "/adx/job/poll" \ --body "{\"job_id\": \"<Job ID>", \"output_s3\": \"<Output S3 bucket", \"file_format\": \"csv\", \"client_id\": \"<Client ID\>", \"client_secret\": \"<Client Secret>", \"cross_region\": \"true\" }"
See below for an example of what a populated AWS CLI call might look like:
aws dataexchange send-api-asset \ --data-set-id <data-set-id> \ --revision-id <revision-id> \ --asset-id <asset-id> \ --method POST \ --request-headers ‘x-api-key=XXXX-XXXX-XXXX-XXX-<client_id>’ \ --path "/adx/job/poll" \ --body "{\"job_id\": \"JOB_ID_123\", \"output_s3\": \"s3://<my-output-bucket-name>\", \"file_format\": \"csv\", \"client_id\": \"<my-client-id>\", \"client_secret\": \"<my-client-secret>\", \"cross_region\": \"true\" }"
Example Responses
The following is an example of a response when processing is complete:
{ "Job ID": "9863C6588358503285051D4F0BC83_AWS_US_EAST_1", "Status": "Output results uploaded to AWS S3 bucket" }
In addition to the response received when processing is complete, you might get one of the following responses in the status parameter:
'DONE': 'ADX Poll job started for delivering output results. Re-poll later for updated status'
'DELIVERING': 'Upload to AWS S3 in progress. Re-poll later or wait for the delivery notification'
'ERROR': 'Cannot poll job because of error. Please contact support'
'ALERT': 'Cannot poll job because of delay. Please contact support'
'INVALID': 'Cannot poll job because of invalid job id. Please validate input'
'EXCEPTION': 'Cannot poll job because of an exception. Please contact support'
'UNKNOWN': 'Cannot poll job because the start job workflow was not executed. Please contact support'
'DEFAULT': 'Cannot poll job because of an unknown error. Please contact support'
API Parameters
See the tables below for a list of the API header parameters and request parameters.
Authorization Parameters
Authorization Parameter | Data Type | Description |
---|---|---|
AccessKey | string | IAM Access Key of the subscribed AWS account. |
SecretKey | string | IAM Secret Key of the subscribed AWS account. |
AWS Region | string | AWS region where the product was subscribed. |
Service Name | string | "dataexchange" |
Session Token | string | Session token of the subscribed AWS account |
Header Parameters
Header Parameter | Data Type | Description |
---|---|---|
data-set-id | string | Your AWS-provided Data set ID. |
revision-id | string | Your AWS-provided Revision ID. |
asset-id | string | Your AWS-provided Asset ID. |
aws-authorization | string |
For information on finding the AWS-provided parameters, see this AWS article.
Request Parameters
Request Parameter | Description |
---|---|
client_id | Either an existing LiveRamp client ID (if you already have Identity API credentials) or a new one provided by LiveRamp |
client_secret | Password / secret for the LiveRamp client_ID (either an existing password / secret (if you already have Identity API credentials) or a new one provided by LiveRamp) |
workflow_type | “resolution” for all identity resolution processes |
workflow_subtype | The type of identifiers being resolved. Options include:
|
input_s3 | S3 directory for input files. |
output_s3 | S3 directory for output files. |
file_format | Specifies the format for input files. Accepted file format is CSV. |
file_pattern | Regex pattern for input files. For example, the pattern ‘input_2.*[.]csv’ would result in the processing of the following files: input_20.csv input_221.csv input_225.csv |
target_column | The column header name for the input field which contains the IDs to be resolved. Ex: “DEVICE_ID” |
cross_region | “true” or “false”. If “true”, then workloads are processed in the default region (us-east-1) if the target region is unavailable. If “false”, then workloads are not processed in the default region if the target region is unavailable and a status message to enable cross region is returned to the caller. |
match_limit | For email resolution only, specify an integer between 1 and 15 to specify the maximum number of RampID results returned per input identifier (to return only the “best match”, returning 1 RampID is sufficient). The default is “1”. |
job_id | For polling requests, enter the Job ID returned in the response for the call to initiate identity resolution. The Job ID consists of a unique ID plus your AWS region name. |
Identity Resolution Output
The output file(s) from the identity resolution process will be compressed and then written to the specified S3 bucket provided in the poll job request.
The file naming convention for the output file will be "<JOB_ID>_0_0_0.csv.gz"
The Job ID will be a unique ID plus your AWS region name.
Ex: 17697C67E98D4702BEB4ED7B3B0FA_AWS_US_EAST_1_0_0_0.csv.gz
Output File for Device Resolution
The output file for device resolution will follow the format shown in the table below.
Column | Example | Description |
---|---|---|
Device identifier OR RampID | 1f4d256c-1f08-41f6-a108-bbe511de9497 | The original identifier included in the input file. |
RampID | XYT999RkQ3MEY1RUYtNUIyMi00QjJGLUFDNjgtQjQ3QUEwMTNEMTA1CgMjVBMkNEMTktRD | For input files containing device identifiers,the RampID associated with the device identifier. For input files containing individual RampIDs, the household RampIDs associated with those individual RampIDs. Note: If multiple RampIDs are associated with a device identifier, multiple lines will be created in the output file. |
Output File for CID Resolution
The output file for CID resolution will follow the format shown in the table below.
Column | Example | Description |
---|---|---|
CID_ID | 93abc799-a0a5-40b5-80dd-d2ab61d4d072 | The original identifier included in the input file. |
RampID | XYT999RkQ3MEY1RUYtNUIyMi00QjJGLUFDNjgtQjQ3QUEwMTNEMTA1CgMjVBMkNEMTktRD | The resolved RampID in your domain encoding. |
Output File for Email Address Resolution
The output file for email address resolution will follow the format shown in the table below.
Column | Example | Description |
---|---|---|
RampID | XYT999RkQ3MEY1RUYtNUIyMi00QjJGLUFDNjgtQjQ3QUEwMTNEMTA1CgMjVBMkNEMTktRD | The RampID associated with the email address. NoteIf multiple RampIDs are associated with an email address, multiple lines will be created in the output file. |
Attribute 1 | Male | The original attribute columns included in the input file. |
Privacy Filters
To minimize the risk of re-identification (the ability to tie an email address directly to a RampID), the service includes the following processes:
Column Values: The process evaluates the combination of all the column values on a per row basis for unique values. If a particular combination of column values occurs 3 or fewer times, the rows containing those column values will not be matchable and will not be returned in the output table.
>5% of the table unmatchable: If, based on column value uniqueness, >5% of the file rows are unmatchable, the job will fail.
Number of Unique RampIDs: If fewer than 100 unique RampIDs would be returned, the job will fail.
Reswizzle full file: Upon completion, the full file will be reswizzled to return the rows RampID | attribute 1 | attribute 2 | attribute n in a different order than what was submitted in the input file.