Skip to main content

Overview of the File Ingestion Process for Onboarding Workflow Files

The overall file ingestion process for Onboarding workflow files consists of two stages:

  • File transfer, where the file is transferred to your LiveRamp customer GCP bucket

  • Ingestion processing, where the records in the file are matched to their associated RampIDs and the audience fields are used to create or update segments

You can view a file’s progress through ingestion processing on the Files page. To view specific status details, hover over a file’s ingestion status or click that file’s row to open the details panel. For more information, see “View File Ingestion Info for Onboarding Audience Files”.

The File Transfer Stage

The file transfer stage involves transferring the file to your LiveRamp customer GCP (Google Cloud Platform) bucket. The beginning of this stage differs slightly, depending on the method you use to get your data into LiveRamp.

During this stage, the file will show an ingestion status of “Transferring”.

If you upload to a LiveRamp resource, such as LiveRamp’s SFTP or via the Connect UI, once transferring begins the file is automatically associated with its audience or audiences. Within about 20 minutes, the file appears associated with its audience(s) on the Files page in Connect, although file stats don’t appear until the ingestion process is complete.

For customers who get their data into LiveRamp via a resource that they own (such as uploading to their SFTP or dropping files into their S3 or GCP bucket), LiveRamp automatically scans that resource every 10 minutes to detect the presence of new files. If the scan detects one or more new files, the transfer process begins but the files are not associated with their audiences until the transfer process is complete. Within about 5 minutes, the files appear on the Files page in Connect under the "Unassociated files" section (for more information, see “Unassociated Files”).

Once a file has completed transferring, the ingestion processing stage begins.

The Ingestion Processing Stage

Once the file has completed transferring and has been associated with an audience (if necessary), the ingestion processing stage begins.

The first part of this stage involves queueing the file for processing. The amount of time the file spends in this stage varies depending on your configuration. While the file is being queued, the file will show an ingestion status of “Queued for Processing”.

Once processing begins, the file will show an ingestion status of “Processing”. During the ingestion processing stage, the following steps are performed:

  1. Decryption/Decompression: The file is decrypted and decompressed (if necessary).

    If the incorrect file decryption subkey was used, processing will fail. The Connect UI and the Ingestion Request endpoint of the Data Pipeline Visibility API will both provide information on the failed status, along with the reason for the failure. You can use this information to attempt to correct the decryption and upload the file again.

  2. File inspection: The file's format and data are checked and parsed to determine file stats (such as the number of rows) and metadata. The data in the segment data columns are turned into key/value pairs.

    If ingestion automation has been configured for the audience, the file format is checked to make sure the various elements (such as the audience key, the file delimiter, and the identifier fields) are consistent with what was configured. For more information, see "The Ingestion Automation Process for File Uploads".

    During this step, processing might pause or fail because of certain issues with the file, such as file formatting that doesn't match what's expected. The Connect UI will provide information on the paused or failed status, along with the reason for the pause or failure. You can use this information to attempt to correct the issue and upload the file again. For a list of the various pause and failure reasons, see "Troubleshoot File Ingestion Issues".

    Note

    Files uploaded for a particular audience are put into a queue for that audience, so a delay in the processing of a file containing issues will delay all subsequent files that have been uploaded to that audience. The delayed files will show a status of either “queued for processing” or “processing paused”.

  3. Product limits check: The file is checked against product limits (for things like meeting the minimum and maximum number of rows and meeting the maximum file size).

    Note

    Files that do not meet product limits will display on the Files page with a yellow caution icon next to the file name but will continue to process. When you hover over the icon or open the details panel, a message is displayed with information about the specific issue. To minimize delays and ensure maximum performance, keep within product limits. For more information, see “Files That Exceed Product Limits”.

  4. Import creation: An import is created for each audience the file is associated with.

    Note

    Depending on your account configuration, files that were picked up by LiveRamp at the same time might be grouped into the same import job. This affects how stats are displayed in the Connect UI and whether you can delete an individual file once it's been ingested. For more information, see “Considerations for Grouped Files”.

  5. Matching: The identifier data is used to match to any maintained and/or derived RampIDs associated with those identifiers, depending on the match precision level for that audience, encoded for your domain. At this point, stats on the number of unique records are displayed.

  6. Anonymization: The data is pseudonymized by removing any PII.

  7. Audience data update: Depending on the update method for the particular audience (incremental, segment refresh, or full refresh), new fields and segments are created and/or any previously-existing fields and segments are updated with the new data.

After the file has successfully completed processing, the file will show one of the following ingestion statuses:

  • No Distributions: If none of the fields or segments created or refreshed from the file are currently being distributed to a destination (other than in derived segments), the ingestion status will be “No Distributions”.

  • Ingested: If fields or segments created from the file or refreshed from the file are currently being distributed to at least one destination, the ingestion status will be “Ingested”.

  • Not Distributed: If none of the fields or segments created or refreshed from the file have been distributed to a destination in the last 30 days (other than in derived segments), the ingestion status will be “Not Distributed”.

  • Overwritten: If the file has been uploaded to an audience that uses the “full refresh” update method, in which a new file completely replaces all of the previously onboarded data for that entire audience, only the data from the most recent file is used for that audience. All other files will show an ingestion status of “Overwritten”.

Note

For the purposes of determining whether a file’s fields or segments are being distributed, we do not include any derived segments that have been created from that file’s fields or segments.

For distributions through destination accounts that use the "Active Refresh" option or the "Backfill on New File Upload" option, any new segment data for active segments will be sent to the destination (for more information, see "How LiveRamp Refreshes Distributed Data").

You can also refresh the data by resending all active segments (for more information, see "Resend Active Segments").