Skip to main content

Skip Previously-Processed Input Files

You can configure Local Encoder to automatically skip input files that have already been processed. When this feature is enabled and Local Encoder is started, Local Encoder automatically detects files that have already been processed by checking if corresponding output files exist. If the output for an input file already exists, the input file is skipped to avoid duplicate processing. Only files that have not been previously processed will be processed.

Note

This feature is optional and disabled by default. If you do not enable it, Local Encoder will process all input files, even those for which output files already exist.

You might want to enable this feature to skip previously-processed input files for the following reasons:

  • Efficiency: Prevents redundant processing of the same data

  • Cost Savings: Reduces compute time and storage operations

  • Safe Restart: Safe to restart the application without reprocessing completed files

  • Crash Recovery: After a crash or restart, the app resumes from where it left off

When Local Encoder starts, it performs a one-time scan of the output location to build a list of previously-processed files. This scan happens only once at application startup and is not refreshed while the app runs.

When the skip feature is enabled, Local Encoder derives each input file’s relative path under the configured input root and checks for a corresponding file at the same relative path under the configured output root (including any required output prefixes such as accountId). If the corresponding output exists, the input file is skipped.

Note

Files with a .csv, .psv, .or tsv extension are supported. Files with a .meta extension, hidden files (starting with “.”), and empty files (0 bytes) are not supported.

Configure the Skip Previously-Processed Input Files Feature

To enable the skip previously-processed files feature, use the appropriate method listed below (depending on your implementation type):

  • Enable via config (YAML): skip_processed_files: true

  • Enable via environment variable: LR_VAULT_SKIP_PROCESSED_FILES=true

Sample Log Messages

I-Skip_Previously_Processed_Files-sample_logs.png

FAQs

When is the output location scanned?

The output location is scanned only once at application startup. The snapshot of existing files is cached and used for the lifetime of the process. If new output files appear while the app is running (such as from another process), they won't be detected until the app restarts.

Does the skip feature work with encrypted output files?

Yes. When LR_VAULT_PUBLIC_KEY_ENCRYPTION=true, the skip logic accounts for the encrypted prefix in output filenames.

What happens if I want to process a file that’s been processed previously?

If you have enabled the skip feature but want to reprocess a file, delete the corresponding output file(s) from the output location and restart the application. The file will then be detected as "not processed" and will be reprocessed.

Does this feature work with subfolders in input?

Yes. When the skip filter is enabled, Local Encoder preserves the input file’s relative path under the input root and checks for a corresponding file at the same relative path under the configured output root. For example:

Input path: s3://vault-app-appliances-dev-input/vaultinput/subfolder

Output path: s3lr://com-liveramp-chp-vaultapp-output-dev/accountid/vaultinput/subfolder

Is horizontal scaling (multiple instances) supported with the skip feature?

The skip feature works in horizontal scaling scenarios when properly configured with LR_VAULT_MULTI_INSTANCE=true. Each instance processes files from its assigned folder, and the skip logic applies independently per instance.

What file types are checked for skip?

A: All files matching the LR_VAULT_FILENAME_PATTERN (default: csv, psv, tsv) are checked. Hidden files (starting with .) and .meta files are excluded.

Is the skip for previously-processed files enabled by default? How do I enable it?

No, the skip feature is optional and is disabled by default. Enable it via the appropriate method listed below (depending on your implementation type):

  • Enable via config (YAML): skip_processed_files: true

  • Enable via environment variable: LR_VAULT_SKIP_PROCESSED_FILES=true