Skip to main content

Managing Input Bucket File Discovery (Polling)

The Local Encoder Vault App polls your input file location (an AWS S3 or GCS bucket) to discover new files for processing. You can optionally increase the polling interval to reduce AWS S3 or GCS ListBucket API calls and their associated costs.

You can optionally use the following configuration parameters to increase the polling interval if your buckets are updated infrequently. You can also add an initial delay to let the app "warm up" before the first poll.

Configuration Parameter

Description

Default Value

Example

LR_DEFAULT_POLLER_PERIOD

The interval in milliseconds to check for new files in the input bucket

The default value varies depending on whether your LR_VAULT_MODE parameter is configured for default mode (for long-running file processing) or task mode (used to enable single file processing). For more information, see the "Optional Configuration Parameters" section.

  • Default mode: 1000 ms

  • Task mode: 10000 ms

LR_DEFAULT_POLLER_PERIOD: "3600000"

LR_DEFAULT_POLLER_INITIAL_DELAY

The initial delay in milliseconds before the first poll starts. For example, you could specify 30000 to delay the first poll by 30 seconds to ensure the application is fully initialized before file discovery begins.

0 ms

LR_DEFAULT_POLLER_INITIAL_DELAY: "30000"

Operational Considerations

When configuring these parameters, consider the following:

  • Cost vs. freshness: Higher intervals reduce API costs but increase time-to-detect for new files. For example, consider the following polling frequencies:

    • Frequent updates (near-real-time): LR_DEFAULT_POLLER_PERIOD = 1000–5000 ms

    • Periodic updates (every few minutes): LR_DEFAULT_POLLER_PERIOD = 30000–300000 ms

    • Infrequent updates (hourly or daily): LR_DEFAULT_POLLER_PERIOD = 3600000–43200000 ms

    Caution

    Increasing the polling interval reduces API calls but will delay the discovery and processing of new files. Choose values that meet your freshness SLAs.

  • Mode awareness: The default value for LR_DEFAULT_POLLER_PERIOD may differ by deployment mode (Default vs. Task Mode). Override explicitly if you need consistent behavior across modes.

  • Backlog effects: If files arrive in large batches, a longer interval can delay detection of the entire batch; ensure downstream throughput matches your detection cadence.

  • Cloud provider quotas: Lower polling frequency can help stay within AWS S3 or GCS request quotas and avoid throttling.

  • Observability: Correlate poll intervals with observed discovery latency in metrics to choose a balanced value.

Polling FAQs

Q. Does a higher poll interval miss files?

A. No, files are not missed and are discovered later. The trade-off is between detection latency and API costs.

Q. Should I change the initial delay?

A. Only if your application or environment requires warm-up time (e.g., credentials loading, cache priming). Otherwise, keep it at 0 ms.

Q. How do I confirm an effective polling interval?

A. Check startup logs for the configured values or observe the timing between successive list calls in your logs.