Skip to content

Normattiva Pipeline

This page documents the current operational behavior of the Normattiva download and ingest flow.

Entry Points

Operational Truth (Current)

  • M (richiestaExport="M", multivigente) is reliable for automation and regular synchronization.
  • Bulk historical windows for V/O are best-effort and frequently return empty archives in async export flows.
  • Per-act codiceRedazionale exports for V/O can produce meaningful results and are currently the practical strategy for targeted historical backfill.

What We Are Doing Now

  • Run continuous/regular syncs with multivigente (M) as the default strategy.
  • Persist inventory and ingest state in manifest DBs to support incremental operation and reproducible backfill intent.
  • Treat broad V/O async windows as opportunistic (non-blocking) inputs, not core guarantees.
  • Use targeted backfill techniques where explicit act identities are known.

What Is Intentionally Not Guaranteed

  • Full bulk historical coverage from async V/O windows.
  • Exhaustive automatic discovery of all codiceRedazionale values.
  • API behavior stability for historical bulk exports beyond observed M reliability.

Practical Contributor Guidance

  • If implementing automation, optimize for M first.
  • If historical V/O coverage is required, design per-act workflows with explicit IDs and retry/reporting controls.
  • Keep docs and CLI help explicit about the distinction between reliable and best-effort modes.

Refactor TODO (Proposed)

  • Introduce downloader strategy selection in backend/src/law_graph/pipelines/normattiva/download/runner.py so ItalyDownloader is not hard-wired and future sources/profiles can be configured without structural duplication.
  • Narrowed ingest persistence surface by moving parse/finalize row-shape assembly behind IngestManifestDbStore runner-facing intent methods and reducing direct row-contract construction in ingest orchestration.
  • Externalize Normattiva client defaults (base URL, origin, referer, retry tuning) into environment/config wiring so deployments can override network behavior without code edits; keep sane defaults in code.