Skip to content

Architecture Map

This page defines ownership boundaries so contributors can decide where code belongs before making changes.

Architecture and Coding Philosophy

  • Prefer clarity over cleverness: explicit ownership beats implicit behavior.
  • Keep one stable type per variable across a flow; fail fast on mismatches.
  • Optimize for small, composable functions and deterministic side effects.
  • Make incremental correctness and resumability the default over one-shot speed.
  • Keep storage contracts explicit and centralized; avoid duplicated schema literals across modules.

Module Ownership Map

Module Core classes/functions Owns Must not own Called by
backend/src/law_graph/pipelines/normattiva/download/italy.py ItalyDownloader, ItalyDownloaderConfig Normattiva async download orchestration, export windows, archive processing strategy wiring, persistence through datastore + download manifest store SQL schema primitives, ingest-state orchestration, act canonicalization logic law_graph.cli (download ita), Normattiva pipeline origin wiring
backend/src/law_graph/pipelines/normattiva/ingest/__init__.py run_ingest, IngestConfig, IngestResult Public ingest entrypoint and stable ingest API surface Orchestration internals, worker/runtime implementation details, row-shape persistence assembly law_graph.cli (ingest ita), tests and pipeline wiring
backend/src/law_graph/pipelines/normattiva/ingest/coordinator.py run_ingest orchestration internals Ingest scheduling/concurrency, variant aggregation flow, integration with IngestManifestDbStore intent APIs, delta-build coordination CLI plumbing, schema migration logic, downloader/network API calls Ingest public package entrypoint
backend/src/law_graph/pipelines/normattiva/shared/{format_adapters,format_resolution,law_id,normattiva_keys,urns}.py format adapters, extension/spec resolution, law-id + key + URN helpers Normattiva format-specific parsing and identifier/URN extraction logic with clear ownership boundaries Ingest orchestration policy, downloader window scheduling, DB contract persistence Downloader/backfill/shared snapshot + planning modules
backend/src/law_graph/storage/repositories/ingest_repository.py IngestRepository and table primitives (upsert_*_raw, replace_*_raw) SQL/table-oriented ingest manifest DB primitives, schema validation/setup at repository boundary Run policy, batching strategy, dedupe policy, orchestration intent APIs IngestManifestDbStore
backend/src/law_graph/storage/ingest_manifest_db.py IngestManifestDbStore, SourceOwnerAliasKey Application-facing ingest persistence boundary: lock scope, ID resolution, run lifecycle coordination, dedupe + unit-of-work policy Downloader orchestration, parser logic, raw SQL spread across runners Ingest runner and backfill orchestration
backend/src/law_graph/delta/builder.py ActBuilder, build_variant_event, build_variant_event_from_projection, build_canonical_records Canonical record generation from snapshots/variant events: node intervals, content blobs, metadata deltas, missing interval hints DB writes, runner scheduling, network calls Ingest worker + aggregation path
backend/src/law_graph/delta/resolver.py resolve_state, ResolvedActView Snapshot reconstruction at a date by replaying deltas and interval selection Persistence concerns, ingest orchestration, downloader behavior Analysis tools, tests, downstream reconstruction consumers

Incremental-First Principle

  • The default architecture target is incremental correctness and resumability, not full rebuild throughput.
  • Download and ingest state must preserve checkpoint continuity so interrupted runs can resume deterministically.
  • New features should integrate with manifest/state ledgers first; one-shot or ad-hoc flows are secondary and must not bypass canonical state tracking.
  • Ingest runners must not emit per-variant phase="parsing" progress DB checkpoints from hot accumulation loops. Keep progress writes at coarse boundaries only (plan start, optional bounded throttle, finalize).
  • Ingest worker hot paths must keep the canonical boundary at DeltaProjection -> VariantEvent; do not rebuild full schema Act trees in steady-state per-variant processing.
  • ActSnapshot and snapshot/canonical reconstruction remain valid for non-ingest analysis, debugging, and explicit multi-snapshot workflows.
  • Full-schema ETL reconstruction must stay explicit and reconstruction-only. Use EtlBuilder.reconstruct_act_tree(...) for those workflows instead of generic public extraction helpers.
  • For non-gating acceptance evidence on heavy variants, use backend/scripts/analysis/profile_direct_variant_event.py to capture worker wall time, peak RSS, and schema-model creation counts for the direct path.

Sequence: Download -> Ingest -> Backfill

sequenceDiagram
    participant CLI as law_graph.cli
    participant DL as ItalyDownloader
    participant DDB as download_manifest.db
    participant FS as exports/normattiva/ita
    participant IR as run_ingest
    participant IDB as ingest_manifest.db
    participant OUT as exports/processed/ita
    participant BF as backfill ita

    CLI->>DL: download ita
    DL->>FS: write AKN/JSON/NIR artifacts
    DL->>DDB: record files/events/checkpoints

    CLI->>IR: ingest ita
    IR->>FS: read downloaded artifacts
    IR->>IDB: persist ingest state + canonical rows
    IR->>OUT: emit processed/base+delta artifacts

    CLI->>BF: backfill ita
    BF->>IDB: read missing interval hints
    BF->>DL: request targeted exports
    DL->>FS: write backfilled artifacts
    DL->>DDB: record additional download inventory/events