Architecture Map
This page defines ownership boundaries so contributors can decide where code belongs before making changes.
Architecture and Coding Philosophy
- Prefer clarity over cleverness: explicit ownership beats implicit behavior.
- Keep one stable type per variable across a flow; fail fast on mismatches.
- Optimize for small, composable functions and deterministic side effects.
- Make incremental correctness and resumability the default over one-shot speed.
- Keep storage contracts explicit and centralized; avoid duplicated schema literals across modules.
Module Ownership Map
| Module | Core classes/functions | Owns | Must not own | Called by |
|---|---|---|---|---|
backend/src/law_graph/pipelines/normattiva/download/italy.py |
ItalyDownloader, ItalyDownloaderConfig |
Normattiva async download orchestration, export windows, archive processing strategy wiring, persistence through datastore + download manifest store | SQL schema primitives, ingest-state orchestration, act canonicalization logic | law_graph.cli (download ita), Normattiva pipeline origin wiring |
backend/src/law_graph/pipelines/normattiva/ingest/__init__.py |
run_ingest, IngestConfig, IngestResult |
Public ingest entrypoint and stable ingest API surface | Orchestration internals, worker/runtime implementation details, row-shape persistence assembly | law_graph.cli (ingest ita), tests and pipeline wiring |
backend/src/law_graph/pipelines/normattiva/ingest/coordinator.py |
run_ingest orchestration internals |
Ingest scheduling/concurrency, variant aggregation flow, integration with IngestManifestDbStore intent APIs, delta-build coordination |
CLI plumbing, schema migration logic, downloader/network API calls | Ingest public package entrypoint |
backend/src/law_graph/pipelines/normattiva/shared/{format_adapters,format_resolution,law_id,normattiva_keys,urns}.py |
format adapters, extension/spec resolution, law-id + key + URN helpers | Normattiva format-specific parsing and identifier/URN extraction logic with clear ownership boundaries | Ingest orchestration policy, downloader window scheduling, DB contract persistence | Downloader/backfill/shared snapshot + planning modules |
backend/src/law_graph/storage/repositories/ingest_repository.py |
IngestRepository and table primitives (upsert_*_raw, replace_*_raw) |
SQL/table-oriented ingest manifest DB primitives, schema validation/setup at repository boundary | Run policy, batching strategy, dedupe policy, orchestration intent APIs | IngestManifestDbStore |
backend/src/law_graph/storage/ingest_manifest_db.py |
IngestManifestDbStore, SourceOwnerAliasKey |
Application-facing ingest persistence boundary: lock scope, ID resolution, run lifecycle coordination, dedupe + unit-of-work policy | Downloader orchestration, parser logic, raw SQL spread across runners | Ingest runner and backfill orchestration |
backend/src/law_graph/delta/builder.py |
ActBuilder, build_variant_event, build_variant_event_from_projection, build_canonical_records |
Canonical record generation from snapshots/variant events: node intervals, content blobs, metadata deltas, missing interval hints | DB writes, runner scheduling, network calls | Ingest worker + aggregation path |
backend/src/law_graph/delta/resolver.py |
resolve_state, ResolvedActView |
Snapshot reconstruction at a date by replaying deltas and interval selection | Persistence concerns, ingest orchestration, downloader behavior | Analysis tools, tests, downstream reconstruction consumers |
Incremental-First Principle
- The default architecture target is incremental correctness and resumability, not full rebuild throughput.
- Download and ingest state must preserve checkpoint continuity so interrupted runs can resume deterministically.
- New features should integrate with manifest/state ledgers first; one-shot or ad-hoc flows are secondary and must not bypass canonical state tracking.
- Ingest runners must not emit per-variant
phase="parsing"progress DB checkpoints from hot accumulation loops. Keep progress writes at coarse boundaries only (plan start, optional bounded throttle, finalize). - Ingest worker hot paths must keep the canonical boundary at
DeltaProjection -> VariantEvent; do not rebuild full schemaActtrees in steady-state per-variant processing. ActSnapshotand snapshot/canonical reconstruction remain valid for non-ingest analysis, debugging, and explicit multi-snapshot workflows.- Full-schema ETL reconstruction must stay explicit and reconstruction-only.
Use
EtlBuilder.reconstruct_act_tree(...)for those workflows instead of generic public extraction helpers. - For non-gating acceptance evidence on heavy variants, use
backend/scripts/analysis/profile_direct_variant_event.pyto capture worker wall time, peak RSS, and schema-model creation counts for the direct path.
Sequence: Download -> Ingest -> Backfill
sequenceDiagram
participant CLI as law_graph.cli
participant DL as ItalyDownloader
participant DDB as download_manifest.db
participant FS as exports/normattiva/ita
participant IR as run_ingest
participant IDB as ingest_manifest.db
participant OUT as exports/processed/ita
participant BF as backfill ita
CLI->>DL: download ita
DL->>FS: write AKN/JSON/NIR artifacts
DL->>DDB: record files/events/checkpoints
CLI->>IR: ingest ita
IR->>FS: read downloaded artifacts
IR->>IDB: persist ingest state + canonical rows
IR->>OUT: emit processed/base+delta artifacts
CLI->>BF: backfill ita
BF->>IDB: read missing interval hints
BF->>DL: request targeted exports
DL->>FS: write backfilled artifacts
DL->>DDB: record additional download inventory/events