Project Structure

Icepack is organized as a layered Python application. This page maps every top-level module in the icepack/ package and explains how the layers fit together.

Module map

Module	Responsibility
`api.py`	FastAPI app — all HTTP endpoints + lifespan.
`cli.py`	Standalone `icepack` CLI commands; bypasses the API queue.
`worker.py`	KEDA-invoked maintenance worker (one job per pod).
`jobs.py`	`JobStore` — Postgres-backed queue + state (DL-197 fence).
`locks.py`	`TableLock` — per-table ownership-checked lock.
`table_cache.py`	`TableCache` + `TableCacheSyncWorker` (atomic-swap refresh).
`history.py`	`HistoryStore` — schema management + persistent reads.
`backend.py`	`select_job_store` / `select_table_cache` factory.
`metrics.py`	OTel/Prometheus gauges (queue depth, workers, etc.).
`config.py`	Pydantic `CompactionConfig` — env-driven configuration.
`discovery.py`	`PolarisDiscovery` — PyIceberg catalog table listing (Polaris REST or Glue fallback).
`catalog.py`	PyIceberg catalog factories (Polaris + Glue).
`status.py`	Maps collected metadata metrics into the raw table status contract.
`health.py`	Derives policy-independent health status and issues from table status.
`recommendation.py`	Derives policy-aware maintenance recommendations from status, table metadata, and history.
`inspector.py`	Metadata inspector abstraction — PyIceberg chart/config default or configured iceberg-go helper.
`maintenance.py`	`MaintenanceRunner` — executes one action via Spark.
`spark.py`	`SparkQueryEngine` — Thrift/Kyuubi wrapper.
`service.py`	`CompactionService` — request-scoped service composition.
`orchestrator.py`	Auto-submit maintenance based on API recommendations.
`health_sync.py`	Shared health precomputation worker using the configured metadata inspector.
`health_sync_job.py`	CronJob entrypoint for one health-sync cycle.

Layered architecture

Icepack follows a strict layered architecture. Each layer only calls into the layer directly below it, keeping concerns separated and individual components testable in isolation.

Entry points

There are three ways to interact with Icepack:

CLI (cli.py) — Click commands for ad-hoc operations. The CLI talks directly to the service layer; it does not go through the API.
REST API (api.py) — FastAPI endpoints for programmatic access and the orchestrator. All job submission and health queries flow through here.
Web UI — An Alpine.js single-page application served by the API. The UI is a pure API consumer and introduces no additional backend logic.

Service layer

The CLI uses CompactionService (service.py) directly. The API uses the same lower-level discovery/inspector/runner components where needed, but job submission itself is a Postgres queue operation. CompactionService composes:

Discovery (discovery.py) — lists tables from the configured PyIceberg catalog (Polaris REST when configured, Glue fallback otherwise).
Metadata inspector (inspector.py) — loads Iceberg table metadata through the configured backend: PyIceberg by chart/config default, or the bundled iceberg-go helper in named deployed environment values.
MaintenanceRunner (maintenance.py) — executes a single maintenance action (compaction, snapshot expiry, etc.) by issuing Spark SQL.

Status, health, and recommendation are now separate contracts. status.py builds raw observed state, health.py derives issues from that state, and recommendation.py combines status with policy/history to produce action intent.

Query engine

At the bottom of the stack sits the QueryEngine protocol (spark.py). This is a minimal, backend-agnostic interface that the service layer uses to run SQL. The concrete implementation, SparkQueryEngine, wraps a PyHive/Thrift connection to Spark Thrift Server (or Kyuubi).

Data flow summary

CLI (Click)              REST API / Web UI
     |                         |
CompactionService        Postgres queue/cache + service components
     |                         |
Discovery / Inspector / Status / Health / Recommendation / MaintenanceRunner
(PyIceberg catalog)       (metadata inspector + Spark SQL)
     |                         |
QueryEngine Protocol -> SparkQueryEngine (PyHive / Thrift)

State management — job queue, table locks, table inventory cache, cached status snapshots, cached health assessments, and run history — is handled entirely through Postgres via the jobs.py, locks.py, table_cache.py, and history.py modules.