Skip to content

System Overview

Icepack is a control plane for Apache Iceberg table maintenance. It discovers tables through a PyIceberg catalog, inspects Iceberg metadata, derives table health, and runs maintenance operations (rewrite, compact, expire, cleanup) through Spark via Kyuubi. Named deployed environment values use the bundled iceberg-go helper for metadata inspection; the base chart and app config still default to PyIceberg. All state lives in Postgres — there is no Redis, no in-memory queue, no sidecar cache.

Architecture diagram

Icepack architecture

High-level topology

flowchart TD
    UI["Web UI<br/>/ui Alpine.js SPA"] -->|HTTPS| API["FastAPI API<br/>icepack-api, N=2<br/>/readyz /healthz"]
    API -->|SQL| PG["Postgres<br/>jobs, job_queue, table_locks,<br/>job_actions, table_cache_entries,<br/>status + health snapshots"]
    API -->|"TableCacheSyncWorker<br/>table list"| Polaris["Polaris REST<br/>Catalog"]
    API -->|"live status/health<br/>metadata"| Glue["AWS Glue + S3<br/>metadata inspector"]

    PG -->|"polled every 30s<br/>by KEDA postgresql"| KEDA["KEDA ScaledJob<br/>max replicas 5 (default) / 3 (dev, stage, preprod)"]
    KEDA -->|spawns| Worker["Worker Pod<br/>one job per pod"]
    Worker -->|"load table metadata"| Glue
    Worker -->|"Spark SQL<br/>actions"| Spark["Spark / Kyuubi<br/>Thrift JDBC"]
    Spark --> Iceberg["Apache Iceberg<br/>tables on S3"]

    Polaris -.->|"table inventory"| Iceberg
    Glue -.->|metadata files| Iceberg

    Worker -->|"job<br/>history"| PG

    Orchestrator["Orchestrator CronJob<br/>default: every 2h<br/>dev/stage/preprod: hourly at :30"] -->|calls API| API
    HealthSync["Health Sync CronJob<br/>every 15m"] -->|"discover + analyze"| Glue
    HealthSync -->|"writes status + health<br/>snapshots"| PG
    HealthSync -->|"optional OTLP"| Mimir["Mimir / Prometheus"]

The standalone CLI (icepack ...) bypasses the API queue and runs directly against the configured PyIceberg catalog and Spark / Kyuubi endpoint.

The orchestrator.schedule controls how often the CronJob checks tables. orchestrator.cadenceHours separately throttles how often a given table may be maintained. Current Helm values run the default chart every two hours (0 */2 * * *) and the dev/stage/preprod environments hourly at :30 (30 * * * *); all keep the per-table cadence at 24 hours.

Health sync runs exclusively as a CronJob (health-sync-cronjob.yaml, every 15 minutes). The API process does not start an in-process health-sync worker.

Component inventory

Application (Python, icepack/)

ModuleResponsibility
api.pyFastAPI app — all HTTP endpoints + lifespan
cli.pyStandalone icepack CLI commands; bypasses the API queue
worker.pyKEDA-invoked maintenance worker (one job/pod)
jobs.pyJobStore — Postgres-backed queue + state (DL-197 fence)
locks.pyTableLock — per-table ownership-checked lock
table_cache.pyTableCache + TableCacheSyncWorker (atomic-swap refresh)
history.pyHistoryStore — schema management + persistent reads
backend.pyselect_job_store / select_table_cache factory
metrics.pyOTel/Prometheus gauges (queue depth, workers, etc.)
config.pyPydantic CompactionConfig — env-driven
discovery.pyPolarisDiscovery — PyIceberg catalog table listing (Polaris REST or Glue fallback)
catalog.pyPyIceberg catalog factories (Polaris + Glue)
status.pyMaps collected metadata metrics into the raw table status contract
health.pyDerives policy-independent health status and issues from table status
recommendation.pyDerives policy-aware maintenance action intent from status, table metadata, and history
inspector.pyMetadata inspector abstraction — PyIceberg chart/config default or configured iceberg-go helper
maintenance.pyMaintenanceRunner — executes one action via Spark
action_impact.pyDerives nullable per-action impact summaries from procedure output or before/after metadata metrics
spark_sql_overrides.pyResolves allowlisted icepack.spark.sql.* table properties into session-scoped Spark SQL SET statements
spark.pySparkQueryEngine — Thrift/Kyuubi wrapper
service.pyCompactionService — request-scoped service composition
orchestrator.pyAuto-submit maintenance based on API recommendations
health_sync.pyShared health precomputation worker using the configured metadata inspector
health_sync_job.pyCronJob entrypoint for one health-sync cycle

Infrastructure (Helm chart, charts/icepack/)

TemplateResourceNotes
api-deployment.yamlAPI pods (N=2)Startup/Liveness/Readiness probes with 5s timeout
api-service.yamlLoadBalancer -> APINLB with ACM-terminated TLS
worker-scaledjob.yamlKEDA ScaledJobpostgresql trigger, 30s polling, max 5 replicas by default / 3 in dev, stage, preprod
keda-postgres-auth.yamlTriggerAuthenticationReferences the postgres secret
postgres-secret.yamlSecrethost/user/password (materialized from values)
postgres-deployment.yaml + postgres-pvc.yaml + postgres-nlb.yamlInternal PostgresUsed when postgres.internal.enabled=true
orchestrator-cronjob.yamlCronJobDefault every 2h; dev/stage/preprod hourly at :30; calls API to submit jobs
health-sync-cronjob.yamlCronJob (every 15m)Writes status snapshots and health assessments; concurrency 2 in dev/stage/preprod
irsa.yamlServiceAccount + IRSAIAM role for S3 / Glue access
_helpers.tplHelpersicepack.postgresDatabase, icepack.postgresSslMode

Provisioning (Terraform, terraform/icepack-api/)

Each environment has its own Terraform root under terraform/icepack-api/env/{dev,stage,preprod}/main.tf. Each root provisions what Helm cannot: the AWS IAM role (IRSA), the Secrets Manager entry for the Postgres password, and the helm_release resource that applies charts/icepack with the corresponding values-{env}.yaml. The chart version + values are the trigger for Terraform to re-apply.

Authentication and secrets

Two secrets are managed in AWS Secrets Manager and injected into pods via Helm set_sensitive blocks — never hardcoded in values files:

SecretSecrets Manager IDConsumer
Polaris service principal{env}/polaris/icepack-principal (JSON: client_id, client_secret)API table-cache discovery; worker pods may receive the env vars, but maintenance currently loads metadata through Glue/S3
Internal Postgres passwordgenerated by Terraform (random_password)API, workers, health-sync, and KEDA trigger authentication

Polaris OAuth2 flow

Icepack uses PyIceberg’s REST catalog with OAuth2 client credentials — credential = "{client_id}:{client_secret}" and scope = PRINCIPAL_ROLE:ALL. PyIceberg handles the token exchange and caching. The PolarisConfig validator rejects half-configured deploys: if uri is set, both credential fields must also be set.

Injection path: Secrets Manager -> Terraform data.aws_secretsmanager_secret_version -> helm_release.set_sensitive -> ICEPACK_POLARIS_* env vars on pods -> Pydantic PolarisConfig. When polaris.uri is empty, create_iceberg_catalog falls back to Glue.

IRSA for Glue/S3

AWS access (Glue, S3) uses IRSA — the service account assumes aws_iam_role.icepack via OIDC, no static AWS credentials.

Current catalog/metadata split

  • TableCacheSyncWorker calls create_iceberg_catalog; in dev this uses Polaris REST because Terraform sets polaris.uri and credentials on API pods.
  • /tables/{db}/{table}/status and /tables/{db}/{table}/health inspect Iceberg metadata through Glue/S3 using the configured metadata inspector. The base chart and app config default to ICEPACK_METADATA_INSPECTOR=pyiceberg; named deployed environment values set ICEPACK_METADATA_INSPECTOR=iceberg-go, which shells out to the bundled helper for metadata reads.
  • /tables/{db}/{table}/maintenance/recommendation consumes status plus table policy/history to decide action intent. The orchestrator calls this endpoint instead of parsing /health.
  • health-sync-cronjob.yaml currently passes Glue settings but not Polaris settings, so the CronJob discovers and analyzes through Glue. The shared HealthSyncWorker.run_once can use Polaris for discovery if those env vars are supplied.

For Iceberg v3 tables with schema features not yet supported by PyIceberg, Icepack can use the bundled icepack-iceberg-inspector helper. The helper loads Iceberg metadata through Glue/S3 and streams the current snapshot’s manifest entries with iceberg-go. Active table health metrics are current snapshot metrics: live data files, live delete files, live data-file size, small-file count, and manifest count. Snapshot count and oldest snapshot age remain retention/history metrics across all retained snapshots.