Key Concepts

Maintenance actions

Icepack supports five maintenance actions, always executed in this order:

expire_snapshots — Removes snapshots older than the retention threshold, freeing the metadata layer from tracking stale table states.
remove_orphan_files — Deletes data files on storage that are no longer referenced by any active snapshot.
rewrite_data_files — Compacts active small data files into fewer, optimally sized files. This is the most impactful action for query performance.
rewrite_position_delete_files — Merges remaining position-delete files back into their corresponding data files, eliminating the read-time overhead of applying deletes.
rewrite_manifests — Consolidates manifest files to reduce planning time for queries that scan large tables.

The ordering matters: snapshot expiration first dereferences stale files, orphan cleanup removes those physical files, compaction then rewrites only the active file set, and manifest rewriting runs last against the final file layout.

Health analysis

Icepack evaluates table health by inspecting Iceberg metadata for four key metrics:

Small file count — Number of data files below the target file size.
Snapshot count — Total snapshots retained by the table.
Manifest count — Number of manifest files in the current metadata.
Position delete files — Count of outstanding position-delete files.

Status metrics are compared against configurable health thresholds. /health returns health_status plus issue details, while /maintenance/recommendation decides whether Icepack should run actions under the configured maintenance policy. Status and health data are available in two flavors:

Live — Fetched through the configured metadata inspector. Accurate but takes a few seconds per table.
Cached — Served from Postgres. Returns in roughly 1 ms, refreshed periodically by the health-sync process.

Opt-out model

Icepack maintains tables by default. Any Iceberg table in a database that the platform team has added to the orchestrator allowlist is eligible for automated maintenance — no per-table action is required to enroll.

A table is maintained unless one of these opts it out:

Explicit opt-out. A data engineer sets the table property to false:
```
ALTER TABLE lakehouse_dev.my_database.my_table
  SET TBLPROPERTIES ('icepack.maintenance_enabled' = 'false');
```
The icepack.maintenance_enabled property is three-state: true (always maintained), false (never maintained), and unset (maintained in opt-out mode).
Hard exclude. Setting compaction_skip = 'true' removes the table from all Icepack maintenance regardless of mode — use it for tables undergoing migration or manual intervention.

The platform team still controls which databases are in scope via the databases allowlist in the Helm values. A table is maintained only if its database is allowlisted and it has not opted out.

Job lifecycle

All maintenance operations in Icepack are asynchronous. A job moves through these states:

Pending — The job has been accepted and queued for execution.
Running — Spark is actively executing the maintenance actions.
Completed — All requested actions finished successfully.
Failed — One or more actions encountered an error. Partial results may exist.
Cancelled — The job was cancelled before completion.

The typical flow: submit a maintenance request via POST and receive a 202 Accepted response with a job ID. Then poll GET /jobs/{id} to track progress. The orchestrator CronJob follows this same lifecycle automatically for all eligible tables.