Skip to content

Submitting Maintenance

While the orchestrator CronJob handles maintenance automatically, you can also submit, monitor, and cancel jobs manually through the REST API. This is useful for one-off maintenance, testing new tables, or running a dry-run preview before enabling automation.

Valid actions

Icepack supports five maintenance actions, listed here in recommended execution order:

ActionWhat it does
expire_snapshotsRemoves snapshots older than the retention threshold, dereferencing stale files before physical cleanup.
remove_orphan_filesDeletes files on S3 that are no longer referenced by any active snapshot before compaction starts.
rewrite_data_filesCompacts active small data files and applies pending row-level deletes into new, optimally sized files.
rewrite_position_delete_filesCompacts remaining position-delete files that were not absorbed by the data-file rewrite.
rewrite_manifestsConsolidates manifest files after compaction produces the final file layout.

You can submit any combination of these actions in a single request. They execute in the order listed above regardless of the order in the request body.

To ask Icepack which actions it would run for a table under current policy, call:

Terminal window
curl -s https://<icepack-host>/tables/offer_service/offers/maintenance/recommendation | jq .

The recommendation endpoint is advisory. It returns policy, cadence/history, skip reasons, and action-specific evidence. Submitting maintenance still requires an explicit POST /tables/{database}/{table}/maintenance request.

Submit a job

POST /tables/{database}/{table}/maintenance

Send a JSON body with the list of actions to perform:

Terminal window
curl -s -X POST https://<icepack-host>/tables/offer_service/offers/maintenance \
-H "Content-Type: application/json" \
-d '{"actions": ["expire_snapshots", "remove_orphan_files", "rewrite_data_files"]}' \
-D -

A successful submission returns 202 Accepted with two important headers:

  • Location: /jobs/{job_id} — The URL to poll for job status.
  • Retry-After: 30 — Suggested polling interval in seconds.

The response body contains the full job object with status: "pending":

{
"job_id": "a1b2c3d4e5f6...",
"database": "offer_service",
"table_name": "offers",
"actions": ["expire_snapshots", "remove_orphan_files", "rewrite_data_files"],
"dry_run": false,
"status": "pending",
"submitted_at": "2026-04-25T14:30:00+00:00",
"started_at": null,
"completed_at": null,
"results": null,
"error": null
}

Error responses on submit

StatusMeaning
400Invalid or empty actions list.
404Table not found in the table cache.
409 ConflictAnother maintenance job is already running or pending for this table. Each table allows only one active job at a time. Wait for the existing job to finish or cancel it first.
422Invalid database or table name format.
503Drain mode is enabled. The API is intentionally rejecting new jobs during a backend migration or maintenance window.

Poll for status

Use the Location header from the submit response to poll for updates:

Terminal window
curl -s https://<icepack-host>/jobs/a1b2c3d4e5f6... | jq .

The status field progresses through these states:

StatusMeaning
pendingJob is queued, waiting for a worker to pick it up.
runningA worker is actively executing the maintenance actions.
completedAll actions finished successfully.
failedOne or more actions encountered an error.
cancelledThe job was cancelled before completion.

While the job is pending or running, the response includes a Retry-After: 30 header. Respect this interval to avoid unnecessary load on the API.

Read results

Once a job reaches a terminal state (completed, failed, or cancelled), the results array contains one entry per action:

{
"job_id": "a1b2c3d4e5f6...",
"status": "completed",
"results": [
{
"action": "expire_snapshots",
"success": true,
"message": "",
"impact": "138 snapshots expired",
"error": null,
"elapsed_seconds": 3.2
},
{
"action": "rewrite_data_files",
"success": true,
"message": "",
"impact": "847 -> 52 data files",
"error": null,
"elapsed_seconds": 124.8
}
],
"error": null
}

Each result includes:

  • action — The action that was executed.
  • success — Whether the action completed without error.
  • message — A human-readable summary of what happened.
  • impact — Concise count-oriented summary of what the action changed. This is null when the engine or metadata inspection cannot provide a reliable count.
  • errornull on success; contains the error message on failure.
  • elapsed_seconds — Wall-clock time the action took to execute.

If the job as a whole failed, the top-level error field contains the root cause. Individual action results may still be present for actions that ran before the failure.

Cancel a job

POST /jobs/{job_id}/cancel

Cancel a pending or running job:

Terminal window
curl -s -X POST https://<icepack-host>/jobs/a1b2c3d4e5f6.../cancel | jq .

A successful cancellation returns 200 with the updated job object showing status: "cancelled".

StatusMeaning
200Job was cancelled.
404Job not found.
409 ConflictJob is already in a terminal state (completed, failed, or cancelled). Terminal jobs cannot be cancelled.

For pending jobs, cancellation is immediate — the job is removed from the queue and the table lock is released. For running jobs, the cancel request revokes the worker’s fence token; the worker detects this and stops at the next checkpoint.

Dry run

To preview what Icepack would do without actually executing any maintenance, submit with "dry_run": true:

Terminal window
curl -s -X POST https://<icepack-host>/tables/offer_service/offers/maintenance \
-H "Content-Type: application/json" \
-d '{"actions": ["expire_snapshots", "remove_orphan_files", "rewrite_data_files"], "dry_run": true}' | jq .

Dry-run jobs complete immediately (no worker needed) and return a results array with a preview of each action:

{
"job_id": "f7e8d9c0b1a2...",
"status": "completed",
"dry_run": true,
"results": [
{
"action": "expire_snapshots",
"success": true,
"message": "dry run",
"impact": null,
"error": null,
"elapsed_seconds": 0.0
},
{
"action": "rewrite_data_files",
"success": true,
"message": "dry run",
"impact": null,
"error": null,
"elapsed_seconds": 0.0
}
]
}

Dry runs do not acquire a table lock, so they never conflict with real jobs. Use them to verify that your request body is valid and the table is visible in the cache before committing to a real submission.