Adds the documentation set for the self-sovereignty cutover seam: - NEW docs/adr/0002-post-handover-sovereignty-cutover.md following ADR-0001's shape (Status, Context, Decision, Consequences, Alternatives Considered). Documents the 8-tether map, the 30/70 provisioning split, the operator-driven trigger model, and the egress-block DoD proof. - ARCHITECTURE.md §11 now carries a §11.1 Phase 2 — Self-Sovereignty Cutover subsection with the 8-Job table, mermaid Phase-0 → Phase-1 → Handover → Phase-2 → Day-2 diagram, and links to issues #790/#791/#792/#793/#794. - INVIOLABLE-PRINCIPLES.md adds Principle #11: Sovereigns must be independent of openova-io after handover. Trigger phrase, cold-start exception, and cutover requirement spelled out. Cites #790 (umbrella), #791 (chart), #792 (api), #793 (ui), #794 (this PR). Extends, does not contradict, ADR-0001 §11 (Catalyst-on-Catalyst) and §2 (Inviolable Principles). Closes #794 Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
14 KiB
ADR-0002: Post-Handover Sovereignty Cutover
| Status | Accepted — 2026-05-04 |
| Authors | hatiyildiz, Claude (Opus 4.7) |
| Date | 2026-05-04 |
| Supersedes | — |
| Superseded by | — |
| Related | ADR-0001, #790, #791, #792, #793, #794 |
1. Status
Accepted 2026-05-04. This ADR extends ADR-0001; it does not contradict it. Section 4 of ADR-0001 (component layout) and section 11 of ARCHITECTURE.md (Catalyst-on-Catalyst) are now read in conjunction with this document. The sovereignty cutover described here is the canonical path by which a franchised Sovereign sheds its mothership tether after handover.
2. Context
2.1 The eight tethers
A franchised Sovereign emerging from Phase-1 provisioning is operationally tethered to the OpenOva mothership in eight places. The tether map (audited 2026-05-04 against infra/hetzner/cloudinit-control-plane.tftpl, clusters/_template/bootstrap-kit/*.yaml, products/catalyst/bootstrap/api/internal/..., and the published Catalyst chart values):
| # | Tether | Where | Phase |
|---|---|---|---|
| 1 | Flux GitRepository.url = github.com/openova-io/openova |
infra/hetzner/cloudinit-control-plane.tftpl:734 |
P0 |
| 2 | containerd registries.yaml rewrites every upstream registry → https://harbor.openova.io (mothership Harbor) |
cloudinit-control-plane.tftpl:680-694 |
P0 |
| 3 | 38 OCI HelmRepository urls = oci://ghcr.io/openova-io |
every clusters/_template/bootstrap-kit/*.yaml |
P0 |
| 4 | catalyst-api hardcodes https://github.com/openova-io/openova as env fallback |
provisioner.go, marketplace_settings.go |
P0 |
| 5 | flux-system/ghcr-pull Secret seeded for private GHCR pulls |
cloud-init | P0 |
| 6 | Crossplane provider packages from xpkg.upbound.io |
provider package URLs | P1 |
| 7 | Catalyst-authored images = ghcr.io/openova-io/openova/* |
products/catalyst/chart/values.yaml + clusters/_template/... |
P0 |
| 8 | OS package mirrors during cloud-init (apt, get.k3s.io) |
one-time during cloud-init | P2 |
A Sovereign that retains any of tethers 1–7 after handover is not a franchise — it is a managed replica of OpenOva. That defeats the franchise model, exposes the customer to OpenOva's availability profile, and breaks the political contract of "your data, your cluster, your control plane."
2.2 The rate-limit reality
Founder direction 2026-05-04: "the sovereign must be a true sovereign with no dependencies to openova" — and the corollary: "I am fine with sovereign cloud instances to temporarily use during the initial provision the openova harbor to avoid any unwanted rate limiting if there is such risk. The independence from openova must be post-handover process."
Docker Hub anonymous-pull limits sit at 100 pulls per 6 hours per IP (200 with auth). The bootstrap-kit alone references ~15 docker.io images via bp-cnpg, bp-keycloak, bp-cilium, bp-falco, and friends. Concurrent provisions, DoD-loop rebuilds, or back-to-back tear-down/re-provision cycles will reliably exhaust this budget and produce 429-shaped ImagePullBackOff failures during Phase 0 + Phase 1.
The mothership Harbor at harbor.openova.io already runs proxy-cache projects for ghcr / docker / k8s / gcr / quay / xpkg / ecr. Routing the cold-start image pulls through it absorbs the rate-limit risk for the ~10–30 minute provisioning window. Cold-start tether is acceptable; permanent tether is not.
2.3 Why a Phase-1.5 cutover is wrong
An earlier draft of #790 proposed cutting over to local Gitea + local Harbor during provisioning (between bootstrap-kit slot 06 and slot 16). This was rejected for two reasons:
- Rate-limit exposure during the cutover itself. Harbor proxy-cache warmup pulls upstream images. If that warmup runs while the rest of the cluster is still rolling out (60+ HelmReleases in-flight), the contention on docker.io anon limits becomes acute and produces flaky provisioning.
- Chicken-and-egg. Some of the components that perform the cutover (Gitea, Harbor, the cutover Job itself) are themselves blueprints pulled through registries that are about to be swapped. Performing this swap mid-roll means the in-flight HelmReleases see a registry change underneath them.
Cutover after Phase-1 is stable, after handover is acknowledged, on operator demand — that's when the cluster is quiet enough to swap registries cleanly.
2.4 Why "manual operator runbook" is wrong
Pivoting eight infrastructure tethers without a progress UI is a footgun. An operator who runs kubectl patch gitrepository, then forgets to patch the 38 HelmRepositories, then reboots the cluster, has bricked their franchise with no rollback path. A first-class blueprint with sequential Jobs, status ConfigMap, SSE event stream, and console UI is the only way to make the cutover safe to run in customer environments.
3. Decision
3.1 Introduce bp-self-sovereign-cutover
A new platform Blueprint published as oci://ghcr.io/openova-io/bp-self-sovereign-cutover:<semver>. It is added to the bootstrap-kit at slot 06a and reconciles dormant during Phase 1 — the chart installs JobTemplate ConfigMaps + RBAC + status ConfigMap, but does not create the eight Jobs until the operator triggers cutover.
This matches Inviolable Principle #3 (Helm-via-Flux is the only K8s manifest packaging unit) and Inviolable Principle #1 (event-driven, never polling) — see ADR-0001 §2.
3.2 Trigger model — operator-driven, post-handover
Cutover is initiated by:
- Operator clicks "Achieve True Sovereignty" on the admin console after handover lands (delivered in #793), OR
catalyst-apiauto-fires after the first successful operator login on a freshly handed-over Sovereign, after a configurable grace period (default off — operator-explicit by default; field can be flipped per-customer)
The button POSTs to POST /api/v1/sovereign/cutover/start (delivered in #792). catalyst-api translates that into a sequence of K8s Job creations from the JobTemplate ConfigMaps the chart installed. Progress streams to the UI via the existing SSE endpoint pattern (consistent with ADR-0001 §6).
3.3 The eight cutover steps (delivered in #791)
01 gitea-mirror git clone --mirror github.com/openova-io/openova
→ push to local gitea/openova/openova
02 harbor-projects Harbor v2 API: create proxy-ghcr, proxy-docker,
proxy-k8s, proxy-gcr, proxy-quay, proxy-xpkg,
proxy-ecr projects on the local Harbor
03 harbor-prewarm Pull-through-cache every image referenced by
clusters/_template/bootstrap-kit/*.yaml so the
local Harbor has bytes before traffic flips
04 registry-pivot DaemonSet rewrites /etc/rancher/k3s/registries.yaml
on every node (mothership Harbor → local Harbor),
triggers containerd config reload, sentinel pod
confirms a pull through the new path succeeds
05 flux-gitrepository-patch Patch flux-system GitRepository.url
github.com/openova-io/openova
→ http://gitea-http.gitea:3000/openova/openova
06 helmrepo-patches Patch all 38 OCI HelmRepositories
oci://ghcr.io/openova-io/* → oci://harbor.<sov>/openova-io/*
07 catalyst-api-env-patch Patch catalyst-api Deployment env
CATALYST_GITOPS_REPO_URL → local Gitea URL
(no upstream fallback after this point)
08 egress-block-test NetworkPolicy deny-egress to github.com,
ghcr.io, harbor.openova.io for 10 min;
all reconciles must remain green;
this is the DoD proof of independence
Each step writes its result into the chart's status ConfigMap. Step 08 holding green for 10 min is the only condition under which cutoverComplete=true is set.
3.4 Documentation
- ADR-0002 (this document) — the architectural decision.
ARCHITECTURE.md§11 — Phase-2 cutover added to the Catalyst-on-Catalyst section as the canonical post-handover sequence.INVIOLABLE-PRINCIPLES.mdPrinciple #11 — independence post-handover is non-negotiable.SOVEREIGN-PROVISIONING.md— to be updated by a follow-up ticket to wire Phase-2 into the provisioning runbook.
4. Consequences
4.1 Pre-cutover
A freshly handed-over Sovereign behaves exactly as today:
- Image pulls go through
harbor.openova.io(mothership) — rate-limit safe - Flux reconciles from
github.com/openova-io/openova— read-only public clone - HelmRepositories pull from
oci://ghcr.io/openova-io— public artefacts catalyst-apiCATALYST_GITOPS_REPO_URLfalls back to the upstream repo
This soft-tethered window is the provisioning safety mode. Customers who never run the cutover keep working, but they are not yet sovereign.
4.2 Post-cutover
After step 08 passes:
- Image pulls go through the customer's local Harbor; mothership Harbor is unreachable and the cluster does not care
- Flux reconciles from the customer's local Gitea
- HelmRepositories pull from the customer's local Harbor
catalyst-apireads the customer's Gitea URL with no upstream fallback- The customer can black-hole
github.com,ghcr.io,harbor.openova.ioat their firewall and the Sovereign continues operating
This is the only state in which OpenOva can truthfully describe a Sovereign as franchised rather than managed.
4.3 Operator experience
The "Achieve True Sovereignty" button is a one-click action with a progress card showing eight steps, current step name, percentage complete, error (if any), and the deny-egress holding-time countdown. The operator has full visibility; no opaque scripts, no kubectl pasting.
4.4 Reversibility
The cutover is not designed to be reversed. Once the Sovereign is independent, "going back" means re-tethering to the mothership — which has no business reason to exist. If a customer ever needs to recover (e.g., local Gitea data loss), the recovery path is restore-from-backup, not roll-back-the-cutover.
4.5 Failure handling
If a cutover step fails:
- The step records its failure in the status ConfigMap
- The SSE stream surfaces the error to the UI
- The cluster is left in a hybrid state (some tethers swapped, others still pointing at mothership) — this is functionally equivalent to pre-cutover for the un-swapped tethers, so the Sovereign remains operational
- The operator can re-run cutover; each step is idempotent (e.g.,
git clone --mirrorbecomesgit remote update;harbor-projectschecks before creating;flux-gitrepository-patchis a no-op if the URL is already correct)
4.6 Audit trail
Every cutover step publishes a CloudEvents-shaped envelope on NATS JetStream (catalyst.cutover.* subjects), consistent with ADR-0001 §6. The operator action that triggered the cutover, the operator identity (Keycloak token), and the eight step results land in the audit log stream.
5. Alternatives Considered
5.1 Phase-1.5 cutover during provisioning
Rejected. Cutting over mid-provision exposes the cluster to docker.io rate-limit failures during Harbor warmup (the warmup itself pulls upstream). Concurrent provisions or fast tear-down/re-provision cycles compound the risk. The mothership Harbor proxy is a known-good rate-limit absorber for the provisioning window; pulling that out before the window closes is gratuitous. (Original Agent B in #790 was cancelled for exactly this reason.)
5.2 Sovereign-built-in mirror (chicken-and-egg)
Rejected. A pure "Sovereign serves its own mirror from day one" design requires Gitea + Harbor to come up before the bootstrap-kit pulls images for Gitea + Harbor. Solving the bootstrap of the bootstrap is a research project; the cold-start tether to mothership Harbor neatly avoids the loop and lets us ship today.
5.3 Manual operator-driven cutover (runbook, no chart)
Rejected. Eight steps with cross-step dependencies + idempotency requirements + RBAC + status reporting + UI integration is exactly the surface a Helm chart + Job pattern is designed to package. A 30-page runbook with kubectl patch snippets is a footgun in a customer environment, and it cannot deliver the live progress card that operators expect.
5.4 Crossplane Composition for the cutover
Rejected. Per Inviolable Principle #3 (ADR-0001 §2.3), Crossplane stays in its lane: cloud-provider APIs. The cutover is K8s-to-K8s composition (creating Jobs, patching CRs, applying NetworkPolicies). That is Flux + a thin chart, not Crossplane.
5.5 Auto-fire on first login by default
Rejected as default; available as opt-in. Auto-firing the cutover the moment the operator logs in is operator-hostile — the operator may want to inspect the freshly handed-over Sovereign before pivoting eight infrastructure tethers. Default is operator-explicit (button click). Auto-fire is a per-customer field for installations that want zero-touch sovereignty.
6. Implementation pointers
| Concern | Where |
|---|---|
| Chart source | platform/self-sovereign-cutover/chart/ (delivered by #791) |
| Bootstrap-kit slot | clusters/_template/bootstrap-kit/06a-bp-self-sovereign-cutover.yaml |
| API handlers | products/catalyst/bootstrap/api/internal/handler/cutover.go (delivered by #792) |
| UI surface | products/catalyst/bootstrap/ui/ admin console card + button (delivered by #793) |
| Audit stream | NATS JetStream subject catalyst.cutover.* |
| Status surface | ConfigMap self-sovereign-cutover-status in flux-system namespace |
Part of OpenOva. Read in conjunction with ADR-0001, ARCHITECTURE.md §11, and INVIOLABLE-PRINCIPLES.md Principle #11.