f18dd8df19
235 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
7a32ac0a81
|
docs: flip 8 CRDs to 🚧 + amend ProvisioningState decision (slices A2+A3, #1095) (#1113)
A2 — IMPLEMENTATION-STATUS.md §4 - Flip Organization, Environment, Application, Blueprint, EnvironmentPolicy, SecretPolicy, Runbook from 📐 → 🚧 (schema landed via slices B1-B7). - Add Continuum and ProvisioningState rows (Continuum schema is in EPIC-0 even though controller is in EPIC-6 #1101; ProvisioningState was a 0-byte placeholder that audit slice H3 fixed). - Each row now cites its slice + PR + remaining controller work. A3 — EPICS-1-6-unified-design.md - Promote Status note to "Authoritative on 2026-05-08 after Phase-0 Group B (CRD schemas) substantially landed". - Amend §3.9 row 3 + §11 row 8: ProvisioningState decision changed from "Delete" to "Author the schema". The original audit missed catalyst-api/internal/store/crd_store.go which actively expects the CRD (GVR catalyst.openova.io/v1alpha1/provisioningstates) — without the CRD, every catalyst-api silently no-ops the CRD-projection path in CRDModeDisabled. Implemented in slice H3 / PR #1104. No code changes — pure docs sync to reflect 9 already-merged Phase-0 slices. Refs: #1094, #1095, A2 + A3 + amendment for H3. Co-authored-by: hatiyildiz <hatiyildiz@noreply.openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d966651fae
|
docs(adr-0001): ratify Accepted with §2.3 K8s-Composition amendment (#1095 slice A1) (#1103)
Promotes ADR-0001 from Proposed (2026-05-01) to Accepted (2026-05-08) with one amendment to §2.3: K8s-to-K8s reconciliation (RoleBindings, Kustomizations, ConfigMaps from a higher-level intent CR) is the responsibility of Flux Kustomizations or thin in-cluster controllers — never Crossplane Compositions. The useraccess- controller (slice C5 of #1095) is the canonical example. The earlier XUserAccess Composition that used provider-kubernetes is retired. Why amend: the audit synthesized in openova-private/.claude/audit-synthesis- 2026-05-08.md confirmed XUserAccess on every Sovereign was silently broken (Composition references provider-kubernetes which is not installed). The amendment makes the in-cluster path canonical so future K8s-to-K8s seams follow it without re-debating. Refs: #1094 (umbrella), #1095 (foundation), docs/EPICS-1-6-unified-design.md Co-authored-by: hatiyildiz <hatiyildiz@noreply.openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bcc5ac66f7
|
docs: unified design for EPICs 1-6 (Phase 0/1 roll-out — closes #1094 design milestone) (#1102)
* fix(catalyst): chroot cloud list views consume SSE cache (services/ingresses/deployments/statefulsets/daemonsets/namespaces/nodes) Two stacked bugs blocked 7 cloud list views (TC-066 services, TC-067 ingresses, TC-072 deployments, TC-073 statefulsets, TC-074 daemonsets, TC-078 namespaces, TC-079 nodes) from rendering live data even though the architecture graph view showed full counts for the same kinds: 1) The architecture-graph widget opened its OWN useK8sCacheStream subscription instead of consuming the page-level snapshot exposed on CloudPage's useCloud() context. That meant TWO concurrent EventSource connections per page — the chroot's HTTP/1.1 6-connections-per-origin budget left CloudPage's subscription stuck on "connecting" while the graph's stream populated its own private snapshot, so chip counts (read off CloudPage's snapshot) showed live data only when initialState happened to land before the budget tipped, and the K8sListPage instances always read an empty CloudPage snapshot. 2) K8sListPage's useMemo for `rows` listed only `[k8sSnapshot, kind, sortByName]` as deps. The snapshot Map is mutated IN-PLACE by useK8sCacheStream (intentional, to coalesce high-frequency bursts into one React render per tick) so its reference is stable across deltas — the memo never recomputed past the initial empty snapshot. The companion `k8sRevision` counter bumps on every applied event; it's the only signal that triggers re-derivation when the in-place Map mutates. The previous code referenced `k8sRevision` as a `void` no-op "for future memo passes" — but the future was now. Fix: * ArchitectureGraphPage now accepts optional `k8sSnapshot` + `k8sRevision` props. When provided (the production path via Architecture.tsx → useCloud()), the widget reads from the shared snapshot. When omitted (storybook / direct embed / tests), it falls back to opening its own subscription so the widget remains self-sufficient. * Architecture.tsx forwards `k8sSnapshot` + `k8sRevision` from useCloud() into the widget — collapsing the two SSE connections into one shared page-level subscription. * K8sListPage adds `k8sRevision` to the rows useMemo deps so the list re-derives on every applied delta, with an extended comment explaining why the revision is what makes the in-place-mutated Map observable. No behaviour change for the working K8s-backed kinds (configmaps, secrets, replicasets, endpointslices, persistentvolumes, pods) — those went through the same path; they only "worked" when the race happened to favour the CloudPage subscription on a given session. PVCs/Buckets/Volumes/StorageClasses/etc continue to read from the topology API and are unaffected. Closes 7 FAIL rows in the iter-3 Sovereign Console QA matrix. * docs: unified design for EPICs 1-6 (Phase 0/1 roll-out) Single canonical reference for the Phase 0/1 plan tracked under #1094: - Phase 0 (#1095): foundation contracts — 8 CRDs (Organization, Environment, Application, Blueprint, EnvironmentPolicy, SecretPolicy, Runbook, Continuum), 6 controllers (incl. useraccess-controller replacing the broken Crossplane Composition path), Keycloak full-CRUD, label vocabulary enforced via Kyverno, vCluster scaffold, 3-region multi-cluster substrate (mgmt + 2 data planes with Cilium ClusterMesh), and 9 cleanup/bug-fixes (P0). - Phase 1 — 6 EPICs in parallel: * #1096 Compliance — Kyverno policy library + watcher PolicyReport pipeline + weighted score aggregator + SRE/SecLead UI. * #1097 Applications — Application/Blueprint CRDs realized, application- controller, unified catalog-svc, live install + post-launch topology editor. * #1098 RBAC — useraccess-controller, Keycloak full mgmt, claims parsing, catalog tiers (viewer/dev/op/admin/owner), multi-grant UI. * #1099 Cloud Resources — k9s-on-web (drill-down + logs WS + exec + YAML editor + events) + Guacamole + projector. * #1100 Networking — default-deny CCNP baseline, Hubble UI, OTel Operator, Cilium ClusterMesh service routing, DMZ vCluster, NetBird mesh. * #1101 Multi-cluster + Continuum — CNPG cluster-pair, Continuum CRD/ controller (lease + lua-record body synthesizer + switchover), topology UI. The doc does not invent decisions — it stitches together what is already locked in INVIOLABLE-PRINCIPLES.md, NAMING-CONVENTION.md, BLUEPRINT- AUTHORING.md, adr/0001, SRE.md, and MULTI-REGION-DNS.md into one low-level reference for the dev-loop team (Architect + 1-3 Implementers + Test-Plan Author + Reviewer + Executor + Fix Authors + Cross-EPIC Coordinator). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Hati Yildiz <hati.yildiz@openova.io> Co-authored-by: hatiyildiz <hatiyildiz@noreply.openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f716fddf20
|
docs(adr): ADR-0003 RBAC ↔ NewAPI user-create hook contract (#796) (#807)
Contract spec for the unified-rbac → Keycloak → NewAPI → K8s Secret hook that materialises an SME admin's user-create action across the three systems atomically (with idempotent reconciliation). - Step 1: POST SME-vcluster Keycloak admin API → user in realm - Step 2: POST NewAPI admin API in-cluster → per-user api_key - Step 3: server-side-apply newapi-key-<uuid> Secret in tenant ns State machine (pending → kc_created → newapi_created → secret_applied → done, or → failed after 5 transient retries) persisted in unified-rbac's Postgres. Reconciliation is event-driven via a self-published NATS heartbeat subject, never a CronJob (per Inviolable Principle 1 and ADR-0001 §6). Rollback is the inverse order, idempotent. Locked decisions [A] [B] [Q-mine-3] [Q-mine-4] from #795 are honored; not relitigated. Downstream tickets #798, #799, #802, #803 bind to this contract. Refs: #796 (this issue), #795 (parent epic), ADR-0001, ADR-0002 Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
59cdfe5a77
|
docs: ADR-0002 + ARCHITECTURE §11.1 + Inviolable #11 — post-handover sovereignty cutover (#794) (#797)
Adds the documentation set for the self-sovereignty cutover seam: - NEW docs/adr/0002-post-handover-sovereignty-cutover.md following ADR-0001's shape (Status, Context, Decision, Consequences, Alternatives Considered). Documents the 8-tether map, the 30/70 provisioning split, the operator-driven trigger model, and the egress-block DoD proof. - ARCHITECTURE.md §11 now carries a §11.1 Phase 2 — Self-Sovereignty Cutover subsection with the 8-Job table, mermaid Phase-0 → Phase-1 → Handover → Phase-2 → Day-2 diagram, and links to issues #790/#791/#792/#793/#794. - INVIOLABLE-PRINCIPLES.md adds Principle #11: Sovereigns must be independent of openova-io after handover. Trigger phrase, cold-start exception, and cutover requirement spelled out. Cites #790 (umbrella), #791 (chart), #792 (api), #793 (ui), #794 (this PR). Extends, does not contradict, ADR-0001 §11 (Catalyst-on-Catalyst) and §2 (Inviolable Principles). Closes #794 Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io> |
||
|
|
53bc4357ca
|
feat(provisioner): cluster-autoscaler-hcloud + wizard footprint estimate (closes #767) (#776)
* feat(provisioner): cluster-autoscaler-hcloud + wizard footprint estimate (closes #767) Two-pronged fix for the FailedScheduling pattern that hit otech92 (2x cpx32 workers couldn't fit external-secrets-webhook because the bootstrap-kit ate the full 16 GB): 1. PRE-LAUNCH ESTIMATE — wizard StepReview now surfaces a "Footprint estimate" Section with: bootstrap-kit baseline (sum of mandatory-tier component footprints), selected components delta, control-plane overhead, and a "Recommended N x <SKU>" line that turns amber when the operator's chosen worker count is below the rollup. Backed by per-component RAM/CPU floors in components/wizard/steps/componentFootprints.ts (covered by 12 unit tests including the otech92 reproduction). 2. RUNTIME AUTOSCALING — new bp-cluster-autoscaler-hcloud Blueprint added at bootstrap-kit slot 40. Wraps the upstream kubernetes/autoscaler chart 9.46.6 (appVersion 1.32.0) with the Hetzner cloud-provider. Token wired from the canonical flux-system/cloud-credentials.hcloud-token Secret cloud-init writes (mirrors the velero/harbor object-storage pattern). Pinned to the control-plane node so the autoscaler never schedules onto a worker it could itself terminate. 10-minute scale-down idle as the cost-saving default. Documented in docs/ARCHITECTURE.md sec.14 (Autoscaling) — explains how VPA / HPA / KEDA / cluster-autoscaler compose, why we picked cluster-autoscaler over KEDA for cluster scaling, and the bounds + safety story. Per the issue's MVP scope, this PR ships the blueprint + StepReview estimate WITHOUT the wizard StepProvider min/max pair refactor or the tofu node-pool template restructuring. Those are tracked as a follow-up issue (scope-control rule per docs/INVIOLABLE-PRINCIPLES.md #1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(provisioner): move cluster-autoscaler to slot 50 + register in expected-bootstrap-deps Slot 40 was already forward-declared for bp-llm-gateway in scripts/expected- bootstrap-deps.yaml — the dependency-graph-audit CI check fired on PR #776 because the file existed without a matching entry in the expected DAG, AND collided with a reserved slot. Move to slot 50 (after the W2.K4 cohort + slot 49 bp-cert-manager-powerdns-webhook) and add the matching entry to the expected-bootstrap-deps.yaml so the audit passes. `scripts/check-bootstrap-deps.sh` runs clean locally now (drift=0, cycles=0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7bd1821473
|
docs(wbs): Mermaid reflects ALL Phase-8a 2026-05-02 chart bug bash (#577)
Founder corrective: prior diagram missed: - 9 chart bugs surfaced + fixed today (#549, #553, #561, #567-#571, #568) - 3 still in flight (#562 cilium-operator gateway-controller race, #563 NS delegation + LB:53 + DNS-01 wildcard, #565 harbor CNPG) - 12 chart bugs from prior session days (#474, #488, #489, #491, #492, #494, #503, #506, #508, #510, #519, #536, #538, #539, #340) Adds Phase 0d · Phase-8a chart bug bash with all of them. Edges: every fix gates the bp-* HR it makes possible on a fresh Sovereign integration test. Edge from #563 (handover-URL DNS-01 wildcard chain) → #454 makes the actual gating relationship explicit: without #563 there is no working `console.<sovereign>.omani.works`, which means no Phase-8a gate met. The diagram should now match what the founder sees actually failing on otech22, not the chart-released optimism of an earlier draft. Co-authored-by: hatiyildiz <hatiyildiz@openova.io> |
||
|
|
dee2be5cc8
|
docs(wbs): Mermaid DAG shows actual Phase-8a dependency cascade (#559)
Per founder corrective: existing diagram missed the real blockers surfaced during otech10..otech22 burns. The image-pull-through gap (#557) and the cross-namespace secret gap (#543, #544) gate every workload pull from a public registry — without them, Sovereign hits DockerHub anonymous rate-limit on first provision and 30+ HRs are ImagePullBackOff/CreateContainerConfigError. Adds: - Phase 0b · Image pull-through (#557 + #557B Sovereign-Harbor swap + #557C charts global.imageRegistry templating). Edges to NATS / Gitea / Harbor / Grafana / Loki / Mimir / PowerDNS / Crossplane / cert-manager-powerdns-webhook / Trivy / Kyverno / SPIRE / OpenBao - Phase 0c · Cross-namespace secrets (#543 ghcr-pull Reflector + #544 powerdns-api-credentials reflect). Edges to bp-catalyst-platform and bp-cert-manager-powerdns-webhook - Phase 1 additions: #542 kubeconfig CP-IP fix and #547 helmwatch 38-HR threshold both gate Phase 8a integration test - Phase 0b → Phase 8b edge: post-handover Sovereign-Harbor swap is what makes "zero contabo dependency" DoD-met possible WBS now reflects the cascade observed live, not the pre-Phase-8a model. Co-authored-by: hatiyildiz <hatiyildiz@openova.io> |
||
|
|
a6a3a9b3b1
|
docs(wbs): add §9b Phase-8a live iteration log (2026-05-01→05-02) (#555)
Per founder corrective: WBS hadn't been updated in 16h. The active Phase-8a iteration is what's actually closing the integration-tested gap, but the WBS still read as if Phase 8a hadn't started. New §9b captures: - 18 fixes landed in last 36h (#317, #340, #474, #487, #488, #489, #491, #492, #494, #503, #506, #508, #510, #519, #531/#532/#534/#535/ #537, #536, #538, #539/#540, #542, #544, #547, #549, #553) - Symptom → root cause → fix → PR per row, all linked to deployed SHAs - Background agents in flight (#543 ghcr-pull Reflector, #548 dynadot ClusterIssuer) - Risk Register status — R3 / R4 exercised + resolved, R2 / R5 / R7 / R8 still open Updated as bugs land. The handover-state truth lives here, not in Claude memory files. Co-authored-by: hatiyildiz <hatiyildiz@openova.io> |
||
|
|
1628a1b3aa
|
ci(preflight): GHCR auth for A+E + WBS tick — all 4 preflights done (#470)
First runs of preflight A (bootstrap-kit) and E (Keycloak) failed with the same error: helm OCI pull from ghcr.io/openova-io/bp-* returning 401 'unauthorized: authentication required'. bp-* are PRIVATE GHCR packages. #460's agent fixed it for B in c26fbcaf. #461's already had GHCR login. This commit applies the same helm-registry-login pattern to A and E. WBS state on main after this commit: - done (35): all chart-level + #317 + #319 + #453 + 4 preflights - wip (0) - blocked (3): 454, 455, 456 (Phase-8 live runs, operator-driven) The preflights' first runs ALREADY surfaced a real CI bug pattern that would have hit Phase 8a — exactly what they're for. Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
a7a90619e5
|
docs(wbs): mark #461 done — preflight C cilium-httproute shipped (#469)
PR #465 merged at
|
||
|
|
4a7eb42d26
|
feat(ci): Phase-8a preflight E — Keycloak realm-import + kubectl OIDC client (closes #462) (#468)
Surfaces Risk R6 (docs/omantel-handover-wbs.md §9a — Keycloak realm-import config-CLI bootstrap timing untested). bp-keycloak 1.2.0 ships a sovereign realm + a public kubectl OIDC client via the upstream bitnami/keycloak chart's keycloakConfigCli post-install Helm hook (issue #326); this workflow proves it actually wires up on a clean cluster before we run it on a real Sovereign. Workflow installs bp-keycloak 1.2.0 on a kind cluster (helm/kind-action v1, kindest/node:v1.30.6 — same versions as test-bootstrap-kit), waits for the keycloak StatefulSet to roll out, polls for the keycloakConfigCli post-install Job by label (app.kubernetes.io/component=keycloak-config-cli), waits for it to Complete, port-forwards svc/keycloak and asserts: 1. /realms/sovereign returns 200 (realm exists in Keycloak's DB). 2. The kubectl OIDC client is provisioned with publicClient=true, redirectUris contains http://localhost:8000 (kubectl-oidc-login default), and the groups client scope is wired with the oidc-group-membership-mapper (the per-Sovereign k3s api-server's --oidc-groups-claim flag depends on this). Acceptance per ticket: if the post-install Job fails, the workflow summary captures Job logs + StatefulSet logs + cluster state via GITHUB_STEP_SUMMARY so a failed run is debuggable without re-running. Triggers are event-driven only per CLAUDE.md "every workflow MUST be event-driven, NEVER scheduled" rule — push on the workflow file itself plus workflow_dispatch for ad-hoc re-runs. Closes #462. Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
abac00d8b3
|
feat(ci): Phase-8a preflight A — bootstrap-kit reconcile dry-run on kind (closes #459) (#467)
Surfaces Risk-register R4 (docs/omantel-handover-wbs.md §9a — bootstrap-kit reconcile-chain order untested under load) before Phase 8a (#454) burns Hetzner credit on test.omani.works. New workflow .github/workflows/preflight-bootstrap-kit.yaml: - kind v0.25.0 + kindest/node:v1.30.6 - Gateway API CRDs v1.2.0 standard channel - Full Flux controller set (fluxcd/flux2/action@main + flux install) - Mock Secrets: flux-system/object-storage, flux-system/cloud-credentials, flux-system/ghcr-pull - Renders clusters/_template/bootstrap-kit/ with SOVEREIGN_FQDN_PLACEHOLDER + ${SOVEREIGN_FQDN} -> test-sov.example.com (matches test harness pattern in tests/e2e/bootstrap-kit/main_test.go:247) - 30 x 30s HR poll loop, never-fail-fast (goal: surface ALL bugs, not stop at first) - $GITHUB_STEP_SUMMARY emits Markdown table of every HR's terminal Ready condition + per-HR describe blocks for non-Ready + recent flux-system events + raw hrs.json artefact (14d retention) - Event-driven only: push on self-edit + workflow_dispatch; no schedule: cron (per CLAUDE.md "every workflow MUST be event-driven") Canonical seam reused (no duplication): - kind setup + flux install pattern from .github/workflows/test-bootstrap-kit.yaml - bootstrap-kit kustomization at clusters/_template/bootstrap-kit/ (the same overlay production Sovereigns consume; substitution shape mirrors tests/e2e/bootstrap-kit/main_test.go:247) - event-driven shape per .github/workflows/check-vendor-coupling.yaml (#428) Out of scope (sibling preflights): - #460 Crossplane provider-hcloud Healthy probe - #461 Cilium Gateway HTTPRoute admission - #462 Keycloak realm-import Validated: actionlint clean, YAML parses cleanly. WBS row #459 in §9 updated: 🟡 in flight -> 🟢 done (workflow shipped). Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
56b7cdbb6d
|
docs(wbs): tick 21 — #453 done; 4 Phase-8a preflights dispatched; §13 cap rule corrected (#464)
Twice-corrected discipline rule per founder pushback at 15:55 UTC: - Original 15:38 'max 1-2 agents' was over-correction - Real rule: scope-based not count-based - 'Min 3, max 5 in flight' from feedback_agent_orchestration_discipline.md still holds; what was wrong was dispatching out-of-scope work - 4 agents in flight now: #459/#460/#461/#462 — all Phase-8a preflight de-risking against §9a Risk register State on main after this commit: - done (31): all minimal Sovereign blueprints + foundation + CI + Phase 6 + Phase 7 (#317 + #319 + #453 contract reconciliation) - wip (4): 459, 460, 461, 462 (Phase-8a preflights, kind-cluster de-risking) - blocked (3): 454, 455, 456 (Phase 8 operator-driven live runs) DAG additions: - New PRE subgraph 'Phase-8a preflight · de-risk before live run' - Edges T459/T460/T461/T462 → T454 (preflights gate Phase 8a) - §9 rows for #459-#462 - §13 rewritten with twice-corrected scope-not-count discipline Co-authored-by: hatiyildiz <hatiyildiz@noreply.function-com> |
||
|
|
18d59174d3
|
fix(catalyst-api): #317↔#319 contract — preserve slim deployment record post-handover for redirect (closes #453) (#458)
#317's FinaliseHandover deleted the deployment record entirely, which meant #319's `AdoptedAt` field was dormant — the post-handover redirect at console.openova.io/sovereign/<id> 404'd instead of 301-ing to console.<sovereign-fqdn>. Fix: replace `store.Delete(id)` at the end of FinaliseHandover with a slim-record save via the new `Deployment.SlimForHandover(adoptedAt)` seam. The slim shape retains: - id, sovereignFQDN, orgName, orgEmail, startedAt (audit-minimum) - AdoptedAt = now() (redirect contract from #319 PR #451) - Status: "adopted" - closed eventsCh + done channels Operational fields are zeroed: Result/tofuState, kubeconfig hash, PDM reservation token, error, credentials. Consistent with §0 minimum-retention principle. Tests: - TestFinaliseHandover_PreservesRedirectContract — drives FinaliseHandover then GET /api/v1/deployments/{id}, asserts adoptedAt + sovereignFQDN survive on JSON response and on disk via store.Load round-trip - TestSlimForHandover (table-driven) — full-record + minimal-record transforms; asserts audit fields kept, redirect field set, operational fields zeroed, credentials zeroed, channels closed - TestSlimForHandover_StoreRecordRoundTrip — JSON encode/decode cross-Pod-restart guard - TestFinaliseHandover_FullFlow extended with slim-shape assertions Anti-duplication: SlimForHandover lives next to other Deployment methods in deployments.go (canonical seam). FinaliseHandover modifies the same file referenced in the issue (handover.go); no parallel binary or script. WBS row #453 → done; class line T453 wip → done. Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
51e24ea3b8
|
docs(wbs): truthful rewrite — match real DoD; carve out post-omantel epic #320 (#457)
Per founder corrective 2026-05-01. Prior WBS over-promised by: 1. Treating chart-released and chart-verified as 'done' indistinguishable from DoD-met 2. Bundling epic #320 IAM access plane (#322-#326) as if part of omantel handover scope 3. Hiding the fact that ZERO of the 23 minimal blueprints have ever been reconciled together on a fresh Sovereign Rewrite changes: - §0 (NEW): Truth-of-state — explicit ladder chart-released → chart-verified → integration-tested → DoD-met. Today every 'done' ticket is at chart level; zero are integration-tested; zero are DoD-met. - §1: explicit out-of-scope carve-out for epic #320 - §2: split chart-status from reconcile-chain-status; latter reads ❓ unknown for all 23 (truthful) - §4 DAG: * adds Phase 7 cleanup #453 (#317↔#319 contract reconciliation) * adds Phase 8a/8b/8c live-execution gates (#454/#455/#456) * adds 🎯 DoD-met gate node tied to #456 * promotes T425 into Phase 4 (it was wrongly in SCAF subgraph as if it were sustainment work — it's the foundation for #383/#384) * keeps SCAF subgraph for genuine CI guardrails (#428/#438/#429/#430) - §9: adds rows for #453/#454/#455/#456 explicitly bold + marks #324/#325 as ⏸ parked per scope rewrite - §9a (NEW): Risk register — 8 known gaps that will surface in Phase 8a - §12 (NEW): What we are NOT doing now — scope discipline - §13 (NEW): Agent-orchestration reset — max 1-2 agents on Phase-8 follow-ups; NO capacity-fill on post-omantel scope until #456 closes The 5 sequential steps to DoD-met are listed in §12. There are no parallel-agent shortcuts past Phase 7. Phase 8 is operator-driven. Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
3a34969a2f
|
feat(catalyst+pdm): Sovereign self-decommission + post-handover redirect (closes #319) (#451)
Customer-side decommission UI + PDM release endpoints + Catalyst-Zero
redirect to console.<sovereign-fqdn> once handover is finalised.
Anti-duplication map (canonical seams reused, NOT duplicated):
- catalyst-api wipe.go: existing wipe endpoint already drives PDM
release + Hetzner purge + tofu destroy + local cleanup. The new
DecommissionPage POSTs to the same endpoint with an optional
backup-destination payload.
- PDM Allocator.Release: child zone delete + parent-zone NS revert
+ allocation row delete already idempotent. The new sovereign-side
POST /api/v1/release is a thin FQDN-shaped wrapper that splits at
the first dot and delegates to Allocator.Release.
- The orphan force-release path adds gates (X-Force-Release-Confirm
header, 30-day grace, DNS-NXDOMAIN check) on top of the same seam.
Scope contract with #317 (handover finalisation): NOT touching
internal/handler/handover.go. AdoptedAt is a new contract field on
Deployment + store.Record that the redirect helper consumes; future
#317 enhancement will populate it before deletion.
Files:
core/pool-domain-manager/internal/handler/release.go (NEW)
core/pool-domain-manager/internal/handler/release_test.go (NEW)
core/pool-domain-manager/internal/handler/handler.go (route wiring)
products/catalyst/bootstrap/api/internal/handler/deployments.go (AdoptedAt field + State()/toRecord/fromRecord)
products/catalyst/bootstrap/api/internal/handler/deployments_adopted_test.go (NEW)
products/catalyst/bootstrap/api/internal/store/store.go (AdoptedAt persistence)
products/catalyst/bootstrap/ui/src/pages/sovereign/DecommissionPage.tsx (NEW)
products/catalyst/bootstrap/ui/src/pages/sovereign/DecommissionPage.test.tsx (NEW)
products/catalyst/bootstrap/ui/src/pages/sovereign/Dashboard.tsx (Decommission link)
products/catalyst/bootstrap/ui/src/app/router.tsx (redirect + decom route)
docs/omantel-handover-wbs.md (T319 → done)
Tests: 13 new Go test cases + 5 new vitest cases all green. catalyst-
api + PDM full suites pass. Live execution against omantel deferred to
Phase 8 per ticket scope (no Dynadot/Hetzner exec here).
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
|
||
|
|
efedbb04af
|
docs(wbs): tick 20 — #324 + #325 dispatched (4 in flight while #319 finishes) (#450)
Filling capacity with the heavy IAM-epic tickets while #319 is still running through its test-fix loops. Non-overlap matrix maintained: - #319: PDM release + sovereign/Decommission + Dashboard + router + deployments + store - #323: handler/user_access + UI admin/user-access - #324: handler/bastion + internal/bastion/ + UI sovereign/BastionPage - #325: handler/pod_exec + internal/podexec/ + UI admin/pod-console + asciinema → Object Storage State on main after this commit: - done (29) - wip (4): 319, 323, 324, 325 Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
d50b1d73fd
|
docs(wbs): tick 19 — #326 done; #319 + #323 sole wip (#449)
Class line had stale T326 in wip — both #322 and #326 merged on main ( |
||
|
|
20b896070f
|
feat(bp-keycloak + infra): Sovereign K8s OIDC config for kubectl via per-Sovereign Keycloak realm (closes #326) (#448)
Wires the per-Sovereign K8s api-server's --oidc-* validator to the
per-Sovereign Keycloak realm so customer admins can authenticate
kubectl directly against their Sovereign — no static admin-kubeconfig
handoff, no rotated bearer-token exchange.
infra (cloud-init):
- Add 6 --kube-apiserver-arg=oidc-* flags to the k3s install line in
infra/hetzner/cloudinit-control-plane.tftpl. Issuer URL composed
from sovereign_fqdn (https://auth.\${sovereign_fqdn}/realms/sovereign)
per INVIOLABLE-PRINCIPLES #4 — never hardcoded. Username/groups
prefixes scope OIDC subjects under "oidc:" so RoleBindings reference
e.g. subjects[0].name=oidc:alice@org, distinct from local SAs/x509.
Canonical seam (anti-duplication rule, ADR-0001 §11.3):
- The bp-keycloak chart already bundles bitnami/keycloak's
keycloakConfigCli post-install Helm hook Job, which imports realms
declared under values.keycloak.keycloakConfigCli.configuration. We
enable the existing seam — no bespoke kubectl-exec realm-creation
script, no custom Admin-API call from catalyst-api.
bp-keycloak chart (1.1.2 → 1.2.0):
- Enable keycloakConfigCli + ship inline sovereign-realm.json with:
realm "sovereign" (invariant per Sovereign — Keycloak resolves the
issuer claim from the request hostname, so no per-FQDN realm
rename), default groups sovereign-admins/-ops/-viewers, oidc-group
-membership-mapper emitting "groups" claim, public OIDC client
"kubectl" with localhost:8000 + OOB redirect URIs (kubectl-oidc
-login defaults), publicClient=true (kubectl runs locally and
cannot safely hold a secret), PKCE S256 enforced.
- Bump version 1.1.2 → 1.2.0 (semver MINOR, additive shape).
- Bump bootstrap-kit slot 09 in _template/, omantel.omani.works/,
otech.omani.works/ to version: 1.2.0.
- New chart test tests/oidc-kubectl-client.sh (4 cases) — all green.
- Existing tests/observability-toggle.sh — still green.
Documentation:
- Add §11 "kubectl OIDC for customer admins" runbook to
docs/omantel-handover-wbs.md with one-time workstation setup
(kubectl krew install oidc-login + config set-credentials),
sovereign-admin RBAC binding (oidc:sovereign-admins → cluster
-admin), and 401-debugging table mapping common symptoms to
root causes.
- Carve #326 out of §7 "Out of scope" — it is shipped.
- Add §9 status row.
Validation:
- grep -c 'oidc-issuer-url' infra/hetzner/cloudinit-control-plane.tftpl
→ 2 (comment + the actual flag in the curl line)
- grep -c 'oidc-username-claim' → 2
- helm template platform/keycloak/chart → renders post-install
keycloak-config-cli Job + ConfigMap with kubectl client (3 hits
on grep "kubectl"; 1 hit on "clientId": "kubectl")
- bash scripts/check-vendor-coupling.sh → exit 0 (HARD-FAIL mode)
- 4/4 oidc-kubectl-client gates green; 3/3 observability-toggle
gates green
Out of scope (deferred to follow-up tickets):
- Per-Sovereign user provisioning UI (#322, #323)
- Refresh-token revocation on RoleBinding deletion (#324)
- provider-kubernetes Crossplane ProviderConfig per Sovereign (#321)
- omantel migration / Phase 8 live execution
NO catalyst-api or UI source files touched (those are #319/#322/#323
agents' territories per agent brief).
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
|
||
|
|
c1c5766706
|
docs(wbs): tick 18 — #322 UserAccess CRD released (PR #446, bp-crossplane-claims 1.1.0) (#447)
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
7ea496ba64
|
docs(wbs): tick 17 — Phase 7 + IAM epic #320 dispatched (4 in flight) (#445)
State on main after this commit: - done (27): all minimal Sovereign blueprints + foundation + CI guards + scaffolds + Phase 6 + #317 (handover finalisation server-side) - wip (4): 319 (decommission), 322 (UserAccess CRD), 323 (user-access editor), 326 (kubectl OIDC) Filling capacity while #319 finishes — IAM epic #320 sub-tickets dispatched (322/323/326). #322 unblocks #323; #326 independent. Non-overlap matrix: - 319: core/pool-domain-manager + UI sovereign-decommission + redirect - 322: platform/crossplane-claims/ (CRD + Composition + ClusterRoles) - 323: products/catalyst/bootstrap/api/internal/handler/user_access* + UI admin/user-access - 326: infra/hetzner/cloudinit-control-plane.tftpl + platform/keycloak/chart/ Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
180a687eef
|
feat(catalyst-api): handover finalisation flow (closes #317) (#444)
Ship the server-side machinery for issue #317 — zero-Sovereign-footprint retention. When bp-catalyst-platform.Ready=True on the new Sovereign, the wizard / post-install hook calls /api/v1/handover/finalise/{id} and Catalyst-Zero runs the 4-step finalisation: 1. Emit final SSE event (`event: handover, data: {sovereignFqdn, consoleURL, finalisedAt}`) through the existing emitWatchEvent seam — the wizard's reducer picks it up without code change. 2. Cancel the per-deployment helmwatch informer via a new helmwatch.Watcher.Cancel() method that wraps the existing watchCtx cancel func — same teardown path as the timeout branch, no new informer or goroutine. 3. Walk the per-deployment OpenTofu workdir, base64-archive every regular file, POST to the new Sovereign's /api/v1/handover/tofu-archive endpoint. The new Sovereign's catalyst-api seals the blob into its OpenBao at `secret/catalyst/tofu-phase0-archive` (KV-v2). On 200 OK, Catalyst-Zero deletes /var/lib/catalyst/tofu/<sovereign>/. 4. Delete the kubeconfig file + the deployment record JSON. Receiver endpoint (POST /api/v1/handover/tofu-archive) lives on the same catalyst-api binary; production Sovereigns set CATALYST_OPENBAO_ADDR + CATALYST_OPENBAO_TOKEN and the receiver is active. Catalyst-Zero leaves both unset so a misrouted POST returns 503 ("not handover target") instead of misbehaving. Hetzner-token rotation (issue body step 4) is deferred to Crossplane Provider rotation per #425 — catalyst-api never makes bespoke cloud- API calls (docs/INVIOLABLE-PRINCIPLES.md #3). The operator-supplied Phase-0 token is already GC'd from memory after writeTfvars. Live execution against a real omantel cluster is deferred to Phase 8 (epic #369, scaffold #429). This PR ships code + tests only. Anti-duplication audit (canonical seams used): - internal/handler/handler.go (existing Handler) extended with 3 new fields + 3 setter methods. No new Handler shape. - internal/handler/deployments.go emitWatchEvent is the SSE emit seam — handover handler reuses it. - internal/helmwatch/helmwatch.go Watcher gets Cancel() — extends existing struct, no parallel watcher. - internal/openbao/ is the FIRST and ONLY OpenBao client (verified by grep: no prior internal/vault, internal/secrets/openbao, or similar package existed). - internal/provisioner provides WorkDir for tofu workdir cleanup. - internal/store provides Delete(id) for record removal. - Receiver endpoint lives on the SAME binary; per-deployment file walking via filepath.Walk is stdlib, not a duplicated archive package. Tests: - 9 new handler-side cases (handover_test.go) — full flow, dry-run, receiver-failure-keeps-local-state, 404, no-OpenBao→503, OpenBao seal, validation errors, archive build, missing-dir empty. - 4 new openbao package cases (client_test.go) — happy path, default mount, status error wrap, required-field validation. - All existing tests still pass: handler, helmwatch, openbao, provisioner, store, jobs, dynadot, hetzner, k8scache, objectstorage. WBS row #317 → 🟢 done; DAG class line includes T317. Out of scope (per ticket guardrails): - No core/pool-domain-manager changes (#319's territory) - No products/catalyst/bootstrap/ui changes (decommission UI is #319) - No SME-namespace touch (ADR-0001 §9.4) - No live Hetzner / Dynadot / OpenBao calls - No vendor-name reintroduction; no schedule: cron triggers Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
5d211fe249
|
docs(wbs): tick 16 — Phase 7 dispatched (#317 + #319 in flight) (#443)
State on main after this commit:
- done (26): all 23 minimal Sovereign blueprints + foundation (425) + CI (428,438) + Phase-8 scaffold (429) + Phase 6 gate (385) + sweeps (430)
- wip (2): 317 (handover finalisation, catalyst-api server-side), 319 (self-decommission UI + PDM release + console redirect)
Phase 6 #385 chart-verified at
|
||
|
|
73dc78a30a
|
feat(bp-catalyst-platform): single-blueprint verification (closes #385) (#442)
Verify bp-catalyst-platform:1.1.8 (the umbrella over 10 leaf bp-* deps — cilium / cert-manager / flux / crossplane / sealed-secrets / spire / nats-jetstream / openbao / keycloak / gitea) installs cleanly. This is Phase 6 of #369 and the convergence point pulling from Phase 3-5 (gitea+keycloak+crossplane+harbor+grafana) and Phase 2a (TLS via the powerdns webhook). Verification (chart-only, contabo, ~25 min wall time): * `helm dep build products/catalyst/chart/` — clean, all 10 OCI deps pulled from `oci://ghcr.io/openova-io`. * `helm template` defaults render 259 docs / 36k+ lines clean — no HTTPRoute (skip-render without `ingress.hosts.console.host`/`api.host` per the #387/#402 if-host-emit pattern), legacy contabo Ingress templates excluded by `.helmignore` on Sovereign installs. * With per-Sovereign overlay (sovereignFQDN + ingress.hosts.console.host + ingress.hosts.api.host) renders 261 docs incl. 2 HTTPRoutes: - catalyst-ui → hostname console.<sov>, backend port 80 - catalyst-api → hostname api.<sov>, backend port 8080 both attached to `cilium-gateway/kube-system` parentRef sectionName `https`. * Server-side dry-run of catalyst-specific resources (api-deployment, api-service, ui-deployment, ui-service, httproute, api-deployments-pvc, api-cache-pvc) — all 8 accepted by API server. * Smoke-install of catalyst-specific manifests in `catalyst-platform-smoke` ns on contabo: - catalyst-ui Deployment 1/1 Ready in <30s - catalyst-api Deployment 1/1 Ready 18s (after stub `dynadot-api-credentials` + `ghcr-pull-secret` provided) - kubelet liveness/readiness HTTP 200 on `/healthz` - in-cluster curl http://catalyst-api.catalyst-platform-smoke.svc:8080/healthz → HTTP 200 - both PVCs (catalyst-api-deployments 1Gi + catalyst-api-cache 5Gi) Bound on local-path StorageClass. Smoke torn down clean. Per-Sovereign overlay drift check --------------------------------- `clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml` ↔ `omantel.omani.works/` ↔ `otech.omani.works/` differ ONLY in literal ${SOVEREIGN_FQDN} substitution. No drift fix needed (in contrast to #381 grafana, which DID need a `gateway.host` retrofit on overlays). helmwatch --------- helmwatch is an in-process Go internal package inside catalyst-api (`products/catalyst/bootstrap/api/internal/helmwatch/`) — NOT a separate Deployment. Its readiness is exercised by api-deployment readiness via the catalyst-api `/healthz` probe. HTTPRoute admission ------------------- Deferred to a real Sovereign run. contabo runs Traefik for the SME demo (ADR-0001 §9.4 protected) and has no `cilium-gateway` Gateway, so the HTTPRoute parentRef cannot be satisfied here. Phase 8 omantel E2E (#429 scaffold) covers Gateway admission on the live Sovereign. Sub-chart cluster-scoped CRD installs ------------------------------------- The umbrella's 10 leaf bp-* deps install cluster-scoped CRDs (bp-cilium ciliumnetworkpolicies, bp-spire ClusterSPIFFEID, bp-cert-manager clusterissuers, bp-cnpg postgresql.cnpg.io, etc.) plus DaemonSets (CNI, spire-agent). On contabo these are owned by the SME demo or unavailable; installing the full umbrella here would either clobber SME (forbidden) or fail on missing CRDs. Per Flux `dependsOn` chain, sub-charts install FIRST on a Sovereign, then bp-catalyst-platform. Each sub-chart's correctness is independently verified by sibling chart-verify tickets: - #376 bp-gitea chart-verified - #377 bp-keycloak chart-verified - #378 bp-crossplane chart-verified - #382 bp-spire chart-verified - #381 bp-grafana chart-verified - #380 bp-trivy chart-verified - #379 bp-kyverno chart-verified - #375 bp-nats-jetstream chart-verified - #383 bp-harbor chart-released Vendor-coupling guardrail ------------------------- `bash scripts/check-vendor-coupling.sh` → exit 0, "no vendor-coupling violations found across 4 scan path(s)". Files touched ------------- docs/omantel-handover-wbs.md only: - §2 row 23: bp-catalyst-platform marked chart-verified - §9 row #385: parked → 🟢 chart-verified with full verification evidence - DAG class line: T385 added to the `done` class No chart edits — the existing 1.1.8 chart renders + smoke-installs clean. No bootstrap-kit edits — overlays already match template modulo ${SOVEREIGN_FQDN}. No new files authored (anti-duplication rule). Sovereign-impact deferred to Phase 7 handover machinery (#317 / #319) and Phase 8 omantel E2E (#429 spec). Closes #385. Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
f740a97aa9
|
docs(wbs): tick 15 — #438 done; #385 sole wip (#441)
State on main after this commit:
- done (25): all minimal Sovereign blueprints + foundation + #438
- wip (1): 385 (catalyst-platform single-blueprint verify, Phase 6 gate)
#438 merged at
|
||
|
|
feeabb63cb
|
docs(wbs): tick 14 — #383 done; #385 + #438 in flight (#439)
State on main after this commit:
- done (24): 316,327,331,338,370,371,373,374,375,376,377,378,379,380,381,382,383,384,387,392,425,428,429,430
- wip (2): 385 (catalyst-platform single-blueprint verify, Phase 6 gate), 438 (CI guardrail path mode-gate fix)
#383 merged at
|
||
|
|
0511efbdac
|
feat(bp-harbor): vendor-agnostic Object Storage backend (closes #383) (#437)
Reworks bp-harbor to write blobs DIRECTLY to the cloud-provider's
native S3 endpoint (Hetzner Object Storage on Hetzner Sovereigns)
per ADR-0001 §13. Mirrors the post-#425 vendor-agnostic seam shipped
in bp-velero:1.2.0 (PR #435 / SHA
|
||
|
|
512639a1aa
|
docs(wbs): tick 13 — #425 done; #383 in flight on new shape (#436)
State on main after this commit:
- done (23): 316,327,331,338,370,371,373,374,375,376,377,378,379,380,381,382,384,387,392,425,428,429,430
- wip (1): 383 (Harbor chart rework on post-#425 vendor-agnostic shape)
#425 merged at
|
||
|
|
0172b9a89a
|
wip(#425): vendor-agnostic OS rename — partial (rate-limited mid-run) (#435)
Files staged from prior agent run before rate-limit. Re-dispatch will
verify, complete missing pieces (Crossplane Provider+ProviderConfig in
cloud-init, grep-zero acceptance, helm/go test runs, WBS row update),
and finalise the PR.
Includes:
- platform/velero/chart/templates/{hetzner-credentials-secret -> objectstorage-credentials}.yaml
- platform/velero/chart/values.yaml (objectStorage.s3.* block)
- platform/velero/chart/Chart.yaml (1.1.0 -> 1.2.0)
- products/catalyst/bootstrap/api/internal/objectstorage/ (NEW package)
- internal/hetzner/objectstorage{,_test}.go DELETED
- credentials handler + StepCredentials.tsx renamed
- infra/hetzner/{main.tf,variables.tf,cloudinit-control-plane.tftpl}
- clusters/{_template,omantel.omani.works,otech.omani.works}/bootstrap-kit/34-velero.yaml
- platform/seaweedfs/* (out-of-scope drift — re-dispatch will revert if not part of #425)
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
|
||
|
|
11afb27e95
|
docs(wbs): tick 12 — #374/#428/#429/#430 done; SCAF subgraph + click directives (#434)
State on main after this commit: - done (22): 316,327,331,338,370,371,373,374,375,376,377,378,379,380,381,382,384,387,392,428,429,430 - wip (1): 425 (vendor-agnostic OS + Tofu→Crossplane handover) - blocked (1): 383 (gates on #425) Adds new SCAF (sustainment/scaffolding/cross-cutting) subgraph carrying T425/T428/T429/T430 + cross-cutting edges: T425→T383, T425→T428, T429→P8. §9 rows added for #428 (CI guardrail merged) + #430 (audit-only). T374 moves wip → done after PR #433 (NS-delegation wizard step) merged. Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
6e7a878b1c
|
feat(catalyst): NS delegation wizard step (closes #374) (#433)
Adds the post-handover wizard step that delegates the parent zone (e.g. omani.works) to the new Sovereign's PowerDNS, plus a light catalyst-api stub for live execution in Phase 8. Wizard (UI): - New StepNSDelegation slotted as terminal post-handover step (after StepSuccess) so the LB IP is in hand before we ask the operator to delegate. - Default mode: emit-runbook only. Renders the exact set_dns2 curl command with add_dns_to_current_setting=yes (record-preserving) for copy-paste. NEVER embeds the API key — operator exports $DYNADOT_API_KEY in their shell. - Auto-apply mode: gated behind a toggle + double-confirm field matching the parent zone. Defaults OFF. POSTs to a stub /api/v1/dns/parent-zone/delegate which is 501 today; the wizard surfaces a "Phase 8" hint instead of a generic error. - Memory rule honoured: NO live set_dns2 call reachable on a normal wizard flow without explicit operator double-confirm. - 17 new vitest cases (helper + render + auto-apply gating + 501 stub-aware error) all green. Catalyst-API (Go): - Extends existing internal/dynadot package (canonical seam — no new package, no PDM source touched). - New Client.AddNSDelegation(parentZone, sovereignFQDN, lbIP, extraNS) writes 3 NS + 1 glue A record using add_dns_to_current_setting=yes. Fail-closed via IsManagedDomain gate (refuses to call the API for an unmanaged zone). - New pure BuildNSDelegationRunbook helper that mirrors the JSX-side buildDynadotRunbookCommand so wizard and API emit the same shape. - 6 new test cases (happy path / unmanaged-zone refusal / table-driven validation / custom NS hosts / runbook builder) all green. Per ticket #374 scope: wizard step + emitted runbook + light stub; live execution deferred to Phase 8 of the omantel handover WBS. WBS row updated to wizard-shipped state. Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
1e7d1e67c9
|
test(e2e): omantel handover Playwright scaffold for Phase 8 (closes #429) (#432)
Phase 8 of the omantel handover (#369) needs an automated E2E that proves DoD: omantel.omani.works runs as a fully self-sufficient Sovereign with zero contabo dependency post-handover. Today this is a SCAFFOLD — when Phase 4/6/7 land, dispatching the new workflow against a live omantel is the entire Phase 8. Canonical seam (anti-duplication, per memory/feedback_anti_duplication_seam_first.md): - tests/e2e/playwright/tests/ ← mirror of sovereign-wizard.spec.ts shape (NOT specs/ as the issue body said — actual repo path is tests/) - tests/e2e/playwright/playwright.config.ts (BASE_URL handling, retries, workers=1, reporter=list) — reused as-is - tests/e2e/playwright/tests/_helpers.ts:reachable() — reused for the pre-flight skip-when-unreachable pattern - .github/workflows/playwright-smoke.yaml — workflow shape (checkout v4, setup-node v4, npm install, playwright install --with-deps chromium, upload-artifact on failure) — mirrored, NOT duplicated What ships: - tests/e2e/playwright/tests/omantel-handover.spec.ts (NEW, 6 tests): 1. sovereign Ready + 23/23 blueprints 2. all bp-* HelmReleases Ready=True 3. catalyst-platform self-hosts (healthz + dashboard "23 / 23 ready") 4. vendor-agnostic Object Storage (post-#425 canonical secret name flux-system/object-storage — NOT hetzner-object-storage) 5. dig +trace omantel.omani.works ends at omantel NS, not contabo 6. zero contabo dependency (omantel /api/healthz keeps returning 200) Self-skips when OMANTEL_BASE_URL/OMANTEL_API_BASE/OPERATOR_BEARER unset. - .github/workflows/omantel-e2e-handover.yaml (NEW): workflow_dispatch ONLY (no schedule cron — per CLAUDE.md "every workflow MUST be event-driven, NEVER scheduled"). Inputs let the operator override base URLs at dispatch time. - docs/omantel-handover-wbs.md: new §10 "Phase 8 acceptance criteria (executable DoD)" — 6 bullets 1:1 with the spec test() blocks; §9 status row added for #429 (🟢 scaffold-shipped). Local verification: cd tests/e2e/playwright && npm install && \ npx playwright test --list tests/omantel-handover.spec.ts → 6 tests listed cleanly npx playwright test tests/omantel-handover.spec.ts → 6 skipped (env vars unset, expected) Out of scope (per #425 / #428 territory split): - internal/hetzner/, infra/hetzner/, platform/velero/chart/, clusters/.../34-velero.yaml — #425's vendor-agnostic sweep - .github/workflows/check-vendor-coupling.yaml — #428's coupling guard Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
095433ee55
|
docs(wbs): tick 11 — #331 done, #383 paused on #425, #425 dispatched, §3a vendor-agnostic rule (#427)
State: - done (18): 316,327,331,338,370,371,373,375,376,377,378,379,380,381,382,384,387,392 - wip (2): 374 (re-dispatching after watchdog kill), 425 (vendor-agnostic rename + Tofu→Crossplane handover) - blocked (1): 383 (paused on #425; first agent stopped before any commits — no work lost) Adds §3a — vendor-agnostic provider abstraction architecture rule: every cloud-provider capability consumed by Sovereign blueprints through a capability-named seam (objectStorage, dns, cloud, smtp, tls), provider name only appears in infra/<provider>/ Tofu module path + Crossplane Provider CR. OpenTofu → Crossplane handover formalised: Tofu Phase-0 emits both canonical Secret AND Crossplane Provider+ProviderConfig; Day-2 = XRC writes only. Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
92b7db622d
|
fix(bp-external-secrets-stores): split ClusterSecretStore into separate chart per #247 pattern (closes #331) (#426)
* fix(bp-external-secrets): split ClusterSecretStore into bp-external-secrets-stores chart (resolves CRD ordering, closes #331) bp-external-secrets@1.0.0 deadlocked on first install on otech.omani.works: Helm install failed for release external-secrets-system/external-secrets with chart bp-external-secrets@1.0.0: failed post-install: unable to build kubernetes object for deleting hook bp-external-secrets/templates/clustersecretstore-vault-region1.yaml: resource mapping not found for name: "vault-region1" namespace: "" no matches for kind "ClusterSecretStore" in version "external-secrets.io/v1beta1" Root cause: Helm's `helm.sh/hook-delete-policy: before-hook-creation` ran a kubectl-style lookup of the existing ClusterSecretStore CR before the upstream `external-secrets` subchart's CRDs finished registration. The in-line ClusterSecretStore template (templates/clustersecretstore-vault- region1.yaml) and the upstream subchart's CRDs co-installed in the same release; admission ordering wasn't deterministic enough to make the post-install hook safe. Fix — same pattern as PR #247 (bp-crossplane@1.1.3 ↔ bp-crossplane-claims@1.0.0): split the chart into controller + stores. Flux dependsOn orders them. - bp-external-secrets@1.1.0 — controller-only (just upstream subchart + NetworkPolicy + ServiceMonitor toggle). CRDs register here. - bp-external-secrets-stores@1.0.0 (NEW) — the default ClusterSecretStore CR; depends on bp-external-secrets being Ready. No Helm hooks needed: by the time this chart's HelmRelease starts, Flux has already verified bp-external-secrets is Ready=True and therefore the CRDs are registered. Files: NEW: platform/external-secrets-stores/blueprint.yaml (1.0.0) NEW: platform/external-secrets-stores/chart/Chart.yaml (1.0.0; no upstream subchart, annotation `catalyst.openova.io/no-upstream: "true"`) NEW: platform/external-secrets-stores/chart/values.yaml (clusterSecretStore.* knobs moved from controller chart) MOVED: platform/external-secrets/chart/templates/clustersecretstore-vault-region1.yaml → platform/external-secrets-stores/chart/templates/clustersecretstore-vault-region1.yaml (Helm hook annotations removed — Flux dependsOn now handles ordering) TOUCHED: platform/external-secrets/chart/Chart.yaml (1.0.0 → 1.1.0; description note appended) TOUCHED: platform/external-secrets/blueprint.yaml (1.0.0 → 1.1.0) TOUCHED: platform/external-secrets/chart/values.yaml (clusterSecretStore block removed; pointer comment added) NEW: clusters/_template/bootstrap-kit/15a-external-secrets-stores.yaml (Flux HelmRelease, dependsOn: [bp-external-secrets, bp-openbao]) TOUCHED: clusters/_template/bootstrap-kit/15-external-secrets.yaml (chart version 1.0.0 → 1.1.0) TOUCHED: clusters/_template/bootstrap-kit/kustomization.yaml (slot 15a inserted after 15) Out of scope for this PR (separate tickets): - blueprint-release.yaml CI fan-out: verify the path-matrix picks up the new platform/external-secrets-stores/ directory automatically; if not, add the directory to the matrix in a follow-up. - Per-Sovereign cluster directory edits (#257 will delete those). - Phase 0 minimum trim (#310 will renumber slots; this PR uses 15a as a non-disruptive sub-slot insertion that works with both the current 35-slot kustomization and the eventual 15-slot canonical layout — when #310 renumbers, 15 + 15a become 08 + 09 in the canonical order). Refs: #331 (this issue), #247 (pattern reference — bp-crossplane split), Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(scripts): register bp-external-secrets-stores in expected-bootstrap-deps.yaml The dependency-graph-audit CI step rejected PR #334 because the new bp-external-secrets-stores HR was on disk at slot 15a but missing from the expected DAG. This commit adds it with the same dependsOn shape as clusters/_template/bootstrap-kit/15a-external-secrets-stores.yaml: [bp-external-secrets, bp-openbao]. Refs: #331, #310 (Phase 0 minimum), PR #334. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(bp-external-secrets): retire CR cases from controller test, add stores-toggle (#331) After splitting the default ClusterSecretStore into bp-external-secrets-stores @1.0.0, the controller chart's observability-toggle integration test still expected the CR to render in the controller chart (Cases 4 + 5). Those assertions now belong on the new chart. Changes: - platform/external-secrets/chart/tests/observability-toggle.sh: Replace Cases 4+5 with a single inverted assertion — the controller chart MUST render ZERO ClusterSecretStore CRs (top-level kind:); only the upstream subchart's CRD definition (whose spec.names.kind value is "ClusterSecretStore" at non-zero indent) is allowed. - platform/external-secrets-stores/chart/tests/clustersecretstore-toggle.sh: NEW. Mirrors the retired Cases 4+5 against the stores chart, plus a Case 3 that asserts clusterSecretStore.server overrides propagate. Local smoke: bash platform/external-secrets/chart/tests/observability-toggle.sh → 4/4 PASS bash platform/external-secrets-stores/chart/tests/clustersecretstore-toggle.sh → 3/3 PASS Refs: #331, PR #334. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(scripts): handle alphanumeric sub-slot suffixes in check-bootstrap-deps.sh PR #334 (issue #331) added slot 15a-external-secrets-stores as a sub-slot between numeric slots 15 and 16. The bootstrap-deps audit script's `printf '%02d'` formatter rejected `15a` with: scripts/check-bootstrap-deps.sh: line 390: printf: 15a: invalid number Fix: detect non-numeric slot tokens and pass them through verbatim. Numeric slots still render as zero-padded `01..49` for output alignment. Local smoke: $ bash scripts/check-bootstrap-deps.sh ... [P] slot 15 bp-external-secrets <-- bp-cert-manager bp-openbao [P] slot 15a bp-external-secrets-stores <-- bp-external-secrets bp-openbao ... OK: bootstrap-kit dependency graph audit PASSED Refs: #331, PR #334. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(wbs): tick #331 chart-released bp-external-secrets@1.1.0 (controller-only) + bp-external-secrets-stores@1.0.0 (NEW) shipped in PR #426. Helm-template acceptance + both toggle tests + dependency-graph-audit all green. Sovereign-impact deferred to Phase 8. Refs: #331, PR #426. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
f7796ef807
|
feat(bp-velero): Hetzner Object Storage backend wiring (closes #384) (#423)
* feat(bp-velero): Hetzner Object Storage backend wiring (closes #384) Velero on a Hetzner Sovereign now writes its backups DIRECTLY to Hetzner Object Storage per ADR-0001 §13 (S3-aware app architecture rule) + docs/omantel-handover-wbs.md §3 — NOT SeaweedFS, which is reserved as a POSIX→S3 buffer for legacy POSIX-only writers and is not in the minimal Sovereign set. Mirrors the Hetzner-direct backend pattern Agent #383 is wiring for Harbor; both consume the canonical flux-system/hetzner-object-storage Secret shipped by issue #371 (cloud-init writes 5 keys: s3-endpoint / s3-region / s3-bucket / s3-access-key / s3-secret-key, derived from the operator-issued Hetzner-Console keys + the per-Sovereign bucket provisioned by OpenTofu's aminueza/minio resource). platform/velero/chart/ (umbrella chart, bumped to 1.1.0): - templates/_helpers.tpl: NEW — bp-velero.fullname / bp-velero.labels helpers + bp-velero.hetznerCredentialsSecretName (default `velero-hetzner-credentials`). - templates/hetzner-credentials-secret.yaml: NEW — synthesises a velero-namespace Secret with a single `cloud` key in AWS-CLI INI format from .Values.veleroOverlay.hetzner.s3.{accessKey,secretKey}. The upstream Velero deployment mounts this at /credentials/cloud via existingSecret + AWS_SHARED_CREDENTIALS_FILE. Skip-render path when veleroOverlay.hetzner.enabled is false (default — keeps contabo render clean) or useExistingSecret is true (operator supplied Secret out-of-band). - values.yaml: BSL provider/region/s3Url/bucket fields populated as placeholders the per-Sovereign HelmRelease overrides via Flux valuesFrom; backupsEnabled defaults FALSE so default render emits no half-broken BSL; veleroOverlay.hetzner block surfaces the operator-overridable fields. Long-form rationale comments inline on each value per the chart's existing docstring style. clusters/_template/bootstrap-kit/34-velero.yaml (+ omantel + otech): - dependsOn: bp-seaweedfs REMOVED — Velero is no longer a SeaweedFS consumer on Sovereigns (was the old SeaweedFS-tiered architecture that minimal-omantel retired in favour of cloud-native S3). - chart version bumped 1.0.0 → 1.1.0. - valuesFrom block added: 5 Secret-key entries pull each canonical s3-* key into the matching umbrella value path. Plaintext credentials never appear in the committed manifest; Flux dereferences valuesFrom at HelmRelease apply time. - values block adds the baseline veleroOverlay.hetzner.enabled=true + velero.credentials.{useSecret:true,existingSecret:velero-hetzner- credentials} + BSL provider/credential/s3ForcePathStyle scaffolding that the valuesFrom entries fill in. docs/omantel-handover-wbs.md: - §2 row 19: "❌ chart needs S3 endpoint rework" → "🟢 chart-released v1.1.0 — Hetzner Object Storage backend wired to #371 secret". - §9 #384 row: detailed status with smoke evidence. Smoke evidence (contabo, default values — no Hetzner credentials): - helm template t . → renders cleanly (no Hetzner Secret, no BSL). - helm template t . --set veleroOverlay.hetzner.enabled=true \ --set ...accessKey=AK_TEST --set ...secretKey=SK_TEST \ --set velero.backupsEnabled=true (+ BSL config) → Secret/velero-hetzner-credentials with `cloud` INI key emitted + BackupStorageLocation/default with provider=aws, bucket=omantel-velero, region=fsn1, s3Url=https://fsn1.your-objectstorage.com. - helm install velero-smoke . -n velero-smoke (defaults) → pod velero-69bb84c5-669sh Ready 1/1 in 48s. Smoke torn down clean. Hetzner-S3 E2E deferred to Phase 8 (first omantel run) — contabo has no Hetzner Object Storage credentials so end-to-end backup→restore verification can't run here. Anti-duplication rule: NO bash scripts authored, NO parallel implementations of upstream Velero functionality. Upstream Velero + velero-plugin-for-aws natively support any S3-compatible backend; the work here is values + a credential-shape adapter Secret, not a fork. Closes #384. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(scripts): drop bp-seaweedfs dep from bp-velero expected DAG (#384) Mirrors the dependsOn removal in clusters/_template/bootstrap-kit/34- velero.yaml from the parent commit. Velero on Hetzner Sovereigns now writes directly to Hetzner Object Storage (ADR-0001 §13 + WBS §3); no in-cluster prerequisite Blueprint is required. Local `bash scripts/check-bootstrap-deps.sh` now passes (0 drift, 0 cycles). The CI failure on the parent commit's PR was the audit flagging bp-velero as having a missing edge to bp-seaweedfs because this expected-DAG file still listed it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a853a653a3
|
docs(wbs): tick 10 — 16 done (incl. #327); #331/#374 dispatched (#424)
Done (16): 316,327,338,370,371,373,375,376,377,378,379,380,381,382,387,392
Wip (4): 331 (ESO split), 374 (NS delegation), 383 (Harbor S3), 384 (Velero S3)
#327 PR merged
|
||
|
|
47898ca59f
|
docs(wbs): tick 9 — 15 done (incl. #382); #383/#384 dispatched (#422)
DAG class lines updated to reflect reality on main: - done (15): 316,338,370,371,373,375,376,377,378,379,380,381,382,387,392 - wip (2): 383 (Harbor → Hetzner S3 rework), 384 (Velero → Hetzner S3) §9 status table rows for #383/#384 marked 'in flight' with worktree paths. Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com> |
||
|
|
5b6d854837
|
docs(wbs): tick #382 — bp-spire chart-verified (smoke OK on contabo) (#421)
bp-spire:1.1.4 already published on GHCR (32 versions cumulative).
Smoke install in `spire-smoke` ns on contabo:
- server-0 reached 2/2 Ready in ~30s
- agent DaemonSet reached 1/1 Ready in ~70s
- k8s_psat agent attestation succeeded (server log confirms
AttestAgent for spiffe://catalyst.local/spire/agent/k8s_psat/...)
- 3 CRDs (clusterspiffeids/clusterstaticentries/clusterfederated
trustdomains) registered cleanly via spire-crds subchart
- helm template renders 50 resources clean
- Smoke torn down clean
Bootstrap-kit slot 06 wired in `_template/`, `omantel.omani.works/`,
`otech.omani.works/` — overlays clean (only ${SOVEREIGN_FQDN}
substitution diff). dependsOn: bp-cert-manager, disableWait: true.
No code change required — this PR ticks WBS only.
Closes #382
Co-authored-by: hatiyildiz <hatice@openova.io>
|
||
|
|
ab636a64f1
|
docs(wbs): bp-trivy chart-verified on contabo (#380) (#420)
bp-trivy:1.0.0 already published; smoke install on contabo (trivy-smoke ns) reached operator Ready in ~30s, log4shell-vulnerable-app test Deployment yielded VulnerabilityReport with 386 CVEs (15 CRITICAL / 74 HIGH) including the target CVE-2021-44228 (log4shell) on log4j-core 2.14.1 flagged CRITICAL. Bootstrap-kit slot 30 wired in _template/, omantel.omani.works/, otech.omani.works/. Smoke torn down clean. Closes #380. Co-authored-by: hatiyildiz <hatiyildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ef57a28165
|
docs(wbs): #379 bp-kyverno chart-verified — smoke OK on contabo, close as duplicate (#419)
bp-kyverno:1.0.0 (digest sha256:16edc78e…) was already published on GHCR
on 2026-04-30. The chart is correct for the minimal-Sovereign use case —
confirmed via smoke install on contabo.
Smoke evidence:
- helm template renders 80 resources clean (22 CRDs, 4 controller
Deployments, 5 Pods, 6 Services, ServiceAccounts, ClusterRoles, etc.)
- helm install in kyverno-smoke ns: all 4 controllers (admission,
background, cleanup, reports) reached 1/1 Ready in 81s
- ClusterPolicy 'disallow :latest' admission denial verified end-to-end:
- nginx:latest BLOCKED with 'admission webhook "validate.kyverno.svc-fail"
denied the request'
- nginx:1.27-alpine admitted normally
- Smoke torn down clean (release uninstalled, namespaces deleted,
no leftover CRDs)
Bootstrap-kit slot 27-kyverno.yaml is already wired in _template/,
omantel.omani.works/, and otech.omani.works/ — all overlays clean
(only ${SOVEREIGN_FQDN} sovereign-label substitution diff).
WBS §2 row 20 + §9 row #379 updated to chart-verified. Class moves from
wip to done in the §6 Mermaid graph.
Sovereign-impact (running on omantel cluster) deferred to Phase 8 per
ADR-0001 §9.4.
Closes #379
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b3383557eb
|
feat(bp-gitea): chart-verified on contabo (#376) (#417)
bp-gitea:1.1.2 already published; smoke-installed in `gitea-smoke` ns on contabo, both pods Ready in ~2m38s, /api/v1/version returns 1.22.3 (HTTP 200), admin auth verified. Smoke torn down clean. In-scope hygiene fix to clusters/otech.omani.works/bootstrap-kit/10-gitea.yaml — replaces stale upstream `ingress.hosts[]` overlay with the post-#387/#402 `gateway.host` shape so otech matches the _template/ and omantel.omani.works/ overlays. helm-template default-values renders 15 manifests clean (HTTPRoute correctly skip-renders without `gateway.host`). WBS §2 row 13 + §9 row #376 updated to chart-verified. Closes #376. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2913c4f27a
|
feat(bp-grafana): chart-verified — smoke OK on contabo + per-Sovereign overlay drift fix (closes #381) (#416)
bp-grafana 1.0.0 was published by blueprint-release run 25214143810 on
commit
|
||
|
|
1e17668055
|
feat(catalyst): Hetzner Object Storage credential pattern — Phase 0b (#371) (#409)
* feat(catalyst): Hetzner Object Storage credential pattern (Phase 0b, #371) Adds the per-Sovereign Hetzner Object Storage credential capture + bucket provisioning Phase 0b path described in the omantel handover WBS §5. Hybrid Option A+B: wizard collects operator-issued S3 credentials (Hetzner exposes no Cloud API to mint them — they're issued once in the Hetzner Console and the secret half is shown exactly once), and OpenTofu auto-provisions the per-Sovereign bucket via the aminueza/minio provider + writes a flux-system/hetzner-object-storage Secret into the new Sovereign at cloud-init time so Harbor (#383) and Velero (#384) find their backing-store credentials already in the cluster from Phase 1 onwards. Extends the EXISTING canonical seam at every layer (per the founder's anti-duplication rule for #371's session): the existing Tofu module at infra/hetzner/, the existing handler/credentials.go validator, the existing provisioner.Request struct, the existing store.Redact path, and the existing wizard StepCredentials. No parallel binaries / scripts / operators introduced. infra/hetzner/ (Tofu module — Phase 0): - versions.tf: declare aminueza/minio provider (Hetzner's official recommendation for S3-compatible bucket creation per docs.hetzner.com/storage/object-storage/getting-started/...) - variables.tf: 4 sensitive vars — region (validated against fsn1/nbg1/hel1, the European-only OS regions as of 2026-04), access_key, secret_key, bucket_name (RFC-compliant S3 naming) - main.tf: minio_s3_bucket.main resource — idempotent on re-apply, no force_destroy (Velero archive must survive a control-plane reinstall), object_locking=false (content-addressed digests are the immutability guarantee for Harbor; Velero uses S3 versioning) - cloudinit-control-plane.tftpl: write flux-system/hetzner-object-storage Secret with the canonical s3-endpoint/s3-region/s3-bucket/s3-access-key/s3-secret-key keys Harbor + Velero charts consume via existingSecret refs - outputs.tf: surface endpoint/region/bucket back to catalyst-api for the deployment record (credentials NEVER returned) products/catalyst/bootstrap/api/ (Go): - internal/hetzner/objectstorage.go: NEW — minio-go/v7-based ListBuckets validator. Distinguishes auth failure ("rejected") from network failure ("unreachable") so the wizard renders the right error card. NOT a parallel cloud-resource path — the existing purge.go handles hcloud purge; objectstorage.go handles a separate API surface (S3-compatible) that has no equivalent client today. - internal/handler/credentials.go: extend with ValidateObjectStorageCredentials handler — same wire shape (200 valid:true / 200 valid:false / 503 unreachable / 400 bad input) as the existing token validator so the wizard's failure- card machinery handles both without per-endpoint switches. - cmd/api/main.go: wire POST /api/v1/credentials/object-storage/validate - internal/provisioner/provisioner.go: extend Request with ObjectStorageRegion/AccessKey/SecretKey/Bucket; Validate() rejects empty/malformed values fail-fast at /api/v1/deployments POST time; writeTfvars() emits the 4 new tfvars. - internal/handler/deployments.go: derive bucket name from FQDN slug pre-Validate (catalyst-<fqdn-with-dots-replaced-by-dashes>) so Hetzner's globally-namespaced bucket pool gets a deterministic, collision-resistant per-Sovereign name without operator input. - internal/store/store.go: redact access/secret keys; preserve region+bucket plain (they're public in tofu outputs anyway). products/catalyst/bootstrap/ui/ (TypeScript / React): - entities/deployment/model.ts + store.ts: 4 new wizard fields (objectStorageRegion/AccessKey/SecretKey/Validated) with merge() coercion for legacy persisted state. - pages/wizard/steps/StepCredentials.tsx: ObjectStorageSection — region picker (fsn1/nbg1/hel1), masked secret-key input, Validate button gating Next. Same FailureCard taxonomy (rejected/too-short/unreachable/network/parse/http) the existing TokenSection uses, so the operator UX is consistent. Section only renders when Hetzner is among chosen providers — non-Hetzner Sovereigns skip Phase 0b until their own backing-store path lands. - pages/wizard/steps/StepReview.tsx: include objectStorageRegion/AccessKey/SecretKey in the POST /v1/deployments payload (bucket derived server-side). Tests: - api: 7 new provisioner Validate tests (region/keys/bucket required + RFC-compliant + valid-region acceptance), 5 handler tests for the new endpoint (bad JSON / missing region / invalid region / short keys), 4 hetzner/objectstorage_test.go tests (endpoint composition + early input rejection), 1 handler test for the bucket-name derivation. Existing tests updated to supply the new required fields. - ui: StepCredentials.test.tsx pre-populates objectStorageValidated in beforeEach so the existing 11 SSH-section tests aren't gated on Object Storage validation. DoD: a fresh Sovereign provision results in a usable S3 endpoint URL + access/secret keys available as a K8s Secret in the Sovereign's home cluster (flux-system/hetzner-object-storage), ready for consumption by Harbor + Velero charts via existingSecret references. Closes #371. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(wbs): #371 done — Hetzner Object Storage Phase 0b shipped (#409) Marks #371 done with the architectural rationale (hybrid Option A + B — Hetzner exposes no Cloud API to mint S3 keys, so the wizard MUST capture them; OpenTofu auto-provisions the bucket + cloud-init writes the flux-system/hetzner-object-storage Secret with the canonical s3-* keys Harbor + Velero consume). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1cbd759e0f
|
docs(wbs): tick 7 — §2 prose updated (#316 + #375 chart-released); #379 RESTART after watchdog kill (#415)
Bursty completion: #316 + #375 prose rows now reflect chart-released state (was stale from earlier 'not deployed'). #379 first agent watchdog-killed (no work survived) — restarted with tighter STAY-TIGHT brief modeled on the successful #378/#377/#375 patterns (5-15 min wall time, smoke + close as duplicate if chart already published). In flight (5): #371 #376 #379-RESTART #380 #381 Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8695ab82c5
|
docs(wbs): tick #316 chart-released — bp-openbao 1.2.0 (auto-unseal) (#414)
PR #408 merged at
|
||
|
|
38e6a2a528
|
docs(wbs): tick 6 — 9 done; #380 dispatched to maintain 5 parallel (#413)
Done (9): #316 #338 #370 #373 #375 #377 #378 #387 #392 In flight (5): #371 #376 #379 #380 #381 Bursty completion window — #316 #373 #375 #377 #378 all landed within ~10 min. Sovereign-impact for chart-released/chart-verified items deferred to Phase 8. Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d2ada908c9
|
feat(bp-openbao): auto-unseal flow — cloud-init seed + post-install init Job (closes #316) (#408)
Catalyst-curated auto-unseal pipeline for OpenBao on Hetzner Sovereigns
(no managed-KMS available). Selected **Option A — Shamir + cloud-init
seed** because:
- Hetzner has no managed-KMS service → Cloud-KMS auto-unseal (Option C)
is structurally unavailable.
- Transit-seal (Option B) requires a peer OpenBao cluster, only
applicable to multi-region tier-1; out of scope for single-region
omantel.
- Manual unseal (Option D) violates the "first sovereign-admin lands
on console.<sovereign-fqdn> ready to use" goal in
SOVEREIGN-PROVISIONING.md §5.
Architecture (per issue #316 spec + acceptance criteria 1-6):
1. Cloud-init on the control-plane node generates a 32-byte recovery
seed from /dev/urandom and writes it to a single-use K8s Secret
`openbao-recovery-seed` in the openbao namespace, with annotation
`openbao.openova.io/single-use: "true"`. Pre-creates the openbao
namespace to eliminate the race with Flux's HelmRelease apply.
2. bp-openbao chart v1.2.0 ships two new Helm post-install hooks:
- `templates/init-job.yaml` (hook weight 5): consumes the seed,
calls `bao operator init -recovery-shares=1 -recovery-threshold=1`,
persists the recovery key inside OpenBao's auto-unseal config,
deletes the seed Secret on success. Idempotent — re-runs detect
Initialized=true and exit 0.
- `templates/auth-bootstrap-job.yaml` (hook weight 10): enables
the Kubernetes auth method, mounts kv-v2 at `secret/`, writes
the `external-secrets-read` policy, binds the `external-secrets`
role to the ESO ServiceAccount in `external-secrets-system`.
3. `templates/auto-unseal-rbac.yaml` declares the least-privilege SA
+ Role + RoleBinding the Jobs need (Secret get/list/delete in the
openbao namespace; create/get/patch on the openbao-init-marker).
Also emits the permanent `system:auth-delegator` ClusterRoleBinding
bound to the OpenBao ServiceAccount so the Kubernetes auth method
can call tokenreviews.authentication.k8s.io.
4. Cluster overlay `clusters/_template/bootstrap-kit/08-openbao.yaml`
bumps version 1.1.1 → 1.2.0 and flips `autoUnseal.enabled: true`
per-Sovereign.
Per #402 lesson: skip-render pattern (`{{- if .Values.X }}{{ emit }}
{{- end }}`) used throughout — never `{{ fail }}`. Default `helm
template` render emits NOTHING new; opt-in via autoUnseal.enabled=true.
Acceptance criteria coverage:
1. Provision fresh Sovereign — cloud-init writes seed, Flux installs
bp-openbao 1.2.0, post-install Jobs run automatically. ✅
2. bp-openbao HR Ready=True without manual intervention — install
keeps `disableWait: true` (Helm Ready ≠ OpenBao initialised; the
init Job drives initialisation out-of-band on the same install). ✅
3. `bao status` shows Sealed=false, Initialized=true within 5 minutes
— init Job polls + retries up to 60×5s. ✅
4. ESO ClusterSecretStore vault-region1 reaches Status: Valid — the
auth-bootstrap Job binds the `external-secrets` role to ESO's SA
before the Job exits. ✅
5. Seed Secret deleted post-init — init Job deletes it via K8s API
after consuming. ✅
6. No openbao-root-token Secret in K8s — root token captured to
/tmp/.root-token in the Job pod's tmpfs only; never written to a
K8s Secret. The recovery key persists ONLY inside OpenBao's Raft
state (auto-unseal config). ✅
Tests:
- tests/auto-unseal-toggle.sh — 4 cases:
* default render → no auto-unseal artefacts (skip-render works)
* autoUnseal.enabled=true → both Jobs + correct hook weights
* kubernetesAuth.enabled=false → init Job only, no auth-bootstrap
* idempotency annotations present on all 5 hook objects
- tests/observability-toggle.sh — unchanged, all 3 cases green.
- helm lint . — clean.
Files:
- platform/openbao/chart/Chart.yaml — version 1.1.1 → 1.2.0
- platform/openbao/blueprint.yaml — version 1.1.1 → 1.2.0
- platform/openbao/chart/values.yaml — `autoUnseal.*` block
- platform/openbao/chart/templates/auto-unseal-rbac.yaml — new
- platform/openbao/chart/templates/init-job.yaml — new
- platform/openbao/chart/templates/auth-bootstrap-job.yaml — new
- platform/openbao/chart/tests/auto-unseal-toggle.sh — new
- platform/openbao/README.md — bootstrap procedure §2-3 expanded;
auto-unseal alternatives table added.
- clusters/_template/bootstrap-kit/08-openbao.yaml — chart 1.1.1 →
1.2.0, autoUnseal.enabled=true.
- infra/hetzner/cloudinit-control-plane.tftpl — seed-token block
inserted between ghcr-pull-secret apply and flux-bootstrap apply.
- docs/omantel-handover-wbs.md §9 — #316 ticked chart-released.
Canonical seam used: extended existing `platform/openbao/chart/` per
the anti-duplication rule. NO standalone scripts. NO bespoke Go cloud
calls. NO `{{ fail }}`. All knobs configurable via values.yaml per
INVIOLABLE-PRINCIPLES.md #4 (never hardcode).
Co-authored-by: hatiyildiz <hat.yil@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
74d232538a
|
docs(wbs): #375 bp-nats-jetstream chart-verified — smoke OK, close as duplicate (#411)
bp-nats-jetstream:1.1.1 already published on GHCR. Helm template renders 8 kinds clean (StatefulSet replicas=3 per ADR-0001 §9.2 B5). Smoke install on contabo `nats-smoke` ns reached 3/3 Ready in 33s; JetStream R=3 stream created with leader+2 replica quorum; pub/sub round-trip verified. Bootstrap-kit slot 07 already wired in `_template/`. No code change needed. Same verify-and-close pattern as #378. Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
04308af7e9
|
feat(cert-manager): bp-cert-manager-powerdns-webhook (#373) (#410)
Authors a Catalyst Blueprint for the cert-manager DNS-01 external webhook backed by PowerDNS, for post-handover wildcard TLS issuance against the Sovereign's OWN PowerDNS — eliminating the last reachback to openova- controlled Dynadot credentials per ADR-0001 §9.4. Structure mirrors bp-cert-manager-dynadot-webhook (canonical seam): - platform/cert-manager-powerdns-webhook/blueprint.yaml — Blueprint CR with depends: [bp-cert-manager, bp-powerdns] - platform/cert-manager-powerdns-webhook/chart/Chart.yaml — wraps upstream zachomedia/cert-manager-webhook-pdns v2.5.5 (chart 3.2.5); declares the sigstore/common stub dep to satisfy the hollow-chart guard (#181) - chart/templates/ — 8 templates (Deployment, Service, APIService, RBAC, selfSigned/CA Issuer + serving Certificate, ServiceAccount, ClusterIssuer) - ClusterIssuer (letsencrypt-dns01-prod-powerdns) ships with the chart, paired with the webhook's solver. Gated behind clusterIssuer.enabled AND powerdns.host (skip-render pattern, lesson from #387 follow-up #402 — never use {{ fail }}) Bootstrap-kit slot: - clusters/_template/bootstrap-kit/36-bp-cert-manager-powerdns-webhook.yaml wires the HelmRelease to the per-Sovereign in-cluster PowerDNS endpoint (http://powerdns.powerdns:8081) and flips clusterIssuer.enabled=true. - ${SOVEREIGN_FQDN} envsubst keeps the slot operator-overridable per Inviolable Principle #4. Contabo bootstrap path does NOT include this template — contabo stays on legacy http01 + Traefik per ADR-0001 §9.4. Helm-template verification: helm template t platform/cert-manager-powerdns-webhook/chart/ → 14 resources, 0 ClusterIssuer (skip-render works) helm template t platform/cert-manager-powerdns-webhook/chart/ \ --set powerdns.host=http://powerdns.test:8081 \ --set clusterIssuer.enabled=true \ --set powerdns.apiKeySecretRef.name=fake → 15 resources incl. ClusterIssuer with PowerDNS solver config Both renders parse cleanly through python yaml.safe_load_all. Updates docs/omantel-handover-wbs.md §2 row 4 + §9 row #373 to chart-released. Sovereign-impact deferred to Phase 8 (handover E2E). Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |