openova

Author	SHA1	Message	Date
e3mrah	8cde771c0f	fix(bp-openbao): unseal on idempotent path + persist keys (Closes #539 ) (#540 ) PR #528 added unseal logic but only on the FRESH-init branch. When a previous Job pod completed `bao operator init` but exited before the unseal block (or when openbao-0 simply restarts under shamir seal), the next reconcile takes the "already initialized" branch and exits without ever running `bao operator unseal`. Symptom on otech21: init-job logs end with `auto-unseal init complete`, but `bao status` reports Initialized=true Sealed=true forever, the bp-openbao HR stays Unknown/Running for the full 15m install timeout, and bp-external-secrets/bp-external-secrets-stores block on the dep. Fix has two parts: 1. Persist `unseal_keys_b64` on fresh init to a new K8s Secret `openbao-unseal-keys` (BEFORE applying the keys, so a unseal crash mid-step is recoverable on next retry). 2. Add a Step 2a "idempotent-path unseal" branch: when bao reports Initialized=true Sealed=true, fetch the persisted keys Secret and apply unseal exactly the same way Step 3a does on fresh init. Verify Sealed=false and exit; otherwise FATAL with the manual-recovery pointer. RBAC: extend the openbao-auto-unseal Role to allow create/get/ patch/update on openbao-unseal-keys (alongside openbao-init-marker). Chart bump 1.2.3 → 1.2.4. HR ref in clusters/_template/bootstrap-kit/08-openbao.yaml updated to match so cloud-init-templated Sovereigns pick up the new chart. Co-authored-by: e3mrah <emrah.baysal@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:44:46 +04:00
e3mrah	d90abb1e85	fix(bp-openbao): unseal vault after init in chart Job (Closes #527 ) (#528 ) The init Job ran `bao operator init -key-shares=1 -key-threshold=1` which leaves the cluster Initialized=true but Sealed=true. Without an explicit `bao operator unseal <key>` call the StatefulSet pod stays sealed forever, the bp-openbao HelmRelease never reports Ready=True, and every dependent blueprint (bp-external-secrets, bp-external-secrets-stores) blocks on this dep. This was the 5th and final latent bug in the chart's auto-unseal flow (after PRs #518 #520 #523 #524 #525). On otech17 (6b17518f12d529ea, 2026-05-02) the init Job completed cleanly but `bao status` reported Sealed=true forever. Fix: parse `unseal_threshold` and `unseal_keys_b64` from the init JSON, call `bao operator unseal <key>` $threshold times (1 with the current key-shares=1 / key-threshold=1 config), then assert `bao status -format=json \| grep '"sealed":false'` before the Job exits success. Bumps chart 1.2.2 -> 1.2.3 and HR ref in clusters/_template/bootstrap-kit/08-openbao.yaml. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:24:57 +04:00
e3mrah	ba5a1929f1	fix(bp-openbao): use shamir-compatible init flags + bump 1.2.1→1.2.2 (refs #517 ) (#525 ) The chart's init Job called `bao operator init -recovery-shares=1 -recovery-threshold=1` which only works with auto-unseal seal types (gcpckms/awskms/transit). The upstream openbao chart's default config uses `seal "shamir"` (no auto-unseal stanza in values.standalone.config / values.ha.config), so the OpenBao API returns 400: "parameters recovery_shares,recovery_threshold not applicable to seal type shamir". Switch to -key-shares=1 -key-threshold=1 which is the correct shamir- seal init flags. Operators wiring auto-unseal seals later will need to flip back via a chart-values toggle. Bumps chart 1.2.1→1.2.2 + matches HR ref so Sovereigns pull the new artifact on next reconcile. Refs #517 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:14:05 +04:00
e3mrah	6e3d3d281e	fix(bp-openbao): bump chart 1.2.0→1.2.1 + HR ref for busybox-wget fix (refs #517 ) (#524 ) Bumps platform/openbao/chart/Chart.yaml version to 1.2.1 carrying the busybox-compatible wget flag fix (PR #523). Also bumps the HR's chart.spec.version in clusters/_template/bootstrap-kit/08-openbao.yaml so Sovereigns pull the new bytes once blueprint-release publishes ghcr.io/openova-io/bp-openbao:1.2.1. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:09:06 +04:00
e3mrah	5c0618d920	fix(bp-openbao): use busybox-compatible wget flag in init Job (refs #517 ) (#523 ) The chart's init Job runs inside the openbao image (quay.io/openbao/ openbao:2.1.0) which uses busybox wget. The script's wget calls used `--ca-certificate=$CACERT` which busybox wget does not support, causing wget to print its usage page and fail with "seed Secret has no key recovery-seed" (false negative — the parsing pipeline saw the usage text instead of JSON). Replace with `--no-check-certificate`. The Secret still requires the Bearer token for auth — the lack of CA verification only affects TLS handshake validation against an in-cluster API server reached via the well-known kubernetes.default.svc DNS name (out-of-band attack surface is negligible inside the pod network). The `--method=DELETE` line for cleaning up the seed Secret remains — busybox wget doesn't support method override either, but that line is wrapped in `\|\| true` so the seed deletion failure doesn't block the init Job from succeeding. Seed is single-use anyway and harmless post-init (the recovery key is the OUTPUT of bao operator init, not this seed). Refs #517 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:07:52 +04:00
e3mrah	d2ada908c9	feat(bp-openbao): auto-unseal flow — cloud-init seed + post-install init Job (closes #316 ) (#408 ) Catalyst-curated auto-unseal pipeline for OpenBao on Hetzner Sovereigns (no managed-KMS available). Selected Option A — Shamir + cloud-init seed because: - Hetzner has no managed-KMS service → Cloud-KMS auto-unseal (Option C) is structurally unavailable. - Transit-seal (Option B) requires a peer OpenBao cluster, only applicable to multi-region tier-1; out of scope for single-region omantel. - Manual unseal (Option D) violates the "first sovereign-admin lands on console.<sovereign-fqdn> ready to use" goal in SOVEREIGN-PROVISIONING.md §5. Architecture (per issue #316 spec + acceptance criteria 1-6): 1. Cloud-init on the control-plane node generates a 32-byte recovery seed from /dev/urandom and writes it to a single-use K8s Secret `openbao-recovery-seed` in the openbao namespace, with annotation `openbao.openova.io/single-use: "true"`. Pre-creates the openbao namespace to eliminate the race with Flux's HelmRelease apply. 2. bp-openbao chart v1.2.0 ships two new Helm post-install hooks: - `templates/init-job.yaml` (hook weight 5): consumes the seed, calls `bao operator init -recovery-shares=1 -recovery-threshold=1`, persists the recovery key inside OpenBao's auto-unseal config, deletes the seed Secret on success. Idempotent — re-runs detect Initialized=true and exit 0. - `templates/auth-bootstrap-job.yaml` (hook weight 10): enables the Kubernetes auth method, mounts kv-v2 at `secret/`, writes the `external-secrets-read` policy, binds the `external-secrets` role to the ESO ServiceAccount in `external-secrets-system`. 3. `templates/auto-unseal-rbac.yaml` declares the least-privilege SA + Role + RoleBinding the Jobs need (Secret get/list/delete in the openbao namespace; create/get/patch on the openbao-init-marker). Also emits the permanent `system:auth-delegator` ClusterRoleBinding bound to the OpenBao ServiceAccount so the Kubernetes auth method can call tokenreviews.authentication.k8s.io. 4. Cluster overlay `clusters/_template/bootstrap-kit/08-openbao.yaml` bumps version 1.1.1 → 1.2.0 and flips `autoUnseal.enabled: true` per-Sovereign. Per #402 lesson: skip-render pattern (`{{- if .Values.X }}{{ emit }} {{- end }}`) used throughout — never `{{ fail }}`. Default `helm template` render emits NOTHING new; opt-in via autoUnseal.enabled=true. Acceptance criteria coverage: 1. Provision fresh Sovereign — cloud-init writes seed, Flux installs bp-openbao 1.2.0, post-install Jobs run automatically. ✅ 2. bp-openbao HR Ready=True without manual intervention — install keeps `disableWait: true` (Helm Ready ≠ OpenBao initialised; the init Job drives initialisation out-of-band on the same install). ✅ 3. `bao status` shows Sealed=false, Initialized=true within 5 minutes — init Job polls + retries up to 60×5s. ✅ 4. ESO ClusterSecretStore vault-region1 reaches Status: Valid — the auth-bootstrap Job binds the `external-secrets` role to ESO's SA before the Job exits. ✅ 5. Seed Secret deleted post-init — init Job deletes it via K8s API after consuming. ✅ 6. No openbao-root-token Secret in K8s — root token captured to /tmp/.root-token in the Job pod's tmpfs only; never written to a K8s Secret. The recovery key persists ONLY inside OpenBao's Raft state (auto-unseal config). ✅ Tests: - tests/auto-unseal-toggle.sh — 4 cases: * default render → no auto-unseal artefacts (skip-render works) * autoUnseal.enabled=true → both Jobs + correct hook weights * kubernetesAuth.enabled=false → init Job only, no auth-bootstrap * idempotency annotations present on all 5 hook objects - tests/observability-toggle.sh — unchanged, all 3 cases green. - helm lint . — clean. Files: - platform/openbao/chart/Chart.yaml — version 1.1.1 → 1.2.0 - platform/openbao/blueprint.yaml — version 1.1.1 → 1.2.0 - platform/openbao/chart/values.yaml — `autoUnseal.*` block - platform/openbao/chart/templates/auto-unseal-rbac.yaml — new - platform/openbao/chart/templates/init-job.yaml — new - platform/openbao/chart/templates/auth-bootstrap-job.yaml — new - platform/openbao/chart/tests/auto-unseal-toggle.sh — new - platform/openbao/README.md — bootstrap procedure §2-3 expanded; auto-unseal alternatives table added. - clusters/_template/bootstrap-kit/08-openbao.yaml — chart 1.1.1 → 1.2.0, autoUnseal.enabled=true. - infra/hetzner/cloudinit-control-plane.tftpl — seed-token block inserted between ghcr-pull-secret apply and flux-bootstrap apply. - docs/omantel-handover-wbs.md §9 — #316 ticked chart-released. Canonical seam used: extended existing `platform/openbao/chart/` per the anti-duplication rule. NO standalone scripts. NO bespoke Go cloud calls. NO `{{ fail }}`. All knobs configurable via values.yaml per INVIOLABLE-PRINCIPLES.md #4 (never hardcode). Co-authored-by: hatiyildiz <hat.yil@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:45:44 +04:00
e3mrah	a1bd550208	fix(charts): HTTPRoute templates skip-render on missing host (was failing default-values render) (#402 ) Blueprint-release for #401 failed because HTTPRoute templates use {{- fail }} when gateway.host is not set, which trips the chart default-values render gate in CI. Switched 6 templates from 'fail loud' to 'skip render': if .Values.gateway.host → emit HTTPRoute else → emit nothing The Gateway API admission already rejects HTTPRoute with empty hostnames, so the loud-fail wasn't buying anything an operator wouldn't see at apply time. Default-values render now produces zero HTTPRoute resources, which is the correct shape for the upstream chart consumers that don't set the Sovereign-only gateway block. Files: keycloak, gitea, openbao, grafana, harbor, catalyst-platform. Verified: helm template t products/catalyst/chart/ → 0 HTTPRoutes (clean) helm template t products/catalyst/chart/ --set ingress.gateway.enabled=true --set ingress.hosts.console.host=console.test --set ingress.hosts.api.host=api.test → 2 HTTPRoutes Closes the blueprint-release failure on commit `abf01b6f`. Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:23:58 +04:00
e3mrah	abf01b6f21	feat(platform): Gateway API migration audit (#387 ) (#401 ) Migrates every minimal-Sovereign-set blueprint chart from networking.k8s.io/v1.Ingress to gateway.networking.k8s.io/v1.HTTPRoute, replacing the legacy Traefik-on-Sovereigns assumption with the canonical Cilium + Envoy + Gateway API path per ADR-0001 §9.4 and the WBS §2 correction note (#388). The single per-Sovereign Gateway is added as additional documents in the existing bootstrap-kit slot clusters/_template/bootstrap-kit/01-cilium.yaml (NOT a new top-level slot), since Cilium owns the GatewayClass. It includes: - Certificate `sovereign-wildcard-tls` requesting `.${SOVEREIGN_FQDN}` from `letsencrypt-dns01-prod` (cert-manager + #373 webhook) - Gateway `cilium-gateway` in `kube-system` with HTTPS (443, TLS terminate) + HTTP (80) listeners, allowedRoutes.namespaces.from=All Per-blueprint HTTPRoute templates (canonical seam: each wrapper chart's existing `templates/` directory): \| Blueprint \| Host pattern \| Backend port \| \|---------------------\|---------------------------------\|--------------\| \| bp-keycloak \| auth.<sov> \| 80 \| \| bp-gitea \| git.<sov> \| 3000 \| \| bp-openbao \| bao.<sov> \| 8200 \| \| bp-grafana \| grafana.<sov> \| 80 \| \| bp-harbor \| registry.<sov> \| 80 \| \| bp-powerdns \| pdns.<sov>/api (dual-mode) \| 8081 \| \| bp-catalyst-platform\| console.<sov>, api.<sov> \| 80, 8080 \| bp-powerdns supports both Ingress (contabo legacy) and HTTPRoute (Sovereign) simultaneously — the per-Sovereign overlay sets `api.gateway.enabled=true` while leaving `api.enabled=true`. The Ingress object is harmless on Cilium clusters with no Traefik. This preserves contabo's existing pdns.openova.io flow per ADR-0001 §9.4. bp-harbor flips `expose.type` from `ingress` to `clusterIP` in platform/harbor/chart/values.yaml so the upstream chart no longer emits its own Ingress; the HTTPRoute is the sole HTTP exposure. TLS terminates at the Gateway (wildcard cert) rather than per-host Certificates inside the chart. bp-catalyst-platform's `templates/httproute.yaml` is NOT excluded by .helmignore (unlike templates/ingress.yaml + templates/ingress-console-tls.yaml, which remain contabo-only legacy demo infra). The contabo path keeps serving console.openova.io/sovereign via Traefik unchanged. Bootstrap-kit slot updates (per-Sovereign hostname interpolation): - 08-openbao.yaml → gateway.host: bao.${SOVEREIGN_FQDN} - 09-keycloak.yaml → gateway.host: auth.${SOVEREIGN_FQDN} - 10-gitea.yaml → gateway.host: gitea.${SOVEREIGN_FQDN} - 11-powerdns.yaml → api.host: pdns.${SOVEREIGN_FQDN}, api.gateway.enabled: true - 19-harbor.yaml → gateway.host: registry.${SOVEREIGN_FQDN} - 25-grafana.yaml → gateway.host: grafana.${SOVEREIGN_FQDN} Server-side dry-run validation against the live Cilium Gateway API CRDs on contabo: every HTTPRoute and the per-Sovereign Gateway + Certificate apply cleanly via `kubectl apply --dry-run=server`. Contabo unaffected: clusters/contabo-mkt/ not modified. The legacy SME ingresses (console-nova, marketplace, admin, axon, talentmesh, stalwart, ...) continue to serve via Traefik as before. powerdns on contabo remains on the Ingress path (api.gateway.enabled defaults to false at the chart level). Closes #387. Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:19:30 +04:00
e3mrah	1f5c76def1	fix(platform): sync blueprint.yaml versions with Chart.yaml (#199 ) * feat(ui): Playwright cosmetic + step-flow regression guards 15 regression guards in products/catalyst/bootstrap/ui/e2e/cosmetic- guards.spec.ts that fail HARD when each user-flagged defect class returns: 1. card height drift from canonical 108px 2. reserved right padding eating description width 3. logo tile drift from per-brand LOGO_SURFACE 4. invisible glyph (white-on-white) via luminance proxy 5. wizard step order Org/Topology/Provider/Credentials/Components/ Domain/Review 6. legacy "Choose Your Stack" / "Always Included" tab labels 7. Domain step reachable before Components 8. CPX32 not the recommended Hetzner SKU 9. per-region SKU dropdown shows wrong provider catalog 10. provision page is .html (static) not SPA route 11. legacy bubble/edge DAG SVG markup on provision page 12. admin sidebar drift from canonical core/console (w-56 + 7 labels) 13. AppDetail uses tablist instead of sectioned layout 14. job rows navigate to /job/<id> instead of expand-in-place 15. Phase 0 banners (Hetzner infra / Cluster bootstrap) on AdminPage Each test prints a failure message naming the canonical reference, the source-of-truth file, and the data-testid PR needed (if any) so the implementing agent has a precise target. No .skip() — per INVIOLABLE-PRINCIPLES #2, missing components fail loud. CI: .github/workflows/cosmetic-guards.yaml runs the suite on every PR that touches products/catalyst/bootstrap/ui/ or core/console/. Docs: docs/UI-REGRESSION-GUARDS.md maps each test to the user's original complaint, the canonical reference, and the green/red semantics (5 tests intentionally RED on main today — they stay red until the companion-agent's UI work lands). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(platform): sync blueprint.yaml versions with Chart.yaml so manifest-validation passes --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 22:07:55 +04:00
hatiyildiz	1ddd569789	fix(bp-): observability toggles default false — break circular CRD dependency Extends the v1.1.1 hardening that started with cilium / cert-manager / crossplane to the remaining 8 bootstrap-kit + per-Sovereign Blueprints. Every observability toggle in every Catalyst-curated Blueprint now ships `false`/`null` by default; the operator opts in via a per-cluster values overlay at clusters/<sovereign>/bootstrap-kit/ once bp-kube-prometheus-stack reconciles. Live failure mode that prompted this (omantel.omani.works 2026-04-29): bp-cilium @ 1.1.0 defaulted hubble.relay/ui + prometheus.serviceMonitor to true. The upstream Cilium 1.16.5 chart renders a monitoring.coreos.com/v1 ServiceMonitor whose CRD ships with kube-prometheus-stack — a tier-2 Application Blueprint that depends on the bootstrap-kit (cilium first). Helm install fails on a fresh Sovereign with "no matches for kind ServiceMonitor in version monitoring.coreos.com/v1 — ensure CRDs are installed first" and every downstream HelmRelease reports `dep is not ready`. The earlier trustCRDsExist=true mitigation only suppresses Helm's render-time gate; the apiserver still rejects the resource at install-time. Per-Blueprint changes: - bp-cilium: hubble.relay.enabled, hubble.ui.enabled → false; hubble.metrics.enabled → null (this is the exact value that disables the upstream metrics ServiceMonitor template branch — verified by reading cilium 1.16.5's _hubble.tpl); hubble.metrics.serviceMonitor .enabled → false. tests/observability-toggle.sh extended with Case 4 (default render produces no hubble-relay / hubble-ui Deployments). - bp-flux: flux2.prometheus.podMonitor.create → false. - bp-sealed-secrets: sealed-secrets.metrics.serviceMonitor.enabled → false (explicit lock; upstream already defaults false). - bp-spire: spire.global.spire.recommendations.enabled + recommendations.prometheus → false. - bp-nats-jetstream: nats.promExporter.enabled + promExporter.podMonitor.enabled → false. - bp-openbao: openbao.injector.metrics.enabled + openbao.serviceMonitor.enabled → false. - bp-keycloak: keycloak.metrics.enabled + metrics.serviceMonitor.enabled + metrics.prometheusRule.enabled → false. - bp-gitea: gitea.gitea.metrics.* and gitea.postgresql.metrics.* serviceMonitor + prometheusRule → false. - bp-powerdns: powerdns.serviceMonitor.enabled + powerdns.metrics.enabled → false (forward-compatibility guard; current upstream pschichtel/powerdns 0.10.0 has no ServiceMonitor template, but a future upstream bump cannot silently regress). Each chart ships a tests/observability-toggle.sh that asserts the rule in three cases (default off / explicit on opt-in / explicit off) — runs under blueprint-release.yaml's chart-test gate (added `bdeb0f54` + the existing wiring) before helm push. A regression that re-introduces a hardcoded enabled: true in any chart fails CI before the OCI artifact is published. Versioning: - All 11 leaf charts bumped 1.1.0 → 1.1.1. - products/catalyst/chart (bp-catalyst-platform umbrella) deps updated to 1.1.1 across the board. - clusters/_template/bootstrap-kit/03-flux through 10-gitea bumped to 1.1.1; clusters/omantel.omani.works/bootstrap-kit/* mirror. docs/BLUEPRINT-AUTHORING.md §11.2 table extended to enumerate every toggle disabled across all 11 Blueprints. References docs/INVIOLABLE-PRINCIPLES.md #4. GATES (all green): - helm dep build resolves cleanly post-change for every chart whose upstream is published (umbrella waits on per-leaf publish). - helm lint clean on all 11 leaves. - helm template . default render produces zero monitoring.coreos.com references on every leaf (verified locally). - tests/observability-toggle.sh PASS on all 11 leaves. Live verification: with v1.1.1 published the omantel.omani.works HelmRelease can roll forward without a manual values patch — Flux picks up the new chart digest automatically (semver: 1.x in OCIRepository). Refs: issue #182.	2026-04-29 19:23:52 +02:00
hatiyildiz	43aff20254	feat(bp-): convert all 11 bootstrap-kit charts to umbrella charts depending on upstream Each platform/<name>/chart/Chart.yaml now declares the canonical upstream chart as a dependencies: entry. helm dependency build pulls the upstream payload into the OCI artifact at publish time, so Flux helm install of bp-<name>:1.1.0 actually installs the upstream Helm release alongside the Catalyst-curated overlays (NetworkPolicy, ServiceMonitor, ClusterIssuer, ExternalSecret) under templates/. Pinned upstream chart versions per platform/<name>/blueprint.yaml: - cilium 1.16.5 https://helm.cilium.io - cert-manager v1.16.2 https://charts.jetstack.io - flux 2.4.0 https://fluxcd-community.github.io/helm-charts - crossplane 1.17.x https://charts.crossplane.io/stable - sealed-secrets 2.16.x https://bitnami-labs.github.io/sealed-secrets - spire ... https://spiffe.github.io/helm-charts-hardened - nats-jetstream ... https://nats-io.github.io/k8s/helm/charts - openbao ... https://openbao.github.io/openbao-helm - keycloak ... https://charts.bitnami.com/bitnami - gitea ... https://dl.gitea.com/charts - catalyst-platform umbrella over the 10 leaf bp- charts via helm dependency values.yaml in each chart adopts the umbrella convention: catalystBlueprint metadata block (provenance + version) at top level, upstream subchart values namespaced under the dependency name. cert-manager specifically: clusterissuer-letsencrypt-dns01.yaml gets the helm.sh/hook: post-install,post-upgrade annotation so it applies AFTER cert-manager controllers are running and CRDs registered (the previous hollow-chart shape ran the ClusterIssuer at install time when CRDs didn't exist yet, which was the omantel cluster's exact failure mode). Wrapper chart version bumped 1.0.0 → 1.1.0 across the board (umbrella conversion is a meaningful structural revision). Cluster manifests in clusters/_template/bootstrap-kit/ AND clusters/omantel.omani.works/ bootstrap-kit/ updated to reference 1.1.0. The blueprint-release.yaml workflow's helm package step needs an explicit helm dependency build before push so the upstream subchart bytes ship inside the OCI artifact. That CI change is a follow-up commit on this same branch (separate file scope).	2026-04-29 17:21:36 +02:00
hatiyildiz	62d9c7d936	fix(charts): drop dependencies block — wrappers carry values overlay only The first 2 blueprint-release CI runs failed on `helm package` with containerd permission errors because the wrapper Chart.yaml's `dependencies:` block triggered helm to pull the upstream charts via OCI/containerd at package time, which the GitHub Actions runner blocks. Architectural fix: each Catalyst Blueprint wrapper carries the values overlay + metadata only. The bootstrap installer reads the upstream chart reference from the wrapper's values.yaml `catalystBlueprint.upstream.{chart,version,repo}` metadata block, points `helm install` at the upstream chart's repo, and overlays our values. This keeps: - blueprint-release CI lightweight (no upstream pulls during package; helm package now works without containerd) - the "bp-<name> wrapper does NOT drift from upstream" property (we ship the overlay, not a fork) - the single Blueprint contract from BLUEPRINT-AUTHORING §1 (a wrapper is still a Catalyst-curated Helm chart published as bp-<name>:<semver>) Changes: - 11 platform/<name>/chart/Chart.yaml: removed dependencies block. Each is now a plain Helm chart with no remote pulls during package. - 11 platform/<name>/chart/values.yaml: prepended catalystBlueprint.upstream.{chart,version,repo} metadata block at the top. Bootstrap installer parses it to know which upstream chart to install with these values. - products/catalyst/bootstrap/api/internal/bootstrap/bootstrap.go: installCilium now does `helm repo add cilium https://helm.cilium.io --force-update` then `helm install cilium cilium/cilium --version 1.16.5 --values -` (the cilium/cilium upstream chart, with our overlay values piped from values.yaml). Same pattern needs propagating to the other 10 install functions in a follow-up. After this commit, blueprint-release CI should green-build all 11 wrappers (helm package now works without containerd access since there's nothing to pull). The bootstrap installer's actual `helm install` calls in production reach upstream chart repos via the runtime k3s cluster's pod network, which has full network access.	2026-04-28 12:57:29 +02:00
hatiyildiz	441ebaebb8	fix(charts): pin upstream chart versions/names to ones that exist in their repos The first Blueprint Release CI run (commit `8c0f766`) failed because four chart wrappers referenced upstream chart versions/names that don't exist in their published repositories: - platform/flux/chart: name was "flux", repo was OCI; actual is name "flux2" in plain helm repo at https://fluxcd-community.github.io/helm-charts. Pinned to 2.13.0. - platform/openbao/chart: version 2.1.0 was the binary appVersion, not the chart version. Pinned to 0.16.0 chart (which packages openbao 2.1.0 internally). - platform/keycloak/chart (Bitnami): chart version 25.0.6 was the appVersion of upstream; Bitnami's chart is at 24.7.1 packaging Keycloak 26.0.x. Pinned to 24.7.1. - platform/nats-jetstream/chart: name was "nats-jetstream"; the upstream chart is named "nats" (it always was — JetStream is a feature of NATS, not a separate chart). Renamed. Cilium, cert-manager, crossplane, sealed-secrets, spire wrappers were unaffected; their version pins matched upstream availability. Containerd permission-denied errors from `helm package` on cilium/cert-manager/crossplane/gitea/sealed-secrets are a separate CI plumbing issue (helm tries to pull OCI base images during package build via containerd, but the GitHub Actions runner blocks containerd socket access). Tracked as a follow-up: switch to `helm package --skip-refresh` or use a runner with containerd permissions. After this commit lands, the next blueprint-release CI run should green-build at minimum the 4 fixed charts. Successful builds publish bp-{flux,openbao,keycloak,nats-jetstream}:1.0.0 OCI artifacts to ghcr.io/openova-io/.	2026-04-28 12:55:21 +02:00
hatiyildiz	8c0f76640c	feat(charts): G2 wrapper Helm charts for 11 bootstrap-kit components + blueprint-release CI Per docs/PROVISIONING-PLAN.md and tickets [F] chart. Adds Catalyst-curated wrapper Helm charts at platform/<name>/chart/ for every component the bootstrap-kit installer (introduced in commit `07b4bcf`) needs. Each chart is the canonical bp-<name> source per BLUEPRINT-AUTHORING.md §1's source-location rule. 11 charts created with Chart.yaml + values.yaml + blueprint.yaml each: Network + GitOps: - platform/cilium/chart — wraps cilium 1.16.5; kubeProxyReplacement, WireGuard mTLS, Hubble, Gateway API - platform/flux/chart — wraps flux 2.4.0 - platform/crossplane/chart — wraps crossplane 1.18.0 + provider-hcloud manifest Security: - platform/cert-manager/chart — wraps cert-manager 1.16.2 with CRDs+ServiceMonitor - platform/sealed-secrets/chart — wraps sealed-secrets 2.16.1 (transient bootstrap-only) - platform/spire/chart — wraps spiffe/spire 1.10.4 (5-min SVID rotation) Catalyst control-plane services: - platform/nats-jetstream/chart — wraps nats 2.10.22 (3-node cluster, JetStream + KV) - platform/openbao/chart — wraps openbao 2.1.0 (3-node Raft, region-local per SECURITY §5) - platform/keycloak/chart — wraps keycloak 25.0.6 (Bitnami flavor, edge proxy mode) - platform/gitea/chart — wraps gitea 10.5.0 (CNPG Postgres backend, no chart-bundled valkey/redis since Catalyst control plane uses JetStream) New platform/ folders (added per AUDIT-PROCEDURE component-count anchor — was 53, now 55): - platform/spire/README.md — workload identity Catalyst control plane component - platform/nats-jetstream/README.md — control-plane event spine - platform/sealed-secrets/README.md — transient bootstrap-only Each blueprint.yaml declares: - catalyst.openova.io/v1alpha1 Blueprint kind (canonical CRD per BLUEPRINT-AUTHORING §3) - visibility: unlisted (mandatory infra, auto-installed by bootstrap kit, not a marketplace card) - manifests.chart: ./chart pointer - depends: [] (foundational components have no Blueprint dependencies; control-plane services depend on each other implicitly via bootstrap order, not via Blueprint depends) .github/workflows/blueprint-release.yaml: - New CI workflow per BLUEPRINT-AUTHORING §11 (path-matrix per Blueprint folder) - Triggers on push to main touching platform//chart/* or products//chart/* - detect job: emits matrix of changed Blueprint folders via git diff - build job (per chart): helm dependency build → helm package → helm push to GHCR → cosign keyless sign (GitHub OIDC) → Syft SBOM attestation - Output: ghcr.io/openova-io/bp-<name>:<semver> with SLSA-3-style supply-chain provenance Closes [F] tickets: 11 G2 charts (cilium, cert-manager, flux, crossplane, sealed-secrets, spire, nats-jetstream, openbao, keycloak, gitea, plus the umbrella products/catalyst/chart already exists from Pass 105). blueprint.yaml CRDs added across 11 entries. CI fan-out workflow live. After this commit lands, the bootstrap-kit installer in commit `07b4bcf` has real OCI artifacts to install. The first push to main will trigger 10 build matrix jobs (cilium was created in a separate commit earlier in this session) which produce 10 cosigned bp-<name>:<semver> artifacts on GHCR. Component-count anchor update follows: 53 → 55 (added spire + nats-jetstream + sealed-secrets — but sealed-secrets was already conceptually counted under "supporting services"). Per AUDIT-PROCEDURE the count needs updating in CLAUDE.md, BUSINESS-STRATEGY, TECHNOLOGY-FORECAST L11. Tracked as separate ticket [K] docs.	2026-04-28 12:51:06 +02:00
hatiyildiz	3993f5fc31	docs(pass-31): openbao + librechat DNS-placeholder carry-over fixes platform/openbao/README.md ingress hosts (line 108) had `bao.<domain>` while the same file's ClusterSecretStore example (line 127) used the canonical `bao.<location-code>.<sovereign-domain>` form. Pass 7's active-active fix addressed the body but missed the ingress placeholder. Aligned with the canonical form. platform/librechat/README.md OAuth callback (line 154) had `chat.ai-hub.<domain>/oauth/openid/callback` — same Application-endpoint shape Pass 25 fixed in llm-gateway. Pass 22 marked the file clean and Pass 29 fixed the Keycloak issuer line but didn't re-sweep. Per NAMING §5.2 Application endpoints are `{app}.{environment}.{sovereign-domain}`. Fixed. docs/GLOSSARY.md verified clean — single-source-of-truth has held across the loop (Pass 6/7/14/20/22/26/27 all consistent with current GLOSSARY). Validation log Pass 31 entry includes meta-note: third file (librechat) that needed re-opening after a "clean" mark — banner scans miss YAML-block drift. Future passes should default to a full placeholder-shape grep on every file touched.	2026-04-27 22:34:10 +02:00
hatiyildiz	42aeb629bb	docs(pass-7): rewrite OpenBao + ESO READMEs to match agreed multi-region semantics Pass 7 — line-by-line read of platform/openbao/README.md and platform/external-secrets/README.md found a major architectural drift: both files described an OLD active-active bidirectional sync model that contradicts docs/SECURITY.md §5 (the canonical reference). The active-active design was rejected during the architecture session because it would have been a stretched cluster — a single region's network blip would block writes everywhere. The agreed model is: - Independent Raft cluster per region (intra-region quorum only). - Single-primary writes; replicas accept reads only. - Async Performance Replication primary → replicas (lag <1s typical). - Explicit DR promotion (sovereign-admin or failover-controller). Fixes: platform/openbao/README.md: - Overview: removed "active-active deployments" / "either region can update secrets". Replaced with "independent Raft cluster per region", "asynchronous Performance Replication". - Architecture diagram: replaced bidirectional-push diagram with the primary→replicas async perf replication topology that matches SECURITY.md §5. - ClusterSecretStores: simplified from "two stores (local+remote)" to "one local store"; reads always pull locally. - Renamed "PushSecret (Bidirectional)" → "Writes go to the primary region" with a single-target PushSecret pointing at bao-primary. - Added DR promotion section pointing at SECURITY.md §5.2. - Status banner: notes that the canonical multi-region reference is SECURITY.md. platform/external-secrets/README.md: - Header line: repositioned as per-host-cluster infrastructure with pointer to PLATFORM-TECH-STACK §3.3. - Removed broken link to non-existent ../openbao/docs/ADR-OPENBAO.md (replaced with link to ../openbao/README.md). - "Multi-region sync \| Push to both OpenBao instances simultaneously" → "Multi-region reads \| Async perf replication". - "PushSecret to Multiple OpenBao Instances" example was writing to two ClusterSecretStores in parallel — replaced with single-target primary write. - "Multi-region sync via single PushSecret" in Consequences → "Cross-region availability via Performance Replication". - Mermaid sequence diagram: "Bootstrap Wizard" actor → "Catalyst Bootstrap (Phase 0)"; "Terraform" → "OpenTofu"; ESO connection description "via K8s auth" → "via SPIFFE SVID (workload identity)". These were the most consequential drift fixes found in any pass — two READMEs were documenting an architecture explicitly rejected by the agreed model. Refs #37	2026-04-27 21:34:09 +02:00
hatiyildiz	119a1e53a0	docs(components): terminology pass across platform and product READMEs Bring per-component READMEs in line with the canonical glossary (docs/GLOSSARY.md). Substantive architectural content unchanged — this is a terminology + reference correctness pass. Placeholder rename: <tenant> → <org> in YAML / IaC examples across - platform/cnpg/README.md (Cluster + Pooler + ScheduledBackup) - platform/debezium/README.md (PostgreSQL connector + topic patterns) - platform/external-secrets/README.md (ExternalSecret / SecretStore) - platform/grafana/README.md (Instrumentation namespace) - platform/k8gb/README.md (Gslb + namespace + kubectl examples) - platform/keda/README.md (ScaledObject + Kafka triggers + Prometheus) - platform/opentofu/README.md (server resource example) - platform/velero/README.md (BackupStorageLocation buckets) - platform/vpa/README.md (VerticalPodAutoscaler examples) - platform/flux/README.md (kustomization name + tenants/ → organizations/) "Catalyst IDP" → "Catalyst console": - platform/crossplane/README.md (integration section retitled and rewritten — Crossplane is platform plumbing, not user-facing) - platform/gitea/README.md (architecture diagram + integration table) - platform/kyverno/README.md (rollout tracking surface) - products/fingate/README.md (TPP onboarding portal) "Bootstrap wizard" → "Catalyst bootstrap": - platform/openbao/README.md (bootstrap procedure rewritten — independent Raft per region clarified; cross-references docs/SECURITY.md §5) - platform/opentofu/README.md (Quick Start) Kyverno labels & prose: - openova.io/tenant → openova.io/organization (label rename for consistency; deployed clusters will add new label as a co-label during migration window) - "tenant labels" / "tenant namespace" prose updated to "Organization labels" / "Organization-labeled namespace" - Priority class names (tenant-high, tenant-default, tenant-batch) retained as deployed artifact names — rename pending in a separate migration ticket No banned-term hits remain in component READMEs (verified by grep in docs/GLOSSARY.md banned-terms table). Refs #37	2026-04-27 20:06:51 +02:00
talent-mesh	10245dff98	feat: ecosystem expansion to 55 components with license compliance - Replace BSL-licensed components with open-source alternatives: Terraform→OpenTofu (MPL 2.0), Vault→OpenBao (MPL 2.0), Redpanda→Strimzi/Kafka (Apache 2.0), n8n→Airflow (Apache 2.0) - Add 14 new platform components: activemq, camel, clickhouse, dapr, debezium, falco, flink, iceberg, opensearch, rabbitmq, superset, temporal, trino, vitess - Rename meta-platforms/ to products/ with new product names: Cortex (AI Hub), Fingate (Open Banking), Titan (Data Lakehouse), Fuse (Microservices Integration) - Update all documentation, READMEs, and cross-references Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 18:15:11 +00:00

18 Commits