openova/products/catalyst/chart/templates/api-deployment.yaml
2026-05-14 06:12:28 +00:00

1213 lines
67 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

apiVersion: apps/v1
kind: Deployment
metadata:
name: catalyst-api
labels:
app.kubernetes.io/name: catalyst-api
app.kubernetes.io/component: api
annotations:
# `kustomize.toolkit.fluxcd.io/force: enabled` is the durable
# remediation for the `RollingUpdate -> Recreate` strategy-flip
# collision documented in docs/CHART-AUTHORING.md §"Strategy flips
# on existing Deployments".
#
# Failure mode this addresses
# ---------------------------
# On 2026-04-29 the `catalyst` Flux Kustomization on contabo-mkt
# stuck at Ready=False with:
#
# Deployment.apps "catalyst-api" is invalid:
# spec.strategy.rollingUpdate: Forbidden:
# may not be specified when strategy `type` is 'Recreate'
#
# Root cause: the live Deployment had been previously created with
# the default `RollingUpdate` strategy (so `rollingUpdate.maxSurge=25%`
# and `maxUnavailable=25%` were present on the live object, owned
# by the `kubectl-client-side-apply` field manager). Flux's
# kustomize-controller submits this manifest via Server-Side Apply
# with field manager `kustomize-controller`. SSA's contract is
# "set the fields you declare" — it does NOT remove fields owned
# by other managers. Result: post-merge object had `type: Recreate`
# AND the residual `rollingUpdate.*` block, which the API server's
# validator rejects as invalid (Recreate forbids any rollingUpdate
# keys). SSA is REQUIRED to reject the merge. No SSA-only chart
# change can fix this.
#
# Why `$patch: replace` does NOT solve this
# -----------------------------------------
# The Strategic Merge Patch directive `$patch: replace` would tell
# an SMP-aware merger to REPLACE the strategy block instead of
# merging into it. But:
# - SSA rejects `$patch` outright with "field not declared in
# schema" (it's not in apps/v1 Deployment).
# - kubectl strict-decoding rejects `$patch` on CREATE under any
# mode with "unknown field spec.strategy.$patch" — so adding
# it to the chart manifest BREAKS fresh installs.
# `$patch: replace` is a runtime SMP directive, never a chart-spec
# value. It belongs in a Kustomize `patches:` entry (where the
# kustomize binary consumes it at build time and emits a clean
# output) — never inline in a base resource.
#
# Why the Flux force annotation IS the right fix
# ----------------------------------------------
# When kustomize-controller's SSA submission fails dry-run with an
# Invalid response, this annotation directs the controller to
# recover by deleting and recreating THIS resource specifically
# (not the whole Kustomization). The recreated Deployment has no
# residual `rollingUpdate.*` fields — the regression cannot
# recur on the rebuilt object.
#
# That is NOT a "kubectl delete bandaid": the annotation is part
# of the IaC manifest, version-controlled, applied declaratively
# via Flux on every reconciliation, scoped to this single
# Deployment, and removed only by editing the chart. Per
# docs/INVIOLABLE-PRINCIPLES.md #3 (Follow the documented
# architecture, exactly — Flux is the ONLY GitOps reconciler) and
# #4 (Never hardcode — runtime configuration in Git, not in shell
# history): the remediation lives in source control.
#
# Why this Deployment in particular tolerates a recreate: the
# spec declares `strategy.type: Recreate`, so the steady-state
# update path is delete-and-recreate anyway. Flux falling back to
# delete-and-recreate on a strategy-flip is a no-op relative to a
# normal pod-spec change. The deployments PVC is ReadWriteOnce;
# the recreate flow detaches it from the old Pod before mounting
# it on the new one, which is exactly the contract `Recreate`
# enforces. State persistence is maintained because the PVC
# itself is NOT recreated by this annotation — only the
# Deployment resource is.
kustomize.toolkit.fluxcd.io/force: enabled
# Reloader watches the sovereign-fqdn + handover-jwt-public ConfigMaps/Secrets
# this Pod reads via valueFrom. On Sovereigns, those resources are applied
# by the sovereign-tls Kustomization concurrently with the bp-catalyst-platform
# HelmRelease. If the Pod started first, optional valueFrom resolves to ""
# and SOVEREIGN_FQDN stays empty for the lifetime of the Pod — every handover
# then fails the audience check with 401 "invalid audience" (caught live on
# otech62, 2026-05-03). Reloader rolls the Deployment when those resources
# land, fixing the race without requiring strict Flux dependsOn ordering.
configmap.reloader.stakater.com/reload: "sovereign-fqdn"
secret.reloader.stakater.com/reload: "handover-jwt-public"
spec:
replicas: 1
# Recreate strategy is required because the deployments PVC is RWO
# (single-attach). A rolling update would try to schedule a second
# Pod that mounts the same PVC, which Kubernetes rejects as a
# MultiAttachError. RWX with a multi-writer-aware filesystem
# (NFS, CephFS) is the path to HA, but Catalyst-Zero today is
# single-replica by design — the wizard is interactive and PDM owns
# cross-tenant isolation, so a single API server is sufficient.
#
# The strategy-flip regression that bit contabo-mkt on 2026-04-29
# (apply over a pre-existing RollingUpdate Deployment fails with
# `spec.strategy.rollingUpdate: Forbidden`) is recovered by the
# `kustomize.toolkit.fluxcd.io/force: enabled` annotation above —
# see that annotation's comment for the full failure-mode analysis
# and the docs/CHART-AUTHORING.md §"Strategy flips on existing
# Deployments" entry. Do NOT add an inline `$patch: replace` here:
# it BREAKS fresh installs (kubectl strict-decoding rejects
# `spec.strategy.$patch` on create), and Flux's SSA path strips it
# anyway. The integration test at tests/integration/strategy-flip.yaml
# asserts both the recovery path works and the regression mode is
# still detected.
strategy:
type: Recreate
selector:
matchLabels:
app.kubernetes.io/name: catalyst-api
template:
metadata:
labels:
app.kubernetes.io/name: catalyst-api
spec:
# serviceAccountName — bind the Pod to the dedicated cutover-driver
# ServiceAccount so the /api/v1/sovereign/cutover/start handler can
# read/patch the cutover ConfigMaps + create/watch Jobs in the
# `catalyst` namespace. See serviceaccount-cutover-driver.yaml +
# clusterrole-cutover-driver.yaml + clusterrolebinding-cutover-
# driver.yaml for the full RBAC graph (issue #830 P0 Bug 1).
#
# The SA is created by THIS chart in the same namespace catalyst-api
# runs in (catalyst-system) and bound at cluster scope (the cutover
# endpoint is namespace-configurable via CATALYST_CUTOVER_NAMESPACE).
# Without this, the Pod runs as system:serviceaccount:catalyst-
# system:default and every cutover-status read returns 502
# "configmaps is forbidden" (caught live on otech102, 2026-05-04).
serviceAccountName: catalyst-api-cutover-driver
imagePullSecrets:
- name: ghcr-pull
# fsGroup applies to the volumes mounted into the Pod so the
# non-root container UID (65534) can write to the deployments
# PVC. Without this, Hetzner Cloud Volumes default to root:root
# and the catalyst-api process gets EACCES on every store.Save —
# surfacing as the "deployment store unavailable" warning at
# startup and silent persistence failures at runtime.
#
# fsGroupChangePolicy: OnRootMismatch limits the chown traversal
# to first start (where the volume is freshly provisioned with
# the wrong UID). Subsequent restarts skip the recursive chown
# if the root dir already matches, keeping Pod start times
# bounded as the deployments directory grows.
securityContext:
fsGroup: 65534
fsGroupChangePolicy: OnRootMismatch
containers:
- name: catalyst-api
# Literal image ref — required for the contabo-mkt Kustomize
# path (kustomize-controller doesn't render Helm templates).
# Auto-bumped by .github/workflows/catalyst-build.yaml's deploy
# step on every push to main, so Sovereigns AND contabo both
# roll to the latest catalyst-api SHA. The matching
# values.yaml `images.catalystApi.tag` is also bumped (but
# unused for catalyst-api; kept for SME services that DO read
# from values).
image: "ghcr.io/openova-io/openova/catalyst-api:b4c2f54"
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
protocol: TCP
env:
- name: PORT
value: "8080"
# OPENOVA_FLOW_SERVER_URL — catalyst-api proxy upstream for
# /api/v1/flows/{deploymentId}/snapshot|stream|events. When
# set, the proxy short-circuits per-deployment FQDN lookup
# (see openova_flow_proxy.go resolveFlowServerURL) and uses
# this URL for every flow id. Used by the mothership
# (contabo) and by Sovereign chroots that install the
# openova-flow chart in the same cluster — both can reach
# their LOCAL openova-flow-server via in-cluster Service DNS
# without needing the public HTTPRoute. Empty on Sovereigns
# that rely on per-deployment FQDN resolution.
#
# Without this env, the proxy returns 502 on every
# /api/v1/flows/{id}/snapshot because the resolver tries to
# reach https://openova-flow.<sovereignFQDN> which only
# exists on a Sovereign that has installed bootstrap-kit
# slot 56 with httproute.enabled=true.
- name: OPENOVA_FLOW_SERVER_URL
value: "http://openova-flow-server.catalyst.svc.cluster.local"
# CATALYST_BUILD_SHA / CATALYST_CHART_VERSION — qa-loop iter-3
# Fix #18 (TC-261). The /api/v1/version handler resolves these
# env vars first (envOrTrim) before falling back to the ldflag
# defaults `dev` / `0.0.0`. Without this injection the live
# version probe returned `{"sha":"dev","version":"0.0.0"}` on
# every Sovereign — masking which image was actually live.
#
# LITERAL values (not Helm template directives) are mandatory
# here: this file is consumed by both Helm (per-Sovereign
# install via bp-catalyst-platform OCI) AND Kustomize
# (clusters/contabo-mkt/apps/catalyst-platform). Helm
# directives in `value:` fields break the Kustomize build
# with `yaml: invalid map key` (see DUAL-MODE CONTRACT note
# on CATALYST_POWERDNS_API_URL below).
#
# The CATALYST_BUILD_SHA literal is bumped by the same CI
# sed-pass that bumps the catalyst-api image tag literal in
# this file — see .github/workflows/catalyst-build.yaml's
# "Bump literal image refs in chart templates" step. Both
# sides converge on the same SHA on every push to main, so
# /api/v1/version returns the SHA the Pod is actually
# running (no drift between image tag and reported SHA).
#
# CATALYST_CHART_VERSION mirrors Chart.yaml's `version:` —
# bumped manually whenever the chart shape changes, per the
# existing bp-catalyst-platform release discipline. The
# value carries through to /api/v1/version's `chartVersion`
# field so dashboards can correlate api-version <-> chart-
# version on a single probe.
- name: CATALYST_BUILD_SHA
value: "b4c2f54"
- name: CATALYST_CHART_VERSION
value: "1.4.95"
- name: CORS_ORIGIN
value: "https://console.openova.io"
- name: DYNADOT_API_KEY
valueFrom:
secretKeyRef:
name: dynadot-api-credentials
key: api-key
# optional=true: Sovereign clusters don't hold Dynadot
# credentials — their tenant DNS is served by the
# Sovereign's own PowerDNS instance, not the parent
# account. Catalyst-Zero (contabo-mkt) supplies the
# real secret; Sovereigns use an empty stub or omit it
# entirely. Without optional=true the pod refuses to
# start when the secret is absent (issue #547).
optional: true
- name: DYNADOT_API_SECRET
valueFrom:
secretKeyRef:
name: dynadot-api-credentials
key: api-secret
optional: true
# DYNADOT_MANAGED_DOMAINS — comma-separated list of pool domains
# the same Dynadot account manages. Per docs/INVIOLABLE-PRINCIPLES.md
# #4, this is runtime configuration so adding a third pool domain
# (e.g. acme.io) does NOT require a code change — only a secret
# update. The Dynadot API is account-scoped (one api-key/api-secret
# pair covers every domain owned by the account); this list scopes
# which domains the catalyst-api is *allowed* to write records for,
# defending against misconfiguration that would let a wizard-
# supplied poolDomain trigger writes against an unrelated domain.
- name: DYNADOT_MANAGED_DOMAINS
valueFrom:
secretKeyRef:
name: dynadot-api-credentials
key: domains
# optional=true so deployments using the legacy single-value
# `domain` key (pre-#108) keep working until the secret is
# migrated; the dynadot package falls through to DYNADOT_DOMAIN
# then to its built-in defaults if neither key is present.
optional: true
- name: DYNADOT_DOMAIN
valueFrom:
secretKeyRef:
name: dynadot-api-credentials
key: domain
optional: true
# CATALYST_TOFU_WORKDIR — provisioner runs `tofu init/plan/apply`
# inside this directory. PVC-backed (catalyst-api-deployments) so
# in-progress tofu state survives Pod restarts. Without this,
# any catalyst-api Pod roll mid-apply (e.g. an unrelated chart
# bump that triggers rolling restart on Catalyst-Zero, or a
# node reboot) leaks Hetzner resources because partial apply
# state is in emptyDir. Caught live on otech64, 2026-05-03:
# contabo's catalyst-api was rolled at 21:40:11 (3 minutes
# into otech64's tofu apply), terminal_LB created without its
# control_plane target, and otech64 came up with an unreachable
# 49.12.16.160 LB. Reasonable for fsGroup=65534 above to
# provide write access to /var/lib/catalyst (PVC mountPath).
- name: CATALYST_TOFU_WORKDIR
value: /var/lib/catalyst/tofu
# CATALYST_DEPLOYMENTS_DIR — flat-file store for deployment
# records (one JSON file per deployment id). Backed by the
# PVC mount below so deployments persist across Pod
# restarts. Each record is the full Deployment state with
# credentials redacted; see internal/store/store.go.
- name: CATALYST_DEPLOYMENTS_DIR
value: /var/lib/catalyst/deployments
# CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS — defensive floor
# only. The load-bearing termination gate is now the
# informer's HasSynced signal (after WaitForCacheSync the
# full bp-* HelmRelease set is in the cache, regardless of
# cardinality). Set to 1 so the watch still refuses to
# terminate when the cache is completely empty (the
# "bootstrap-kit Kustomization never reconciled at all"
# footgun, classified as OutcomeFluxNotReconciling).
#
# Earlier values (11, then 38) tied this to the kit count;
# that coupling is brittle — otech48 (2026-05-03) sat
# phase1-watching forever because the env was 38 but the
# kit had drifted to 37. The HasSynced gate is drift-proof.
- name: CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS
value: "1"
# CATALYST_PHASE1_WATCH_TIMEOUT — overall budget for the
# Phase-1 HelmRelease watch (helmwatch.DefaultWatchTimeout).
# F8 fix (2026-05-12, prov #44 RCA): the previous 60m default
# was tighter than the inner bp-catalyst-platform HR install
# × retries chain after that chart's timeout was raised
# 15m → 30m. With retries=3 and strategy=rollback, worst-case
# inner HR is 30m × 3 = 90m; the outer watch budget MUST be
# larger so it never terminates while helm-controller still
# has remediation attempts left. Per docs/INVIOLABLE-
# PRINCIPLES.md #4 the knob is runtime-configurable here so
# capacity-bounded sandboxes can dial it down without a
# code change. Parsed by helmwatch.CompileWatchTimeout —
# accepts any time.ParseDuration value (e.g. "120m", "2h").
- name: CATALYST_PHASE1_WATCH_TIMEOUT
value: "120m"
# CATALYST_KUBECONFIGS_DIR — sibling directory on the same
# PVC for the plaintext kubeconfigs the new Sovereign POSTs
# back via the bearer-token endpoint (issue #183, Option D).
# One <id>.yaml per deployment, mode 0600. The store JSON
# record carries only the file path + a SHA-256 hash of
# the bearer; the plaintext kubeconfig is NEVER serialized
# into the JSON.
- name: CATALYST_KUBECONFIGS_DIR
value: /var/lib/catalyst/kubeconfigs
# CATALYST_API_PUBLIC_URL — the public origin the new
# Sovereign's cloud-init PUTs its kubeconfig back to. The
# OpenTofu module templates this into the Sovereign's
# user_data so the Sovereign knows where to call. Per
# docs/INVIOLABLE-PRINCIPLES.md #4 this is runtime
# configuration; air-gapped franchises override it
# without code change.
- name: CATALYST_API_PUBLIC_URL
value: https://console.openova.io/sovereign
# CATALYST_K8SCACHE_KUBECONFIGS_DIR — issue #321. Directory
# the k8scache.Factory reads kubeconfigs from at startup.
# The data-plane SharedInformerFactory opens one informer
# per kubeconfig file; the cloud-init postback handler
# (PUT /api/v1/deployments/{id}/kubeconfig) writes here on
# Phase-1 attach so a fresh Sovereign id is automatically
# picked up at next catalyst-api restart. The same PVC
# (catalyst-api-deployments) backs the existing
# deployments store; the data-plane reads the kubeconfigs/
# subdirectory directly.
- name: CATALYST_K8SCACHE_KUBECONFIGS_DIR
value: /var/lib/catalyst/kubeconfigs
# CATALYST_K8SCACHE_SNAPSHOT_DIR — issue #321 cold-start
# mitigation. Backed by a separate 5Gi PVC
# (catalyst-api-cache) so its size is independent of the
# deployments store. See api-cache-pvc.yaml for the sizing
# rationale + the cold-start latency contract.
- name: CATALYST_K8SCACHE_SNAPSHOT_DIR
value: /var/cache/sov-cache
# CATALYST_K8SCACHE_KINDS_CONFIGMAP — optional ConfigMap
# extending the built-in kinds registry. Per docs/
# INVIOLABLE-PRINCIPLES.md #4 a new watched GVR (e.g.
# HelmRelease, Kustomization) is a runtime configuration
# change, not a code change. Empty disables ConfigMap
# loading; built-in DefaultKinds is used.
- name: CATALYST_K8SCACHE_KINDS_CONFIGMAP
value: catalyst-k8scache-kinds
- name: CATALYST_K8SCACHE_KINDS_CONFIGMAP_NAMESPACE
value: catalyst
# CATALYST_GHCR_PULL_TOKEN — long-lived GHCR pull token that
# the provisioner stamps onto every Request and the OpenTofu
# cloud-init template writes into the new Sovereign's
# flux-system/ghcr-pull Secret so Flux source-controller
# can pull private bp-* OCI artifacts from
# ghcr.io/openova-io/. Without this, Phase 1 stalls at
# bp-cilium with "secrets ghcr-pull not found" — verified
# live on omantel.omani.works pre-fix.
#
# optional: true — when the Secret or key is missing the
# Pod still starts (with the env var unset). The
# provisioner's Validate() rejects deployments that need
# the token (Phase 1 bootstrap-kit pulls private bp-*
# charts) with a clear pointer to docs/SECRET-ROTATION.md,
# so a misconfigured catalyst-api fails fast on
# /api/v1/deployments POST instead of silently mid-apply.
# /healthz, /api/v1/credentials/validate, and the BYO
# registrar proxy keep working — they don't read the
# token at all.
#
# Rotation: yearly, see docs/SECRET-ROTATION.md. The Secret
# is created out-of-band by an operator (never via Flux,
# never committed to git) — the chart references it but
# does not template it.
- name: CATALYST_GHCR_PULL_TOKEN
valueFrom:
secretKeyRef:
name: catalyst-ghcr-pull-token
key: token
optional: true
# CATALYST_HARBOR_ROBOT_TOKEN — central Harbor proxy-cache
# robot account secret (issue #557 + #557 follow-up). The
# value is interpolated into the new Sovereign's
# /etc/rancher/k3s/registries.yaml at cloud-init time so
# containerd authenticates against harbor.openova.io's proxy
# projects (proxy-dockerhub etc).
#
# Provisioning seam (catalyst-system Pod gets the Secret):
# 1. Tofu var.harbor_robot_token enters cloud-init
# (infra/hetzner/cloudinit-control-plane.tftpl).
# 2. Cloud-init writes /var/lib/catalyst/harbor-robot-
# token-secret.yaml into flux-system ns with the
# auto-mirror Reflector annotations
# (reflection-auto-enabled: "true").
# 3. runcmd applies it BEFORE flux-bootstrap, so the
# Secret exists before any Helm release runs.
# 4. bp-reflector (slot 05a) propagates it into every
# namespace (incl. catalyst-system) on first reconcile.
# 5. This Pod's secretKeyRef resolves once the mirror lands.
# Mirrors the canonical pattern that flux-system/ghcr-pull
# already uses (PR #543).
#
# NOT optional — provisioner.Validate() rejects deployments
# with an empty token. The architecture mandate is that every
# Sovereign image pull goes through harbor.openova.io; falling
# through to docker.io is forbidden (rate-limit makes a fresh
# Hetzner IP unbootable within minutes). When `optional: true`
# was previously contemplated we chose against it: a missing
# token must surface immediately as a Pod start failure
# (CreateContainerConfigError), not silently mid-provision.
#
# Rotation: yearly. Re-render Tofu plan → re-apply cloud-init
# → kubectl apply runs against the existing Secret with
# rotated bytes; bp-reflector propagates the rotation to all
# mirrored copies on the next watch tick. Plaintext NEVER
# lives in git.
- name: CATALYST_HARBOR_ROBOT_TOKEN
valueFrom:
secretKeyRef:
name: harbor-robot-token
key: token
# CATALYST_POWERDNS_API_KEY — contabo PowerDNS API key (PR
# #681 followup). The value is interpolated into the new
# Sovereign's `cert-manager/powerdns-api-credentials` Secret
# at cloud-init time so bp-cert-manager-powerdns-webhook
# can write DNS-01 challenge TXT records to contabo's
# authoritative omani.works zone.
#
# Provisioning seam:
# 1. Source: contabo's `openova-system/powerdns-api-
# credentials` Secret (created by bp-powerdns chart).
# 2. Reflector mirrors it into every namespace incl.
# catalyst (annotations on the source: reflection-
# auto-enabled: "true", reflection-auto-namespaces: "").
# 3. This Pod resolves it via secretKeyRef.
# 4. provisioner.New() reads CATALYST_POWERDNS_API_KEY at
# startup, stamps onto every Request.
# 5. cloud-init writes the Sovereign-side Secret in
# cert-manager namespace BEFORE Flux reconciles
# bp-cert-manager-powerdns-webhook.
#
# optional=true: Catalyst-Zero pods on Sovereigns don't have
# this Secret reflected (their PowerDNS is local) so the
# bootstrap shape stays clean across both contabo+Sovereign
# catalyst-api deployments.
- name: CATALYST_POWERDNS_API_KEY
valueFrom:
secretKeyRef:
name: powerdns-api-credentials
key: api-key
optional: true
# CATALYST_POWERDNS_API_URL — base URL of the per-Sovereign
# PowerDNS REST API (issue #827). Used by:
# - the SME-tenant pipeline's PATCH-RRset writer
# (sme_tenant_dns.go) for free-subdomain provisioning
# - the multi-zone parent-domain handler
# (parent_domains.go) for runtime add-zone
# Default is the in-cluster Service FQDN of the Sovereign's
# own PowerDNS (the Helm chart targets namespace `powerdns`
# with default release name `powerdns`). Operators in
# non-standard layouts override via the Helm values overlay
# at clusters/<sovereign>/bootstrap-kit/13-bp-catalyst-
# platform.yaml.
#
# NOTE — DUAL-MODE CONTRACT (see SOVEREIGN_FQDN block below
# for the canonical explanation): this file is consumed BOTH
# by Helm (per-Sovereign install) AND by Kustomize (contabo-
# mkt's flux Kustomization at path: ./products/catalyst/chart/
# templates). Helm template syntax (double-curly directives)
# in this file BREAKS the Kustomize build with
# "yaml: invalid map key" and stalls every contabo
# reconciliation. The 1.4.0 version of this block used
# {{ default "..." .Values.catalystApi.powerdnsURL }} — that
# broke contabo's catalyst-platform Kustomization until this
# follow-up landed. Issue #830 follow-up.
#
# Solution: the in-cluster Service URL is a non-secret
# constant on every Sovereign that ships bp-powerdns at its
# canonical release name (powerdns/powerdns). Hardcode the
# literal here so the Kustomize build stays clean. Per-
# Sovereign overrides are still possible via the per-
# Sovereign HelmRelease overlay's `catalystApi.env`
# additional-env patch that takes precedence over the
# default below.
- name: CATALYST_POWERDNS_API_URL
value: "http://powerdns.powerdns.svc.cluster.local:8081"
# CATALYST_POWERDNS_SERVER_ID — virtually always "localhost"
# per the PowerDNS REST API contract. Operator-overridable
# for multi-tenant PowerDNS deployments where a single
# PowerDNS instance hosts multiple servers (override via the
# HelmRelease overlay env patch — same pattern as
# CATALYST_POWERDNS_API_URL above).
- name: CATALYST_POWERDNS_SERVER_ID
value: "localhost"
# ── /auth/handover Keycloak service-account (issue #606) ──────────
# CATALYST_KC_ADDR — Keycloak base URL. Defaults to in-cluster
# service FQDN in code; override here for non-standard Sovereign
# Keycloak deployments.
# optional=true: Catalyst-Zero pods don't run Keycloak locally.
- name: CATALYST_KC_ADDR
valueFrom:
secretKeyRef:
name: catalyst-kc-sa-credentials
key: addr
optional: true
- name: CATALYST_KC_REALM
valueFrom:
secretKeyRef:
name: catalyst-kc-sa-credentials
key: realm
optional: true
- name: CATALYST_KC_SA_CLIENT_ID
valueFrom:
secretKeyRef:
name: catalyst-kc-sa-credentials
key: client-id
optional: true
- name: CATALYST_KC_SA_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: catalyst-kc-sa-credentials
key: client-secret
optional: true
# ── Gitea client (qa-loop iter-12 Fix #53B) ──────────────────
# CATALYST_GITEA_URL + CATALYST_GITEA_TOKEN feed
# internal/handler/blueprints.go giteaClientFromEnv() so the
# /api/v1/sovereigns/.../blueprints/{publish,curatable,curate,
# edit-pr} endpoints can proxy into the Sovereign's in-cluster
# Gitea. Without these env vars every blueprint-CRUD endpoint
# returned the 503 "Gitea client unconfigured" error documented
# at blueprints.go:184 + blueprints.go:493 (matrix TC-081, TC-082,
# TC-083, TC-085 FAIL).
#
# Mirrors the wiring already used by blueprint-controller,
# application-controller, environment-controller, organization-
# controller (see controllers/blueprint-controller-deployment.yaml
# for the canonical pattern). The catalyst-gitea-token Secret
# is materialised by the bp-gitea chart's post-install Job and
# mirrored into catalyst-system by emberstack/reflector.
#
# optional=true: Catalyst-Zero (contabo) doesn't run a local
# Gitea — these endpoints return 503 there, which is the
# intended behaviour (only Sovereigns curate blueprints).
# CATALYST_GITEA_URL: in-cluster Gitea service FQDN. Identical
# across every Sovereign (the bp-gitea chart always lands in
# `gitea` ns with the same Service name). Hardcoded literal
# rather than .Values lookup because this file is consumed by
# BOTH Helm AND Kustomize (see SOVEREIGN_FQDN comment ~line 550)
# — Helm template directives break the Kustomize parse.
# Catalyst-Zero (contabo) doesn't run a local Gitea; the
# endpoint is unreachable there which the blueprints handler
# treats as "Gitea unconfigured" (returns 503 on blueprint
# CRUD endpoints — intended behaviour, only Sovereigns curate).
- name: CATALYST_GITEA_URL
value: "http://gitea-http.gitea.svc.cluster.local:3000"
- name: CATALYST_GITEA_TOKEN
valueFrom:
secretKeyRef:
name: catalyst-gitea-token
key: token
optional: true
# KEYCLOAK_BOOTSTRAP_TIER_ROLES — EPIC-3 slice T2 (#1098/#1146).
# When "true", a goroutine on catalyst-api startup calls
# EnsureTierRealmRoles (internal/keycloak/realm_bootstrap.go)
# against the Sovereign realm to materialise the 5 catalog-
# tier composite realm-roles (catalyst-{viewer,developer,
# operator,admin,owner}) per docs/EPICS-1-6-unified-design.md
# §6.2. Re-runs are idempotent no-ops.
#
# Source-of-truth is `.Values.keycloak.bootstrap.ensureTierRoles`
# (default "true" per qa-loop iter-1 cluster
# `controllers-and-kc-bootstrap-gates`). Per-Sovereign HelmRelease
# overlays may set it to "false" on the contabo mothership (whose
# `openova` realm has its own role taxonomy and should not gain
# catalyst-* tier roles). The goroutine waits up to ~30s for KC
# to be reachable, then retries up to 5 times with capped backoff
# before giving up; the next catalyst-api restart picks the
# bootstrap up again. Orthogonal to the chart-install-time
# keycloak-config-cli Job (which seeds the realm itself) — this
# env var only flips the runtime tier-role bootstrap.
# LITERAL value (not Helm template directive) — dual-mode contract:
# this file is consumed by both Helm AND Kustomize (contabo-mkt's
# clusters/contabo-mkt/apps/catalyst-platform). Helm directives in
# `value:` fields break the Kustomize build with `yaml: invalid map
# key` (caught live on contabo 2026-05-10 from #1311 regression at
# commit 92228bc — Flux Kustomization stuck for 2 days, blocking
# catalyst-api roll on contabo). Default OFF; per-Sovereign
# HelmRelease overlays MAY set this to "true" via the
# `catalystApi.env` additional-env patch (Helm-only codepath, takes
# precedence over this default at template-render time).
- name: KEYCLOAK_BOOTSTRAP_TIER_ROLES
value: "false"
# CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH — path to the JWK file that
# holds the RS256 public key for validating one-time handover JWTs.
# The K8s Secret `catalyst-handover-jwt-public` (created by
# cloud-init at provision time, see infra/hetzner/cloudinit-control-
# plane.tftpl) is mounted as a directory at /etc/catalyst/handover-
# jwt-public/, so the JWK lives at /etc/catalyst/handover-jwt-public/
# public.jwk. We deliberately mount the Secret as a directory rather
# than using subPath: the catalyst-api PVC at /var/lib/catalyst is
# ReadWriteOnce and a leftover empty directory at the legacy path
# /var/lib/catalyst/handover-jwt-public.jwk/ from earlier installs
# (where the Secret was missing and Kubernetes created an empty
# directory in the volume) collides with the subPath file mount on
# re-provisioning. Mounting under /etc/ keeps the JWK off the PVC
# entirely so the conflict cannot recur. Caught live on otech48,
# 2026-05-03.
- name: CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH
value: /etc/catalyst/handover-jwt-public/public.jwk
# SOVEREIGN_FQDN — Sovereign's public FQDN. The /auth/handover
# validator (auth_handover.go) reads this to compute the expected
# JWT audience claim ("https://console." + SOVEREIGN_FQDN). When
# unset on a Sovereign, the audience check collapses to "https://
# console." and every valid token is rejected with "invalid
# audience" 401 — caught live on otech48, 2026-05-03.
#
# NOTE: this file is consumed BOTH by Helm (per-Sovereign install
# via bp-catalyst-platform OCI chart) AND by Kustomize (contabo-
# mkt's clusters/contabo-mkt/apps/catalyst-platform Kustomization
# at path: ./products/catalyst/chart/templates). Kustomize parses
# raw YAML — Helm template syntax (double-curly directives) here
# breaks the Kustomize build (caught live on contabo 2026-05-03
# from commit adf8dc7d: "yaml: invalid map key").
#
# Solution: read the value from a ConfigMap that exists ONLY on
# Sovereigns (not contabo). On contabo the optional reference
# resolves to empty (correct — catalyst-api on contabo is the
# SIGNER never the validator, /auth/handover never hits there).
# On Sovereigns, clusters/_template/sovereign-tls/sovereign-fqdn-
# configmap.yaml renders the ConfigMap from envsubst-ed
# ${SOVEREIGN_FQDN} when Flux applies the kustomization.
- name: SOVEREIGN_FQDN
valueFrom:
configMapKeyRef:
name: sovereign-fqdn
key: fqdn
optional: true
# CATALYST_OTECH_FQDN — same value as SOVEREIGN_FQDN, but read by
# the SME tenant create handler (sme_tenant.go) and the
# sovereign-parent-domains seed (sovereign_parent_domains.go).
# The two envs exist for historical reasons: SOVEREIGN_FQDN is the
# Phase-8b handover-flow JWT-audience env; CATALYST_OTECH_FQDN is
# the SME-tier tenant-pipeline env (epic #795 / #804). Both
# ultimately point at the Sovereign's public FQDN. Wired from the
# same `sovereign-fqdn` ConfigMap (key `fqdn`). optional=true since
# Catalyst-Zero (contabo) doesn't run the SME tenant pipeline.
# Issue #876 — without this, POST /api/v1/sme/tenants returns
# 503 {"error":"otech-fqdn-unconfigured"} on every Sovereign.
- name: CATALYST_OTECH_FQDN
valueFrom:
configMapKeyRef:
name: sovereign-fqdn
key: fqdn
optional: true
# CATALYST_SELF_DEPLOYMENT_ID — the deployment-record id this
# Sovereign was provisioned under on the contabo orchestrator.
# Read by HandleSovereignSelf (sovereign_self.go) so the
# Sovereign-side catalyst-ui can resolve /console/<page> to the
# canonical /provision/<self-id>/<page> deployment-scoped UI.
# Sourced from the sovereign-fqdn ConfigMap (key
# selfDeploymentId), stamped by the orchestrator's per-
# Sovereign overlay writer at handover. Empty on contabo and
# on freshly-provisioned Sovereigns whose handover hasn't run
# yet — HandleSovereignSelf returns 503 in that window so
# the UI shows a "waiting for handover" pill.
- name: CATALYST_SELF_DEPLOYMENT_ID
valueFrom:
configMapKeyRef:
name: sovereign-fqdn
key: selfDeploymentId
optional: true
# SOVEREIGN_LB_IP — Sovereign's load-balancer public IPv4. Used by
# the Day-2 multi-domain add-domain flow (issue #900) to
# pre-register glue records at the customer's registrar before
# the set_ns flip. Without it Dynadot rejects with
# "'ns1.<sov>.omani.works' needs to be registered with an ip
# address before it can be used" — caught live during otech103
# multi-domain verification.
#
# Sourced from the chart's `global.sovereignLBIP` value (rendered
# into the same `sovereign-fqdn` ConfigMap that holds `fqdn`).
# optional=true: Catalyst-Zero (contabo) doesn't run the Sovereign-
# side multi-domain pipeline; the env stays empty and the glue
# path becomes a no-op (plain set_ns flows through unchanged).
- name: SOVEREIGN_LB_IP
valueFrom:
configMapKeyRef:
name: sovereign-fqdn
key: lbIP
optional: true
# CATALYST_CONFIGURED_REGIONS — comma-separated Hetzner regions
# the operator declared at provision time (qa-loop iter-16
# Fix #88, Path B). The fleet handler reads this when the
# in-memory deployment record's Regions slice is empty (e.g.
# on the chroot Sovereign post-handover where catalyst-api
# has no provisioner records of its own) so the
# /api/v1/fleet/sovereigns/{id}/summary `configuredRegions`
# field is non-empty even on a self-Sovereign API call.
#
# Source: sovereign-fqdn ConfigMap key `configuredRegions`,
# populated from .Values.sovereign.configuredRegions (or
# .Values.qaFixtures.configuredRegions when qaFixtures is
# enabled). optional=true so Catalyst-Zero (contabo) and
# legacy Sovereigns without the key start cleanly with the
# env empty — the UI then surfaces only the live region.
- name: CATALYST_CONFIGURED_REGIONS
valueFrom:
configMapKeyRef:
name: sovereign-fqdn
key: configuredRegions
optional: true
# CATALYST_QA_APPLICATIONS — comma-separated literal app
# names the chroot Sovereign's /compliance/scorecard
# surface emits via `appRefs[]` (qa-loop iter-16 Fix #167).
# Read by handler.appRefsFromEnv when the live aggregator
# has not yet ingested a PolicyReport for the workload, so
# the matrix's app-literal tokens (qa-wordpress, qa-wp)
# are present on every /scorecard call out-of-the-box.
# Mirrors CATALYST_CONFIGURED_REGIONS' qa-fixtures fallback.
# Source: sovereign-fqdn ConfigMap key `qaApplications`,
# populated from .Values.sovereign.qaApplications (or
# .Values.qaFixtures.applications when qaFixtures is
# enabled). optional=true so Catalyst-Zero (contabo) and
# legacy Sovereigns without the key fall back to the
# appRefsFromEnv default (`qa-wordpress,qa-wp`).
- name: CATALYST_QA_APPLICATIONS
valueFrom:
configMapKeyRef:
name: sovereign-fqdn
key: qaApplications
optional: true
# CATALYST_GITOPS_USER + CATALYST_GITOPS_TOKEN — basic-auth
# credentials embedded in the GitOps clone URL (issue #878).
# Pre-cutover (Catalyst-Zero): User=x-access-token, Token=GitHub
# PAT (already wired via separate CATALYST_GITOPS_TOKEN secret on
# contabo). Post-cutover (Sovereign): User=gitea_admin,
# Token=<gitea-admin-password> from the local Gitea admin secret.
# The same secret (`gitea-admin-secret`) is mirrored into
# catalyst-system via the bp-reflector annotation block on
# bp-gitea (issue #866), so this Sovereign-side wiring works
# post-Day-2-Independence without a manual mirror step.
# optional=true: Catalyst-Zero (contabo) does not run the SME
# tenant pipeline.
- name: CATALYST_GITOPS_USER
valueFrom:
secretKeyRef:
name: gitea-admin-secret
key: username
optional: true
- name: CATALYST_GITOPS_TOKEN
valueFrom:
secretKeyRef:
name: gitea-admin-secret
key: password
optional: true
# POOL_DOMAIN_MANAGER_URL — base URL of the central Pool Domain
# Manager (PDM) ingress on Catalyst-Zero (contabo). Sovereign-
# side catalyst-api calls PDM's /api/v1/registrar/{r}/set-ns
# endpoint for the Day-2 multi-domain "Add another parent
# domain" flow (issue #879, parent epic #825 / #829).
#
# Why a public ingress URL (not an in-cluster Service):
# the in-cluster default `pool-domain-manager.openova-system.
# svc.cluster.local` ONLY resolves on the contabo cluster
# (PDM lives in `openova-system` ns there). On a franchised
# Sovereign post-handover, that DNS name is NXDOMAIN, so
# every Day-2 add-domain call returned `dial tcp: lookup
# pool-domain-manager.openova-system.svc.cluster.local on
# 10.43.0.10:53: no such host` (caught live on otech103,
# 2026-05-05 — issue #879 verification).
#
# The default below points at the public PDM ingress on
# contabo (`pool.openova.io`). Per Inviolable Principle #4
# (never hardcode), per-Sovereign overlays may override via
# `catalystApi.poolDomainManagerURL` in values. Catalyst-Zero
# (contabo) leaves this default — its catalyst-api Pod hits
# the SAME public URL via its own loopback ingress (the proxy
# is idempotent on the source cluster).
#
# Pairs with CATALYST_PDM_BASIC_AUTH_USER / _PASS below: the
# PDM ingress at pool.openova.io is gated by Traefik basicAuth
# (clusters/contabo-mkt/apps/pool-domain-manager/ingress.yaml).
# Both halves wired together so a fresh Sovereign reaches PDM
# without a manual env-var patch.
#
# NOTE — DUAL-MODE CONTRACT: this file is consumed BOTH by
# Helm (per-Sovereign install via bp-catalyst-platform OCI)
# AND by Kustomize (contabo-mkt's clusters/contabo-mkt/apps/
# catalyst-platform). The default literal below (no Helm
# template directives) keeps both build paths clean. Per-
# Sovereign overlays override via the HelmRelease overlay's
# `catalystApi.env` additional-env patch (Helm-only, takes
# precedence over THIS default at template-render time).
- name: POOL_DOMAIN_MANAGER_URL
value: "https://pool.openova.io"
# CATALYST_PDM_BASIC_AUTH_USER / _PASS — basic-auth credentials
# for the PDM public ingress (issue #879 Bug 2). The Sovereign-
# side catalyst-api adds `Authorization: Basic …` to every PDM
# call so the Traefik basicAuth Middleware in front of
# pool.openova.io accepts the request. Without this, every
# Day-2 add-domain call returns 401 from PDM (caught live on
# otech103).
#
# Source Secret (`pdm-basicauth`, keys `username` + `password`)
# is pre-provisioned by cloud-init on every Sovereign at
# provision time, mirrored via the same Reflector seam ghcr-
# pull / harbor-robot-token already use. optional=true so:
# - Catalyst-Zero pods (contabo's catalyst-api) start cleanly
# when the Secret is absent. On contabo the in-cluster
# Service path bypasses the ingress entirely and BasicAuth
# is a no-op.
# - CI / local dev / older Sovereigns that pre-date this
# provisioning seam start cleanly. POSTs without auth get
# 401 from PDM with a clear log line, instead of the Pod
# crashlooping on start.
#
# Per Inviolable Principle #10: the credentials never enter a
# logged struct or a deployment record — loaded into the Pod
# env once at start, read per-call by pdmFlipNS only.
- name: CATALYST_PDM_BASIC_AUTH_USER
valueFrom:
secretKeyRef:
name: pdm-basicauth
key: username
optional: true
- name: CATALYST_PDM_BASIC_AUTH_PASS
valueFrom:
secretKeyRef:
name: pdm-basicauth
key: password
optional: true
# CATALYST_HANDOVER_KEY_PATH — path to the RS256 PRIVATE key
# catalyst-api uses to mint magic-link + handover JWTs. The
# signer auto-generates the keypair on first start if absent.
# MUST be on a writable PVC mount. Catalyst-Zero only.
- name: CATALYST_HANDOVER_KEY_PATH
value: /var/lib/catalyst/handover-jwt-private.pem
# ── Magic-link auth (issue #608, Phase-8b Agent A) ──────────────
# CATALYST_KC_CLIENT_ID — OIDC client ID for the Catalyst-Zero
# UI (catalyst-zero-ui PKCE client). Defaults to "catalyst-zero-ui"
# in code; override here for multi-tenant or custom client names.
# optional=true: Sovereign clusters don't use this auth path.
- name: CATALYST_KC_CLIENT_ID
valueFrom:
secretKeyRef:
name: catalyst-magic-link-credentials
key: kc-client-id
optional: true
# CATALYST_KC_REDIRECT_URI — OAuth callback URL the Keycloak magic-
# link redirects to after verification (e.g.
# https://console.openova.io/sovereign/auth/callback).
# Per INVIOLABLE-PRINCIPLES #4: runtime configuration, not hardcoded.
- name: CATALYST_KC_REDIRECT_URI
valueFrom:
secretKeyRef:
name: catalyst-magic-link-credentials
key: kc-redirect-uri
optional: true
# CATALYST_SESSION_COOKIE_SECRET — HMAC-SHA256 key for signing the
# catalyst_session HttpOnly cookie value. 32 random bytes (base64url
# encoded). Rotation invalidates all active sessions.
- name: CATALYST_SESSION_COOKIE_SECRET
valueFrom:
secretKeyRef:
name: catalyst-magic-link-credentials
key: session-cookie-secret
optional: true
# CATALYST_POST_AUTH_REDIRECT — URL the browser is sent to after a
# successful magic-link / PIN callback. Defaults to /wizard in code.
# Catalyst-Zero (contabo) routes the UI under the /sovereign prefix
# (Traefik strip-prefix is transparent to the server-side Location
# header), so contabo overrides this to /sovereign/wizard via the
# per-environment overlay. On a freshly franchised Sovereign the
# wizard is mothership-only — empty page on /sovereign/wizard.
# The post-handover Sovereign Console homepage is /sovereign/components,
# so that's the default we now ship (issue #901, 2026-05-05).
#
# DUAL-MODE CONTRACT — see CATALYST_POWERDNS_API_URL block above:
# this file is consumed by both Helm (Sovereign) and Kustomize
# (contabo-mkt). Helm template directives (curly-brace syntax) in
# `value:` break the Kustomize render with "yaml: invalid map key".
# So this default is a literal. Per-Sovereign overrides go through
# the HelmRelease overlay's `catalystApi.env` additional-env patch,
# NOT through this file.
#
# Per INVIOLABLE-PRINCIPLES #4: the override seam exists (overlay
# env patch); only the chart-shipped default is a literal.
- name: CATALYST_POST_AUTH_REDIRECT
value: "/sovereign/components"
# ── Option-B magic-link: openova realm service account ───────────
# CATALYST_OPENOVA_KC_ADDR — Keycloak base URL for the openova realm.
# Defaults in code to keycloak-zero.keycloak-zero.svc (in-cluster
# on Catalyst-Zero). optional=true: Sovereign clusters don't run
# the openova realm.
- name: CATALYST_OPENOVA_KC_ADDR
valueFrom:
secretKeyRef:
name: catalyst-openova-kc-credentials
key: kc-addr
optional: true
- name: CATALYST_OPENOVA_KC_REALM
valueFrom:
secretKeyRef:
name: catalyst-openova-kc-credentials
key: kc-realm
optional: true
- name: CATALYST_OPENOVA_KC_SA_CLIENT_ID
valueFrom:
secretKeyRef:
name: catalyst-openova-kc-credentials
key: kc-sa-client-id
optional: true
- name: CATALYST_OPENOVA_KC_SA_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: catalyst-openova-kc-credentials
key: kc-sa-client-secret
optional: true
# CATALYST_OPENOVA_KC_AUDIENCE — OIDC audience for KC token-exchange.
# Defaults to "catalyst-zero-ui" in code. optional=true.
- name: CATALYST_OPENOVA_KC_AUDIENCE
valueFrom:
secretKeyRef:
name: catalyst-openova-kc-credentials
key: kc-audience
optional: true
# CATALYST_SMTP_HOST / CATALYST_SMTP_PORT — Stalwart SMTP relay for
# magic-link email delivery. Defaults in code to
# stalwart-web.stalwart.svc.cluster.local:587. optional=true.
- name: CATALYST_SMTP_HOST
valueFrom:
secretKeyRef:
name: catalyst-openova-kc-credentials
key: smtp-host
optional: true
- name: CATALYST_SMTP_PORT
valueFrom:
secretKeyRef:
name: catalyst-openova-kc-credentials
key: smtp-port
optional: true
- name: CATALYST_SMTP_USER
valueFrom:
secretKeyRef:
name: catalyst-openova-kc-credentials
key: smtp-user
optional: true
- name: CATALYST_SMTP_PASS
valueFrom:
secretKeyRef:
name: catalyst-openova-kc-credentials
key: smtp-pass
optional: true
- name: CATALYST_SMTP_FROM
valueFrom:
secretKeyRef:
name: catalyst-openova-kc-credentials
key: smtp-from
optional: true
# CATALYST_SESSION_COOKIE_DOMAIN — optional domain scoping for the
# catalyst_session + catalyst_refresh cookies.
#
# Why this is empty by default (issue #910 Bug 2)
# ===============================================
# Pre-1.4.19 this was hardcoded `console.openova.io` because that
# was the host Catalyst-Zero (contabo) serves both /sovereign/wizard
# and /sovereign/auth/magic from. On contabo that worked: the
# request host == the cookie domain, so the browser accepted the
# Set-Cookie and re-presented it on every subsequent request.
#
# On a freshly franchised Sovereign (e.g. console.otech105.omani.
# works, caught live 2026-05-05) the same hardcoded value made the
# browser refuse to bind the cookie at all: the Set-Cookie header
# had `Domain=console.openova.io` while the request host was
# `console.otech105.omani.works`. RFC 6265 §5.3 step 6 rejects any
# Set-Cookie where the request URI's host is not the cookie's
# domain (or a sub-domain). The browser silently dropped the
# cookie → next /api/* request had no session → backend redirected
# to /login → infinite loop. Login broke for every Sovereign.
#
# Empty value contract: when CATALYST_SESSION_COOKIE_DOMAIN is
# empty, the auth handler omits the Domain attribute from
# Set-Cookie. Per RFC 6265 the browser binds the cookie to the
# exact request host. That is the correct behaviour on BOTH:
# - Sovereign: request host = console.<sov-fqdn>, cookie binds
# there, /api/* on the same host re-presents it.
# - Catalyst-Zero (contabo): request host = console.openova.io,
# cookie binds there. Wizard + magic-link callbacks are
# served from the same Ingress so a single cookie jar is
# sufficient.
#
# Per the dual-mode contract documented in the
# CATALYST_POWERDNS_API_URL block above, this MUST stay a literal
# value (no Helm template directives) so the Kustomize-mode
# contabo build keeps parsing. Per-Sovereign overlays MAY
# override via the `catalystApi.env` additional-env patch in the
# per-cluster HelmRelease (Helm-only codepath, takes precedence
# over this default at template-render time).
- name: CATALYST_SESSION_COOKIE_DOMAIN
value: ""
# CATALYST_CATALOG_URL — in-cluster Service URL of
# catalyst-catalog (EPIC-2 Slice L, #1148). Consumed by
# internal/handler/catalog_client.go::NewCatalogClientFromEnv
# for live-install and preview flows. The default literal
# below matches the Service rendered by templates/services/
# catalog/service.yaml: name `catalyst-catalog`, namespace
# `.Release.Namespace` (catalyst-system on Sovereigns), port
# 8080. The literal keeps the dual-mode contract (Helm +
# Kustomize on contabo) working without Helm directives in
# `value:` fields. Per Inviolable Principle #4 per-Sovereign
# overlays MAY override via the `catalystApi.env` additional-
# env patch when the catalog ships in a different namespace.
#
# qa-loop iter-1 root cause: services.catalog.enabled was
# default false → the Service didn't exist → catalyst-api's
# in-code default URL pointed at a non-existent endpoint →
# every /api/v1/catalog* call returned 404 from the upstream
# round-trip wrapper. Flipping enabled=true (above) renders
# the Service; this env var documents the wiring contract
# in the manifest itself rather than only in Go source.
- name: CATALYST_CATALOG_URL
value: "http://catalyst-catalog.catalyst-system.svc.cluster.local:8080"
# CATALYST_TEST_SESSION_ENABLED — gates POST /api/v1/auth/test-session
# (qa-loop iter-11 Cluster-A). Default empty/false → endpoint
# returns 404 to the public, indistinguishable from "this
# route doesn't exist". On QA/chroot Sovereigns the value
# "true" enables the endpoint so the 5-agent QA executor
# can mint per-tier session cookies and assert tier-boundary
# 403/200 contracts. Per qaFixtures.testSessionEnabled in
# values.yaml. NEVER enable on customer Sovereigns.
# LITERAL value (not Helm template directive) — dual-mode contract,
# see KEYCLOAK_BOOTSTRAP_TIER_ROLES block above. Default OFF (404 on
# public Sovereigns). qa-loop chroot Sovereigns enable it via the
# `catalystApi.env` additional-env patch in their per-cluster
# HelmRelease (Helm-only codepath, takes precedence over this
# default). NEVER enable on customer Sovereigns.
- name: CATALYST_TEST_SESSION_ENABLED
value: "false"
resources:
requests:
cpu: 100m
memory: 512Mi
limits:
# tofu provider plugins (hcloud ~80MB, dynadot ~30MB) + state +
# plan files easily exceed the prior 64Mi cap. 1Gi was the
# post-tofu floor.
# Multi-region (prov #61, 2026-05-13): the API now holds
# in-memory helmwatch.Watcher state for N regions plus
# composes per-snapshot region groups across N kubeconfigs.
# OOMKilled at 1Gi on a 3-region prov when concurrent
# /snapshot polls hit during phase 1. Bumped to 4Gi for
# headroom — same machine has plenty (Hetzner CPX42 mothership
# node is 16GB-resident).
cpu: 2000m
memory: 4Gi
# Liveness vs readiness — the split is REQUIRED, not cosmetic
# (issue #530). /healthz is liveness: it returns 200 whenever
# the catalyst-api process is up and the HTTP server is
# serving. /readyz is readiness: it returns 200 only when the
# primary Sovereign's Pod + Deployment informers are synced
# (or no Sovereigns are registered yet).
#
# The previous wiring pointed BOTH probes at /healthz AND
# /healthz performed the strict informer-sync check. The
# crashloop chain that followed:
#
# 1. Operator POSTs a fresh deployment.
# 2. catalyst-api registers the Sovereign in k8scache and
# starts looking for a kubeconfig file on the PVC.
# 3. Kubeconfig will NOT arrive until the new Sovereign's
# cloud-init runs (~60-120s) and PUTs it back. Until
# then, informers cannot start, sync flips false.
# 4. /healthz returns 503. kubelet kills the Pod on the
# next liveness probe (~33s).
# 5. Restarted Pod restores deployments from the PVC,
# re-registers the Sovereign, re-enters the same
# no-kubeconfig state. Loop repeats.
# 6. Service has zero ready endpoints throughout. nginx
# returns 502 to cloud-init's kubeconfig PUT. The PUT
# never reaches catalyst-api. Provision stalls forever.
#
# The fix: liveness must be process-level (am I up?), NOT
# workload-level (do I have a kubeconfig?). The strict
# informer-sync check stays — moved to /readyz — so a Pod
# whose primary Sovereign is mid-sync briefly drops out of
# the Service rotation but is NOT restarted. The kubeconfig
# PUT endpoint reaches catalyst-api the moment cloud-init
# calls it, breaking the deadlock.
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 3
periodSeconds: 10
readinessProbe:
httpGet:
path: /readyz
port: 8080
initialDelaySeconds: 2
periodSeconds: 5
securityContext:
allowPrivilegeEscalation: false
# readOnlyRootFilesystem deliberately false: the bootstrap installer
# writes kubeconfig temp files (mode 0600) under /tmp and helm
# downloads chart caches under $HOME. Per Catalyst security policy
# these writes are scoped via emptyDir below, never to the image's
# actual root FS.
readOnlyRootFilesystem: false
runAsNonRoot: true
runAsUser: 65534
volumeMounts:
- name: tmp
mountPath: /tmp
- name: home
mountPath: /home/nonroot
# Catalyst PVC — mounted at /var/lib/catalyst so two
# subdirectories live on the same single-attach volume:
#
# deployments/<id>.json — flat-file deployment store.
# Every catalyst-api restart that rehydrates from
# this directory closes the user-reported regression
# where a deployment id created at 12:57 vanished
# after 6 image rolls. The store walks every *.json
# on startup; in-flight rows are rewritten to
# `failed` with operator instructions for purging
# orphaned Hetzner resources.
#
# kubeconfigs/<id>.yaml — plaintext kubeconfig POSTed
# back from cloud-init via the bearer-token endpoint
# (issue #183, Option D). Mode 0600 per file. The
# path is persisted in the deployment record so a
# Pod restart mid-Phase-1 reattaches the helmwatch
# goroutine.
#
# One PVC, one mount — keeps the failure modes (PVC
# unbind, fs full) bounded to one volume, and lets the
# Go process create both subdirectories on startup
# without a second volume claim or init container.
- name: catalyst
mountPath: /var/lib/catalyst
# k8scache disk-snapshot mount (issue #321). Separate PVC
# so cache size is independent of deployment-record
# storage. The k8scache loop writes one JSON per
# (cluster, kind) here, mode 0600. Pruned by the loop
# itself when a snapshot ages past 1h.
- name: sov-cache
mountPath: /var/cache/sov-cache
# handover-jwt-public — RS256 public key JWK distributed by
# cloud-init from Catalyst-Zero's signing keypair. Mounted
# read-only as a directory under /etc/catalyst/ (NOT under
# /var/lib/catalyst because that is the catalyst-api PVC; a
# leftover empty directory at the legacy file path from
# pre-#606 installs would collide with a subPath file mount on
# re-provision). The JWK lives at /etc/catalyst/handover-jwt-
# public/public.jwk — see CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH
# above. optional=true on the Secret so pods start on
# Catalyst-Zero (which is the SIGNER, not the verifier) and
# in CI where the Secret may be absent.
- name: handover-jwt-public
mountPath: /etc/catalyst/handover-jwt-public
readOnly: true
volumes:
- name: tmp
emptyDir:
# 2Gi to hold the per-deployment OpenTofu workdir tree under
# /tmp/catalyst/tofu/<sovereign-fqdn>/ (provider plugins + state
# + plan binary). Each Sovereign run gets its own subdirectory.
sizeLimit: 2Gi
- name: home
emptyDir:
sizeLimit: 256Mi
# Persistent catalyst-api state — mounted at /var/lib/catalyst
# so deployments/ and kubeconfigs/ share one volume. The PVC
# must already exist in the same namespace under the name
# catalyst-api-deployments; see api-deployments-pvc.yaml in
# this chart. Single-attach (RWO) is fine because the
# Deployment is single-replica with the Recreate strategy
# declared above; a future HA rework would need RWX or a
# different persistence layer.
- name: catalyst
persistentVolumeClaim:
claimName: catalyst-api-deployments
# k8scache disk-snapshot PVC (issue #321). 5Gi RWO; see
# api-cache-pvc.yaml for the sizing + cold-start contract.
- name: sov-cache
persistentVolumeClaim:
claimName: catalyst-api-cache
# handover-jwt-public — RS256 public key JWK written by cloud-init
# from Catalyst-Zero's signing keypair. Secret is optional so
# Catalyst-Zero pods (the signer) and CI start without it.
- name: handover-jwt-public
secret:
secretName: catalyst-handover-jwt-public
optional: true
items:
- key: public.jwk
path: public.jwk