Helm install completes when manifests apply, not when pods reach Ready. Flux dependsOn checks Ready=True on each HR independently, so spec.install.disableWait + spec.upgrade.disableWait is the correct shape for slow-Ready workloads. Blanket spec.timeout: Nm watchdogs from PR #221 were a band-aid that caused cascading HR failures and blocked downstream HRs (bp-nats-jetstream, bp-openbao depended on bp-spire). Founder direction (verbatim): "always event driven robust jobs" Per-HR audit (drop spec.timeout: 15m, add disableWait, with reason): - bp-cilium: envoyconfig CRD self-wait — agent crash-loops until its own CRDs land - bp-cert-manager: webhook readiness depends on cainjector mutating Secret — multi-minute on cold start - bp-flux: adopts cloud-init Flux objects; the helm-controller reconciling THIS HR is itself a chart target — Ready deadlock without disableWait - bp-sealed-secrets: single-replica controller + CRD — install completes on manifest apply - bp-spire: spire-controller-manager waits for CRD informer cache sync — multi-minute legitimate path; chart fix below - bp-nats-jetstream: JetStream raft quorum formation across N replicas - bp-openbao: 3-node Raft sealed-by-default; Ready=True only after operator runs `bao operator init` unseal flow - bp-keycloak: DB schema migration + 100+ Liquibase changesets on first install - bp-gitea: PostgreSQL DB init + admin user + Blueprint catalog mirror seeding - bp-external-dns: pod readiness depends on PowerDNS API + pdns-pg CNPG cascade - bp-catalyst-platform: ~10 services, inter-service NATS/OTel readiness is not Helm's concern Intentionally NOT touched (other parallel agents own these): - bp-crossplane (Agent A): chart split for intra-chart CRD-ordering - bp-powerdns (Agent D): post-install hook for intra-chart Job-ordering bp-spire chart fix (1.1.3 -> 1.1.4): Root cause investigation on otech.omani.works (live): spire-controller-manager has restarted 37 times with: "failed to wait for clusterstaticentry caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.ClusterStaticEntry" `kubectl get crd | grep spire` returns nothing — the spire.spiffe.io v1alpha1 CRDs (ClusterSPIFFEID / ClusterStaticEntry / ClusterFederatedTrustDomain) are NOT registered. The upstream `spire` chart does not install its own CRDs; the spiffe maintainers ship them via the SEPARATE `spire-crds` chart, expected to be installed first. Fix: platform/spire/chart/Chart.yaml now declares spire-crds 0.5.0 as the FIRST dependency. Helm installs subcharts in dependency order, so listing spire-crds first guarantees CRDs are applied before the spire subchart's controller-manager Deployment starts. blueprint.yaml + both 06-spire.yaml cluster references bumped to 1.1.4. Live error this fixes (otech.omani.works, persistent ~5h): Helm upgrade failed for release spire-system/spire with chart bp-spire@1.1.3: context deadline exceeded + downstream cascade: bp-nats-jetstream / bp-openbao stuck at "dependency 'flux-system/bp-spire' is not ready" Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
58 lines
1.5 KiB
YAML
58 lines
1.5 KiB
YAML
# bp-nats-jetstream — Catalyst bootstrap-kit Blueprint. Catalyst's control-plane event spine. Per-Org Account isolation. KV bucket per Environment.
|
|
#
|
|
# Wrapper chart: platform/nats-jetstream/chart/
|
|
# Catalyst-curated values: platform/nats-jetstream/chart/values.yaml
|
|
# Reconciled by: Flux on the new Sovereign's k3s control plane.
|
|
|
|
---
|
|
apiVersion: v1
|
|
kind: Namespace
|
|
metadata:
|
|
name: nats-system
|
|
labels:
|
|
catalyst.openova.io/sovereign: otech.omani.works
|
|
---
|
|
apiVersion: source.toolkit.fluxcd.io/v1beta2
|
|
kind: HelmRepository
|
|
metadata:
|
|
name: bp-nats-jetstream
|
|
namespace: flux-system
|
|
spec:
|
|
type: oci
|
|
interval: 15m
|
|
url: oci://ghcr.io/openova-io
|
|
secretRef:
|
|
name: ghcr-pull
|
|
---
|
|
apiVersion: helm.toolkit.fluxcd.io/v2
|
|
kind: HelmRelease
|
|
metadata:
|
|
name: bp-nats-jetstream
|
|
namespace: flux-system
|
|
spec:
|
|
interval: 15m
|
|
releaseName: nats-jetstream
|
|
targetNamespace: nats-system
|
|
dependsOn:
|
|
- name: bp-spire
|
|
chart:
|
|
spec:
|
|
chart: bp-nats-jetstream
|
|
version: 1.1.1
|
|
sourceRef:
|
|
kind: HelmRepository
|
|
name: bp-nats-jetstream
|
|
namespace: flux-system
|
|
# Event-driven install: NATS StatefulSet with JetStream raft initialisation
|
|
# — quorum formation across N replicas is legitimately multi-minute on
|
|
# cold start. Helm install completes when manifests apply; downstream
|
|
# dependsOn checks Ready=True independently. Replaces PR #221 timeout.
|
|
install:
|
|
disableWait: true
|
|
remediation:
|
|
retries: 3
|
|
upgrade:
|
|
disableWait: true
|
|
remediation:
|
|
retries: 3
|