478743db17
2 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
0dbdf3b327
|
fix(bp-trivy): node-collector tolerates control-plane taint (closes #769) (#772)
PR #755 added `node-role.kubernetes.io/control-plane=true:NoSchedule` to the CP node when worker_count > 0. Two bootstrap-kit charts have pods that MUST land on the CP and lacked the matching toleration: bp-trivy • node-collector: Pod pinned to each node via nodeSelector `kubernetes.io/hostname=<node>`. The CP-bound collector reads /var/lib/etcd, /var/lib/kubelet, /var/lib/kube-scheduler, /var/lib/kube-controller-manager via hostPath — these only exist on the CP. Without the toleration the collector sat Pending forever on otech93 (live evidence in #769). • scanJobTolerations: per-workload scan jobs the operator spawns may target pods on CP-only system DaemonSets (kube-system kube-proxy in non-Cilium mode, etc.). Adding the toleration here so reports are produced for those workloads too. bp-alloy • DaemonSet — one pod MUST land on every node including the CP, so CP-local kubelet logs + node metrics flow into the LGTM stack. Without the toleration Alloy ran 3/4 nodes (Ready=N-1) on otech93 and CP telemetry was silently lost. Both tolerations are no-ops on solo Sovereigns (worker_count=0): the CP is untainted in solo mode per PR #755's conditional. Versions bumped: • bp-trivy 1.0.2 → 1.0.3 (Chart.yaml + 3× HelmRelease pins) • bp-alloy 1.0.0 → 1.0.1 (Chart.yaml + 3× HelmRelease pins) Out of scope (audited, no change needed): • bp-cilium — upstream defaults already tolerate everything (verified on otech93: cilium DaemonSet at 4/4 nodes). • bp-falco — values.yaml already declares NoSchedule + NoExecute Exists tolerations (4/4 on otech93). • cnpg/harbor — no kubelet-cert-renew Jobs in current charts. Verified: • `helm template` on both charts renders the expected toleration (alloy: pod-spec; trivy: trivy-operator-config ConfigMap consumed by the operator at scan-job spawn time). • `bash scripts/check-bootstrap-deps.sh` PASSED (no DAG drift). Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
75128781b3
|
feat(platform): observability stack umbrellas (grafana/loki/mimir/tempo/alloy/otel/langfuse/velero) (#214)
* feat(bp-grafana): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana — visualization layer of the LGTM observability stack (Loki/Grafana/Tempo/Mimir). Pinned to grafana/grafana 10.5.15 (appVersion 12.3.1) — current stable on 2026-04-29. Solo-Sovereign defaults: 1 replica, 10Gi PVC, ServiceMonitor disabled per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-loki): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana Loki — log aggregation backend of the LGTM stack. SingleBinary mode by default (solo-Sovereign min); SimpleScalable/Distributed are values toggles. Pinned to grafana/loki 7.0.0 (appVersion 3.6.7) on 2026-04-29. Filesystem storage default; SeaweedFS S3 wiring is per-Sovereign overlay when scaling out. All observability toggles default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-mimir): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana Mimir — metrics storage tier of the LGTM stack. Pinned to grafana/mimir-distributed 6.0.6 (appVersion 3.0.4) on 2026-04-29. Solo-Sovereign defaults: every component scaled to 1 replica, zoneAwareReplication disabled, Kafka ingest-storage disabled. Bundled MinIO kept enabled as a stop-gap so the chart renders; SeaweedFS S3 wiring is per-Sovereign overlay. All metaMonitoring toggles default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-tempo): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana Tempo — distributed tracing backend of the LGTM stack. Single-binary mode by default (solo-Sovereign min); microservice mode (tempo-distributed) is a chart swap toggle. Pinned to grafana/tempo 1.24.4 (appVersion 2.9.0) on 2026-04-29. Local PVC storage default; SeaweedFS S3 wiring is per-Sovereign overlay. Metrics generator disabled by default (depends on bp-mimir). ServiceMonitor default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-alloy): umbrella chart for observability stack Catalyst Blueprint umbrella for Grafana Alloy — unified telemetry collector for the LGTM stack (logs, metrics, traces; OTLP-native). Pinned to grafana/alloy 1.8.0 (appVersion v1.16.0) on 2026-04-29. DaemonSet controller default (one Alloy per node) so node + container telemetry work out of the box. Empty Alloy config by default; per-Sovereign overlays populate forwarders to bp-loki/bp-mimir/bp-tempo once those reconcile. ServiceMonitor + ingress + CRDs default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-opentelemetry): umbrella chart for observability stack Catalyst Blueprint umbrella for the OpenTelemetry Collector — vendor- neutral telemetry collector. Sibling to bp-alloy; per-Sovereign overlays choose one. Pinned to open-telemetry/opentelemetry-collector 0.152.0 (appVersion 0.150.1) on 2026-04-29. Uses the contrib distribution (otel/opentelemetry-collector-contrib:0.150.1) so Loki/Mimir/Tempo exporters are bundled. Deployment mode default (1 replica); DaemonSet + StatefulSet are values toggles. All presets default false; ingress + ServiceMonitor + PodMonitor + PrometheusRule + NetworkPolicy default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. * feat(bp-langfuse): umbrella chart for observability stack Catalyst Blueprint umbrella for Langfuse — LLM observability platform. Complements bp-grafana (infrastructure metrics) with AI-specific telemetry (traces, evaluations, prompts, cost attribution). Pinned to langfuse/langfuse 1.5.28 (appVersion 3.171.0) on 2026-04-29. Catalyst convention: ALL bundled Bitnami subcharts are disabled — PostgreSQL via cnpg.io/Cluster (bp-cnpg), Redis via bp-valkey, ClickHouse via bp-clickhouse, S3 via bp-seaweedfs. Per-Sovereign overlays wire external endpoints + Secret references. Telemetry to Langfuse Inc. defaulted false; signUpDisabled defaulted true. Part of issue #204 observability-stack umbrellas batch. * feat(bp-velero): umbrella chart for observability stack Catalyst Blueprint umbrella for Velero — Kubernetes-native backup and disaster recovery. Per platform/velero/README.md, ALL Velero output goes to SeaweedFS (Catalyst's unified S3 encapsulation), which transitions to a cloud archival backend on the cold tier. Pinned to vmware-tanzu/velero 12.0.1 (appVersion 1.18.0) on 2026-04-29. Bundled velero-plugin-for-aws:v1.14.0 init container so SeaweedFS S3 is reachable. backupsEnabled/snapshotsEnabled defaulted false at this layer (placeholders for backupStorageLocation); per-Sovereign overlays flip on after wiring SeaweedFS endpoint + credentials. ServiceMonitor + PodMonitor + PrometheusRule default false per BLUEPRINT-AUTHORING.md §11.2. Part of issue #204 observability-stack umbrellas batch. --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |