Commit Graph

3 Commits

Author SHA1 Message Date
e3mrah
8988cd9e4f
feat(infra-hetzner): wire all var.regions[] entries end-to-end (slice G1, #1095) (#1131)
Slice G1 of EPIC-0 (#1095, Group G "Multi-cluster substrate"). Today
infra/hetzner/main.tf only realises regions[0] end-to-end — every wizard
payload's regions[1..N] entries silently no-op. EPIC-6 (#1101) Continuum
DR demo needs 3 regions (mgmt + fsn + hel per docs/EPICS-1-6-unified-design.md
§3.8 + §11), so this slice closes the gap.

Architecture: hybrid singular-path + secondary-region overlay.
- The legacy singular path (var.region + count = local.control_plane_count)
  STAYS untouched — every existing Sovereign state (omantel, otech*) keeps
  its resource addresses (hcloud_server.control_plane[0],
  hcloud_load_balancer.main, etc) and produces a no-op plan diff.
- New regions (regions[1+]) are realised via a parallel for_each set keyed
  by "{cloudRegion}-{index}" (e.g. fsn1-1, hel1-2). Each secondary region
  gets its own /24 subnet inside the shared /16 hcloud_network, its own
  CP server, its own workers, and its own lb11 load balancer. The shared
  hcloud_firewall + hcloud_ssh_key (one tenant boundary per Sovereign).

Why hybrid not full for_each: a wholesale refactor would change every
existing resource address (hcloud_server.control_plane[0] →
hcloud_server.control_plane["mgmt"]), forcing every running Sovereign
to run `tofu state mv` for ~12 resources or face destructive recreates.
The brief explicitly bans that. Hybrid is purely additive — secondary
resources are NEW addresses no existing state carries.

No `tofu state mv` runbook required. Existing Sovereigns provisioned
with var.regions = [] or len(var.regions) == 1 produce identical plans
before and after this PR.

Slice G3 (out of scope here) wires Cilium ClusterMesh between secondary
regions and adds per-cluster GitOps path differentiation; today every
secondary CP renders an identical Flux Kustomization pointed at
clusters/<sovereign_fqdn>/.

Tests: tests/multi_region.tftest.hcl exercises 5 scenarios offline via
mock_provider + override_resource (no real Hetzner):
  - legacy_no_regions_payload (var.regions=[])
  - single_region_entry_does_not_double_provision (len==1)
  - three_region_mgmt_fsn_hel (EPIC-6 shape)
  - same_region_duplicates_produce_distinct_keys
  - non_hetzner_regions_are_filtered_out (oci entries skipped)
All 5 pass. CI workflow infra-hetzner-tofu.yaml runs validate + fmt -check
+ test on every PR touching infra/hetzner/**.

Per CLAUDE.md "every workflow MUST be event-driven, NEVER scheduled":
push-on-merge + pull-request-on-touch + workflow_dispatch only. No cron.

Validation:
  $ tofu validate
  Success! The configuration is valid.
  $ tofu fmt -check -recursive
  exit=0
  $ tofu test
  tests/multi_region.tftest.hcl... pass
    run "legacy_no_regions_payload"... pass
    run "single_region_entry_does_not_double_provision"... pass
    run "three_region_mgmt_fsn_hel"... pass
    run "same_region_duplicates_produce_distinct_keys"... pass
    run "non_hetzner_regions_are_filtered_out"... pass
  Success! 5 passed, 0 failed.

Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 00:29:44 +04:00
e3mrah
1865ac8975
fix(bp-seaweedfs): vendor upstream chart, drop fromToml-using template (#340) (#504)
* fix(bp-seaweedfs): vendor upstream chart, drop fromToml-using template (#340)

The upstream seaweedfs/seaweedfs 4.22.0 chart now ships
templates/shared/security-configmap.yaml which calls fromToml — a Sprig
function added in Helm 3.13. Flux v1.x helm-controller bundles a Helm
SDK older than 3.13 and PARSES every template before any
{{- if .Values.global.seaweedfs.enableSecurity }} gate fires, so the file's
mere presence breaks install on every Sovereign with:

  parse error at (bp-seaweedfs/charts/seaweedfs/templates/shared/security-configmap.yaml:21):
    function "fromToml" not defined

even though enableSecurity defaults to false. Setting the gate value
does NOT skip parsing — only deleting / never-shipping the file does.

Fix shape (per ticket #340):

1. Vendor upstream seaweedfs/seaweedfs 4.22.0 into chart/charts/seaweedfs/
   (committed bytes, not auto-pulled at build time). Required because the
   upstream Helm repo overwrites 4.22.0 in place — re-pulling would
   re-introduce the broken file.
2. Delete charts/seaweedfs/templates/shared/security-configmap.yaml.
   Every other template that references the deleted ConfigMap is gated
   under {{- if enableSecurity }} so removing it is a no-op for our
   default deployment shape (Catalyst SeaweedFS auth happens at the S3
   layer via IAM creds from External Secrets, not via the upstream
   chart's TLS/JWT machinery).
3. Drop the dependencies: block from chart/Chart.yaml; add
   annotations.catalyst.openova.io/no-upstream=true so the
   blueprint-release workflow's hollow-chart guard (issue #181) skips
   the auto-pull/round-trip checks for this chart.
4. Whitelist platform/seaweedfs/chart/charts/ in .gitignore so the
   vendored bytes are tracked.
5. Bump bp-seaweedfs 1.0.1 → 1.1.0 (signal: vendored, not auto-pulled).
6. Add tests/no-fromtoml.sh — chart-test that asserts the offending
   file stays deleted across future re-vendors. Runs in
   .github/workflows/blueprint-release.yaml as a publish-gating check.

Unblocks Phase-8a observability + storage chain on otech (bp-loki,
bp-mimir, bp-tempo, bp-velero, bp-harbor, bp-grafana all dependsOn
bp-seaweedfs).

Closes #340

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(scripts): align expected-bootstrap-deps.yaml with bp-harbor's actual deps

The bp-harbor HR at clusters/_template/bootstrap-kit/19-harbor.yaml lines
35-37 already removed `bp-seaweedfs` from its dependsOn (cloud-direct
architecture per ADR-0001 §13 — Harbor writes blobs directly to cloud
Object Storage on Sovereigns, not via SeaweedFS), but the expected DAG
in scripts/expected-bootstrap-deps.yaml was never updated to match.

Pre-existing drift on main; surfaced by the dependency-graph-audit
check on PR #504 (bp-seaweedfs vendoring fix). Fixing it inline so the
audit passes on the same PR — the two changes are both about the
storage chain on Sovereigns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: alierenbaysal <alierenbaysal@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 01:20:59 +04:00
hatiyildiz
a75414a782 feat(wizard): admin landing = app card grid; per-app page with logs/prereqs/status/overview tabs (replaces DAG)
User decided to abandon the DAG bubble + edge layout in favour of a
common admin view: the same application card surface the wizard uses,
clickable through to a per-Application page with tabs.

New tree under products/catalyst/bootstrap/ui/src/pages/sovereign/:
  AdminPage.tsx              landing page — phase banners + app card grid
  AdminShell.tsx             topbar + left sidebar + status rollup
  ApplicationCard.tsx        per-component card (reuses corp-comp-card geom)
  ApplicationPage.tsx        per-app detail with 4 tabs
  PhaseBanners.tsx           Hetzner-infra + Cluster-bootstrap status
  StatusPill.tsx             pending|installing|installed|failed|degraded
  applicationCatalog.ts      bootstrap-kit + selected-components resolver
  eventReducer.ts            SSE event → per-component state machine
  + tests for each (5 test files, 35 new vitest cases)

Routes:
  /sovereign/provision/$deploymentId           → AdminPage
  /sovereign/provision/$deploymentId/app/$id  → ApplicationPage

ApplicationPage tabs:
  Overview      long-form copy from marketplaceCopy.ts + upstream link
  Logs          events filtered by component, replay from /events buffer
                + live SSE; auto-scroll, level color, expandable lines
  Dependencies  depends-on + depended-on-by graph from componentGroups.ts
                each clickable, navigates to /app/<dep>
  Status        helm-release name, namespace, chart version, condition

Reuses existing surface conventions: logoTone per-brand backplates,
family chips with palettes from marketplaceCopy, theme tokens
--wiz-*. No DAG SVG, no bubble layout, no edge routing — gone.

Removed: 1474 lines from ProvisionPage.tsx (DAG + supernode mapping +
hcloud sub-progress). The original ProvisionPage.tsx is gutted to a
thin redirect for backwards-compat, then deleted in a follow-up commit.

Tests: 188/188 green (153 prior + 35 new). Typecheck clean. Build
clean. Playwright manual verification pending — captured screenshots
will land in a follow-up commit on this same branch once the live
deployment is rolled with the new image.
2026-04-29 17:23:59 +02:00