openova/products/catalyst/bootstrap/api/internal
e3mrah 5b69247135
fix(clustermesh): secondary cluster name match tofu scheme (D11) (#1540)
Tofu's `secondary_region_cluster_mesh_name` local at
infra/hetzner/main.tf:389 generates secondary names as
`<sovereign-stem>-<region-stem-no-digits>` (e.g. `t129-nbg`,
`t129-sin`). The bootstrap-kit slot 01-cilium.yaml renders
cilium-config cluster.name from this value via the
CLUSTER_MESH_NAME envsubst.

The orchestrator's clusterName derivation was wrong: it appended
`-<region-key>` to the primary's name (e.g. `t129-mesh-nbg1-1`),
which matched NEITHER the tofu scheme NOR the cilium-config value.

Caught on t129 (6cddff7ef4432bdc, 2026-05-16): TLS, etcd RBAC,
and connection all working after PRs #1530, #1536, #1538, #1539 —
but agent reported `failed to retrieve cluster configuration:
not found` for every secondary peer because it queried
`cilium/cluster-config/v1/t129-mesh-nbg1-1` against an etcd that
only had `t129-nbg`.

Fix: export `DeriveSecondaryClusterMeshName(req, rs)` that
mirrors tofu's local exactly, plus a `stripTrailingDigits` helper.
Orchestrator's buildRegionSlots uses this for secondaries; primary
keeps the `<stem>-mesh` shape.

Closes D11 incident chain: #1525#1528#1530#1536#1538#1539 → this. With this PR landed t129's secondary→primary
connection already works (verified on live cluster — secondary
agents show "ready, 2 nodes, 113 endpoints, 326 identities");
primary→secondary will work on a fresh prov once the name match
is correct from the start.

Refs DoD D11.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:08:55 +04:00
..
audit feat(catalyst-ui): RBAC member views — App Members tab + Org Members + access matrix + audit trail (slice U5-U8, #1098) (#1157) 2026-05-09 07:18:28 +04:00
auth feat(auth): parse groups + realm_access.roles + RBAC custom claims (slice D2, #1095) (#1118) 2026-05-08 22:56:35 +04:00
catalog fix(chart,api): qa-loop iter-7 Cluster-C — qa-wp install + apps API dual-shape (#1227) (#1231) 2026-05-10 01:09:24 +04:00
dynadot fix(pdm/dynadot): remove fictional ResponseHeader wrapper from api3.json adapter (#939) (#948) 2026-05-05 15:11:39 +04:00
flowemit refactor(openova-flow): CNPG-backed durable store + emit loop (#1471) 2026-05-14 14:16:11 +04:00
handler fix(clustermesh): secondary cluster name match tofu scheme (D11) (#1540) 2026-05-16 19:08:55 +04:00
handoverjwt feat(catalyst-ui+api): replace magic-link with 6-digit PIN auth (#688) (#694) 2026-05-03 20:26:05 +04:00
helmwatch fix(helmwatch): emit Succeeded events for HRs Ready at attach time (#1510) 2026-05-15 23:54:25 +04:00
hetzner fix(purge): second name-prefix pass for CCM-named clustermesh LBs (#1532) 2026-05-16 17:29:26 +04:00
infrastructure fix(catalyst): chroot JobDetail 'Job not found' + graph WorkerNode duplicates 2026-05-07 15:10:17 +02:00
jobs fix(canvas): canonicalise resolved DependsOn too — kill malformed prior values (#1501) 2026-05-15 17:24:33 +04:00
jtistore feat(catalyst-api): /auth/handover endpoint for seamless single-identity flow (Closes #606) (#612) 2026-05-02 17:34:26 +04:00
k8scache fix(canvas): skip TLS verify on Sovereign k3s self-signed CA — restore sibling deps (#1497) 2026-05-15 14:46:21 +04:00
keycloak fix(catalyst-api): Keycloak admin proxy for /admin/realms/* endpoints (qa-loop iter-1 prefetch Fix #104) (#1327) 2026-05-10 22:52:34 +04:00
newapi feat(unified-rbac): SME-tier extension + host-header tenant discovery (#802) (#816) 2026-05-04 22:34:11 +04:00
objectstorage wip(#425): vendor-agnostic OS rename — partial (rate-limited mid-run) (#435) 2026-05-01 18:05:19 +04:00
openbao feat(catalyst-api): handover finalisation flow (closes #317) (#444) 2026-05-01 18:48:29 +04:00
pdm fix(catalyst-api): PDM client must add basic auth for public ingress (#907) (#908) 2026-05-05 11:07:25 +04:00
powerdns fix(dns): auto-write per-Sovereign A records into parent zone after Phase-0 (#1505) 2026-05-15 21:12:38 +04:00
provisioner fix(clustermesh): secondary cluster name match tofu scheme (D11) (#1540) 2026-05-16 19:08:55 +04:00
store fix(api): unbreak 3 pre-existing CI test failures (EPIC-0 stretch) (#1132) 2026-05-09 00:37:31 +04:00