openova

History

e3mrah ef93a2cdbe feat(cloud-init): patch node providerID after k3s healthz (unblocks Gap A) (#1520 ) Architecturally-clean replacement for the reverted PRs #1513 (k3s flag) and #1516 (pre-install hcloud-ccm). Both prior approaches broke cold-start (chicken-and-egg with the uninitialized taint). This patch instead lets k3s boot normally with its default embedded cloud controller (which sets `providerID=k3s://<hostname>` — the problem), then immediately patches the local Node's `spec.providerID` to `hcloud://<id>` using the Hetzner instance metadata endpoint (169.254.169.254). The patch runs ONCE per CP node, right after k3s apiserver healthz becomes reachable, BEFORE flux-bootstrap.yaml applies the bootstrap-kit Kustomization. Once providerID has the canonical `hcloud://` prefix, bp-hcloud-ccm (installed by Flux later in the bootstrap-kit chain) accepts the node as a Hetzner-managed instance and allocates LBs for Service type=LoadBalancer normally. That unblocks: - D12: clustermesh-apiserver Service gets a real external IP instead of <pending> - D10: AutoEstablishClusterMesh (PR #1508) can read each region's LB IP and write peer entries into cilium-clustermesh Secret - D11: inter-region pod-to-pod traffic flows via Cilium WG over the per-region LB IPs - D5: child catalyst-api can reach secondary regions via mesh, so /cloud view aggregates all 3 regions instead of 1/1 Failure is non-fatal: if metadata lookup or patch fails, we log and continue (bp-hcloud-ccm has a chance to set providerID later via its own node-list-and-match logic). Cold-start is never blocked. Canonical topology (1 cpx52 per region, workerCount=0) means every node is a CP — covered by this patch. Operator-added workers (workerCount>0) would also need providerID patched; a follow-up Job in bp-providerid-patcher can iterate all nodes post-Flux. Co-authored-by: claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 14:12:26 +04:00
..
cloudflare-worker-leases	feat(continuum): K-Cont-4 — Cloudflare Worker source + tofu wiring for lease witness (#1101 ) (#1159 )	2026-05-09 08:01:44 +04:00
hetzner	feat(cloud-init): patch node providerID after k3s healthz (unblocks Gap A) (#1520 )	2026-05-16 14:12:26 +04:00

feat(cloud-init): patch node providerID after k3s healthz (unblocks Gap A) (#1520 )

Architecturally-clean replacement for the reverted PRs #1513 (k3s flag)
and #1516 (pre-install hcloud-ccm). Both prior approaches broke
cold-start (chicken-and-egg with the uninitialized taint).

This patch instead lets k3s boot normally with its default embedded
cloud controller (which sets `providerID=k3s://<hostname>` — the
problem), then immediately patches the local Node's `spec.providerID`
to `hcloud://<id>` using the Hetzner instance metadata endpoint
(169.254.169.254). The patch runs ONCE per CP node, right after k3s
apiserver healthz becomes reachable, BEFORE flux-bootstrap.yaml applies
the bootstrap-kit Kustomization.

Once providerID has the canonical `hcloud://` prefix, bp-hcloud-ccm
(installed by Flux later in the bootstrap-kit chain) accepts the node
as a Hetzner-managed instance and allocates LBs for Service
type=LoadBalancer normally. That unblocks:

- D12: clustermesh-apiserver Service gets a real external IP
        instead of <pending>
- D10: AutoEstablishClusterMesh (PR #1508) can read each region's
        LB IP and write peer entries into cilium-clustermesh Secret
- D11: inter-region pod-to-pod traffic flows via Cilium WG over the
        per-region LB IPs
- D5: child catalyst-api can reach secondary regions via mesh, so
       /cloud view aggregates all 3 regions instead of 1/1

Failure is non-fatal: if metadata lookup or patch fails, we log and
continue (bp-hcloud-ccm has a chance to set providerID later via its
own node-list-and-match logic). Cold-start is never blocked.

Canonical topology (1 cpx52 per region, workerCount=0) means every
node is a CP — covered by this patch. Operator-added workers
(workerCount>0) would also need providerID patched; a follow-up Job
in bp-providerid-patcher can iterate all nodes post-Flux.

Co-authored-by: claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-16 14:12:26 +04:00

cloudflare-worker-leases

feat(continuum): K-Cont-4 — Cloudflare Worker source + tofu wiring for lease witness (#1101 ) (#1159 )

2026-05-09 08:01:44 +04:00

hetzner

feat(cloud-init): patch node providerID after k3s healthz (unblocks Gap A) (#1520 )

2026-05-16 14:12:26 +04:00