Architecturally-clean replacement for the reverted PRs #1513 (k3s flag)
and #1516 (pre-install hcloud-ccm). Both prior approaches broke
cold-start (chicken-and-egg with the uninitialized taint).
This patch instead lets k3s boot normally with its default embedded
cloud controller (which sets `providerID=k3s://<hostname>` — the
problem), then immediately patches the local Node's `spec.providerID`
to `hcloud://<id>` using the Hetzner instance metadata endpoint
(169.254.169.254). The patch runs ONCE per CP node, right after k3s
apiserver healthz becomes reachable, BEFORE flux-bootstrap.yaml applies
the bootstrap-kit Kustomization.
Once providerID has the canonical `hcloud://` prefix, bp-hcloud-ccm
(installed by Flux later in the bootstrap-kit chain) accepts the node
as a Hetzner-managed instance and allocates LBs for Service
type=LoadBalancer normally. That unblocks:
- D12: clustermesh-apiserver Service gets a real external IP
instead of <pending>
- D10: AutoEstablishClusterMesh (PR #1508) can read each region's
LB IP and write peer entries into cilium-clustermesh Secret
- D11: inter-region pod-to-pod traffic flows via Cilium WG over the
per-region LB IPs
- D5: child catalyst-api can reach secondary regions via mesh, so
/cloud view aggregates all 3 regions instead of 1/1
Failure is non-fatal: if metadata lookup or patch fails, we log and
continue (bp-hcloud-ccm has a chance to set providerID later via its
own node-list-and-match logic). Cold-start is never blocked.
Canonical topology (1 cpx52 per region, workerCount=0) means every
node is a CP — covered by this patch. Operator-added workers
(workerCount>0) would also need providerID patched; a follow-up Job
in bp-providerid-patcher can iterate all nodes post-Flux.
Co-authored-by: claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>