fix(infra): body-supplied SKUs win over QA defaults (Fix #183) (#1386)

* fix(catalyst-ui): delete malformed `import type from react` line (Fix #181)

Fix #180 PR #1383 merged with sed -i error: produced `import type  from 'react'`
(empty import binding) which is a syntax error. Main build broken.
This PR removes the malformed line entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(infra): pin LB private IPs + revert hel1 zone (Fix #182)

Root cause of prov #32 FATAL "hcloud/inlineAttachServerToNetwork:
attach server to network: IP not available" on hcloud_server.control_plane[0]:

  hcloud_load_balancer_network.{main,secondary} both attached to the
  shared network WITHOUT an explicit `ip` argument. Hetzner auto-allocates
  the first free IP from the first matching-zone subnet. In the
  multi-region prov #32 the secondary LB-network (hel1) completed first
  at t+16s and took 10.0.1.2 from the only eu-central subnet existing
  at that moment (`main` = 10.0.1.0/24) — stealing the IP the primary
  CP claims explicitly via `ip = "10.0.1.${count.index + 2}"`.

  Fix: pin LB anchors to top-of-subnet (.254) so they live outside the
  CP/worker IP range (.2..N for CPs, .10+ for workers).

Also revert Fix #179 (`hel1 = "eu-north"`). Hetzner /v1/locations API
on 2026-05-11 returns network_zone=eu-central for hel1. Fix #179 caused
prov #32's secondary subnet to fail with `invalid input in field
'network_zone' [network zone does not exist]`. The original prov #29/#30
"IP not available on secondary[hel1-1]" was the same LB-IP collision —
this PR resolves both.

Multi-region apply now lands cleanly:
  10.0.1.2     -> primary CP (cp1)
  10.0.1.254   -> primary LB anchor
  10.0.10.2    -> secondary CP (hel1-1)
  10.0.10.254  -> secondary LB anchor (hel1-1)

Refs: openova-private prov-loop session 2026-05-11 Wave 26

* fix(infra): body-supplied SKUs win over QA defaults (Fix #183)

Fix #157 introduced `effective_cp_size = coalesce(var.qa_control_plane_size,
var.control_plane_size)` when qa_fixtures_enabled='true'. Because
qa_control_plane_size has a non-empty default (cpx32), coalesce always
returned the QA default and silently overrode whatever the body supplied
in `controlPlaneSize`.

Founder-supplied body for prov #32 specified `controlPlaneSize: "cpx42"`
explicitly (cheapest viable for the founder's collapsed-CP+worker
single-node-per-region topology with workerCount=0). The QA-default
override downgraded that to cpx32 at plan time — the explicit choice
never made it onto the hardware.

Fix #183 — invert the coalesce so body wins:

  effective_cp_size = local.qa_mode
    ? coalesce(var.control_plane_size, var.qa_control_plane_size)
    : var.control_plane_size

`provisioner.go` writeTfvars already emits control_plane_size / worker_size
only when the body's field is non-empty (so `var.control_plane_size`
inherits variables.tf's cost-optimised default when the body left it
blank). That means `coalesce(var.control_plane_size, var.qa_*)` always
has a non-empty first arg in normal flow; the QA-default fallback only
fires on a zero-override QA call that intentionally leaves the SKU empty.

No change to customer-Sovereign behaviour (qa_fixtures_enabled='false'
branch already used `var.control_plane_size` verbatim).

Refs: openova-private prov-loop session 2026-05-11 Wave 26

---------

Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
e3mrah 2026-05-11 13:04:41 +04:00 committed by GitHub
parent 515c3cf38d
commit 4e6bec7022
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -131,9 +131,19 @@ locals {
# already permits (solo Sovereign, worker_count=0): an empty
# qa_worker_size would otherwise short-circuit to "" coalesce() falls
# back to the production default in that mode.
qa_mode = var.qa_fixtures_enabled == "true"
effective_cp_size = local.qa_mode ? coalesce(var.qa_control_plane_size, var.control_plane_size) : var.control_plane_size
effective_worker_size = local.qa_mode ? coalesce(var.qa_worker_size, var.worker_size) : var.worker_size
qa_mode = var.qa_fixtures_enabled == "true"
# Fix #183: body's control_plane_size / worker_size win over QA defaults
# when present. Previously `coalesce(var.qa_*, var.*)` returned the QA
# default whenever it was non-empty, silently downgrading a body's
# explicit cpx42 cpx32. Now QA defaults only kick in when the body
# left the SKU empty (zero-override prov / legacy QA path). On customer
# Sovereigns (qa_fixtures_enabled='false') the QA defaults are never
# considered. See provisioner.go writeTfvars body-supplied SKUs are
# emitted as JSON only when non-empty, so var.control_plane_size /
# var.worker_size already inherit variables.tf defaults when the body
# left them blank; coalesce-with-body-first is the right precedence.
effective_cp_size = local.qa_mode ? coalesce(var.control_plane_size, var.qa_control_plane_size) : var.control_plane_size
effective_worker_size = local.qa_mode ? coalesce(var.worker_size, var.qa_worker_size) : var.worker_size
# k3s deterministic bootstrap token derived from project ID + sovereign FQDN.
# Workers join with this; k3s rotates it after first join.