5ee02f7d36
4 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
af4ed5ed94
|
fix(pdm/dynadot): auto-register NS glue records before set_ns (#1496)
Dynadot rejects set_ns when any NS hostname is not yet registered as a glue record in the customer's account. The 31-line code comment above SetNameservers documents this requirement but the implementation never landed at the adapter layer — only the per-request handler-side glueIP path (BYO Flow B, issue #900) registered glue, leaving the mothership parent-domain onboard flow exposed. Live blocker on 2026-05-15: founder attempted zero-touch onboard of fresh parent domain omani.homes; the flow stalled because ns3.openova.io had never been registered as a Dynadot glue record on this account (ns1/ns2 had been registered long ago when openova.io itself was onboarded). Failure surface: "'ns3.openova.io' needs to be registered with an ip address before it can be used." Required out-of-band manual API calls to unblock, defeating the zero-touch property the architecture is supposed to deliver. Fix (adapter layer, no per-request flag, always-on when configured): - Adapter gains NSGlueIP field; SetNameservers iterates every NS hostname BEFORE set_ns, skips in-bailiwick children of the domain being set, calls RegisterGlueRecord(host, NSGlueIP) for the rest. - RegisterGlueRecord (already idempotent per issue #900) short- circuits via get_ns on identical IP, falls through to set_ns_ip on a stale IP, and runs register_ns when the host is missing — so a SetNameservers retry costs only get_ns probes, not extra writes. - A typed registrar error inside the register loop returns immediately without calling set_ns (fail-fast contract). - POOL_DOMAIN_MANAGER_NS_GLUE_IP env var (canonical operator-config pattern in this repo) threaded through cmd/pdm/main.go onto the Dynadot adapter at PDM startup. Empty value preserves prior pass-through behaviour, keeping BYO Flow B handler-level glue authoritative for per-request Sovereign add-domain calls. Tests (httptest server, 7 new cases) cover: - AllFresh: 3 NS hostnames, all unregistered → 3× (get_ns+register_ns) + set_ns (7 API calls, in order). - OneAlreadyRegistered: middle NS short-circuits via get_ns, others register, set_ns runs. - RegisterFails_SetNsNotCalled: 429 mid-register surfaces ErrRateLimited unwrapped; set_ns must NOT execute. - SetNsFailsAfterRegister: pre-register completes, set_ns returns Dynadot error; ErrDomainNotInAccount surfaces. - SkipsInBailiwick: in-bailiwick NS hostname (child of domain being set) is skipped entirely (no get_ns, no register_ns). - DisabledWhenNSGlueIPEmpty: backward-compat — bare SetNameservers issues exactly one set_ns call when env var unset. - IsInBailiwickHost: case- and trailing-dot-tolerant table test. go build ./... and go test ./... both green across the entire core/pool-domain-manager module. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> |
||
|
|
a6fb7410f4 |
feat(pdm): per-Sovereign PowerDNS zones for #168
Refactor pool-domain-manager to own per-Sovereign zones in PowerDNS,
replacing the previous Dynadot-set_dns2 record-write flow.
Phase 1 — internal/pdns: REST client for PowerDNS Authoritative API
- CreateZone / DeleteZone / EnsureZone / ZoneExists
- PatchRRSets (atomic batch RRset writes)
- AddARecord / AddNSDelegation / RemoveNSDelegation
- EnableDNSSEC: PUT dnssec flag, generate KSK+ZSK (algorithm 13
ECDSAP256SHA256 per docs/PLATFORM-POWERDNS.md), POST rectify
- retry-once-on-5xx with exponential backoff (250ms, 1s)
- X-API-Key header from K8s Secret, never logged
- 22 unit tests covering every method against httptest mock
Phase 2 — allocator: DNSWriter interface + per-Sovereign lifecycle
- /reserve: insert pdm-pg row + create child zone with apex NS
RRset + add NS delegation into parent + enable DNSSEC on child
- /commit: write the canonical 6-record set (apex, *, console,
api, gitea, harbor) into child zone, TTL 300, atomic PATCH
- /release: drop child zone (DNSSEC keys retire) + remove parent
NS delegation, idempotent on 404
- sweeper teardowns DNS for expired reservations before deleting
pdm-pg rows
- rollback path on Reserve failure preserves operator UX
- allocator_test.go: fake DNSWriter for state-machine assertions
Phase 3 — startup parent-zone bootstrap
- BootstrapParentZones runs at PDM startup before HTTP serves
- EnsureZone for every entry in DYNADOT_MANAGED_DOMAINS
- DNSSEC enabled on each parent zone (idempotent)
- PDM exits non-zero if bootstrap fails
Phase 4 — schema unchanged
- child zone name derived as <subdomain>.<poolDomain>, no new column
- existing pool_allocations table works as-is
Phase 5 — dynadot package trimmed
- removed AddSovereignRecords / DeleteSubdomainRecords / AddRecord /
getZone / writeZone (Dynadot DNS write code)
- kept IsManagedDomain / ManagedDomains / ResetManagedDomains /
ErrUnmanagedDomain (config-resolution helpers)
- registrar adapter at internal/registrar/dynadot/ untouched (handles
BYO Flow B NS-delegation via #170)
Phase 6 — env-var contract
PDM_PDNS_BASE_URL, PDM_PDNS_API_KEY, PDM_PDNS_SERVER_ID, PDM_NAMESERVERS
all runtime-configurable per docs/INVIOLABLE-PRINCIPLES.md #4.
Quality bar (all met):
- DNSSEC enabled on every child zone (mandatory per spec)
- parent NS delegation TTL 3600, child A-record TTL 300
- retry-once-on-5xx with exponential backoff in pdns client
- all credentials flow from env vars sourced from K8s Secrets
- no hardcoded URLs, regions, or NS endpoints
Closes openova#168 (DNS-side; private-repo manifest update lands separately).
|
||
|
|
567d7e1f60 |
feat(pdm): registrar adapters for Cloudflare, Namecheap, GoDaddy, OVH, Dynadot (#170)
Adds the BYO Flow B (#166) registrar-flip seam: PDM now exposes a provider-agnostic Registrar interface and 5 adapter implementations plus a new HTTP endpoint that dispatches to them. Wire surface - POST /api/v1/registrar/{registrar}/set-ns Body: {"domain":"...","token":"...","nameservers":["..."]} Reply: {"success":true,"registrar":"...","domain":"...", "nameservers":["..."],"propagation":"..."} - GET /healthz now lists the wired-in registrar names. Interface (internal/registrar/registrar.go) - Name(), ValidateToken, SetNameservers, GetNameservers - Typed errors: ErrInvalidToken, ErrRateLimited, ErrDomainNotInAccount, ErrAPIUnavailable, ErrUnsupportedRegistrar - Registry map[string]Registrar with Lookup + Names helpers Adapters - internal/registrar/cloudflare/ — API v4 with Bearer token; verifies via /user/tokens/verify, looks up zone by name, PATCHes name_servers - internal/registrar/namecheap/ — XML API; ApiUser+ApiKey+UserName+ ClientIp auth; getBalances probe + getList domain check; setCustom for write. IP-whitelisting requirement documented in source comments - internal/registrar/godaddy/ — v1 API with sso-key auth; GET /v1/domains list + PATCH /v1/domains/{d} with nameServers body - internal/registrar/ovh/ — request signing (HMAC-SHA1 over appSecret+consumerKey+method+url+body+timestamp); GET /domain probe; POST /domain/{d}/nameServers/update for write; GET .../nameServer[/{id}] for read - internal/registrar/dynadot/ — api3.json with key+secret as colon- separated token; uses set_ns + domain_info commands. Distinct from the existing internal/dynadot package which is the DNS-record writer for OpenOva-managed pool domains (different concern: pool DNS vs. customer-domain registrar NS-flip) Token hygiene (per docs/INVIOLABLE-PRINCIPLES.md #10) - Tokens never persisted: in-memory only for the duration of the call - Never logged: handler uses classifyOutcome to render redacted outcome labels, never the raw error message or token - Never echoed in responses - TestSetNSResponseDoesNotEchoToken + TestSetNSHappy assert no token bytes appear in JSON body or zerolog/slog output Tests - 74 new unit tests (httptest server per adapter): cloudflare 11, dynadot 11, godaddy 11, namecheap 13, ovh 12, handler 14, registrar interface 2 - Each adapter covers: happy path, bad-token, rate-limited (429), bad-domain (404 / not-in-account), empty-NS guard, name+default - OVH signature math verified deterministically via injected nowFn Acceptance (issue #170) - All 5 adapters pass their unit tests - PDM /api/v1/registrar/{r}/set-ns endpoint live - Wired into cmd/pdm/main.go: every adapter registered at startup Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode), each adapter's BaseURL is constructor-default + struct-overridable, so tests inject httptest endpoints without environment shenanigans. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
585b046f5d |
feat(pdm): pool-domain-manager service skeleton (Phase 1 of #163)
Build a new Go service core/pool-domain-manager that becomes the SOLE
authority for OpenOva-pool subdomain allocation across the fleet.
Why this exists: today products/catalyst/bootstrap/api/internal/handler/
subdomains.go does naive net.LookupHost() to decide whether a candidate
subdomain is taken. Dynadot's wildcard parking record at the apex of
omani.works (and any future pool domain) makes EVERY subdomain resolve
to 185.53.179.128, so the check rejects everything. DNS is the wrong
source of truth for an OpenOva-managed pool — the central control plane
must own the allocation table.
What this commit adds (no integration with catalyst-api yet — that lands
in a follow-up commit):
core/pool-domain-manager/
cmd/pdm/main.go chi router, healthz, sweeper boot
api/openapi.yaml wire contract for every endpoint
Containerfile alpine final stage, UID 65534
internal/store/ pgx + CNPG; pool_allocations table
migrations.sql idempotent CREATE TABLE schema
store.go Reserve/Get/Commit/Release/List
store_test.go integration tests (PDM_TEST_DSN)
internal/dynadot/ moved + extended; SOLE Dynadot caller
dynadot.go AddRecord, AddSovereignRecords,
DeleteSubdomainRecords (read-modify-
write to honour feedback_dynadot_dns)
dynadot_test.go managed-domain resolution tests
internal/reserved/ centralised reserved-name list
reserved.go IsReserved/All; pulled out of
catalyst-api's subdomains.go
internal/handler/ HTTP surface
handler.go /api/v1/pool/{domain}/{check,reserve,
commit,release,list}, /healthz,
/api/v1/reserved
internal/allocator/ state machine + sweeper goroutine
Architecture choices and how they map to docs/INVIOLABLE-PRINCIPLES.md:
- Principle #4 (never hardcode): every value (PORT, PDM_DATABASE_URL,
DYNADOT_MANAGED_DOMAINS, PDM_RESERVATION_TTL, PDM_SWEEPER_INTERVAL)
flows from env vars; the K8s ExternalSecret will populate them at
deploy time. The reserved-subdomain list lives in ONE place
(internal/reserved); catalyst-api will not duplicate it.
- Principle #2 (no quality compromise): the state machine commits the
DB row before the Dynadot side-effect, so a crash between the two
leaves the system in a recoverable state (operator runs Release).
The reservation_token in the row protects against stale-tab commit
races. UPSERT semantics + a CHECK constraint mean two operators
racing /reserve get a clean 23505 (unique_violation) → HTTP 409.
- Principle #3 (follow architecture): PDM is a ClusterIP service in
openova-system — it is not a Crossplane provider, not a Flux
HelmRelease, not bespoke OpenTofu state. catalyst-api speaks to it
via plain HTTP. The Crossplane Composition that wraps PDM as a
declarative MR (XDynadotPoolAllocation) lands in a follow-up phase.
The DNS-wildcard problem the issue describes is fixed STRUCTURALLY here:
PDM never calls net.LookupHost. The /check path is a single SELECT
against pool_allocations. omani.works's wildcard A record at the apex
becomes architecturally irrelevant.
Tests exercised in this commit:
- internal/reserved: full unit coverage (case-insensitive, sorted, set
membership)
- internal/dynadot: managed-domain runtime resolution (env-var,
legacy single-domain fallback, built-in defaults, list parsing)
- internal/store: integration suite gated on PDM_TEST_DSN env var,
covers reserve happy-path, reserve race (ErrConflict), TTL expiry
frees, commit happy-path, commit token mismatch, release removes
row, sweeper deletes expired rows
Closes phase 1 of #163. Phase 2 (catalyst-api wiring), Phase 3 (CI +
manifests), Phase 4 (Crossplane composition), Phase 6 (deploy +
verification curl) follow in separate commits.
Refs: #163
|