openova/core/pool-domain-manager/api/openapi.yaml
hatiyildiz 585b046f5d feat(pdm): pool-domain-manager service skeleton (Phase 1 of #163)
Build a new Go service core/pool-domain-manager that becomes the SOLE
authority for OpenOva-pool subdomain allocation across the fleet.

Why this exists: today products/catalyst/bootstrap/api/internal/handler/
subdomains.go does naive net.LookupHost() to decide whether a candidate
subdomain is taken. Dynadot's wildcard parking record at the apex of
omani.works (and any future pool domain) makes EVERY subdomain resolve
to 185.53.179.128, so the check rejects everything. DNS is the wrong
source of truth for an OpenOva-managed pool — the central control plane
must own the allocation table.

What this commit adds (no integration with catalyst-api yet — that lands
in a follow-up commit):

  core/pool-domain-manager/
    cmd/pdm/main.go                     chi router, healthz, sweeper boot
    api/openapi.yaml                     wire contract for every endpoint
    Containerfile                        alpine final stage, UID 65534
    internal/store/                      pgx + CNPG; pool_allocations table
      migrations.sql                       idempotent CREATE TABLE schema
      store.go                             Reserve/Get/Commit/Release/List
      store_test.go                        integration tests (PDM_TEST_DSN)
    internal/dynadot/                    moved + extended; SOLE Dynadot caller
      dynadot.go                           AddRecord, AddSovereignRecords,
                                           DeleteSubdomainRecords (read-modify-
                                           write to honour feedback_dynadot_dns)
      dynadot_test.go                      managed-domain resolution tests
    internal/reserved/                   centralised reserved-name list
      reserved.go                          IsReserved/All; pulled out of
                                           catalyst-api's subdomains.go
    internal/handler/                    HTTP surface
      handler.go                           /api/v1/pool/{domain}/{check,reserve,
                                           commit,release,list}, /healthz,
                                           /api/v1/reserved
    internal/allocator/                  state machine + sweeper goroutine

Architecture choices and how they map to docs/INVIOLABLE-PRINCIPLES.md:

  - Principle #4 (never hardcode): every value (PORT, PDM_DATABASE_URL,
    DYNADOT_MANAGED_DOMAINS, PDM_RESERVATION_TTL, PDM_SWEEPER_INTERVAL)
    flows from env vars; the K8s ExternalSecret will populate them at
    deploy time. The reserved-subdomain list lives in ONE place
    (internal/reserved); catalyst-api will not duplicate it.

  - Principle #2 (no quality compromise): the state machine commits the
    DB row before the Dynadot side-effect, so a crash between the two
    leaves the system in a recoverable state (operator runs Release).
    The reservation_token in the row protects against stale-tab commit
    races. UPSERT semantics + a CHECK constraint mean two operators
    racing /reserve get a clean 23505 (unique_violation) → HTTP 409.

  - Principle #3 (follow architecture): PDM is a ClusterIP service in
    openova-system — it is not a Crossplane provider, not a Flux
    HelmRelease, not bespoke OpenTofu state. catalyst-api speaks to it
    via plain HTTP. The Crossplane Composition that wraps PDM as a
    declarative MR (XDynadotPoolAllocation) lands in a follow-up phase.

The DNS-wildcard problem the issue describes is fixed STRUCTURALLY here:
PDM never calls net.LookupHost. The /check path is a single SELECT
against pool_allocations. omani.works's wildcard A record at the apex
becomes architecturally irrelevant.

Tests exercised in this commit:
  - internal/reserved: full unit coverage (case-insensitive, sorted, set
    membership)
  - internal/dynadot: managed-domain runtime resolution (env-var,
    legacy single-domain fallback, built-in defaults, list parsing)
  - internal/store: integration suite gated on PDM_TEST_DSN env var,
    covers reserve happy-path, reserve race (ErrConflict), TTL expiry
    frees, commit happy-path, commit token mismatch, release removes
    row, sweeper deletes expired rows

Closes phase 1 of #163. Phase 2 (catalyst-api wiring), Phase 3 (CI +
manifests), Phase 4 (Crossplane composition), Phase 6 (deploy +
verification curl) follow in separate commits.

Refs: #163
2026-04-29 06:37:38 +02:00

191 lines
5.7 KiB
YAML

openapi: 3.0.3
info:
title: pool-domain-manager
version: 1.0.0
description: |
Central authority for OpenOva-pool subdomain allocation. Closes #163.
The PDM is the SOLE source of truth for which (poolDomain, subdomain)
pairs have been reserved or activated across the OpenOva fleet, and the
SOLE service in the fleet that calls api.dynadot.com.
State machine per (domain, subdomain) pair:
NULL ─reserve→ RESERVED ─commit→ ACTIVE
│ │
expire/ release/
release destroy
↓ ↓
NULL NULL
servers:
- url: http://pool-domain-manager.openova-system.svc.cluster.local:8080
description: In-cluster catalyst-api → PDM call path
- url: https://pool.openova.io
description: Operator-facing endpoint (auth-gated)
paths:
/healthz:
get:
summary: Liveness probe
responses:
'200':
description: PDM is healthy and CNPG is reachable
content:
application/json:
schema:
type: object
properties:
status: { type: string, example: ok }
managedDomains:
type: array
items: { type: string }
'503':
description: PDM is up but CNPG is unreachable
/api/v1/reserved:
get:
summary: Canonical reserved-subdomain list
responses:
'200':
description: List of reserved subdomain labels
content:
application/json:
schema:
type: object
properties:
reserved:
type: array
items: { type: string }
/api/v1/pool/{domain}/check:
get:
summary: Fast availability read
parameters:
- in: path
name: domain
required: true
schema: { type: string }
- in: query
name: sub
required: true
schema: { type: string }
responses:
'200':
description: Always 200 — clients use body.available, not status
content:
application/json:
schema:
$ref: '#/components/schemas/CheckResult'
/api/v1/pool/{domain}/reserve:
post:
summary: Atomic reserve with TTL
parameters:
- in: path
name: domain
required: true
schema: { type: string }
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [subdomain]
properties:
subdomain: { type: string }
createdBy: { type: string }
responses:
'201':
description: Reservation created
content:
application/json:
schema:
$ref: '#/components/schemas/ReserveResponse'
'409':
description: Subdomain already taken
'422':
description: Invalid input (format / unsupported pool)
/api/v1/pool/{domain}/commit:
post:
summary: Promote reservation → active and write Dynadot records
parameters:
- in: path
name: domain
required: true
schema: { type: string }
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [subdomain, reservationToken, sovereignFQDN, loadBalancerIP]
properties:
subdomain: { type: string }
reservationToken: { type: string, format: uuid }
sovereignFQDN: { type: string }
loadBalancerIP: { type: string, format: ipv4 }
responses:
'200': { description: Committed }
'202': { description: Committed in DB; Dynadot retry needed }
'403': { description: Reservation token mismatch }
'404': { description: No reservation exists }
'409': { description: Already active }
'410': { description: Reservation TTL expired }
/api/v1/pool/{domain}/release:
delete:
summary: Free a (pool, subdomain) and remove Dynadot records
parameters:
- in: path
name: domain
required: true
schema: { type: string }
requestBody:
content:
application/json:
schema:
type: object
required: [subdomain]
properties:
subdomain: { type: string }
responses:
'200': { description: Released }
'202': { description: Row deleted; Dynadot delete partial }
'404': { description: No allocation exists }
/api/v1/pool/{domain}/list:
get:
summary: Operator-facing list of allocations
parameters:
- in: path
name: domain
required: true
schema: { type: string }
responses:
'200':
description: All allocations for the pool
components:
schemas:
CheckResult:
type: object
properties:
available: { type: boolean }
reason: { type: string }
detail: { type: string }
fqdn: { type: string }
ReserveResponse:
type: object
properties:
poolDomain: { type: string }
subdomain: { type: string }
state: { type: string, enum: [reserved] }
reservedAt: { type: string, format: date-time }
expiresAt: { type: string, format: date-time }
reservationToken: { type: string, format: uuid }
createdBy: { type: string }