openova/platform/cert-manager
e3mrah 2b60e944e2
fix(bp-cert-manager-powerdns-webhook): re-target to contabo PowerDNS, drop dynadot-webhook (#681)
* fix(bp-cert-manager-powerdns-webhook): re-target to contabo PowerDNS, drop dynadot-webhook

Caught live on otech43-46: cert-manager DNS-01 challenges for
*.otechN.omani.works failed because the Sovereign-side webhook wrote
challenge TXT records to the Sovereign's local PowerDNS. omani.works is
delegated from Dynadot to ns1/2/3.openova.io which run on contabo's
central PowerDNS — the Sovereign's local PowerDNS is INVISIBLE on the
public DNS chain until pool-domain-manager seals the per-Sovereign NS
delegation. Let's Encrypt resolvers walk the public chain, query
contabo, get NXDOMAIN, the cert never issues. Manual workaround was
seeding challenge TXT directly in contabo PowerDNS.

This PR automates the right write path:

- bp-cert-manager-powerdns-webhook chart bumped to 1.0.4. Default
  powerdns.host flips from "" (skip-render) to https://pdns.openova.io
  (contabo's public PowerDNS API ingress, authoritative for omani.works).
- ClusterIssuer letsencrypt-dns01-prod-powerdns now usable with no
  per-cluster powerdns.host override for the omani.works pool.
  apiKeySecretRef.namespace clarified — upstream ignores it; the Secret
  must live in cert-manager namespace (= ChallengeRequest.ResourceNamespace
  for ClusterIssuers).
- bootstrap-kit slot 49 updated: drops bp-powerdns dependsOn (webhook
  calls out-of-cluster contabo, not local PowerDNS), bumps chart version,
  removes inline powerdns.host override (defaults are correct).
- bootstrap-kit slot 49b (bp-cert-manager-dynadot-webhook) DELETED
  entirely — Dynadot is NOT the API-level authority for omani.works
  subdomains, the dynadot webhook silently fails the same way the
  Sovereign-local powerdns one did.
- clusters/_template/sovereign-tls/cilium-gateway-cert.yaml flips
  issuerRef from letsencrypt-dns01-prod (was dynadot-backed) to
  letsencrypt-dns01-prod-powerdns (the new contabo-backed issuer).
- bp-cert-manager chart: certManager.issuers.dns01.enabled defaults to
  false (deprecated dynadot path). letsencrypt-http01-prod retained for
  per-host certs. Cluster overlays MAY flip dns01.enabled=true for
  non-omani.works pools where Dynadot IS the API-level authority.
- scripts/expected-bootstrap-deps.yaml: drops slot 49b, drops bp-powerdns
  edge from slot 49.
- Documentation (README + blueprint.yaml + Chart.yaml description)
  rewritten to reflect contabo retarget and lifecycle reasoning.

Credential plumbing (out of scope here, must be done in cloud-init):
- Every Sovereign needs a `powerdns-api-credentials` Secret in the
  `cert-manager` namespace whose `api-key` value matches contabo's
  PowerDNS API key. Same seeding pattern as `dynadot-api-credentials`
  in infra/hetzner/cloudinit-control-plane.tftpl.

Caveat — basicAuth on contabo's PowerDNS API ingress: contabo currently
fronts pdns.openova.io with Traefik basicAuth (per
clusters/contabo-mkt/apps/powerdns/helmrelease.yaml). The upstream
zachomedia/cert-manager-webhook-pdns binary supports the X-API-Key
header but not HTTP Basic Auth out of the box. To make this end-to-end
green, contabo's basicAuth requirement must be relaxed (X-API-Key alone
provides the auth posture, and contabo's API endpoint is restricted to
operator IPs by other means OR the Sovereign's webhook needs an
Authorization header injected via the chart's powerdns.headers map
(plaintext password in the ClusterIssuer config — not ideal). This PR
ships the chart side; the basicAuth question is a follow-up on the
contabo side.

Verified locally:
- helm lint platform/cert-manager-powerdns-webhook/chart -> PASS
- helm template platform/cert-manager-powerdns-webhook/chart -> renders
- helm template ... --set clusterIssuer.enabled=true -> renders the
  ClusterIssuer with host="https://pdns.openova.io" + correct apiKey
  Secret reference.
- helm template platform/cert-manager/chart -> renders ONLY
  letsencrypt-http01-prod (the dns01 dynadot issuer correctly gated off).
- scripts/check-bootstrap-deps.sh: net-zero new drift; my branch reduces
  pre-existing errors from 3 to 2 (the dropped slot 49b removed the only
  drift my branch was responsible for).

Closes follow-up to #373. Preconditions for handover URL TLS green
on otech43-46 lineage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(scripts): repair YAML structure in expected-bootstrap-deps.yaml

Two pre-existing drifts were blocking dependency-graph-audit CI:

1. Slot 5a (bp-reflector) was missing its closing list separator,
   causing yq to merge the bp-nats-jetstream entry into the bp-reflector
   map and effectively drop bp-reflector from the expected DAG.
   Added explicit `- slot: 7` for bp-nats-jetstream and quoted "5a" so
   yq treats it as a string slot (matches the convention with "49b").

2. bp-powerdns slot 11: actual bootstrap-kit declares dependsOn
   bp-cnpg (live since otech28 — pdns-pg-app secret race) but the
   expected DAG was missing this edge.

This is unblocks merging fix/cert-manager-powerdns-webhook-contabo (PR
above) — these drifts existed on main but weren't surfaced until the
last expected-deps edit forced a re-run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:12:48 +04:00
..
chart fix(bp-cert-manager-powerdns-webhook): re-target to contabo PowerDNS, drop dynadot-webhook (#681) 2026-05-03 17:12:48 +04:00
blueprint.yaml feat(dns): cert-manager-dynadot-webhook for DNS-01 wildcard TLS (closes #159) (#291) 2026-04-30 19:37:47 +04:00
README.md docs(pass-8): role-in-Catalyst banners + dead-link fix in component READMEs 2026-04-27 21:39:03 +02:00

cert-manager

TLS certificate automation. Per-host-cluster infrastructure (see docs/PLATFORM-TECH-STACK.md §3.3) — runs on every host cluster a Sovereign owns.

Status: Accepted | Updated: 2026-04-27


Overview

cert-manager provides automated TLS certificate management using Let's Encrypt with automatic renewal and Kubernetes-native integration.


Architecture

flowchart TB
    subgraph CM["cert-manager"]
        Controller[Controller]
        Webhook[Webhook]
        CAInjector[CA Injector]
    end

    subgraph Issuers["Issuers"]
        LE[Let's Encrypt]
        CA[Internal CA]
    end

    subgraph Resources["K8s Resources"]
        Cert[Certificate]
        Secret[TLS Secret]
        Ingress[Gateway/Ingress]
    end

    Controller --> LE
    Controller --> CA
    Cert --> Controller
    Controller --> Secret
    Secret --> Ingress

Challenge Types

Challenge Use Case DNS Provider
HTTP-01 Public endpoints Not required
DNS-01 Wildcards, internal Cloudflare, Route53, etc.

Recommended: DNS-01 for wildcard certificates


Configuration

ClusterIssuer (Let's Encrypt)

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@<domain>
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
      - dns01:
          cloudflare:
            apiTokenSecretRef:
              name: cloudflare-api-token
              key: api-token

Certificate

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard-cert
  namespace: cilium-gateway
spec:
  secretName: wildcard-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - "*.<domain>"
    - "<domain>"

Gateway API Integration

cert-manager integrates with Cilium Gateway API:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: main-gateway
  namespace: cilium-gateway
spec:
  gatewayClassName: cilium
  listeners:
    - name: https
      protocol: HTTPS
      port: 443
      tls:
        mode: Terminate
        certificateRefs:
          - name: wildcard-tls

Renewal

Setting Value
Renewal window 30 days before expiry
Check interval 24 hours
Retry interval 1 hour on failure

cert-manager automatically renews certificates before expiration.


Monitoring

Metric Description
certmanager_certificate_expiration_timestamp_seconds Certificate expiry time
certmanager_certificate_ready_status Certificate readiness
certmanager_http_acme_client_request_count ACME requests

Part of OpenOva