helmwatch.Bridge writes SOME Job.DependsOn entries as bare names
("install-flux") rather than the canonical JobID form
("<deploymentId>:install-flux") — 71 such entries observed on prov
bfdccbdbd6f700e1 (2026-05-12). My flowSnapshotFromJobs emit copied
those bare names verbatim into Relationship.fromId. The canvas
reducer matches FlowNode.id by exact string, so the bare-name fromId
became a phantom edge pointing to a non-existent node. In the
force-directed layout these phantom edges visually routed through
the nearest real bubbles, manifesting as 5-edge fan-outs from every
Phase-0 tofu job to every install-* bubble (operator-reported on
install-cnpg, but symmetric across all install-*).
Normalise every fromId to jobs.JobID(deploymentID, dep) form when
the stored value lacks a ":" separator.
Caught after operator reported "install-cnpg has 5 different
connections from terraform jobs — this is matter of a proper
chaining" — looking at the snapshot showed Job.DependsOn=[install-flux]
without the prefix.
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per products/openova-flow/core/src/types.ts line 112:
"contains — toId (parent) contains fromId (child)"
My emit had this inverted: I set FromID=parent, ToID=child, which
made the FE adapter (flowStreamToOrganic.ts line 134) interpret every
install-* leaf as a group containing the bootstrap-kit/provisioner
group nodes. Net result: only 2 bubbles ever rendered on the canvas
regardless of ?depth= because the hierarchy graph was upside-down.
Caught by opening the canvas in a browser via Playwright after the
operator reported "still showing only 2 bubbles, no drill-down".
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the Pod restarts between PutKubeconfig writing the file AND the
next Result.Save() persisting the field, dep.Result.KubeconfigPath
comes back empty even though the file exists at the canonical
convention <kubeconfigsDir>/<deploymentID>.yaml. RefreshWatch was
returning 409 watch-not-resumable in this state, which left the
mothership canvas frozen because the live watcher couldn't re-attach
to source HR.spec.dependsOn for the install-* edge derivation.
Hit live on prov bfdccbdbd6f700e1 (2026-05-12): chart roll for
PR #1431 restarted catalyst-api Pod, the file
/var/lib/catalyst/kubeconfigs/bfdccbdbd6f700e1.yaml was on disk but
RefreshWatch refused to use it because the record field was empty.
Fix: when KubeconfigPath is empty AND h.kubeconfigsDir is configured
AND a file exists at <dir>/<depID>.yaml, use that path and patch the
record so subsequent /components/state + flow snapshot calls see a
populated field.
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs the operator hit on /sovereign/provision/<id>/jobs:
1) Phase-1 install-* Jobs rendered DISCONNECTED on the canvas —
helmwatch.Bridge doesn't persist Job.DependsOn (only the Phase-0
tofu chain + cluster-bootstrap is wired today). Pull HR.spec.dependsOn
from the live Watcher's informer cache via SnapshotComponents()
(ComponentSnapshot.DependsOn already populated by extractDependsOn)
at snapshot-time and emit finish-to-start edges from upstream
install-<dep> to install-<self>. Also add provisioner→bootstrap-kit
group-to-group finish-to-start so the Phase-0/Phase-1 ordering is
visible on the canvas.
2) Clicking a canvas node → "404 page not found" because
FlowPage.handleNodeDoubleClick passed the full
"<deploymentId>:install-X" id verbatim. The backend Store.GetJob
keys by bare jobName ("install-X"), so the colon-prefixed id missed
exact-match and JobDetail returned 404. Mirror useJobLinkBuilder
(JobsTable.tsx line 364): strip the "<deploymentId>:" prefix and
encodeURIComponent the remainder before pushing to the router.
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(catalyst-api): add OPENOVA_FLOW_SERVER_URL env to chart template
Without this env the proxy resolveFlowServerURL() falls back to
per-deployment FQDN lookup (https://openova-flow.<sovereignFQDN>) which
only exists on Sovereigns that already installed bootstrap-kit slot 56
with httproute=enabled. Every other catalyst-api deployment (mothership
contabo + Sovereigns that haven't reached cutover yet) returns 502 on
/api/v1/flows/{deploymentId}/snapshot — the live regression founder
saw at console.openova.io: "No nodes to render."
The env points at the in-cluster Service DNS for the LOCAL openova-flow-
server. Both the mothership (catalyst-system or catalyst namespace) and
each Sovereign chroot run the bp-openova-flow-server chart with a local
Service, so this URL is correct for every cluster catalyst-api runs in.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(flow-proxy): assemble snapshot from local jobs.Store before upstream proxy
Mothership canvas at /sovereign/provision/<id>/jobs was empty for the
first ~30 minutes of every fresh provision because the snapshot
endpoint went straight to https://openova-flow.<sovereignFQDN> which
can't serve until cilium + cert-manager + the HTTPRoute TLS cert are
all up on the chroot. The Phase-0 + Phase-1 lifecycle Jobs catalyst-api
ALREADY owns (tofu-init/plan/apply/output, flux-bootstrap,
install-bp-<chart>, ...) were invisible the whole time.
This change adds flowSnapshotFromJobs which assembles the canonical
FlowMessage envelope from h.jobsStore().ListJobs(deploymentID) — every
Job becomes a FlowNode with the legacy <deploymentId>:<jobName> id form
the canvas drill-down already expects, every Job.DependsOn becomes a
finish-to-start Relationship, every Job.ParentID becomes a contains
Relationship. HandleFlowSnapshot checks the local store first and
returns immediately when it has data; otherwise falls through to the
existing upstream proxy path.
HandleFlowStream gets the same treatment via flowStreamLocal: emit a
snapshot frame on connect AND every 3 seconds thereafter, plus a 15s
heartbeat. The OpenovaFlow consumer's reducer is idempotent on
snapshot replay so re-emitting an unchanged envelope is harmless;
in exchange the canvas reflects Job state transitions within ~3s
of when helmwatch.Bridge writes them.
No FE change required — the same /api/v1/flows/<id>/snapshot and
/stream endpoints serve the same envelope shape the chroot adapter
emits (products/openova-flow/adapter-flux/internal/types/flow.go),
named SSE events including 'snapshot' and 'heartbeat'.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
prov #44 (d9399223c3caa4f9) hit the catalyst-api 60m phase1 watch cap
with bp-catalyst-platform HR still mid-retry (failures=3) and 41/45 HRs
True. F1-F7 are correct and live on main (qa-finalizer-strip Completed,
autoscaler workers joined). The remaining wall is total bootstrap-kit
install time exceeding the outer watch budget on a fresh cpx42×1
Sovereign without a warm Harbor proxy-cache.
Two lock-step changes widen both bounds:
1. clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml:
install.timeout 15m → 30m, upgrade.timeout 15m → 30m. The umbrella
chart genuinely needs >15m worst case when the full SME + Catalyst
service stack rolls cold.
2. products/catalyst/bootstrap/api/internal/helmwatch/helmwatch.go:
DefaultWatchTimeout 60m → 120m. Worst-case inner HR retry chain is
now 30m × 3 = 90m; the outer phase1 budget MUST be larger so the
watch never terminates while helm-controller still has remediation
attempts left. CATALYST_PHASE1_WATCH_TIMEOUT env-var override path
was already wired (issue #538 baseline) — chart template now
declares the explicit "120m" value so the runtime knob is
discoverable for capacity-bounded environments. Per INVIOLABLE-
PRINCIPLES.md #4 the knob remains runtime-configurable.
New unit test TestPhase1WatchConfig_ProductionDefaultIs120m pins the
F8 floor against future regression. Existing env-var override + field-
override tests still pass unchanged.
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause (4-layer trace on prov #41, omantel.biz, 2026-05-12 00:28 UTC):
bp-catalyst-platform HR install.timeout=15m
→ Helm pre-install hook: qa-finalizer-strip Job (weight -99)
→ Pod requests 50m CPU + 64Mi memory (tiny)
→ BUT no tolerations → scheduler restricted to worker
→ worker cpx32 (8vCPU/16GB) at 99% CPU requests
(7980m of 8000m allocated) after bootstrap-kit fan-out
→ FailedScheduling: "0/2 nodes are available: 1
Insufficient cpu, 1 node(s) had untolerated taint
{node-role.kubernetes.io/control-plane: true}"
→ autoscaler triggers scale-up worker 2→3 → "1 in backoff
after failed scale-up" → still Pending → 15m timeout
→ InstallFailed → Flux uninstall+rollback → installFailures: 3
→ Flux gives up entirely
Live evidence quoted from chroot kubeconfig on prov #41:
- bp-catalyst-platform HR `Reconciling=True, reason=Progressing,
message="Running 'install' action with timeout of 15m0s"`
- HR `Released=False, reason=InstallFailed, message="Helm install
failed for release catalyst-system/catalyst-platform with chart
bp-catalyst-platform@1.4.140: failed pre-install: 1 error occurred:
* timed out waiting for the condition"`
- Pod `qa-finalizer-strip-m2hdb` status=Pending; events:
`Warning FailedScheduling 108s default-scheduler 0/2 nodes are
available: 1 Insufficient cpu, 1 node(s) had untolerated taint
{node-role.kubernetes.io/control-plane: true}`
- Worker `Allocated cpu 7980m (99%) of 8000m capacity`
- Control-plane `Allocated cpu 635m (7%) of 8000m capacity` (idle)
Fix: add tolerations for the control-plane NoSchedule taint +
priorityClassName: system-cluster-critical so the qa-finalizer-strip
Job can ALWAYS schedule regardless of worker-node CPU saturation.
The hook is a defense-in-depth cleanup that runs in seconds on a
clean cluster; it legitimately belongs anywhere with free capacity
including the control-plane node (which on prov #41 had 7365m CPU
free vs. the hook's 50m request).
Why prior fixes didn't suffice:
- Fix#114 introduced this hook to break a finalizer-deadlock loop
on prov #9. Correct fix for that wedge; never anticipated worker
saturation as a scheduling failure mode for the hook itself.
- Fix#138 (chart 1.4.138) converted the qa-cnpg-backup-s3-seed +
qa-cnpg-status-seed hooks (weight 0/post-install) to regular
release resources to break a circular DAG dep. Different hook
surface.
- Fix#184 (chart 1.4.140) raised the gitea-token-mint pre-install
hook (weight +10) wait budget for cold-start autoscaler. That
hook runs AFTER qa-finalizer-strip (-99 < +10); if the -99 hook
never starts, the +10 hook never runs.
Recurring class: same family as Fix#114 (hook scheduling failure
wedges entire HR install). 3 consecutive recurrences (prov #38, #39,
#41) on chart pin 1.4.140 trigger the category-level audit threshold
(CLAUDE.md rule "CATEGORY-LEVEL THINKING"). Coupled chart hygiene
swept in same commit:
- Switch image from bitnamilegacy/kubectl:1.29.3 (Docker-Hub
redirect for deprecated Bitnami images, 2025-08 cutover
documented at platform/self-sovereign-cutover/chart/values.yaml:
252) → harbor.openova.io/proxy-dockerhub/alpine/k8s:1.31.4 —
the canonical alpine-based kubectl image already used by sibling
hook catalyst-gitea-token-mint (Fix#163). MIRROR-EVERYTHING +
ARCHITECT-FIRST rules.
Coordinator follow-up tickets:
- Sibling Jobs in templates/qa-fixtures/cnpg-clusters-qa.yaml
(qa-cnpgpair-status-seed) still reference bitnamilegacy/kubectl
:1.29.3 — same Bitnami-deprecation class. Out of scope for this
Fix (not part of the recurrence cluster); flagged for a sweep.
- Worker cpx32 sizing may be undersized for the bootstrap-kit fan-
out on omantel.biz — separate sizing ticket, not blocking.
Changes:
- products/catalyst/chart/templates/qa-fixtures/pre-install-
finalizer-strip.yaml: add tolerations + priorityClassName;
switch image to alpine/k8s:1.31.4. Inline doc comments explain
the 4-layer trace and the Fix #114/#138/#184 history.
- products/catalyst/chart/Chart.yaml: bump 1.4.140 → 1.4.141 with
changelog entry capturing root cause + budget arithmetic.
- clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml:
bump HR pin 1.4.140 → 1.4.141.
Verification:
- helm template renders cleanly (exit 0, ~6700 lines).
- kubectl apply --dry-run=client validates the rendered Job
manifest (job.batch/qa-finalizer-strip created (dry run)).
- Rendered Job contains tolerations[control-plane Exists NoSchedule],
priorityClassName: system-cluster-critical, image: alpine/k8s:1.31.4.
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TC-035 (iter-2, 2026-05-11): OpenovaFlow rows merged into JobsPage
(PR #1413) lost their region-prefixed identity in the URL. The link
builder sliced the "<prefix>:" segment off every id with a colon —
intended to strip the legacy "<deploymentId>:install-keycloak" form,
but it also stripped "contabo:bp-openova-flow-server" → bare
"bp-openova-flow-server" in the href. The matrix asserts the
verbatim form "/jobs/contabo:bp-openova-flow-server" must appear in
the rendered DOM.
Fix: stop slicing. `encodeURIComponent` still escapes unsafe path
chars (`/` for live K8s job ids like "job/syft-grype/..."), then we
restore `:` because RFC 3986 permits it as a path-segment `pchar`.
FlowPage canvas navigation (PR #1411) and JobDetail flow-fallback
(PR #1412) already pass on the colon-present form, so this round-
trips end-to-end. Legacy "bp-cilium" / "cluster-bootstrap" hrefs are
unchanged (no `:` to encode). The previously-stripped legacy form
"<deploymentId>:install-keycloak" now lands as the full id in the
URL, and JobDetail's `jobsById` lookup is already keyed by BOTH the
canonical id AND the bare jobName (JobDetail.tsx:124-131), so the
resolution path is preserved.
Test coverage: new Case 4 in JobsPage.flow-merge.test.tsx asserts
the openova-flow row's anchor `href` contains
`/jobs/contabo:bp-openova-flow-server` and is NOT the bare-jobName
form. All 4 flow-merge cases PASS. The 3 pre-existing failures in
JobsPage.test.tsx (back-to-apps href, canonical-columns header,
Show-as-Flow button) are the documented iter-2 baseline — untouched
by this change.
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TC-035 iter-1 FAIL (2026-05-11): /sovereign/provision/12e194090631a885/jobs
asserts rows for the openova-flow-server + openova-flow-emitter HRs but the
JobsTable only sourced from /api/v1/deployments/<id>/jobs (legacy event
stream) — verified live: GET /v1/flows/<id>/snapshot returns 2 leaf nodes
(contabo:bp-openova-flow-server, contabo:bp-openova-flow-emitter) whose ids
NEVER appear in the legacy /jobs payload. Sovereigns whose state lives only
in the OpenovaFlow snapshot silently drop these rows.
Fix: wire `useFlowStream({deploymentId})` alongside the existing legacy
reducer + live-jobs backfill. Synthesize a Job stub per FlowNode via
`synthesizeJobFromFlowNode` (PR #1412 — same adapter JobDetail's
flow-fallback path uses) and append the rows whose ids are absent from the
legacy set. Legacy wins dedup on id collisions because it carries real
execution timeline / appId / parentId / dependsOn — the flow synth is
intentionally a minimal stub.
Behavior unchanged for Sovereigns without an active flow stream: empty
FlowNode map → empty `flowJobs` → `legacyMerged` passes through untouched.
Test coverage (JobsPage.flow-merge.test.tsx — 3 cases, all PASS):
1. Legacy 5 / flow empty → 5 rows, no behavior change.
2. Legacy 5 / flow has 2 distinct ids → 7 rows with the contabo:bp-*
ids present.
3. Legacy 5 / flow has 1 id-collision + 1 new → 6 rows, legacy wins
dedup (DOM scan asserts the colliding testid appears exactly once).
Validation:
vitest: 3/3 PASS on new file; 13 prior tests in JobsPage.test.tsx
unchanged from origin/main baseline (3 unrelated pre-existing failures
in chrome/columns/Show-as-Flow tests, untouched by this fix).
tsc --noEmit -p tsconfig.app.json: 27 errors, ALL pre-existing in
@openova/flow-canvas + @openova/flow-core workspaces — zero new errors
introduced.
Canonical seam reused (no new code paths):
- @/lib/openflow-adapter-sse → useFlowStream (FlowPage / JobDetail share)
- @/lib/synthesizeJobFromFlowNode (PR #1412 helper)
- @/lib/jobs.types → Job (single source of truth)
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JobDetail built `jobsById` from the legacy useDeploymentEvents reducer
+ useLiveJobsBackfill polling. For Sovereigns whose state lives ONLY in
the openova-flow snapshot (post-flux-only flow, fresh chroot before the
catalyst-api event bridge has emitted any rows), that lookup misses and
JobDetail short-circuited to "Job not found" — never mounting FlowPage,
the very surface that would have painted the node.
Verified live this turn against deployment 12e194090631a885:
GET /api/v1/flows/12e194090631a885/snapshot → 200, 2 leaf nodes
GET /api/v1/deployments/12e194090631a885/jobs/<nodeId> → 404
This blocks ~20 of 26 iter-1 FAILs on the OpenovaFlow canvas test
matrix (TC-019/020/021/023/024/025/027/028/033/034/036/037/038/039/040
/041/042/053/054/060/064).
Fix:
• JobDetail now reads the same useFlowStream hook FlowPage uses.
• When `jobsById[jobId]` is undefined, look up the node in the flow
snapshot's nodes Map. If found, synthesize a flat Job stub from the
FlowNode (id, label, status) so the canvas mounts with the right
hostJobId.
• Behaviour for Sovereigns WITH an active event stream is unchanged
— the legacy lookup wins and the synth stub is never read.
• "Job not found" panel renders ONLY when BOTH lookups miss.
Tests:
Added JobDetail.flow-fallback.test.tsx (vitest, 3 cases):
1. Legacy has the job → FlowPage renders, no fallback.
2. Legacy empty, flow snapshot has the node → FlowPage renders
via synth job (the iter-1 FAIL scenario).
3. Both empty → "Job not found" panel.
All 3 new + 5 existing JobDetail tests pass.
No tsc regressions (27 → 27 baseline errors, all pre-existing
in flow-canvas/flow-core packages).
Refs INVIOLABLE-PRINCIPLES.md:
#1 (waterfall): target-state fallback, no MVP "show loading" stub.
#2 (no compromise): no field is faked with plausible data; absent
timestamps land as null / 0 so fmtTime renders "—".
#4 (never hardcode): the synth helper coerces FlowNode.status into
the JobStatus vocabulary; the label falls back to the node id when
`label` is empty.
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause of live crash 'TypeError: t.relationships is not iterable':
the Go server uses omitempty JSON tags on FlowMessage so empty slices
are dropped from the wire (snapshot with 2 nodes + 0 rels arrives as
'{"type":"snapshot","nodes":[...]}' with no 'relationships' key).
The reducer iterates msg.relationships, msg.nodes, msg.ids, msg.pairs
without nullish guards → crashes on first frame.
Defensive (?? []) on every reducer iteration. Same shape, idempotent.
Observed bundle: index-CEnQMVBy.js@2285:51356.
Snapshot proven empty-rel: GET /v1/flows/12e194090631a885/snapshot
returns {type:'snapshot',nodes:[2 items]} with relationships key absent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Manual bump — Build & Deploy Catalyst workflow's deploy job lost the
push race twice on PR #1411 merge. Images exist in GHCR; this commit
lands the template+values bump so Flux on contabo-mkt reconciles and
the natural-view canvas restore (FlowCanvasOrganic + fold badges +
depth chip) takes effect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Founder rejected the lane-layout + synthetic-phase scaffolding shipped
via PR #1399/#1400/#1407. This commit restores the founder-tuned
natural view (FlowCanvasOrganic) and adds the per-bubble fold-
disclosure badge + top-right depth chip on top of it.
Adapter (products/openova-flow/adapter-flux/):
- mapper.go: BuildFromHR now returns ONE leaf FlowNode + finish-to-
start edges from spec.dependsOn only. Deleted BuildRegionNode,
BuildPhaseNodes, BuildPhaseEdges, phaseLabels, phaseSortKey,
AllPhaseSuffixes, PhaseSuffix* constants, derivePhase, PhaseLabel,
PhaseSortKey. Node-id separator changed "/" → ":" so ids do not
collide with URL routing (founder hit "Not Found" drilling into
contabo/phase-0).
- hr_informer.go: dropped bootstrap(), tracker, nodeGroups,
reemitGroups(), buildGroupNode(). handle() is now single-leaf
upsert + dependsOn edges.
- rollup.go: deleted entirely (StatusTracker only existed for
synthetic group rollups).
- mapper_synthetic_test.go + rollup_test.go: deleted; mapper_test.go
updated for the ":" separator + no-synthetic-rels assertions.
UI (products/catalyst/bootstrap/ui/):
- FlowPage.tsx: switched from @openova/flow-canvas's FlowCanvas back
to FlowCanvasOrganic. Dropped lane-layout (regionDescriptorsFromFlow),
defaultFoldedAtDepth from @openova/flow-core, FoldControls chrome
strip. Kept useFlowStream + ?folded=/?depth= URL contract.
- flowStreamToOrganic.ts (new): bridges live SSE state to the Job[]
+ hints + region/family descriptors flowLayoutOrganic expects.
Treats `contains` rels as parent-child and FS/SS/FF/SF/triggers as
dependsOn.
- FlowCanvasOrganic.tsx: ADDITIVE optional props onFoldToggle,
badgeCounts, nodeActions, onNodeAction. Renders per-bubble "⊕ K"/
"⊖" disclosure badge on group bubbles when wired; right-click
opens a small action menu. Existing call sites are unchanged.
- Depth chip: ◀ L<n>/<max> ▶ pinned top-right of canvas host,
visible only when real groups exist in the data. Esc clears
manual fold overrides.
Verification:
- go build ./... in adapter-flux: clean
- go test ./... in adapter-flux: PASS (12 tests)
- tsc --noEmit on bootstrap/ui: clean
- vitest FlowPage + FlowCanvasOrganic.bounded: 25/25 PASS
- vitest JobDetail + distribution + flowLayoutOrganic + flow-bridge:
27/27 PASS
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(openova-flow-canvas): fold UX + lane layout + actions menu + cross-flow nav (Agent #9)
Wires the 6 founder-locked canvas views agreed 2026-05-11:
• Lane layout — `meta.layout: 'lane-vertical' | 'lane-horizontal'`
on a `contains`-parent renders the group as a rounded-rect
swim-lane; children pack inside (L→R horizontal, T→B vertical).
Lanes nest: region (vertical) → phase (horizontal) → HR bubbles.
Falls back to organic d3-force when no group declares a layout
hint, so single-region provisions look unchanged.
• Child-count badge `[N]` on every foldable parent — recursive
descendant count through `contains` edges, surfaced via
PositionedNode.descendantCount. Renders independent of fold
state per the founder-locked View 4 ASCII (region keeps `[43]`
even when expanded to phases only).
• Hover dim — onMouseEnter/Leave on a node dims non-neighbor
nodes + non-incident edges to 35% opacity. Selection / host /
neighbor rings keep full opacity per spec precedence.
• Right-click → adapter actions menu — new `actions` +
`onNodeAction` props on FlowCanvasProps. Renders the supplied
NodeAction[] (filtered by per-action `enabled` predicate) in a
NodeActionsMenu (click-outside + Esc dismissal, mirrors
ProfileMenu's canonical seam).
• `triggeredBy` cross-flow badge — when FlowInstance.triggeredBy
is non-empty, a top-left banner lists the parent flows with a
`[↗ open flow]` button → onNavigateFlow callback.
• Cross-flow edges — when a Relationship's `toFlowId` references a
flow not in the current canvas, the source node renders a
"→ flow" tag that calls onNavigateFlow.
FlowPage wires onNodeAction to POST /api/v1/flows/{id}/nodes/{nodeId}
/actions/{actionId} and onNavigateFlow to the router. Default action
list (Retry/Suspend/View logs) supplied by FlowPage; adapters can
override.
Canonical seam citations (per ARCHITECT-FIRST):
• core/src/layout.ts (Agent #1) — pure layout function. Extended
with LaneDescriptor[] + descendantCount, cycle-safe lane-depth
walks reusing the existing visited-set pattern. Lane geometry
stays in canvas (the layout is pure topology).
• widgets/auth/ProfileMenu.tsx — canonical click-outside + ESC
dismissal pattern. NodeActionsMenu mirrors this verbatim so we
stay consistent without a new radix/headless-ui dependency.
Tests: 25 core (was 20, +5 for lanes + descendantCount) + 22 canvas
(was 9, +13 for lane layout, badge math, hover dim, action menu,
triggeredBy banner, cross-flow tag). FlowPage tests still 8/8 green.
No vite/next builds (Rule 7). No kubectl writes (Rule 11). Lane
geometry has zero domain knowledge — the canvas never reads "phase"
or "region" as words; everything is `meta.layout` + `meta.isGroup`
+ `contains` edges driven by the adapter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(openflow-adapter-sse): subscribe to NAMED SSE events not just onmessage
Root cause of canvas "No nodes to render": the openova-flow-server
emits SSE frames with named event types per the contract:
event: snapshot
event: upsert-nodes
event: upsert-rels
...
EventSource's `onmessage` handler ONLY fires for the default
("message") event type. addEventListener with the explicit name is
required for named events. The hook only had `next.onmessage = onMessage`
so EVERY frame the server emitted was silently dropped; the local state
stayed at the initial empty value and FlowCanvas rendered the empty
fallback message.
Verified live: in-browser test showed onmessage_count=0,
addEventListener('snapshot') count=1 — exactly one snapshot frame
arrived but the hook ignored it.
Fix: register addEventListener for every event name in the contract
(snapshot, upsert-flow, upsert-nodes, upsert-rels, delete-nodes,
delete-rels, heartbeat). onmessage retained as defensive default.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without this env the proxy resolveFlowServerURL() falls back to
per-deployment FQDN lookup (https://openova-flow.<sovereignFQDN>) which
only exists on Sovereigns that already installed bootstrap-kit slot 56
with httproute=enabled. Every other catalyst-api deployment (mothership
contabo + Sovereigns that haven't reached cutover yet) returns 502 on
/api/v1/flows/{deploymentId}/snapshot — the live regression founder
saw at console.openova.io: "No nodes to render."
The env points at the in-cluster Service DNS for the LOCAL openova-flow-
server. Both the mothership (catalyst-system or catalyst namespace) and
each Sovereign chroot run the bp-openova-flow-server chart with a local
Service, so this URL is correct for every cluster catalyst-api runs in.
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the 6 founder-locked canvas views agreed 2026-05-11:
• Lane layout — `meta.layout: 'lane-vertical' | 'lane-horizontal'`
on a `contains`-parent renders the group as a rounded-rect
swim-lane; children pack inside (L→R horizontal, T→B vertical).
Lanes nest: region (vertical) → phase (horizontal) → HR bubbles.
Falls back to organic d3-force when no group declares a layout
hint, so single-region provisions look unchanged.
• Child-count badge `[N]` on every foldable parent — recursive
descendant count through `contains` edges, surfaced via
PositionedNode.descendantCount. Renders independent of fold
state per the founder-locked View 4 ASCII (region keeps `[43]`
even when expanded to phases only).
• Hover dim — onMouseEnter/Leave on a node dims non-neighbor
nodes + non-incident edges to 35% opacity. Selection / host /
neighbor rings keep full opacity per spec precedence.
• Right-click → adapter actions menu — new `actions` +
`onNodeAction` props on FlowCanvasProps. Renders the supplied
NodeAction[] (filtered by per-action `enabled` predicate) in a
NodeActionsMenu (click-outside + Esc dismissal, mirrors
ProfileMenu's canonical seam).
• `triggeredBy` cross-flow badge — when FlowInstance.triggeredBy
is non-empty, a top-left banner lists the parent flows with a
`[↗ open flow]` button → onNavigateFlow callback.
• Cross-flow edges — when a Relationship's `toFlowId` references a
flow not in the current canvas, the source node renders a
"→ flow" tag that calls onNavigateFlow.
FlowPage wires onNodeAction to POST /api/v1/flows/{id}/nodes/{nodeId}
/actions/{actionId} and onNavigateFlow to the router. Default action
list (Retry/Suspend/View logs) supplied by FlowPage; adapters can
override.
Canonical seam citations (per ARCHITECT-FIRST):
• core/src/layout.ts (Agent #1) — pure layout function. Extended
with LaneDescriptor[] + descendantCount, cycle-safe lane-depth
walks reusing the existing visited-set pattern. Lane geometry
stays in canvas (the layout is pure topology).
• widgets/auth/ProfileMenu.tsx — canonical click-outside + ESC
dismissal pattern. NodeActionsMenu mirrors this verbatim so we
stay consistent without a new radix/headless-ui dependency.
Tests: 25 core (was 20, +5 for lanes + descendantCount) + 22 canvas
(was 9, +13 for lane layout, badge math, hover dim, action menu,
triggeredBy banner, cross-flow tag). FlowPage tests still 8/8 green.
No vite/next builds (Rule 7). No kubectl writes (Rule 11). Lane
geometry has zero domain knowledge — the canvas never reads "phase"
or "region" as words; everything is `meta.layout` + `meta.isGroup`
+ `contains` edges driven by the adapter.
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mothership catalyst-api serves /sovereign/api/v1/flows/{deploymentId}/* for
every Sovereign's user-facing job view, but the previous resolver only knew
about OPENOVA_FLOW_SERVER_URL (or the in-cluster Service DNS default). On
the mothership both fall back to a name the kernel can't resolve, so prov #34
hit:
HTTP/2 502 openova-flow-server unreachable:
Get "http://openova-flow-server.catalyst-system.svc.cluster.local:8080/v1/flows/.../snapshot":
dial tcp: lookup openova-flow-server.catalyst-system.svc.cluster.local: no such host
Resolution order is now:
1. OPENOVA_FLOW_SERVER_URL env override — wins (chroot catalyst-api).
2. h.deployments.Load(deploymentId) → Request.SovereignFQDN → build
`https://openova-flow.<sovereignFQDN>` (HTTPRoute pattern documented
in platform/openova-flow-server/chart/values.yaml comment + the
bootstrap-kit overlay clusters/_template/bootstrap-kit/56-bp-openova-
flow-server.yaml which sets `hostname: openova-flow.${SOVEREIGN_FQDN}`).
3. No deployment in store (and no env): return 404 instead of silently
dialing a Service URL the mothership can't reach.
Canonical patterns cited (ARCHITECT-FIRST rule):
- PDM-by-deploymentId lookup: deployments.go GetDeployment lines 1201-1216
(h.deployments.Load(id) → (*Deployment).Request.SovereignFQDN). The
chrootEnsureDeployment fallback (jobs.go lines 53-86) covers the
chroot case; on the mother it returns nil and surfaces 404.
- Self-signed TLS skip-verify: deployment_handover_export.go line 62
(&tls.Config{InsecureSkipVerify: true} with nolint:gosec, gated by
explicit operator opt-in). Gated here on
OPENOVA_FLOW_TLS_SKIP_VERIFY=true so qa-loop Sovereigns minting
LE-staging "Fake LE Intermediate X1" certs are reachable, while
production stays strict.
SSE streaming logic is unchanged. Per docs/INVIOLABLE-PRINCIPLES.md #4
the only hostname literal added is the chart-documented prefix
`openova-flow.`; the FQDN suffix itself comes from the per-deployment
record at runtime.
Tests:
- TestFlowProxy_EnvOverride_TakesPrecedence — chroot path
- TestFlowProxy_DerivesURLFromDeploymentFQDN — mother path
- TestFlowProxy_DerivedURL_NotFoundReturns404
- TestFlowProxy_DerivedURL_EmptyFQDNReturns404
- TestFlowProxy_DerivedURL_PathAssembly
All 15 TestFlowProxy_* tests pass (go test ./internal/handler -run TestFlowProxy).
go vet ./... clean. go build ./cmd/api clean. The two pre-existing
TestHandleWhoami_* failures on origin/main are unrelated.
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build-ui on 841b6133 surfaced TS2304 "Cannot find name 'global'" in
several layout tests after the workspace-root npm ci fix exposed
errors that the prior react/d3-* failures had masked. The tests use
`global.fetch = vi.fn(...)` which requires @types/node ambient types.
tsconfig.app.json restricted `types` to ["vite/client"], so node
types weren't auto-loaded. Add "node" so the existing @types/node
devDep (^24.12.0) is in scope.
Co-authored-by: hatiyildiz <269457768+hatiyildiz=hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1399 (Agent #5) added npm workspaces at the repo root, but the
Containerfile still ran `npm ci` from /repo/products/catalyst/bootstrap/ui/
which bypasses workspace activation. Cross-workspace bare-spec imports
(react / d3-force / d3-drag / d3-selection) from the canvas package
source couldn't resolve, breaking the Docker build with ~120 TS2307
errors on commit 2c6595a3 (2026-05-11).
Fix: COPY the workspace-root package.json + package-lock.json + each
workspace's package.json BEFORE installing. Run `npm ci --workspaces
--include-workspace-root` from /repo. Then WORKDIR into the leaf for
the Vite build. This is the canonical npm workspaces flow.
Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the OpenovaFlow Foundation end-to-end so the catalyst-ui FlowPage
consumes the new openova-flow-server's merged multi-region SSE stream
(`GET /api/v1/flows/{deploymentId}/stream`) and renders the per-region
adapter-flux emissions directly via @openova/flow-canvas. Closes the
revert from PR #1394 and unblocks the prov #34 multi-region 2-bubble
demo (fsn1 + hel1 each install bp-gateway-api → two bubbles).
# What ships
## A. npm workspaces at repo root
• New `package.json` declares `openova-monorepo` private root with
three workspaces: products/openova-flow/{core,canvas} +
products/catalyst/bootstrap/ui.
• Root `package-lock.json` resolves @openova/flow-* as workspace
symlinks into the hoisted node_modules tree.
• react / react-dom / d3-* are now hoisted into the monorepo's root
node_modules, so flow-canvas's bare `import 'react'` resolves via
standard upward-walking node_modules — no per-package sibling
node_modules required (the root cause of PR #1389's build break).
## B. Catalyst-ui consumes @openova/flow-* via file: deps
• catalyst-ui's `package.json` adds `@openova/flow-core` and
`@openova/flow-canvas` as `file:../../../openova-flow/{core,canvas}`
deps so `npm ci` from within catalyst-ui (today's CI path) keeps
working without needing root-level `npm ci -ws`.
• Vite `resolve.alias` + tsconfig `paths` bind `@openova/flow-core`
and `@openova/flow-canvas` to the source-only `./src/index.ts`
entry points. `dedupe: ['react', 'react-dom']` guards against
double-instancing.
• `tsconfig.app.json` `include` adds the two flow-package src trees
so tsc covers them with catalyst-ui's strict settings (instead of
each package's standalone `tsc -p tsconfig.json`, which lacks the
React/d3 node_modules siblings).
## C. New SSE consumer + bridge
• `src/lib/openflow-adapter-sse.ts` — `useFlowStream` React hook +
pure `reduceFlowMessage` reducer. Consumes the contract verbatim
(snapshot / upsert-flow / upsert-nodes / upsert-rels / delete-nodes
/ delete-rels). Owns the EventSource lifecycle, GET /snapshot
pre-paint, capped exponential reconnect.
• `src/lib/flow-bridge.ts` — catalyst-specific glue:
`CATALYST_STATUS_PALETTE` (mirrors `--bubble-*` CSS tokens onto
`StatusTone`), `flowStateToArrays` (Map→Array materialiser),
`regionDescriptorsFromFlow` (derives FlowCanvas regions from live
region tags + optional wizard-store augmentation), and
`rollupFlowStatus` (provisioning-status rollup on the new
contract).
• NOT a Job-shape bridge — the legacy Job adapter from PR #1389
is gone. catalyst-ui never goes through Catalyst's legacy Job model
again; the SSE stream IS the source of truth.
## D. FlowPage.tsx rewired
• Drives `FlowCanvas` from `@openova/flow-canvas` directly off the
new hook.
• Multi-region support comes for free: per-region adapter-flux tags
every emitted FlowNode with `region: '<location-code>'`; the
canvas's swimlane layout buckets by `region`. Single-region
provisions render identically to before via a synthetic
fallback descriptor.
• Embedded mode preserved for JobDetail.
## E. Containerfile preserves CI build
• COPY products/openova-flow/{core,canvas}/{package.json,src/}
BEFORE `npm ci` so `file:` deps validate. Subsequent
`COPY products/` layers the rest (CONTRACT.md etc.) in.
# Tests
• 23 new tests, 0 regressions on adjacent areas:
- `openflow-adapter-sse.test.ts` (6) — reducer covers all 6
FlowMessage variants including delete-nodes' rel-prune cascade
AND a multi-region merge case (fsn1 + hel1 both install
bp-gateway-api).
- `flow-bridge.test.ts` (10) — palette completeness, Map→Array
ordering, region descriptor derivation/fallback, status rollup
including group-exclusion and terminal-failure detection.
- `FlowPage.test.tsx` (7) — empty-state mount, StatusStrip, no
legacy mode toggle, embedded variant.
• flow-core: 20/20 passing; flow-canvas: 9/9 passing.
• Vitest full suite: 1130 pass / 87 fail (87 fails are pre-existing
on main and unrelated — PinInput6, ProvisionPage, etc.). Baseline
on main is 1052 pass / 88 fail / 27 failed files; this PR brings
78 new passing tests and lowers failing files from 27 → 18.
# Constraints honoured (Rule 7)
• NO `vite build` / `next build` / `npm run build` / `npx playwright
test` / `npx playwright install`. Only `tsc --noEmit` + `vitest
run` + `npm install --package-lock-only`.
• NO `kubectl apply` / chart manifests touched (Rule 11).
• NO hardcoded URLs / regions / k3s flags. Endpoint composed from
`API_BASE`; regions derived from live FlowNode tags; deploymentId
from `useParams` (Rule 18).
• Two-repo discipline: openova-io/openova only (Rule 21).
• Conventional commit + Claude co-author footer (Rule 20).
• isolation:"worktree" — work landed in a dedicated worktree.
# Canonical-seam citations (ARCHITECT-FIRST)
1. PR #1389's `flow-bridge.ts` — reference for the shape of a
catalyst-ui→@openova/flow contract layer. NOT conflated: that
bridge translated legacy Catalyst Jobs into FlowNodes; this one
consumes the new SSE FlowMessage stream directly with no Job
intermediary.
2. `useDeploymentEvents.ts` (line 526+, `openStream` + `onerror`
reconnect + capped retry) — canonical SSE consumer pattern in
this codebase. `useFlowStream` mirrors it (capped exponential
backoff, idempotent reducer over replayed buffered events).
# Definition of Done — post-merge verification plan
1. CI green (catalyst-build builds the new Containerfile path).
2. `curl -k -b /tmp/cz-cookie-prov27.txt
'https://console.openova.io/sovereign/api/v1/flows/5a175e0a88c99cec/snapshot' | jq`
→ nodes[] contains BOTH `fsn1/bp-gateway-api` AND `hel1/bp-gateway-api`.
3. Browser test: navigate to
`https://console.openova.io/sovereign/provision/5a175e0a88c99cec/jobs/install-gateway-api`
→ expect TWO bubbles (one per region).
4. If snapshot is empty, inspect emitter DaemonSets:
`kubectl --context=omantel get pods -n openova-flow`.
Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final integration piece for OpenovaFlow infrastructure path —
catalyst-api proxy + cloud-init substitution for SOVEREIGN_DEPLOYMENT_ID
+ SOVEREIGN_REGION_KEY, so bp-openova-flow-emitter (slot 57) emits
distinct region tags on every FlowNode and the snapshot returns 2× per
HR on a multi-region Sovereign.
Builds on PR #1389 (TS core + canvas packages on disk), PR #1390 (Go
server + flux adapter + bootstrap-kit slots 56/57), PR #1394 (catalyst-
ui temporary revert until npm workspaces land), PR #1395 (chart no-op).
## Scope vs original Agent #3 brief
The brief planned a 4-section PR (proxy + cloud-init + FlowPage rewire +
runbook). Section 3 (catalyst-ui rewire of @openova/flow-*) is deferred:
PR #1394 reverted Agent #1's UI wiring because the Docker UI build has
no node_modules for the cross-workspace canvas source. Founder note on
#1394: "Agent #3 (or a follow-up) will re-wire them properly once npm
workspaces are configured at repo root."
This PR ships the infrastructure half (proxy + cloud-init + runbook).
The canvas-side rewire is a separate follow-up PR that needs npm
workspaces, not surgical edits to FlowPage.
## What ships
### 1. catalyst-api proxy /api/v1/flows/{deploymentId}/{snapshot,stream,events}
products/catalyst/bootstrap/api/internal/handler/openova_flow_proxy.go:
- GET /snapshot — JSON pass-through, headers + status forwarded
- GET /stream — unbuffered SSE pass-through using http.Flusher (NOT
httputil.ReverseProxy; that buffers and breaks text/event-stream)
- POST /events — body forwarded byte-for-byte
- Upstream URL from env OPENOVA_FLOW_SERVER_URL (default Sovereign
in-cluster Service DNS)
Routes registered in cmd/api/main.go inside the auth-gated chi.Group.
11 table-driven tests cover snapshot/events/stream pass-through, upstream
404/400/unreachable propagation, empty-deploymentId guard, SSE frames
arrive AS EMITTED, and env-default fallback.
### 2. Cloud-init threads SOVEREIGN_DEPLOYMENT_ID + SOVEREIGN_REGION_KEY
- infra/hetzner/cloudinit-control-plane.tftpl — two new postBuild.
substitute keys alongside SOVEREIGN_FQDN/SOVEREIGN_LB_IP
- infra/hetzner/main.tf — primary CP renders var.region as region key;
secondary CP renders each.key (e.g. "hel1-1") from for_each over
local.secondary_regions
- infra/hetzner/variables.tf — new sovereign_deployment_id var (string,
default "" for tofu mocks)
- provisioner.go writeTfvars — writes vars["sovereign_deployment_id"]
= req.DeploymentID
- bootstrap-kit slot 57 — swap placeholder ${SOVEREIGN_FQDN} / literal
"primary" for the new ${SOVEREIGN_DEPLOYMENT_ID} / ${SOVEREIGN_REGION_KEY}
envsubst keys
### 3. Deployment record flag
handler/deployments.go State() — emits `openovaFlowEnabled: true` on
every deployment. The catalyst-ui rewire (follow-up PR) will read this
to enable the openova-flow-server adapter; legacy provisions without
the flag will keep the bridge once the rewire lands.
### 4. Verification runbook
docs/runbooks/openova-flow-multi-region-verify.md — prov #34 POST body
(multi-region cpx42 fsn1+hel1, qaTestEnabled=true,
sovereignFQDN=omantel.biz), step-by-step kubectl/curl gates, visual
canvas checks (gated on the follow-up UI rewire), and a failure-class
triage table.
## Canonical-seam citations
1. SSE pattern — products/catalyst/bootstrap/api/internal/handler/
deployments.go:1244-1287 (StreamLogs): identical Content-Type +
Cache-Control + X-Accel-Buffering header set; identical
http.Flusher.Flush() after each write; identical r.Context().Done()
cancel path.
2. postBuild.substitute pattern — infra/hetzner/cloudinit-control-plane.tftpl:884-893
(SOVEREIGN_FQDN + SOVEREIGN_LB_IP): same indentation, same KEY: ${var}
form, dual emission at primary + secondary CP for_each in main.tf.
## Verification
```
$ go build ./...
(clean)
$ go vet ./...
(clean)
$ go test ./internal/handler/ -run TestFlowProxy -count=1 -race
ok github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/handler 1.410s
$ go test ./internal/provisioner/... -count=1
ok github.com/openova-io/openova/products/catalyst/bootstrap/api/internal/provisioner 0.025s
```
3 pre-existing test failures (TestHandleWhoami_NoRBACOmitsFields,
TestHandleWhoami_PinSessionRBACClaims,
TestUnstructuredToUserAccess_NilApplicationsBecomesEmpty) reproduce on
main HEAD without this PR — unrelated baseline state.
Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1389 wired the new @openova/flow-core + @openova/flow-canvas
packages into catalyst-ui via Vite alias + tsconfig paths. Build-image
tsc then tried to typecheck the canvas source (`products/openova-flow/
canvas/src/`) which has no sibling node_modules — bare imports for
react/d3-* fell off the resolution chain and the Docker UI build broke
on 16ec3399 with ~120 TS2307 errors.
PR #1392 attempted to add explicit paths for react/d3-* but pointed
at runtime .js dirs (no .d.ts), which broke ALL of catalyst-ui's
type resolution.
Cleanest emergency revert: undo the FlowPage refactor, restore vite
alias + tsconfig paths to pre-#1389 state, delete flow-bridge.{ts,test.ts}.
The new openova-flow/{core,canvas} source packages remain on disk —
Agent #3 (or a follow-up) will re-wire them properly once npm
workspaces are configured at repo root. Until then catalyst-ui uses
the legacy flowLayoutOrganic + FlowCanvasOrganic stack and builds
cleanly.
Multi-region rendering goal is unblocked: Agent #2's openova-flow-server
+ adapter-flux still deploy via bp-openova-flow-{server,emitter} HRs;
the canvas-side rewiring is the follow-up.
Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build-ui failed on 16ec3399 with TS2307 'Cannot find module react/d3-*'
when typechecking ../../../openova-flow/canvas/src/FlowCanvas.tsx.
Vite's bundler-mode module resolution starts from the imported file's
location. Canvas source lives at products/openova-flow/canvas/src/
with no sibling node_modules — bare-spec imports for react / react-dom /
d3-force / d3-drag / d3-selection fall off the resolution chain.
Fix: extend catalyst-ui tsconfig.app.json with explicit `paths` entries
mapping those bare specs to catalyst-ui's installed node_modules. Mirrors
the vite.config.ts alias additions Agent #1 introduced; both resolvers
now agree on the path. Also expands `include` to typecheck the canvas +
core sources from catalyst-ui's compilation root, so future regressions
land at PR-CI time, not build-image time.
Workspaces will eventually supersede this — Agent #2+#3 plan to land
real npm workspaces. Until then, paths is the canonical seam.
Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(openova-flow): extract flow-core + flow-canvas packages (drop parentId, adopt PMI temporal types)
OpenovaFlow Foundation — Agent #1 of 3. Splits flow visualisation out
of Catalyst into two standalone packages:
• @openova/flow-core: plugin-shaped contract (FlowInstance, FlowNode,
Relationship, FlowMessage, FlowAdapter) + pure layout engine.
• @openova/flow-canvas: React SVG canvas, zero OpenOva imports,
theme-decoupled via CSS variables.
Founder-locked design adopted:
• FlowInstance is first-class (definitionId / parentFlowId /
triggeredBy) — DAG vs DAG-run distinction works for Argo,
Temporal, Flux, custom.
• Node hierarchy moves from FlowNode.parentId to
Relationship{type:'contains'}. The legacy parentId field is gone
from the new contract (the bridge still adapts legacy Job.parentId
so catalyst-ui keeps working against today's catalyst-api).
• Edge types follow the PMI temporal taxonomy: finish-to-start (FS),
start-to-start (SS), finish-to-finish (FF), start-to-finish (SF)
+ 'triggers' (event-driven) + 'contains' (hierarchy). Failure-
conditioned edges render as overlays and are NOT counted toward
depth.
Layout engine port:
• Verbatim cycle-safety + parent-elision + MAX_VISIBLE_DEPTH cap
invariants from products/catalyst/.../flowLayoutOrganic.ts.
• Adds component-detection (weak connected components on the
blocking-DAG graph) so future UIs can paint gutters.
Catalyst-ui refactor:
• New products/catalyst/bootstrap/ui/src/lib/flow-bridge.ts adapts
legacy Job[] → FlowNode + Relationship[]. Single-responsibility
seam — the only place that still knows about the legacy shape.
• FlowPage now drives @openova/flow-canvas via the bridge.
• Legacy lib/flowLayoutOrganic.ts + sovereign/FlowCanvasOrganic.tsx
remain in place for non-FlowPage consumers (JobDetail breadcrumbs,
JobsTable rollups) until Agent #3 retires them with the real
catalyst-api FlowAdapter.
Tests:
• core: 20 tests (cycle-safety, parent-elision, RelType tagging,
component detection, defaultFoldedAtDepth) — all passing.
• canvas: 9 tests (render shape, RelType edge attrs, host/selection
rings, single-click debounce, fold toggle, navigate) — all passing.
• catalyst-ui: bridge 11 tests + FlowPage 9 tests (testid updated
flow-job-* → flow-node-* to match new contract) — all passing.
• tsc --noEmit: clean on all three workspaces.
Constraints honoured:
• Two-repo discipline: lands entirely in openova-io/openova (public).
• No npm run build / playwright install / playwright test.
• No kubectl apply / chart manifests touched.
• No hardcoded URLs, regions, k3s flags, chart versions.
• vitest --pool=threads --maxWorkers=2 --no-isolate everywhere.
Canonical-seam citations (ARCHITECT-FIRST):
• Monorepo packages alias via tsconfig + vite resolve (no top-level
`workspaces:` field exists in this monorepo today). Pattern
mirrors core/console + products/axon path-mapping style.
• CSS-variable theming follows the data-theme="light/dark" pattern
already in catalyst-ui's globals.css (line 87+).
Agents #2/#3 (out of scope for this PR):
• Agent #2: catalyst-api server that emits FlowMessage events on
a SSE endpoint per CONTRACT.md.
• Agent #3: replace lib/flow-bridge.ts with a real FlowAdapter
against catalyst-api, then delete legacy flowLayoutOrganic +
FlowCanvasOrganic.
Prov #34 readiness: the bridge forwards Job.region (when catalyst-api
begins emitting it) opaquely; perNodeHints feed region descriptors
to the new layout. Multi-region rendering is shape-ready end-to-end —
the catalyst-api just needs to emit region per job.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(openova-flow): resolve react/d3-* from ui node_modules — restore /wizard rendering
The flow-core/flow-canvas alias targets in products/openova-flow/{core,canvas}/src/
have no sibling node_modules tree (workspaces wiring lands with Agent #2), so
Vite/Rolldown could not resolve their peer-dependency imports (react, react-dom,
d3-force, d3-drag, d3-selection) from those source files. The production build
failed with "Rolldown failed to resolve import 'react' from .../FlowLogFeed.tsx",
no dist/ was emitted, and the CI Playwright smoke lane therefore got 404 on
/wizard (which itself does NOT use FlowPage, but the whole bundle was missing).
Fix: alias each peer dep bare-spec to this package's local node_modules, and
add resolve.dedupe for react/react-dom. Also reorders @openova/* entries above
the '@' prefix entry — both are correct in @rollup/plugin-alias today since
matching is whole-name not prefix, but reordering follows the documented
"longer key first" convention defensively.
Verified:
- `npx vite build --mode production` succeeds (3.5s, dist/index.html + asset
chunks emitted, wizard route in bundle).
- `npx vitest run` flow-related tests: src/lib/flow-bridge.test.ts +
src/pages/sovereign/FlowPage.test.tsx → 2 files / 21 tests / all pass
(baseline pre-fix had FlowPage.test.tsx failing).
- Other vitest failures present in baseline are pre-existing and flaky
across runs; not introduced by this fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(openova-flow): clarify alias-matching comment — the bare-spec react/d3 aliases are the real /wizard fix
The previous fix commit (3b19501) shipped two changes bundled together:
1. Reorder `@openova/flow-core` + `@openova/flow-canvas` above the
`@` alias (claimed: "@ would otherwise shadow @openova/...").
2. Add bare-spec aliases for react / react-dom / d3-force / d3-drag /
d3-selection pointing at this package's local node_modules.
Reading Vite's alias matcher (node_modules/vite/dist/node/chunks/node.js
line ~27349, function `matches`) shows that the `@` alias is matched
with EXACT equality OR `startsWith(@ + '/')` — so `@/foo` matches but
`@openova/flow-core` does NOT. The reorder was harmless but the comment
explaining it was misleading.
The bare-spec aliases (#2) ARE the actual fix. The aliased
`@openova/flow-{core,canvas}` source files live OUTSIDE this package
and have no sibling node_modules tree (workspace wiring lands with
Agent #2). Vite resolution from inside those source files would walk
up the filesystem looking for `node_modules/d3-drag`, find nothing,
and throw "Failed to resolve import 'd3-drag'" — which surfaces as a
white-screen wizard at `/wizard`. The aliases redirect bare imports
to the absolute paths under catalyst-ui's own node_modules.
Verification on this commit:
• `npx tsc --noEmit` from products/catalyst/bootstrap/ui — clean.
• `npx vitest run --pool=threads --maxWorkers=2 --no-isolate
src/pages/sovereign/FlowPage.test.tsx src/lib/flow-bridge.test.ts`
— 2 files / 21 tests / all pass.
• Reverting the prior fix and re-running the same vitest produces:
"Failed to resolve import 'd3-drag' from
../../../openova-flow/canvas/src/FlowCanvas.tsx" — proves the
aliases are load-bearing.
• `vite build` / `vite dev` / playwright NOT run locally (Rule 7);
CI on this push exercises the dev-server path the Playwright
smoke uses.
No behavior change vs 3b19501 — this commit only rewrites the inline
comment block so the next maintainer sees the real reason the aliases
exist.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(catalyst-platform): gitea-token-mint hook 60->180 iters for autoscaler cold-start (Fix#184)
Raise the catalyst-gitea-token-mint pre-install hook's Gitea-API wait
loop from a hardcoded 60x5s (300s = 5m) budget to a values-driven knob
(giteaWait.iterations x giteaWait.intervalSeconds, default 168x5 =
840s = 14m). Pairs with HR install.timeout=15m to leave 60s slack for
the rest of the umbrella install action.
Root-cause trace (4-layer) on prov #33 (multi-region fsn1+hel1, cpx42
workerCount=0+autoscaler):
bp-catalyst-platform HR (15m HR-timeout)
-> Helm pre-install hook Job: catalyst-gitea-token-mint
-> pod runs alpine/k8s curl loop:
while ! curl gitea-http.gitea.svc.cluster.local; do
sleep 5; i=$((i+1))
done
-> Hook gave up at iter 60 (= 5 min wall-time)
-> Meanwhile gitea Pod is Pending: autoscaler-hcloud still
scaling up workers in fsn1/hel1 (Fix#157 sizing default
workerCount=0 means cold start).
Budget arithmetic (post-Fix #184 default):
hook_wait_time = iterations x intervalSeconds = 168 x 5 = 840s (14 min)
HR install.timeout = 900s (15 min)
slack within HR budget = 60s ( 1 min)
The hook MUST complete strictly before HR remediates; the 60s slack
absorbs regular release resources rolling + post-install hooks after
the pre-install Job.
Canonical-seam citations:
- The hook lives at products/catalyst/chart/templates/
catalyst-gitea-token-secret.yaml (line ~303 pre-Fix), the
catalyst-gitea-token-mint Job's `args` block.
- Prior pattern: bp-keycloak chart 1.4.5 (Fix#146) introduced
keycloakConfigCli.availabilityCheck.timeout as a values knob -
same shape (chart-internal hook timing knob, distinct from the
outer HR timeout). See platform/keycloak/chart/values.yaml:413.
- The HR's install.timeout=15m lives at clusters/_template/
bootstrap-kit/13-bp-catalyst-platform.yaml:484 - the chart-internal
wait budget MUST stay strictly less than this.
Recurring class: same family as Fix#127 (bp-cutover HR 15m),
Fix#131 (bp-gitea HR 15m), Fix#150 (bp-harbor HR 15m), Fix#154
(HR-timeout audit). Those bumped the HelmRelease install.timeout.
This bumps the chart-INTERNAL wait loop budget inside the pre-
install hook Job, which is a different (lower) seam.
Per INVIOLABLE-PRINCIPLES #4 (never hardcode) the budget is fully
runtime-configurable via .Values.giteaWait. Operators may shorten on
known-warm-cluster overlays or extend on air-gapped Sovereigns.
Changes:
- products/catalyst/chart/templates/catalyst-gitea-token-secret.yaml:
replace hardcoded `seq 1 60` + `sleep 5` with templated
ITERATIONS/INTERVAL vars driven by .Values.giteaWait.{iterations,
intervalSeconds}.
- products/catalyst/chart/values.yaml: add giteaWait block with
defaults (iterations: 168, intervalSeconds: 5 = 14m budget).
- products/catalyst/chart/Chart.yaml: bump 1.4.139 -> 1.4.140 with
changelog entry capturing the 4-layer trace + budget arithmetic.
- clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml: bump
HelmRelease pin 1.4.138 -> 1.4.140 (skip 1.4.139 which is a no-op
packaging bump on main).
Verification:
- helm template renders cleanly (2799 lines, exit 0).
- Force-render with lookup gate bypassed shows ITERATIONS=168 +
INTERVAL=5 substituted into the rendered Job args.
- --set giteaWait.iterations=240 --set giteaWait.intervalSeconds=10
override confirmed to emit ITERATIONS=240 + INTERVAL=10.
Test plan (post-merge, on prov #34):
- kubectl logs -n catalyst-system catalyst-gitea-token-mint-* should
emit `waiting for gitea api ($i/168)` instead of `($i/60)`.
- bp-catalyst-platform HR reaches Ready=True within the 15m HR
budget (previously installFailures: 2 on prov #33).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(bootstrap-deps): reconcile pre-existing dep-graph audit drift
Two pre-existing drift items surfaced when dep-graph-audit ran on the
Fix#184 PR — both are in `main` already, not introduced here, but the
gate blocks any PR until the expected DAG matches the actual HRs.
1. `bp-catalyst-platform` (slot 13) — actual HR file declares
`bp-crossplane-claims` as an additional dependsOn edge (added in
chart-roll-rca iter-15, 2026-05-10, for the XRD-ordering race that
caused the omantel.biz 90-min wedge). Update expected-deps to
include it.
2. `bp-hcloud-ccm` (slot 55) — present on disk but absent from
expected-deps. Cloud-provider seam, no upstream dependencies.
Added with empty depends_on.
---------
Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Fix#180 PR #1383 merged with sed -i error: produced `import type from 'react'`
(empty import binding) which is a syntax error. Main build broken.
This PR removes the malformed line entirely.
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix#178 PR #1382 introduced new test file but left an unused `ReactNode`
import. Containerfile's `tsc -b` (strict mode) fails TS6133. CI Build &
Deploy Catalyst workflow blocked → Fix#178 features (sortable cols +
2-mode delete) never reached production.
Caught live: `npx tsc --noEmit` (Fix Author's local check) does NOT
enforce TS6133, but production `tsc -b` does.
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds operator-friendly admin controls to /sovereign/deployments:
* Sortable column headers — click any of FQDN / Status / Started /
Finished / Region to sort the table; second click toggles ASC↔DESC.
Default is Started DESC (newest first). Sort is client-side; the
list is small enough that round-tripping via ?sort= would only add
latency without operator benefit.
* Per-row Delete button → opens DeleteDeploymentModal with TWO modes
via a radio group:
1. "Delete record only (mother)" — DELETE /api/v1/deployments/{id}.
Removes the catalyst-api row (in-memory map + on-disk store +
kubeconfig file) but LEAVES THE HETZNER SOVEREIGN RUNNING.
2. "Delete record AND wipe Sovereign (kill the kid)" — POSTs to
the existing /wipe endpoint (tofu destroy + Hetzner orphan
purge + PDM release + record cleanup in one pass).
Both modes require typing the deployment FQDN to confirm (same
safety pattern WipeDeploymentModal uses, per Fix#46 / #914).
Deep-delete additionally requires the Hetzner token, which flows
straight through to the wipe handler (S3 + Hetzner creds never
logged, per principle #10).
Backend:
* New DeleteDeployment handler (record-only). Refuses adopted (422)
+ in-flight (409) + unknown (404, matching the issue #689
anti-enumeration posture). Idempotent: a second DELETE on a
vanished row returns 404 cleanly.
* Route wired in cmd/api/main.go alongside the existing /wipe and
/release-subdomain endpoints, inside the session-required group.
* 5 unit tests covering happy path / adopted / in-flight / unknown /
terminal-wiped paths.
Frontend:
* DeploymentsList now mounts the new modal and invalidates the
React Query cache (`catalyst, deployments, list`) on success so
the table refreshes without a hard reload.
* 8 unit tests covering default sort order, header-click sort
switching, ASC↔DESC toggle, status sort, delete button rendering
(enabled for terminal rows, disabled for in-flight), modal open
with both radios, conditional Hetzner-token field per mode.
Files:
* products/catalyst/bootstrap/api/internal/handler/deployments_delete.go
* products/catalyst/bootstrap/api/internal/handler/deployments_delete_test.go
* products/catalyst/bootstrap/api/cmd/api/main.go (route)
* products/catalyst/bootstrap/ui/src/components/CrudModals/DeleteDeploymentModal.tsx
* products/catalyst/bootstrap/ui/src/components/CrudModals/index.ts (export)
* products/catalyst/bootstrap/ui/src/pages/sovereign/DeploymentsList.tsx
* products/catalyst/bootstrap/ui/src/pages/sovereign/DeploymentsList.test.tsx
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>