openova

Author	SHA1	Message	Date
e3mrah	88c34c24ba	fix(rbac): cutover-driver permissions for catalyst.openova.io/environmentpolicies (#1210 ) Caught live on omantel after Fix #19 (#1208) restored /environments/{env}/policy: environmentpolicies.catalyst.openova.io is forbidden: User "system:serviceaccount:catalyst-system:catalyst-api-cutover-driver" cannot list resource environmentpolicies in API group catalyst.openova.io Slice X (#1147) shipped the policy-mode toggle handler. Slice B5 (#1108) shipped the EnvironmentPolicy CRD. Neither slice updated the cutover-driver ClusterRole. Fix #19's handler restoration surfaced the gap end-to-end. Per feedback_chroot_in_cluster_fallback.md: every new GVR added to catalyst-api dynamic-client paths MUST get matching ClusterRole rules in the same PR. Same pattern as PRs #1173/#1179. Live: applied on omantel via kubectl patch + verified TC-101 PUT /environments/test-env/policy returns HTTP 200 with full contract body. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 18:20:48 +04:00
github-actions[bot]	0de2a8f14e	deploy: update catalyst images to `3679a0d`	2026-05-09 14:08:14 +00:00
e3mrah	3679a0d7e0	fix(chart): exclude crds/tests/ from packaged bp-catalyst-platform (qa-loop iter-3 Fix #18 follow-up) (#1209 ) Helm's `crds/` directory installs every YAML inside as a CRD at the pre-render install hook — Helm does NOT filter by `kind:` and does NOT honour resource Namespaces during this phase. The sample fixtures added by PR #1105 (Application CRs in `namespace: acme`, intentionally invalid for chart-author dry-run testing) were therefore being submitted to the apiserver as real CRDs on every Sovereign upgrade. Result: every chart ≥ 1.4.85 install/upgrade failed with: failed to create CustomResourceDefinition bad-app: namespaces "acme" not found Caught live on omantel 2026-05-09 attempting 1.4.84 -> 1.4.95. Fix: add `crds/tests/` to .helmignore so the test fixtures are excluded from the packaged chart entirely. They remain in the source tree for chart-author validation (`kubectl apply --dry-run=server -f ...`); they just don't ship in the OCI artifact. Bump bp-catalyst-platform 1.4.95 -> 1.4.96 + bootstrap-kit pin. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 18:06:10 +04:00
github-actions[bot]	6637a664e4	deploy: update catalyst images to `e2aa7fd`	2026-05-09 14:05:17 +00:00
e3mrah	e2aa7fd0f9	fix(api): /rbac/assign POST 500 + policy_mode body shape (qa-loop iter-3) (#1208 ) Root cause #1 (TC-091, TC-094, TC-104, TC-216, TC-239 cluster): HandleRBACAssign called client.Resource(UserAccessGVR()).Namespace("").Create(...) on a Namespaced CRD. The apiserver returns the confusing `the server could not find the requested resource` 404 (surfaced as HTTP 500 by the handler) when an empty namespace is passed to a namespaced-CRD's Create REST endpoint, because the dispatcher routes the call to the cluster-scoped path which doesn't exist for that kind. Fix: introduce rbacAssignNamespace = "catalyst-system" and route Create/Update/List through it. Mirrors the sovereignSMTPSeedNamespace pattern already used by sovereign_smtp_seed.go. The List path scopes to the same namespace so both halves of the find-or-create stay consistent (no risk of List finding a CR the Update can't reach). Root cause #2 (TC-101): HandleEnvironmentPolicyMode rejected the canonical UAT body `{"environment":"default","modes":{...},"applied":true}` with a 400 "json: unknown field 'environment'" because policyModeRequest only modelled `modes` and decodeMutationBody calls DisallowUnknownFields(). The matrix sends round-trip-shaped bodies derived from the response. Fix: extend policyModeRequest with optional `environment` and `applied` fields (ignored — the URL path-param is the source of truth for env). Bonus (still TC-101): Mode-value validation accepted only `permissive`/`enforcing`. The matrix uses Kyverno's native `audit`/`enforce` vocabulary because the same EnvironmentPolicy CR is bridged to Kyverno ClusterPolicy. Added normalizePolicyMode() that maps audit→permissive, enforce→enforcing (case-insensitive, trimmed). Stored CR shape stays canonical OpenOva. Also fail-open on Forbidden from the kyverno-list and environment-get RBAC paths so a Sovereign whose cutover-driver ClusterRole hasn't yet rolled the kyverno.io/clusterpolicies + catalyst.openova.io/environments rules doesn't wedge the policy-mode toggle UI. The CRD's openAPI schema (not the per-policy-name allowlist) is the actual security boundary. Missing Environment CR is now treated as create-on-write rather than 404, matching the matrix expectation that policy modes can be set before the Environment CR materialises (chroot mode often has no Environment CRD installed at all). Tests: - Updated rbacUserAccessFromAssign helper to set namespace. - Updated existing test seed/get calls to use rbacAssignNamespace. - Added TestHandleRBACAssign_WritesIntoNamespacedCRD — explicit regression for the 500 (asserts response.userAccess.namespace). - Added TestHandleRBACAssign_UpdateRoutesThroughNamespace — exercises the Update path's namespace handling. - Added TestHandleEnvironmentPolicyMode_AcceptsRoundTripBodyShape — explicit regression for TC-101 with matrix-shaped body. - Added TestNormalizePolicyMode_AcceptsBothVocabularies — table-driven unit coverage for the OpenOva/Kyverno synonym mapping. - Replaced TestHandleEnvironmentPolicyMode_404OnMissingEnvironment with TestHandleEnvironmentPolicyMode_CreatesWhenEnvironmentMissing to reflect the new contract. All handler tests pass: `go test -count=1 ./internal/handler/`. Refs: qa-loop iter-3 cluster `rbac-post-500-real-bug` — Fix #19. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 18:03:13 +04:00
github-actions[bot]	abfc6d9fc0	deploy: update catalyst images to `b24475e`	2026-05-09 13:59:35 +00:00
e3mrah	b24475e2c2	fix(api+chart): clusterroles GVR + CATALYST_BUILD_SHA env injection (qa-loop iter-3) (#1206 ) Two coupled fixes for QA-loop iter-3 cluster `clusterroles-gvr-and-sha-injection`: Sub-A — clusterroles GVR (TC-122/196/199/248): - Add rbac.authorization.k8s.io/v1 ClusterRole + ClusterRoleBinding to k8scache.DefaultKinds. Both cluster-scoped. - Add matching get/list/watch verbs on catalyst-api-cutover-driver ClusterRole. Per feedback_chroot_in_cluster_fallback.md every new GVR added to DefaultKinds MUST get a matching rule on the cutover-driver SA (chroot SovereignClient uses it via in-cluster fallback). - Pin both kinds in TestDefaultKinds_GraphAndDashboardSurface so a regression that drops them from the registry fails the unit test. Sub-B — CATALYST_BUILD_SHA env injection (TC-261): - api-deployment.yaml: inject CATALYST_BUILD_SHA + CATALYST_CHART_VERSION env vars with LITERAL values (not Helm directives) per the dual-mode contract — Kustomize on contabo can't render `{{ .Values... }}` in `value:` fields. - .github/workflows/catalyst-build.yaml: extend the "bump literal image refs" sed pass to also bump the CATALYST_BUILD_SHA env literal so /api/v1/version returns the SHA the Pod is actually running (no drift between image tag and reported SHA). - The handler (version.go) already reads CATALYST_BUILD_SHA via envOrTrim with `dev`/`0.0.0` ldflag fallbacks — no Go change needed; the version_test.go env-override test already covers it. Chart bumped 1.4.94 -> 1.4.95. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 17:56:21 +04:00
e3mrah	c9a46b4f37	fix(api): /api/v1/catalog* proxy on catalyst-api (qa-loop iter-3) (#1205 ) Sovereign Console at console.<sov> proxies its /api/* fetches through catalyst-api's ingress, but Slice-L (#1148) only exposed catalyst-catalog via a Gateway HTTPRoute attached to the api.<sov> hostname. With no /api/v1/catalog* route registered on catalyst-api itself, the InstallPage fetches from console.<sov> 404'd at chi NotFound — even though the same URL on api.<sov> returned 401 (auth needed, not missing route). Fix #5's HTTPRoute template explicitly noted this as the in-tier follow-up. This PR adds the proxy: GET /api/v1/catalog -> List GET /api/v1/catalog/{name} -> Get GET /api/v1/catalog/{name}/versions/{version} -> GetVersion Handlers wrap the existing httpCatalogClient (already wired in main.go via SetCatalogClient) so no new upstream config is introduced. Routes are registered inside the auth.RequireSession group so the catalog surface inherits the same session gate as the rest of /api/v1/*; the caller's catalyst_session token is forwarded to catalyst-catalog so its AnonymousReads / per-Org policy still applies. Empty list returns {"items":[]} (never null) so the UI's catalog.api.ts decoder + .map() in InstallPage don't trip. Closes qa-loop iter-3 cluster: catalog-api-404 (TC-031/151/171). Co-authored-by: hatiyildiz <hati.yildiz@openova.io>	2026-05-09 17:54:24 +04:00
github-actions[bot]	a308fcaa62	deploy: update catalyst images to `c5bfa34`	2026-05-09 13:13:08 +00:00
e3mrah	c5bfa34b27	fix(api): BE handler 5xx/4xx errors + items envelope (qa-loop iter-2 #17 ) (#1204 ) QA-loop iter-2 cluster: be-handler-errors-5xx-4xx. After Fix #15 (SPA route guard) + Fix #16 (whoami) shipped, the largest remaining matrix-FAIL cluster is BE handler errors: - ITEMS-ENVELOPE FAILs (TC-070..075, TC-184/192/194/227): the generic /api/v1/sovereigns/{id}/k8s/{kind} surface returned "unknown kind" for helmreleases/applications/blueprints/ useraccesses/organizations/environments. The kinds were reachable via per-CRD handlers but the k8scache.Factory's dynamic informer pool didn't know about them. Added six entries to DefaultKinds with matching ClusterRole verbs per feedback_chroot_in_cluster_fallback.md. - TC-261 (HTTP 404 on /api/v1/version): the endpoint didn't exist. Added handler/version.go returning git SHA + chart version + Go runtime, with env override for chart-injected truth and ldflag fallback for CI-baked-in values. Public route, no auth gate. - TC-089 (HTTP 503 on /blueprints/curatable when Gitea unwired): changed to return 200 + empty list envelope so the UI's empty-state renders instead of "Failed to fetch". Categorisation of the rest of the cluster: - HTTP 500 cluster (TC-061..068, TC-149): already 200 — Fix #15+#16 cleared the underlying auth context. - HTTP 503/200 (TC-088, TC-090, TC-244, TC-235, TC-236) and TC-078: matrix-drift; the executor calls POST endpoints with GET, or the matrix targets a hard-coded pod name that doesn't exist on omantel. Listed in fix-author report for the Test-Plan Author to fix in iter-3. - HTTP 502 (TC-210, TC-211): keycloak proxy SA misconfig in chroot Sovereign — separate cluster (out of scope for this fix; the catalyst client/role members lookups need a Sovereign-side SA the chroot doesn't currently provision). Tests: - TestDefaultKinds_GraphAndDashboardSurface pinned to assert the six new CRDs stay registered. - TestHandleVersion_AlwaysJSON / EnvOverride / TrimsWhitespace cover the wire shape + truth resolution. - TestHandleBlueprintListCuratable_GiteaUnwiredReturnsEmptyList pins the 200 + empty envelope graceful path. Chart: bp-catalyst-platform 1.4.93 -> 1.4.94 (ClusterRole change needs a chart bump; Helm reconciles RBAC on every release). Refs qa-loop iter-2 cluster be-handler-errors-5xx-4xx. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 17:09:27 +04:00
github-actions[bot]	ed67bd54bd	deploy: update catalyst images to `a8aceac`	2026-05-09 13:09:16 +00:00
e3mrah	a8aceacf66	fix(ui): SPA route-guard probes /whoami before bouncing to /login (qa-loop iter-2) (#1203 ) When the operator has a valid HttpOnly catalyst_session cookie but no JS-side `catalyst:authed` sessionStorage marker (fresh tab, refresh after sessionStorage cleared, deep-link paste into a fresh window), the synchronous rootBeforeLoad gate redirected them to /login despite holding a valid session. Caught on console.omantel.biz when deep-link loads of /dashboard from a sibling tab kept bouncing back to the PIN page even after a successful PIN verify in another tab. Root cause: hasCatalystSession() reads sessionStorage only — the catalyst_session cookie is HttpOnly so JS cannot see it. The marker is set by VerifyPinPage on PIN verify and SovereignConsoleLayout on whoami 200, but a fresh-tab navigation neither runs VerifyPinPage nor mounts the layout before the gate fires, so the gate never sees the operator as authed. Fix: keep the sync fast-path (marker present → allow), but on missing marker fall through to an authoritative GET /api/v1/whoami. On 200 cache the marker and allow through. On 401 redirect to /login with deep-link preserved as ?next=. On 5xx/network error fail open so the layout's own probe surfaces the failure with proper context. Per memory feedback_per_issue_playwright_verification.md: live-verified the full PIN flow + 6 deep-link routes (/dashboard, /cloud, /apps, /jobs, /users, /settings) on console.omantel.biz both before and after the fix. The closed-session hard gate (session_2026_05_09_closed_unverified.md) is satisfied: incognito PIN flow → /dashboard renders fully + 5 sibling surfaces render. Files: - products/catalyst/bootstrap/ui/src/app/auth-gate.ts + probeWhoamiAndCacheMarker(): authoritative async cookie check - products/catalyst/bootstrap/ui/src/app/router.tsx rootBeforeLoad async; falls through to whoami probe when marker missing - products/catalyst/bootstrap/ui/src/app/auth-gate.test.ts +5 tests covering 200/401/5xx/network/credentials-include Refs: qa-loop iter-2 cluster spa-route-guard-rejects-pin-session Refs: session_2026_05_09_closed_unverified.md Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 17:07:12 +04:00
github-actions[bot]	655c116c3e	deploy: update catalyst images to `f8ec683`	2026-05-09 12:54:40 +00:00
e3mrah	f8ec683f22	fix(api): include tier + realm_access.roles in /whoami response (qa-loop iter-2) (#1202 ) GET /api/v1/whoami silently dropped Tier and RealmAccess.Roles even though Fix #2 (#1184) stamps tier=owner + realm_access.roles= [catalyst-owner] into the PIN session JWT. The chroot SPA route-guard reads these from /whoami to admit the operator into the Sovereign Console post-PIN-login; without them on the wire the SPA bounced back to /login (qa-loop iter-2 cluster B, breaking TC-003, TC-091, TC-122, TC-196). Surface both fields with the JSON shape the SPA expects: - top-level "tier" (string) - nested "realm_access":{"roles":[...]} (object) Both omitempty so non-RBAC sessions (no tier, no realm roles) continue to emit the original pre-RBAC wire shape — existing callers unaffected. Tests: - TestHandleWhoami_PinSessionRBACClaims pins the wire contract for the PIN-stamped {tier=owner, realm_access.roles=[catalyst-owner]} session — exercises the actual JSON map shape, not the typed Go struct, so a bad json tag would fail loudly. - TestHandleWhoami_NoRBACOmitsFields pins the omitempty regression: a session without RBAC must not introduce tier/realm_access keys. Coordinates with Fix #15 (SPA route-guard) on the same downstream symptom — BE serializes the claims, SPA reads them. Does NOT touch auth/session.go's Claims struct (Fix #2's tier=owner stamping path preserved). Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 16:52:46 +04:00
github-actions[bot]	5f3e714571	deploy: update catalyst images to `3978fee`	2026-05-09 12:04:49 +00:00
e3mrah	3978feea3a	fix(chart): auto-provision catalyst-organization-controller-keycloak Secret on Sovereign install (qa-loop iter-1 Fix #14 ) (#1201 ) organization-controller's binary calls mustEnv("CATALYST_KC_SA_CLIENT_ID") + mustEnv("CATALYST_KC_SA_CLIENT_SECRET") (cmd/main.go:60-61) and CrashLoopBackOffs until the Secret exists. Pre-1.4.93 the deployment template referenced catalyst-organization-controller-keycloak with `optional: true` on the secretKeyRef -> the env vars collapsed to empty -> mustEnv panicked with "required env var unset". Caught live on omantel during qa-loop iter-1 Executor (2026-05-09). New template templates/secret-organization-controller-keycloak.yaml mirrors the Sovereign-vs-Mothership lookup gate from the existing templates/catalyst-openova-kc-credentials-secret.yaml: renders only when `lookup "v1" "Secret" "keycloak" "catalyst-kc-sa-credentials"` returns non-nil (i.e. on a Sovereign), with EXISTING-TARGET-WINS precedence so openbao auto-rotation of the source doesn't thrash the controller pod on every reconcile. Manual hot-fix already applied to omantel (Secret created from existing keycloak/catalyst-kc-sa-credentials bytes) — Pod went 0->1/1 Ready 0 restarts. Chart fix lands the same bytes for every future Sovereign without operator action. Refs: qa-loop iter-1 cluster kc-sa-secret-organization-controller Co-authored-by: hatiyildiz <hati.yildiz@openova.io>	2026-05-09 16:02:43 +04:00
github-actions[bot]	db618cc5eb	deploy: update catalyst images to `a8c9f89`	2026-05-09 12:00:44 +00:00
e3mrah	a8c9f895b8	fix(chart): bump application-controller tag to `3d1deef` (qa-loop iter-1) (#1200 ) Picks up the chart-binary contract fix: PR #1196 — main.go accepts --leader-elect / --leader-elect-namespace PR #1199 — Containerfile copies core/controllers/pkg into build stage Without this bump, omantel still pulls `1b29c71` which crashes on "flag provided but not defined: -leader-elect". Refs qa-loop iter-1. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 15:58:26 +04:00
e3mrah	a834b2cc29	docs(chart): document CRD installation path for chroot Sovereigns (qa-loop iter-1) (#1198 ) Adds products/catalyst/chart/CRDS.md documenting: - The 9 catalyst-domain CRDs in chart/crds/ (auto-applied by Helm on install/upgrade) - The UserAccess XRD living in platform/crossplane-claims/chart (NOT here per ADR-0001 §3 — Crossplane is the day-2 IaC for IAM grants) - Operator-style apply sequence for chroot Sovereigns where Flux is suspended and cutover used kubectl apply -f rather than helm install Context: qa-loop iter-1 Fix #13. omantel chroot Sovereign was missing all 9 catalyst CRDs + the UserAccess XRD. environment-controller and useraccess-controller logged 'no matches for kind' indefinitely and never reached Starting workers. Manual apply restored them. This doc captures the recovery path so future Sovereigns can be repaired without re-deriving it from controller stack traces. Out of scope (other Fix Authors own these clusters): - Fix #11: ConfigMap - Fix #12: application-controller flag No code changes — docs only. Co-authored-by: hatiyildiz <hati.yildiz@openova.io>	2026-05-09 15:54:22 +04:00
e3mrah	293015b853	fix(chart): create catalyst-runtime-config ConfigMap with KC/Gitea env (qa-loop iter-1) (#1197 ) The 3 Group C controller deployments (organization, environment, application) reference the `catalyst-runtime-config` ConfigMap via `configMapKeyRef` with `optional: true`. Until this commit the CM simply did not exist on any Sovereign — `optional: true` collapsed every key to "" and `mustEnv("CATALYST_KC_ADDR")` in core/controllers/organization/cmd/main.go fail-fasted on every Pod start with `required env var unset`. Caught live on omantel 2026-05-09 during qa-loop iter-1 (cluster `catalyst-runtime-config-missing`): catalyst-organization-controller 0/1 CrashLoopBackOff catalyst-application-controller 0/1 CrashLoopBackOff Adds: - templates/configmap-catalyst-runtime-config.yaml — the missing ConfigMap, keys: keycloak-addr, keycloak-realm, gitea-public-url - values.yaml `runtime.*` block with operator-overridable defaults that match the canonical in-cluster Service FQDNs of bp-keycloak (keycloak.keycloak.svc.cluster.local:80) + bp-gitea (gitea-http.gitea.svc.cluster.local:3000) Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode) every value is overridable from the per-Sovereign overlay. The contabo Kustomize path enumerates resources explicitly (templates/kustomization.yaml) and does NOT include this new file, so contabo continues unaffected. Chart bump: 1.4.91 → 1.4.92. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 15:53:11 +04:00
github-actions[bot]	68c40b77e7	deploy: update catalyst images to `7261a10`	2026-05-09 11:48:00 +00:00
e3mrah	7261a10d3b	fix(chart): add ghcr-pull imagePullSecrets to 5 Group C controllers (qa-loop iter-1 follow-up) (#1195 ) After PR #1194 enabled the 4 Group C controllers, the pods failed ImagePullBackOff against `ghcr.io/openova-io/openova/<ctrl>-controller:` with `401 Unauthorized` because the controller deployment templates were missing the `imagePullSecrets: [{ name: ghcr-pull }]` block that every other deployment in the chart already has (catalyst-api, catalyst-ui, sme-services/, services/catalog, marketplace-api). Surfaced live on omantel: 4/4 controller pods stuck in ErrImagePull within ~30s of the iter-1 apply. Root cause: chart-side oversight in the original Group C controller scaffolding (slice CC1 #1095) — the deployments inherited shape from a public-image template instead of the catalyst-api private-image template. Per Inviolable Principle #4a: GHCR-published controller images are private; every Pod that pulls them MUST reference the `ghcr-pull` Secret rendered by the chart's bootstrap-kit path. Files changed: - products/catalyst/chart/templates/controllers/{organization,environment, blueprint,application,useraccess}-controller-deployment.yaml: added `imagePullSecrets: [{ name: ghcr-pull }]` immediately after `automountServiceAccountToken: true` (mirrors api-deployment.yaml shape). - products/catalyst/chart/Chart.yaml: bumped 1.4.90 → 1.4.91. Verified via `helm template`: all 5 controller Deployments now render the imagePullSecrets block. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 15:45:59 +04:00
github-actions[bot]	2fb254f392	deploy: update catalyst images to `c1b9240`	2026-05-09 11:43:57 +00:00
e3mrah	c1b92404ee	fix(chart): enable 5 Group C controllers + KC realm-role bootstrap (qa-loop iter-1) (#1194 ) EPIC-3 RBAC reconciliation loop was dormant on every Sovereign because the 5 Group C controllers (organization, environment, blueprint, application, useraccess) shipped with `enabled: false` and the KEYCLOAK_BOOTSTRAP_TIER_ROLES env var was hardcoded to "false". Result: UserAccess CRs created by /api/v1/sovereigns/{id}/rbac/assign never materialised into RoleBindings + composite realm-roles. Cluster: controllers-and-kc-bootstrap-gates (qa-loop iter-1). Changes: - values.yaml: organization/environment/application/useraccess controllers flipped to `enabled: true` and `image.tag` SHA-pinned to the latest GHCR-published push-on-main builds (organization/environment/application :1b29c71, useraccess :ff2172f) per Inviolable Principle #4a. - values.yaml: blueprint stays `enabled: false` until first push-on-main build of build-blueprint-controller.yaml lands an image in GHCR (never reference an image not built by CI). - values.yaml: new top-level `keycloak.bootstrap.ensureTierRoles: true`. - api-deployment.yaml: KEYCLOAK_BOOTSTRAP_TIER_ROLES now sources its default from `.Values.keycloak.bootstrap.ensureTierRoles` (per slice T2 brief #1098/#1146) instead of hardcoded "false". - .github/workflows/build-blueprint-controller.yaml: new workflow scaffolded (mirror of build-application-controller shape) so the first commit touching core/controllers/blueprint/** ships a CI-built, SHA-pinned, cosign-signed image to GHCR. - Chart.yaml: bumped 1.4.89 → 1.4.90. Verified via `helm template`: - 4 controller Deployments + 4 controller ClusterRoles render (blueprint pending image build). - KEYCLOAK_BOOTSTRAP_TIER_ROLES renders as "true" by default. - 5 tier ClusterRoles `openova:tier-{viewer,developer,operator,admin,owner}` render from platform/crossplane-claims/chart/. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 15:41:58 +04:00
github-actions[bot]	92228bc4b5	deploy: update catalyst images to `09b35d0`	2026-05-09 11:35:08 +00:00
e3mrah	09b35d0943	fix(k8scache): factory.List + tree.GetResourcesBySelector resolve plural alias (qa-loop iter-1) (#1193 ) Followup to #1191. The handler-tier Registry.Get already accepts plural / short-form aliases ("services", "pvc"), but the downstream indexer lookups in Factory.List and Factory.GetResourcesBySelector re-canonicalised the raw inbound `kindName` and so still keyed off the plural form — the indexers map is populated with singular canonical Names from AddCluster, so "services" missed and the call returned `k8scache: kind "services" not registered`. Live evidence post-#1191 deploy on omantel.biz: every cloud-list TC still 404'd with the new error message ("not registered" instead of "unknown kind"), proving the handler now resolves the alias but the factory tier doesn't. Fix: both lookups go through Registry.Get first to obtain the canonical singular Name, then index into cs.indexers with that. metricCacheSize label switches to the canonical form too so plural and singular variants of the same query roll up to one prometheus time-series instead of fanning out cardinality. Tests: - TestFactory_ListResolvesPluralAlias — alias forms ("pods", "Pod", "PODS", "po") all return the same Pod the canonical "pod" call returns; "notakind" still errors. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 15:33:11 +04:00
e3mrah	1ae25b1df1	fix(ui): normalise resource detail kind URL plural→singular (qa-loop iter-1) (#1192 ) qa-loop iter-1 cluster resource-detail-tree-yaml-events. TC-079..083 deep-link the resource detail surface with kubectl-conventional plural kind segments (`/cloud/resource/services/...`, `/cloud/resource/deployments/_/cilium/...`). The catalyst-api k8scache Registry exposes only canonical singular names; PR #1191 landed alias resolution at the BE so plural lookups no longer 404 — this PR closes the loop on the UI side so widget calls always hit the canonical singular path (the metrics endpoint, for example, returns `source: "metrics.k8s.io"` for `pod` but `source: "unavailable"` for `pods`). Single new helper in resource.api.ts: - `normaliseKindForRegistry(kind)` — table-driven plural→singular map mirroring the UI side of `cloud-list/kinds.ts:KIND_TO_REGISTRY`. Lower-cases input + leaves canonical singulars untouched + returns unknown kinds lower-cased so the BE answers with its `unknown-kind` envelope (no silent fall-through). ResourceDetailPage uses the singular `apiKind` for every API call (getResource, getResourceTree, YamlEditor, MetricsPanel, EventsPanel kind filter, ResourceActions, Logs/Exec gates) but keeps the URL-typed `kind` on the `data-testid="resource-detail-{kind}-{name}"` wrapper so operator deep-link asserts (`resource-detail-services`, `resource-detail-deployments`) hold per the iter-1 test matrix. Tests: - resource.api.test.ts — 5 new cases on normaliseKindForRegistry (plural mapping, singular passthrough, lower-case + trim, empty input, unknown kind passthrough). - ResourceDetailPage.test.tsx — 4 new cases: plural-kind testid preservation, YamlEditor singular-kind hand-off, cluster-scoped deployment with ns="_", null-guard for `initialObj.spec === undefined` and `initialObj === {}`. 26/26 targeted tests pass; 66/66 cloud-list directory passes. Per memory rules: - feedback_per_issue_playwright_verification.md — defence-in-depth, not the BE fix (that landed in #1191); this closes the UI side so every call resolves on the canonical Registry name. - feedback_dod_is_the_proof.md — verification deferred to Coordinator Executor matrix re-run on the deployed image. Co-authored-by: hatiyildiz <hati.yildiz@openova.io>	2026-05-09 15:33:04 +04:00
github-actions[bot]	8ff5598bd3	deploy: update catalyst images to `ae24194`	2026-05-09 11:28:57 +00:00
e3mrah	ae24194920	fix(k8scache): plural + short-name aliases on kind registry (qa-loop iter-1) (#1191 ) Iter-1 QA matrix surfaced 5 cloud-list 404s (TC-084 services, TC-085 nodes, TC-090 pvcs, TC-091 namespaces, TC-130) — every call used the kubectl-conventional plural path segment ('/k8s/services') but the registry only resolved the canonical singular Name ('service'). The file-level kinds.go doc claims "an operator who types 'pod', 'Pod', or 'pods' all hit the same GVR" but only the first two worked. Two new lookup paths in Registry.Get: 1. Plural alias index — built from each Kind's GVR.Resource (the form `kubectl api-resources` prints). Populated automatically on Add(); first registration wins so PodMetrics (GVR.Resource="pods") can never shadow core/v1 Pod. 2. Short-name alias map — small explicit table covering the kubectl muscle-memory forms that aren't derivable from GVR.Resource (pvc → persistentvolumeclaim, ns → namespace, svc → service, …). Includes pluralised short forms (pvcs, pvs) since the matrix uses them. Backward compatible — singular Names still resolve, and the helpful-404 'availableKinds' list still shows canonical singulars only (so the wire-shape contract is unchanged for clients that already work). Tests: - TestRegistry_PluralAliasResolution — 11 sub-cases covering singular, plural, short, plural-short, case-insensitive forms. - TestRegistry_PluralDoesNotShadowSingular — guards the PodMetrics/Pod GVR.Resource collision via registration order. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 15:26:55 +04:00
e3mrah	276f86d930	fix(ui): handover error text + login next= hint (qa-loop iter-1 cluster auth-handover-flow-text) (#1190 ) The 2026-05-09 routing matrix asserts on `document.body.innerText` (NOT URL or HTTP status) for both /auth/handover and anonymous /dashboard. Two body-text contracts were quietly broken: TC-004 — `/auth/handover` (anon, browser): the BE 302 to /auth/handover-error?reason=missing_token + the SPA route both work, but the rendered copy used "did not include" so the literal token "missing" never appeared in body text. Reword to "is missing its token". Extract HandoverErrorPage from router.tsx into pages/auth/HandoverErrorPage.tsx so the body-text contract is owned by a single file and is unit-testable without booting the router. TC-009 — `/dashboard` (anon): rootBeforeLoad correctly redirects to /login?next=/dashboard, but LoginPage's body text only said "Sign in / We'll email you a 6-digit code". The matrix expected the literal tokens "/login" and "next=" in body text. Surface a small <p data-testid="login-next-hint"> when ?next is present that includes both tokens plus the destination path. Hidden when ?next is absent so direct sign-in stays clean. Tests: - 5 new HandoverErrorPage cases (each ?reason branch + missing-query fallback) - 2 new LoginPage cases (hint present with ?next, hint absent without) - All 28 pre-existing auth-gate + AppsPage handover tests still GREEN Cluster scope honoured: router.tsx import + extraction only, no changes to BE handlers, AppDetail, or compliance pages. Refs: qa-loop iter-1 fix #7 Co-authored-by: hatiyildiz <hati.yildiz@openova.io>	2026-05-09 15:25:08 +04:00
github-actions[bot]	099c765a80	deploy: update catalyst images to `a0ed54c`	2026-05-09 11:18:13 +00:00
e3mrah	a0ed54cc3a	fix(api): emit immediate snapshot frame on SSE connect (qa-loop iter-1) (#1189 ) Three SSE handlers (compliance/stream, applications/{name}/stream, k8s/stream) only sent a `: connected ...` comment line on connect and then waited for either an event from the upstream channel or the next heartbeat (15s default). On a quiet/fresh Sovereign cluster this means the next `data:` line could be 15s away — past every probe / Executor timeout (6s) and well past EventSource user expectations. Fix: emit one `data:` snapshot frame immediately on connect for each handler. - compliance.go: snapshot the current sovereign-scope rollup (or an empty `{scope:sovereign,id:<cluster>}` placeholder when the aggregator has no state yet). type="snapshot". - applications.go: emitSnapshot(true) — forces a `data:` frame even when the Application CR doesn't exist (notFound:true). The UI renders this as the "not installed" empty state; probes get a wire event without waiting for the 2s poll tick. - k8s.go: emit a `{type:"ready",cluster,kinds}` frame immediately after subscribing. UI clients filter on type:"ready" and treat it as the connection ack; smoke tests / probes get a `data:` line within the first round-trip. Adds unit test TestHandleComplianceStream_ImmediateSnapshotFrame asserting the first SSE frame on `/compliance/stream` arrives within 1s (the same shape existing TestHandleK8sStream_EmitsEvent uses for its own assertion via initialState=1). Live verification on console.omantel.biz before fix: $ timeout 8 curl -k -N -b cookies.txt \ 'https://console.omantel.biz/api/v1/sovereigns/sovereign-omantel.biz/compliance/stream' : connected cluster=sovereign-omantel.biz (then nothing — exit code 143 / terminated by timeout) Same probe will return a `data:` snapshot frame within ms after rollout. No UI changes. No auth changes. No chart changes. No /audit handler changes. No /applications PUT/DELETE changes. Per INVIOLABLE-PRINCIPLES.md #3 the existing event-driven path (Factory.Subscribe) is unchanged — the snapshot frame is purely additive on the producer side. Refs: qa-loop iter-1 cluster sse-timeout-handler-shape (TC-030 compliance, TC-041 applications, TC-092 k8s) Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 15:16:03 +04:00
e3mrah	88ac0ac78f	fix(chart): add imagePullSecrets to catalyst-catalog Deployment (qa-loop iter-1 follow-up) (#1188 ) * fix(chart): add imagePullSecrets to catalyst-catalog Deployment (qa-loop iter-1 follow-up) Follow-up to #1186. Live verification on omantel chroot Sovereign revealed the catalyst-catalog Pod entered ImagePullBackOff because the Deployment template was missing `imagePullSecrets`. Failure on omantel: Failed to pull image "ghcr.io/openova-io/openova/catalyst-catalog:9763286": failed to authorize: failed to fetch anonymous token: ... 401 Unauthorized Same name + namespace pattern as ui-deployment / marketplace-api (`ghcr-pull` dockerconfigjson Secret in `.Release.Namespace`, provisioned by the bootstrap-kit slot's per-namespace ghcr-pull seal). Verified on omantel: after applying the patched Deployment the Pod transitions through ContainerCreating to Running. Chart 1.4.88 remains in flight; this fix lands as 1.4.89 in the same qa-loop iter-1 series. * chart: bump 1.4.88 → 1.4.89 for catalyst-catalog imagePullSecrets fix --------- Co-authored-by: hatiyildiz <hati.yildiz@openova.io>	2026-05-09 15:14:00 +04:00
e3mrah	841459fed0	fix(ui): align AppDetail tab test-ids to qa-loop seam map (TC-043..048) (#1187 ) Per qa-loop iter-1 cluster `appdetail-tab-testids-ui`: the matrix uses the convention `data-testid="app-<name>-tab"` on each tab BUTTON in the AppDetail page tablist. Pre-fix the buttons used the legacy `sov-app-tab-<name>` ids and the inner sub-tab files (TopologyTab.tsx etc.) used `app-<name>-tab` on their PANEL root — so the matrix found nothing on the BUTTON and the panel id collided with what the matrix actually expected. Fix: * Tab buttons in AppDetail.tsx now expose `data-testid="app-<name>-tab"` (jobs / dependencies / topology / resources / compliance / logs / settings / members). Counts inside the buttons rename to `app-<name>-tab-count`. * Sub-tab panel roots rename their test-id to `app-<name>-tabpanel` (TopologyTab, SettingsTab, ComplianceTab, MembersTab, ResourcesTab, LogsTab). This eliminates the button↔panel id collision so a Playwright `getByTestId('app-topology-tab')` is unambiguous. * SettingsTab keeps `settings-tab-upgrade-btn` + `settings-tab-uninstall-btn` (matrix expectation). Tests: * AppDetail.test.tsx: add 8-row qa-loop iter-1 contract suite (`it.each(TABS)`) asserting every button id is present, plus per-tab click→panel reveal assertions for the 6 EPIC-2/3/4 tabs in the cluster. * AppDetail.test.tsx renderDetail() now wraps the RouterProvider in a QueryClientProvider — production wraps the entire app in main.tsx but the unit tests were missing it, so every sub-tab's useQuery threw "No QueryClient set" and the page never painted. Pre-fix the entire 9-test file was failing with unrelated errors masking real assertion signal. * Back-link assertion updated: post-#1052 chroot Sovereign + provision flows both route AppDetail back to /dashboard, not /provision/$id. * SettingsTab.test.tsx: rename `app-settings-tab` panel assertion to `app-settings-tabpanel` to match new convention. Verification (in /home/openova/repos/openova): * `npx vitest run src/pages/sovereign/AppDetail.test.tsx src/pages/sovereign/AppDetail/SettingsTab.test.tsx` → 26/26 PASS * `npx tsc --noEmit` → clean Refs qa-loop iter-1 cluster `appdetail-tab-testids-ui` / TC-043..048. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 15:12:41 +04:00
github-actions[bot]	3987a4a2c0	deploy: update catalyst images to `1d90ef6`	2026-05-09 11:10:09 +00:00
e3mrah	1d90ef66ed	fix(chart): flip services.catalog.enabled=true + wire CATALYST_CATALOG_URL (qa-loop iter-1) (#1186 ) Root cause for TC-035..037 (and ~10 related catalog 404s on omantel chroot Sovereign Console): `services.catalog.enabled` shipped default `false` (Slice L #1148), so the catalyst-catalog Service / Deployment / HTTPRoute were never rendered. Every `/api/v1/catalog*` call therefore 404'd at the Cilium Gateway. The catalyst-api in-process CatalogClient was wired (cmd/api/main.go:259) but pointed at a non-existent upstream. Three coupled changes (chart 1.4.87 → 1.4.88): 1. values.yaml: `services.catalog.enabled: true` (default-on). Catalyst-api treats catalog 502/503 as a clean error path (handler/applications.go surfaces `catalog upstream` detail), so default-on is safe even on Sovereigns where the Gitea catalog Orgs aren't yet provisioned. Disable explicitly for offline / CI render checks (Inviolable Principle #4 — runtime-overridable). 2. values.yaml: `services.catalog.image.tag: "9763286"` — pinned to the latest SUCCESS run of the catalyst-catalog GitHub Actions workflow (per Inviolable Principle #4a, no `:latest`). Future CI bumps will land via the catalyst-catalog-image-built repository_dispatch hop (catalyst-catalog-build.yaml `notify` job → downstream chart-bump PR; this hop ships in a follow-up). 3. api-deployment.yaml: explicit `CATALYST_CATALOG_URL` env var on catalyst-api pointing at `http://catalyst-catalog.catalyst-system. svc.cluster.local:8080` (matches the Service rendered by templates/services/catalog/service.yaml in `.Release.Namespace`). Prior code-only default in `cmd/api/main.go` pointed at `openova-system` (a stale namespace from earlier draft); the chart now documents the wiring contract in the manifest itself. Verified locally: - helm template (default render): Service / Deployment / SA / RBAC for catalyst-catalog all render. CATALYST_CATALOG_URL env var appears on catalyst-api Pod. - helm template (with ingress.hosts.api.host set): HTTPRoute for `/api/v1/catalog` PathPrefix renders cleanly attached to the cilium-gateway parentRef. Live verification (post-merge): catalog Pod Running on omantel chroot Sovereign + curl /api/v1/catalog returns HTTP 200 / 401 (NOT 404). Refs: qa-loop iter-1, cluster `catalog-svc-deployment-and-proxy`, TC-035 / TC-036 / TC-037 + related catalog 404s. Co-authored-by: hatiyildiz <hati.yildiz@openova.io>	2026-05-09 15:08:11 +04:00
e3mrah	65b5ceb345	fix(ui): null-guard compliance dashboard render path (qa-loop iter-1) (#1185 ) TC-024 (`/sre/compliance`) and TC-025 (`/sec/compliance`) crashed with "Something went wrong" + a TypeError on cold-start sovereigns. Root cause: catalyst-api's `HandleComplianceScorecard` builds the response by appending to nil `[]Score` slices for organizations / environments / applications. Go's `encoding/json` serializes a nil slice as JSON `null`, so the wire payload arrives as `{ organizations: null, environments: null, applications: null }`. The dashboard then called `.map()` / `.filter()` / `.length` on `null`, throwing during render. Frontend-only fix per qa-loop scope (Fix #4 cluster boundary): • `compliance.api.ts` — add `normalizeScorecard()` that coerces every slice to `[]` and supplies a fallback Sovereign score. `getScorecard` now runs every wire payload through it. • `SREDashboardPage.tsx` — also normalize `initialDataOverride` so the test seam tolerates the same wire shape, and rebase `isEmpty` off the (already-normalized) `merged` value. • `ComplianceTreemap.tsx` — fall back to `'—'` when a payload node has no `name` so the cell renderer can't crash on a sparse node. • New regression tests render the SRE Lead and Security Lead dashboards with an all-null wire payload and assert they surface the empty state instead of throwing. Fix #4 — qa-loop iter-1, cluster `compliance-dashboard-crash`. Co-authored-by: hatiyildiz <hati.yildiz@openova.io>	2026-05-09 15:07:10 +04:00
github-actions[bot]	4009b61b9a	deploy: update catalyst images to `c4e1895`	2026-05-09 11:05:33 +00:00
e3mrah	c4e1895f6c	fix(auth): stamp tier=owner + realm_access.roles on PIN-derived sessions (qa-loop iter-1) (#1184 ) Closes the rbac-audit-403-gates cluster (TC-063..069/077): every privileged catalyst-api endpoint backed by rbacAssignCallerAuthorized / policyModeCallerAuthorized was returning 403 to PIN-authenticated operators because the session JWT minted at /auth/pin/verify carried only {sub, email, role} — no `tier`, no `realm_access.roles`. Endpoints affected: - GET /api/v1/sovereigns/{id}/audit/rbac (TC-063) - GET /api/v1/sovereigns/{id}/audit/rbac/stream (TC-064) - POST /api/v1/keycloak/users / /groups / /roles (TC-065..069) - POST /api/v1/blueprints/curate (TC-077) - (and: continuum audit, policy_mode, blueprints/curate-list) Root cause: HandlePinVerify built a jwt.MapClaims with only the legacy single-string `role` field. The EPIC-3 (#1098) RBAC gates walk claims.RealmAccess.Roles or claims.Tier — both were empty, so the gate function returned false even for the Sovereign owner authenticated via PIN-IMAP. Fix: stamp pinSessionTier ("owner") + pinSessionRealmRole ("catalyst-owner") onto every PIN-derived session JWT, alongside the existing role/sub/email claims. Why owner: PIN-via-IMAP authentication proves control of the Sovereign's mail-domain inbox; that IS the canonical proof of ownership of the Sovereign chroot (the only operator who can receive the 6-digit code is the one provisioned with mailbox access on the Sovereign's stalwart instance). Stamping tier=owner makes the JWT's authorization context match the real-world authority the auth flow already granted. Per CLAUDE.md INVIOLABLE-PRINCIPLES #5 (least privilege): the stamp happens ONLY at PIN-verify (i.e. only after the operator proved IMAP control); pre-PIN sessions never carry these claims. Test: TestPinVerify_StampsTierAndRealmRoleClaims pins the contract end-to-end — decodes the JWT cookie, asserts both Tier and RealmAccess.Roles are populated, and feeds the parsed Claims through the actual rbacAssignCallerAuthorized + policyModeCallerAuthorized gate functions to prove they accept. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 15:03:34 +04:00
github-actions[bot]	500b800709	deploy: update catalyst images to `b9f0992`	2026-05-09 09:52:53 +00:00
e3mrah	b9f09926d0	fix(rbac): add cutover-driver permissions for apps.openova.io + dr.openova.io (#1179 ) Caught live on omantel iter-1 of qa-loop: TC-040 → HTTP 500 with body: applications.apps.openova.io is forbidden: User "system:serviceaccount:catalyst-system:catalyst-api-cutover-driver" cannot list resource applications in API group apps.openova.io TC-099 → HTTP 500 with body: continuums.dr.openova.io is forbidden: ... EPIC-2 slice I (#1152) added the Application install handler. EPIC-6 slice U-DR-1 (#1162) added the Continuum DR handlers. Neither slice updated the catalyst-api-cutover-driver ClusterRole — same violation as PR #1173 (events.k8s.io + wgpolicyk8s.io). Per `feedback_chroot_in_cluster_fallback.md`: every new GVR added to catalyst-api dynamic-client paths MUST get matching ClusterRole rules in the same PR. Adds: - apps.openova.io applications: create + get/list/watch/update/patch/delete - dr.openova.io continuums: create + get/list/watch/update/patch/delete split per `feedback_rbac_create_no_resourcenames.md`. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 13:50:46 +04:00
github-actions[bot]	4f49cefff1	deploy: update catalyst images to `56262df`	2026-05-09 08:52:49 +00:00
e3mrah	56262df649	fix(auth): VerifyPinPage + /auth/handover set catalyst:authed marker BEFORE navigating (#1090 cluster A3) (#1174 ) LIVE BUG report 2026-05-09: operator submits correct PIN at console.omantel.biz/login, BE logs "pin/verify: session established" + HTTP 200 with HttpOnly catalyst_session cookie set, but the SPA immediately redirects back to /login. Root cause: PR #1109 (cluster A2) added rootRoute.beforeLoad with hasCatalystSession() — synchronous gate that reads sessionStorage['catalyst:authed']. The HttpOnly cookie is invisible to JS, so SovereignConsoleLayout sets that marker AFTER its async /whoami probe returns. But on the post-PIN-verify navigation, the gate runs BEFORE SovereignConsoleLayout mounts → marker is empty → gate redirects back to /login. Bounce loop. Two fixes: 1. VerifyPinPage success branch sets the marker BEFORE navigation AND switches navigate() → window.location.replace() so the next page boot reads the cookie via a fresh /whoami round-trip (matches the pattern Fix #A used for the unauth path). 2. /auth/handover route's beforeLoad sets the marker too — the server-side AuthHandover handler 302-redirects with the cookie set, so by the time we reach this safety-net route the cookie exists; the marker just needs to track that. Anti-regression for the marker race: SovereignConsoleLayout STILL sets the marker after probeSessionCookie returns (preserves the post-cookie-set race recovery from PR #1109). Both seams set it defensively. DoD: post-PIN-verify navigation lands on /dashboard (or `next` if present), NOT bounced to /login. Confirmed BE side already works (8h session minted on 200 response). Co-authored-by: Hati Yildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 12:50:40 +04:00
github-actions[bot]	91ca7531ff	deploy: update catalyst images to `3cc24be`	2026-05-09 08:37:40 +00:00
e3mrah	3cc24beff6	fix(rbac): add cutover-driver permissions for wgpolicyk8s + events.k8s.io (#1173 ) * fix(build): unblock Build & Deploy Catalyst — Containerfile + test typing The Build & Deploy Catalyst workflow has been failing on every PR since EPIC-2 Slice I (#1152) merged. Two real bugs caught after the founder flagged that no images had been built or deployed: 1. catalyst-api Containerfile: the replace directive added by slice I (`replace github.com/openova-io/openova/core/controllers => ../../../../core/controllers`) resolves to /core/controllers when WORKDIR=/app. The Containerfile only copied products/catalyst/bootstrap/api/go.{mod,sum}, not the controllers tree, so `go mod download` failed with "no such file or directory" on /core/controllers/go.mod. Fix: COPY the controllers tree BEFORE go mod. 2. SessionsPage.test.tsx (slice X2+E #1169): vi.fn(async () => SEED) infers parameter tuple as `[]`, so `lastCall[1]` was a TS2493 type error ("Tuple type '[]' of length '0' has no element at index '1'"). Cast lastCall to the actual listSessions signature. Per canon §7 + the founder's "you are the merger" rule, this is the kind of CI-pipeline regression that MUST be caught BEFORE claiming slice completion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rbac): add cutover-driver permissions for wgpolicyk8s + events.k8s.io Caught live on omantel during qa-loop setup after image_roll(`da1d3d1`): failed to list events.k8s.io/v1, Resource=events: events.events.k8s.io is forbidden: User "system:serviceaccount:catalyst-system:catalyst-api-cutover-driver" cannot list resource "events" in API group "events.k8s.io" failed to list wgpolicyk8s.io/v1alpha2, Resource=policyreports: policyreports.wgpolicyk8s.io is forbidden EPIC-1 slice W (#1139) added PolicyReport + ClusterPolicyReport to DefaultKinds. EPIC-4 slice R (#1167) added Event kind. Neither slice updated the catalyst-api-cutover-driver ClusterRole — violation of the canon rule from `feedback_chroot_in_cluster_fallback.md`: "Future GVRs added to handlers via the dynamic client MUST get matching catalyst-api-cutover-driver ClusterRole rules in the same PR." Adds: - wgpolicyk8s.io {policyreports, clusterpolicyreports} get/list/watch - events.k8s.io events get/list/watch After this lands + image_roll, the qa-loop can run without the chroot informer log-storm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 12:35:30 +04:00
github-actions[bot]	3b8734f27f	deploy: update catalyst images to `da1d3d1`	2026-05-09 08:31:55 +00:00
e3mrah	da1d3d1ffa	fix(build): unblock Build & Deploy Catalyst — Containerfile + test typing (#1172 ) * fix(build): unblock Build & Deploy Catalyst — Containerfile + test typing The Build & Deploy Catalyst workflow has been failing on every PR since EPIC-2 Slice I (#1152) merged. Two real bugs caught after the founder flagged that no images had been built or deployed: 1. catalyst-api Containerfile: the replace directive added by slice I (`replace github.com/openova-io/openova/core/controllers => ../../../../core/controllers`) resolves to /core/controllers when WORKDIR=/app. The Containerfile only copied products/catalyst/bootstrap/api/go.{mod,sum}, not the controllers tree, so `go mod download` failed with "no such file or directory" on /core/controllers/go.mod. Fix: COPY the controllers tree BEFORE go mod. 2. SessionsPage.test.tsx (slice X2+E #1169): vi.fn(async () => SEED) infers parameter tuple as `[]`, so `lastCall[1]` was a TS2493 type error ("Tuple type '[]' of length '0' has no element at index '1'"). Cast lastCall to the actual listSessions signature. Per canon §7 + the founder's "you are the merger" rule, this is the kind of CI-pipeline regression that MUST be caught BEFORE claiming slice completion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * deploy: update catalyst images to 7235431 --------- Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2026-05-09 12:28:59 +04:00
e3mrah	9763286900	feat(z): cross-EPIC follow-ups — lastLuaRecord + fleet alerts + edit-pr (#1095/#1096/#1099/#1101) (#1170 ) Slice Z bundles three small flags surfaced during EPIC-1..6 implementation into one PR; each is <50 LOC, none blocks shipping individually. Z1 — K-Cont-2: surface status.lastLuaRecord after PDM commit - Continuum reconciler's runSwitchover wraps PDMCommit so a successful /v1/lua/commit patches Continuum.status.lastLuaRecord with the records-array shape U-DR-1's LuaRecordView already parses (records[].body). - status.lastLuaRecordAt stamped server-side (RFC3339); rollbacks re-track to rolled-back records ("status reflects what PDM has"). - CRD extended: explicit status.lastLuaRecord (records[].{hostname,body, ttl,primaryRegion}) + status.lastLuaRecordAt fields. Server-side apply confirmed. Z2 — EPIC-1 score aggregator → U-Fleet alerts count - ComplianceHandler.SovereignAlertCount(clusterID) — len(violationsFor( clusterID, "")) with nil-tolerant receiver. Returns the per-cluster failing (resource, policy) pair count from the existing aggregator. - summarizeSovereign() reads it instead of returning the alerts: 0 placeholder. h.compliance unwired → 0 (dashboard stays green when the aggregator isn't wired). Z3 — Gitea PR write seam for YamlEditor flux-managed branch - gitea.Client.CreatePullRequest + findOpenPR: typed PullRequest shape, 409 race re-fetches existing PR (mirrors EnsureRepo pattern). Repo 404 → ErrRepoNotFound. - gitea.Client.EnsureBranch promoted to GiteaBlueprintClient interface (was already on Client). - POST /api/v1/sovereigns/{id}/blueprints/edit-pr — body {org, path, content, message, title}. Auth: applicationInstallCallerAuthorized (tier-admin or higher), mirrors /publish. Branch name deterministic per (path, content-hash) — same edit re-targets the same PR via 409 fallback. EnsureBranch + PutFile + CreatePullRequest against <org>/shared-blueprints. 503 when Gitea unwired; 400 on bad input; 404 when repo missing. - UI: editPRBlueprint in catalog.api.ts. YamlEditor's flux Apply branch posts to /blueprints/edit-pr → renders prURL link ([data-testid=yaml-editor-pr-link]). Org slug derived from catalyst.openova.io/organization label with namespace fallback. Tests - Z1: TestRunSwitchover_PatchesLastLuaRecord + TestPatchStatus_LuaRecordOnlyOnNonNil + TestLuaRecordStatusValue_NilOnEmpty. - Z2: TestCompliance_SovereignAlertCount (real aggregator + 3 violations + nil-receiver guard) + TestHandleFleetSovereignSummary_AlertsFromCompliance (200 with seeded state) + TestHandleFleetSovereignSummary_AlertsZeroWhenComplianceNil. - Z3: TestCreatePullRequest_HappyPath + RejectsMissingArgs + RepoNotFound + 409ReFetchesExisting (gitea client) + TestHandleBlueprintEditPR_OpensPR + DeterministicBranchPerContent + 403WhenNotTierAdmin + 503WhenGiteaUnwired + 404WhenRepoMissing + BadRequest + TestEditPRBranchName_DeterministicAndPathSensitive (handler) + YamlEditor vitest "flux Apply opens PR" + "surfaces server error" (UI). go test -count=1 -race ./... clean across core/controllers + catalyst-api; go vet ./... clean; npm run typecheck clean for changed UI files (SessionsPage.test.tsx pre-existing tsc error from #1169 per canon §7). CRD applies via kubectl apply --dry-run=server. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 11:54:06 +04:00
e3mrah	7b59292cad	feat(catalyst-ui): X2+E — xterm.js logs viewer + Guacamole exec + session list + replay (slice X2+E1+E2+E3, #1099 ) (#1169 ) EPIC-4 final slice. Replaces the Logs/Exec placeholders shipped by R (#1167) with target-state implementations and lays the surface for the Guacamole-fronted recorded shell flow. UI (catalyst-ui): - widgets/cloud-list/LogViewer.tsx — xterm.js viewer for the X1 Pod-log WebSocket. Container picker (multi-container Pods), search box (⌃F / ⌘F), 10k scrollback, reconnect-with-since on disconnect (per X1 resume protocol). - widgets/cloud-list/ExecPanel.tsx — Open Shell button → POST /k8s/exec/.../session → Guacamole iframe. 5s iframe-load timeout OR onError → falls through to xterm.js + X1-style fallback WebSocket; banner explains "recording disabled" on fallback. - pages/sovereign/sessions/SessionsPage.tsx — guacamole session list + filter (pod/user) + paginate + Replay modal. Mounted on both /provision/$id/sessions (mothership) and /sessions (chroot). - pages/sovereign/cloud-list/ResourceDetailPage.tsx — Logs tab now renders LogViewer; Exec tab now renders ExecPanel. Non-Pod kinds surface a "drill into Tree to find Pods" hint. - resource.api.ts — adds logsWebSocketURL + execWebSocketURL + createExecSession + listSessions + getSessionReplay helpers (single URL truth per INVIOLABLE-PRINCIPLES #4). API (catalyst-api): - internal/handler/k8s_exec.go — three new endpoints: POST /api/v1/sovereigns/{id}/k8s/exec/{ns}/{pod}/{container}/session (tier-developer or higher; calls GuacamoleClient.CreateSession; emits guacamole-session-opened audit) GET /api/v1/sovereigns/{id}/sessions?from=&to=&pod=&user=&page= (tier-admin or higher; paginated; reads from GuacamoleClient OR in-memory fallback when no client is wired) GET /api/v1/sovereigns/{id}/sessions/{sessionId}/replay (admin/owner only — sessions.playback per EPIC-3 §6.2; emits guacamole-session-replayed audit) - internal/handler/k8s_exec_ws.go — direct WebSocket exec fallback (bidi pump; xterm.js client) for when Guacamole iframe is blocked. - GuacamoleClient interface + in-memory fallback session store: the chroot Sovereign / CI flow renders cleanly even when Guacamole isn't deployed; production wires the real client via SetGuacamoleClient. - Audit-type predicate IsGuacamoleAuditType + 3 canonical type names (guacamole-session-opened/closed/replayed). Reuses the EPIC-3 U5-U8 audit Bus + the slice K+P+X1+G's reservation per the canonical seam map; future audit consumers filter via prefix `guacamole-*`. Tests: - 9 LogViewer / ExecPanel / SessionsPage vitest test files, 38 tests passing in `pages/sovereign/cloud-list/` + `widgets/cloud-list/` + `pages/sovereign/sessions/`. - 22 Go test functions in k8s_exec_test.go + k8s_exec_ws_test.go covering happy/forbidden/not-found/audit-emit/pagination/filter paths. `go test -count=1 -race ./internal/handler/` clean. - 6 Playwright snapshot tests at 1440x900 in `e2e/logs-exec-sessions.spec.ts` covering LogViewer / search box / ExecPanel idle / ExecPanel post-click / SessionsPage list / filter. `npm run typecheck` clean. `go vet ./...` clean. Pre-existing UI test failures (12 files, 99 tests) confirmed identical to main per canon §7. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 11:18:06 +04:00
e3mrah	21810a3760	feat(catalyst-ui): R — resource browser drill-down + tree + YAML editor + events + metrics + actions (slice R, #1099 ) (#1167 ) EPIC-4 Slice R bundle layered on the K+P+X1+G backend (#1164): - R1 ResourceDetailPage with 7 tabs (Overview / YAML / Logs / Exec / Events / Metrics / Tree); routes mounted on both mothership (/provision/$id/cloud/resource/...) and chroot (/cloud/resource/...) trees. - R2 ResourceTree widget with owner-walk UP and selector-walk DOWN, server-side at /k8s/{kind}/{ns}/{name}/tree using new k8scache GetResourcesByOwner + GetResourcesBySelector indexer-only paths. - R3 YamlEditor with side-by-side diff, dry-run validation, flux-vs-manual branching (manual → /apply, flux → PR seam wired for the unified Gitea client). - R4 EventsPanel filtering events.k8s.io/v1 Events by regarding-object; new "event" kind added to k8scache DefaultKinds. - R5 MetricsPanel with Recharts sparkline; rolls up PodMetrics across owned Pods for Deployment/StatefulSet/DaemonSet. - R6 ResourceActions widget: scale (Deployment/StatefulSet), restart (annotation stamp), delete (typed-confirmation gate). All mutation endpoints tier-admin gated server-side via the canonical applicationInstallCallerAuthorized seam — UI hide is convenience only. K8sListPage rows are now clickable and navigate to the detail page. 7 server-side endpoints added under /api/v1/sovereigns/{id}/k8s/{kind}/{ns}/{name}: GET, /tree, /scale, /restart, /dry-run, /apply, DELETE — plus /k8s/metrics/{kind}/{ns}/{name}. New k8scache.Factory accessors: DynamicClientFor + RedactForKind. Same lifecycle as CoreClient — no second per-cluster pool. Tests: 37 new vitest cases (ResourceTree / YamlEditor / EventsPanel / MetricsPanel / ResourceActions / ResourceDetailPage / resource.api) all passing. 12 new Go test funcs covering GET / scale / restart / delete / dry-run / apply / tree / metrics + tree.go owner+selector walks. 8 Playwright snapshots at 1440x900 (one per tab + list-row entry). Pre-existing baselines untouched: 59 lint errors (matches main); 12 vitest test files / 98 vitest tests still failing on main (StepComponents + cosmetic-guards + AppDetail), zero introduced by this slice; pre-existing TestGetKubeconfig_ReadsFromPathPointer TempDir-cleanup race observed only with -race + parallel run, passes in isolation. Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 10:34:01 +04:00

1 2 3 4 5 ...

989 Commits