Catalyst-authored umbrella charts for the W2.5.D AI-inference stack. None of the three upstream projects publish a Helm chart, so each chart hand-wires the upstream container as Deployment + Service + ConfigMap + ServiceMonitor + NetworkPolicy + HPA, with the sigstore/common library subchart declared to satisfy the hollow-chart gate (issue #181). bp-vllm (slot 39) — wraps vllm/vllm-openai:v0.6.4. GPU-aware (nvidia.com/gpu when vllm.gpu.enabled=true; CPU fallback for dev). Default model meta-llama/Llama-3.1-8B-Instruct, port 8000, OpenAI-compatible /v1/chat/completions. All engine knobs (maxModelLen, gpuMemoryUtilization, dtype, quantization, tensorParallelSize, prefix-caching) overlay-tunable. Closes #266. bp-bge (slot 42) — wraps ghcr.io/huggingface/text-embeddings-inference:cpu-1.5. Default model BAAI/bge-small-en-v1.5 + BAAI/bge-reranker-base sidecar in same Pod. Two-port Service (8080 embed, 8081 rerank) annotated for bp-llm-gateway discovery. CPU-friendly defaults; overlay swaps in BAAI/bge-m3 on GPU Sovereigns. Closes #269. bp-nemo-guardrails (slot 43) — wraps the upstream NVIDIA/NeMo-Guardrails Dockerfile (nemoguardrails server, FastAPI, port 8000). LLM endpoint + model + engine all overlay-tunable; Colang flow bundle mounts via configMap.externalName for production rails. ConfigMap stub renders a default rail for smoke testing. Closes #270. All three charts: - Default observability toggles to false per BLUEPRINT-AUTHORING.md §11.2 - Pin upstream image tags (no :latest) per INVIOLABLE-PRINCIPLES.md #4 - Non-root securityContext (runAsUser 1000, drop ALL capabilities) - prometheus.io scrape annotations on the Pod for fallback discovery - Operator-tunable NetworkPolicy gating ingress to bp-llm-gateway and egress to HuggingFace / bp-vllm / bp-bge as appropriate helm template (default values) per chart: bp-vllm: ConfigMap, Deployment, Service, ServiceAccount bp-bge: ConfigMap, Deployment, Service, ServiceAccount bp-nemo-guardrails: ConfigMap, Deployment, Service, ServiceAccount helm template (--set serviceMonitor.enabled=true networkPolicy.enabled=true hpa.enabled=true): All three render ConfigMap + Deployment + Service + ServiceAccount + ServiceMonitor + NetworkPolicy + HorizontalPodAutoscaler. helm lint: 0 chart(s) failed for all three (single INFO on missing icon — icons land with the marketplace card work). Closes #266 Closes #269 Closes #270 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
63 lines
1.8 KiB
YAML
63 lines
1.8 KiB
YAML
apiVersion: catalyst.openova.io/v1alpha1
|
|
kind: Blueprint
|
|
metadata:
|
|
name: bp-bge
|
|
labels:
|
|
catalyst.openova.io/category: ai-runtime
|
|
catalyst.openova.io/section: pts-4-6-llm-serving
|
|
spec:
|
|
version: 1.0.0
|
|
card:
|
|
title: BGE Embeddings + Reranker
|
|
summary: BAAI General Embedding (sentence-transformers + bge-reranker). CPU-friendly multilingual embeddings + cross-encoder reranking. Default model bge-small-en-v1.5; bp-llm-gateway discovers via Service annotation.
|
|
icon: bge.svg
|
|
category: ai-runtime
|
|
tags: [embeddings, reranker, rag, sentence-transformers, ai]
|
|
documentation: https://huggingface.co/BAAI
|
|
license: MIT
|
|
visibility: listed
|
|
owner:
|
|
team: ai-platform
|
|
contact: ai-platform@openova.io
|
|
configSchema:
|
|
type: object
|
|
properties:
|
|
embeddingModel:
|
|
type: string
|
|
default: "BAAI/bge-small-en-v1.5"
|
|
description: HuggingFace model ID for the embeddings endpoint.
|
|
rerankerModel:
|
|
type: string
|
|
default: "BAAI/bge-reranker-base"
|
|
description: HuggingFace model ID for the reranker endpoint.
|
|
enableReranker:
|
|
type: boolean
|
|
default: true
|
|
description: Whether to start the reranker container alongside embeddings.
|
|
replicas:
|
|
type: integer
|
|
default: 1
|
|
minimum: 1
|
|
maximum: 8
|
|
maxBatchSize:
|
|
type: integer
|
|
default: 32
|
|
maxLength:
|
|
type: integer
|
|
default: 512
|
|
description: Token cap per request (bge-small-en-v1.5 supports 512; bge-m3 supports 8192).
|
|
placementSchema:
|
|
modes: [single-region, active-active]
|
|
default: active-active
|
|
manifests:
|
|
chart: ./chart
|
|
depends:
|
|
- blueprint: bp-cnpg
|
|
version: ^1.0
|
|
alias: cnpg
|
|
upgrades:
|
|
from: ["0.x"]
|
|
observability:
|
|
metrics: prometheus
|
|
logs: stdout
|