Kubernetes Deployment
Forail Platform deploys natively on Kubernetes via the official Helm chart at forail-platform/forail-helm, with an optional operator at forail-platform/forail-operator (v1.0.0) that lets you manage nine Forail resource kinds — organizations, teams, projects, inventories, credentials, job templates, schedules, workflows (DAG), and remote Forail instances — declaratively as Kubernetes Custom Resources, with support for routing different CRs to different Forail backends.
This page is the long-form companion to the Docker Compose Deployment guide. If you already run a Kubernetes cluster, the Helm path is the recommended deployment for production: it manages secrets natively, supports rolling upgrades, and integrates with Ingress controllers, External Secrets, and the wider ecosystem.
When to Pick Kubernetes
| Docker Compose | Kubernetes (Helm + Operator) | |
|---|---|---|
| Best for | Single-host deployments, evaluation, small teams | Multi-node clusters, GitOps, declarative job-template management |
| HA | No (single node) | Yes (multiple replicas, multi-master cluster) |
| Storage | Bind mounts | PVCs (any CSI driver) |
| Secrets | .env file | k8s Secrets / External Secrets / Sealed Secrets |
| Ingress | nginx + Let’s Encrypt scripts | Any IngressController + cert-manager |
| Job templates | Created via UI / API only | UI / API or kubectl apply -f jt.yaml |
| Upgrade | docker compose pull && up -d | helm upgrade (rolling) |
Architecture on Kubernetes
The chart deploys six application workloads plus the Forail backend itself. Every workload runs as its own Deployment (or StatefulSet for stateful data), and is fronted by a ClusterIP Service for in-cluster traffic. Ingress traffic enters through a single Ingress resource that fans out by URL path.
┌──────────────────┐
│ Ingress (TLS) │ forail.lan
│ Traefik / NGINX │
└─────────┬────────┘
┌───────────────────┼─────────────────────┐
│ /api/admin/static/sso/websocket │ /
▼ ▼
┌───────────────┐ ┌────────────────┐
│ forail-web │ Django + uwsgi + daphne │ forail-frontend │ React SPA
│ (Deployment) │ ports: 8013 8015 │ (Deployment) │ port: 80
└──┬──────┬──┬──┘ └────────────────┘
│ │ │
│ │ └──── OPA (sidecar via service forail-opa:8181) ──── policy decisions
│ │
│ └──── OTel Collector (forail-otel-collector:4317) ──── traces / metrics
│
├──── Postgres (StatefulSet, forail-postgres:5432, PVC 8 Gi)
│
├──── Redis (Deployment, forail-redis:6379, PVC 2 Gi)
│
└──── forail-task (Deployment, no Service)
├── ansible-runner + podman in privileged container
├── Receptor mesh (control socket /var/run/awx-receptor/receptor.sock)
└── shared PVC forail-projects (mounted in both web + task)
Two extra resources run as one-shots / CRDs:
forail-initJob — runs once per release revision: applies migrations, creates the admin user, provisions the instance, registers thecontrolplane+defaultqueues, seeds preload data, and writesCSRF_TRUSTED_ORIGINSinto the DB. Suffixed with{{ .Release.Revision }}so each upgrade gets a fresh Job (Job spec is immutable, so re-applying the same name would fail).- forail-operator — runs in its own namespace, watches
Organization/Team/Project/Inventory/Credential/JobTemplate/Schedule/Workflow/ForailInstanceCRs cluster-wide, and reconciles each into a Forail REST API. Authenticates with an OAuth2 personal access token issued by Forail; the optionalForailInstanceCR lets one operator deployment fan out to multiple Forail backends.
Namespaces
The chart and operator each install into their own namespace. This separation lets you grant the operator only the permissions it needs and lets you upgrade Forail without touching the operator (or vice versa).
| Namespace | Contents | Created by |
|---|---|---|
forail | Postgres, Redis, OPA, OTel Collector, forail-web, forail-task, forail-frontend, forail-init Job, Ingress, all chart Secrets and ConfigMaps | helm install forail ./forail-helm -n forail --create-namespace (or pre-created if you need to seed pull-secrets first) |
forail-operator | The operator Deployment, its ServiceAccount + ClusterRole + ClusterRoleBinding, and one Secret holding the Forail OAuth2 token | helm install forail-operator ./forail-operator/helm -n forail-operator --create-namespace |
| any | The nine CRD instances (Organization, Team, Project, Inventory, Credential, JobTemplate, Schedule, Workflow, ForailInstance). Each CR is namespace-scoped; the operator watches all namespaces and reconciles them centrally against Forail. | You — kubectl apply -f cr.yaml |
The chart’s top-level namespace.create value defaults to false because pre-creating the namespace is the common path: you usually want to kubectl create it first so you can drop in the Harbor pull-secret and a TLS secret before the Pods start trying to mount them.
Networking
Services and ports
Every workload that other workloads (or the Ingress) need to reach gets a ClusterIP Service. forail-task is the only one without a Service — nothing reaches into it; it pulls work off Redis and pushes it through Receptor.
| Service | Ports | Selector | Reached by |
|---|---|---|---|
forail-web | 8013 (HTTP API), 8015 (websocket) | component=web | Ingress, forail-task callback, forail-operator |
forail-frontend | 80 (HTTP) | component=frontend | Ingress (path /) |
forail-postgres | 5432 | component=postgres | forail-web, forail-task, forail-init |
forail-redis | 6379 | component=redis | forail-web, forail-task |
forail-opa | 8181 (REST policy decision API) | component=opa | forail-web (policy checks) |
forail-otel-collector | 4317 (gRPC), 4318 (HTTP) | component=otel-collector | forail-web, forail-task (traces + metrics) |
Cluster DNS
Workloads reach each other through the cluster DNS at <service>.<namespace>.svc.cluster.local. Inside the forail namespace the bare service name resolves too, so forail.otel.endpoint defaults to http://forail-otel-collector:4317. The operator reaches the API at the FQDN http://forail-web.forail.svc.cluster.local:8013 because it lives in a different namespace.
Ingress and URL routing
A single Ingress resource named forail routes forail.lan by URL path. The chart’s default uses Traefik (the chart’s ingress.className: traefik), but any IngressController works — change className to nginx, contour, etc. Path order matters because Forail owns five distinct path prefixes that must take precedence over the SPA catch-all.
| Path | Target Service | Why |
|---|---|---|
/api | forail-web:8013 | REST API (/api/v2/*) |
/admin | forail-web:8013 | Django admin |
/sso | forail-web:8013 | SAML / OIDC / social-auth callbacks |
/websocket | forail-web:8015 | Job event streaming over WS |
/static/forail | forail-frontend:80 | SPA assets — must come before /static below |
/static | forail-web:8013 | Django staticfiles (admin CSS, DRF browser) |
/ | forail-frontend:80 | SPA catch-all |
Hostname TLD pitfall (.local vs .lan)
The chart’s ingress.host default is forail.lan, not forail.local. The reason: most desktop Linux distros ship nsswitch.conf with mdns_minimal [NOTFOUND=return] ahead of files, which intercepts every .local lookup, fails it, and bypasses /etc/hosts entirely. The browser then shows Server Not Found even though curl --resolve forail.local:30080:<node-ip> ... works fine. The .lan TLD has no such hijack.
For host-based access from a developer laptop, add to /etc/hosts:
192.168.56.32 forail.lan
Exposing the cluster
How Ingress traffic actually reaches your nodes depends on the IngressController service type:
- NodePort (test clusters) — Traefik’s service type defaults to
NodePorton30080(HTTP) and30443(HTTPS). The browser hitshttp://forail.lan:30080/and the kernel kube-proxy forwards from any node to the Traefik pod. - LoadBalancer (cloud / MetalLB) — set the IngressController service type to
LoadBalancerand use the assigned external IP / DNS name. - hostNetwork (bare metal, single-node) — runs the IngressController on the host network so 80/443 are reachable directly.
CNI quirk on VirtualBox dev clusters
If you build a multi-VM cluster on VirtualBox host-only networks, Flannel’s default --iface auto-detect picks eth0 (the NAT adapter). Pod-to-pod traffic across nodes then dies, with symptoms like connect: no route to host from any cross-node service VIP. Patch the Flannel DaemonSet to bind to the host-only interface explicitly:
kubectl -n kube-flannel patch ds kube-flannel-ds --type=json \
-p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--iface=eth1"}]'
The forail-dev-cluster Vagrant repo bakes this patch into master-init.sh.
Storage
Five PVCs are created. Sizes are values overrides — start with the defaults and grow the underlying storage when you outgrow them.
| PVC | Default size | Mode | Mounted in | Purpose |
|---|---|---|---|---|
forail-postgres | 8 Gi | RWO | StatefulSet forail-postgres | Database files |
forail-redis | 2 Gi | RWO | Deployment forail-redis | RDB snapshots / AOF |
forail-projects | 4 Gi | RWX | forail-web + forail-task | Synced project repos (_<id>__<slug>) |
forail-receptor | 2 Gi | RWO | forail-task | Receptor work units (/tmp/receptor) |
forail-backups | 10 Gi | RWO | forail-web (mount-only) | Where backup.sh writes .sql.gz archives |
RWX requirement: forail-projects is mounted in both forail-web and forail-task. If those Pods land on different nodes, RWX is mandatory. If you’re forced to ReadWriteOnce (e.g. local-path-provisioner on bare metal), pin both Deployments to the same node with a nodeSelector or podAffinity; otherwise the second Pod will sit Pending forever.
Override the StorageClass per-PVC with values like postgres.storage.storageClass: ceph-rbd. Leaving it empty falls back to the cluster default StorageClass.
Prerequisites
- Kubernetes 1.27+ (tested on 1.30)
- An IngressController (Traefik / NGINX / Contour).
forail-dev-cluster’spost-cluster-setup.shinstalls Traefik via Helm - A default
StorageClass(or per-PVC overrides) — local-path-provisioner is fine for dev helm3.12+- Optional but recommended: cert-manager (real TLS) or a hand-rolled
kubernetes.io/tlsSecret namedforail-tls
Container images
Forail images are published to the public GitHub Container Registry: ghcr.io/forail-platform/*. No pull secret is required.
kubectl create namespace forail
If you mirror the images to your own registry, override images.backend.repository + images.frontend.repository in values.yaml and add an imagePullSecrets entry pointing at your in-cluster docker-registry Secret.
TLS secret
Pre-create a TLS Secret if you want HTTPS on first boot. Self-signed is fine for dev:
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout tls.key -out tls.crt -subj '/CN=forail.lan' \
-addext 'subjectAltName=DNS:forail.lan,DNS:*.forail.lan'
kubectl -n forail create secret tls forail-tls --cert=tls.crt --key=tls.key
Install Forail core
git clone git@github.com:forail-platform/forail-helm.git
cd forail-helm
helm install forail . -n forail \
--set forail.admin.user=admin \
--set secrets.forailAdminPassword='<strong-password>' \
--set secrets.postgresPassword='<random-32-bytes>' \
--set secrets.forailSecretKey="$(openssl rand -hex 32)" \
--set secrets.forailBroadcastWebsocketSecret="$(openssl rand -hex 32)"
The install runs synchronously through the forail-init Job. Watch progress with:
kubectl -n forail get pods -w
Expected end state — every Pod Running 1/1, forail-init-1 Completed:
NAME READY STATUS RESTARTS AGE
forail-frontend-7f8c5d9c8b-abcde 1/1 Running 0 3m
forail-init-1-xxxxx 0/1 Completed 0 3m
forail-opa-6cdfb9d79c-fghij 1/1 Running 0 3m
forail-otel-collector-... 1/1 Running 0 3m
forail-postgres-0 1/1 Running 0 3m
forail-redis-... 1/1 Running 0 3m
forail-task-... 1/1 Running 0 3m
forail-web-... 1/1 Running 0 3m
Optional: AI Assistant
Chart 0.3.0 introduces an optional forail-assistant deployment that wraps the all-in-one image (Ollama + ChromaDB + RAG API in one container). It is disabled by default — enable it on a fresh install or on an upgrade:
helm upgrade forail . -n forail \
--reuse-values \
--set assistant.enabled=true
The chart provisions a PersistentVolumeClaim (forail-assistant-data, default 20 Gi) so the LLM model cache and Chroma vector store survive pod restarts. On first boot the entrypoint pulls the configured model (gemma3:1b by default, ≈800 MB) and the embedding model (nomic-embed-text), then indexes the bundled documentation — allow ~5 minutes before the pod is Ready. The startupProbe is sized for this (failureThreshold: 30 at periodSeconds: 10) so the liveness probe will not kill the pod mid-bootstrap.
Key values you may want to override:
| Value | Default | Notes |
|---|---|---|
assistant.model | gemma3:1b | Switch to llama3.1:8b or mistral:7b for higher answer quality at the cost of memory and latency. |
assistant.storage.size | 20Gi | Bump to 30 Gi+ if you change models or grow the indexed corpus. |
assistant.resources.limits.memory | 4Gi | gemma3:1b inference peaks ~1.5 Gi; larger models need 8 Gi+. |
The Service is reachable cluster-internally at forail-assistant.forail.svc.cluster.local:8100. Ingress routing is intentionally not wired up by this release — expose it with a follow-up middleware (Traefik stripPrefix on /assistant) once you decide on a URL contract.
Install the Operator
The operator is shipped separately because not everyone wants declarative-via-CRD management — some teams prefer the UI. Install only after Forail core is healthy (the operator needs to authenticate to it on startup).
Generate a Forail OAuth2 PAT
The operator authenticates with a personal access token issued by Forail. Generate one inside the running web Pod:
TOKEN=$(kubectl -n forail exec deploy/forail-web -- \
forail-manage create_oauth2_token --user admin | tail -1)
echo "$TOKEN"
The token is shown once — copy it now. Revoking it later: forail-manage revoke_oauth2_tokens --user admin.
Install with Helm
git clone git@github.com:forail-platform/forail-operator.git
cd forail-operator
helm install forail-operator helm/ \
-n forail-operator --create-namespace \
--set forail.token="$TOKEN" \
--set forail.url=http://forail-web.forail.svc.cluster.local:8013
Verify:
kubectl -n forail-operator logs deploy/forail-operator -f
You should see Starting workers for each of the nine reconcilers (organization, team, project, inventory, credential, jobtemplate, schedule, workflow, forailinstance).
For full coverage of the v1.0.0 release — multi-cluster routing via ForailInstance, the OLM bundle, and the declarative Workflow DAG model — see the dedicated Operator v1.0.0 guide.
Custom Resources
Each CR maps to a Forail primary-key after first reconcile (status.forailId). The operator owns the resource via a finalizer — deleting the CR deletes the Forail resource too. Re-applying a CR after manual edits in the Forail UI overwrites the UI changes (drift is detected on a 60s requeue and reconciled toward the CR).
Sample CRs live in forail-operator/config/samples/:
| Kind | Maps to | Sensitive fields |
|---|---|---|
Organization | /api/v2/organizations | None — top-level tenant container with max-host quota |
Team | /api/v2/teams + member sync at /teams/{id}/users/ | None — spec.users[] resolves usernames at reconcile |
Project | /api/v2/projects | SCM credential resolved by name from an existing Credential CR |
Inventory | /api/v2/inventories + nested hosts & groups | None — pure spec |
Credential | /api/v2/credentials | spec.inputsFrom[] reads sensitive values from k8s Secrets in the same namespace; the operator watches those Secrets and re-syncs on rotation |
JobTemplate | /api/v2/job_templates with credential / project / inventory references resolved by name | None |
Schedule | /api/v2/schedules | None — RFC 5545 RRULE drives recurrence |
Workflow | /api/v2/workflow_job_templates + node DAG at /workflow_job_templates/{id}/workflow_nodes/ + edges | Declarative spec.nodes[] keyed by identifier with successNodes / failureNodes / alwaysNodes graph references |
ForailInstance | Control-plane only — describes a Forail backend that other CRs target via spec.forailInstance | Bearer token sourced from a k8s Secret via spec.tokenSecretRef |
Verification
After both Helm releases land:
# Forail API healthy
kubectl -n forail port-forward svc/forail-web 8013:8013 &
curl http://localhost:8013/api/v2/ping/ | jq
# Browser access (assuming /etc/hosts is set)
open http://forail.lan:30080/
# Operator wired up
kubectl get crd | grep forail-platform.io
kubectl apply -f https://raw.githubusercontent.com/forail-platform/forail-operator/main/config/samples/inventory-sample.yaml
kubectl get inventory production -o jsonpath='{.status}'
Day-2 Operations
Upgrade
git -C forail-helm pull
helm upgrade forail ./forail-helm -n forail --reuse-values
The chart re-runs forail-init on each upgrade (its name embeds {{ .Release.Revision }}). The init script is idempotent. If you upgrade against a populated DB, the migrate step finishes in milliseconds; the heavier preload-data and execution-environment registration steps short-circuit when records already exist.
Backup
kubectl -n forail exec deploy/forail-web -- /usr/local/bin/backup.sh
kubectl -n forail cp forail-web-xxx:/var/backups/forail/forail-<ts>.sql.gz ./forail.sql.gz
Scale forail-task
Forail dispatches jobs to any forail-task replica via the Receptor mesh. Increase task.replicas in values.yaml for more concurrent jobs. Keep in mind:
- forail-task runs
privileged: truewith podman — restrict it to a node pool you trust - The chart bumps the task memory limit to 4 Gi by default, because supervisord + dispatcher + Receptor + ansible-runner + an EE container together blow past 2 Gi during the first project sync. Lower this only if you’ve profiled your specific workload.
Customizing values.yaml
The shipped values.yaml is heavily commented. The most commonly tuned values:
| Path | Default | Why you’d change it |
|---|---|---|
images.backend.repository | ghcr.io/forail-platform/forail-backend | Mirror to your own registry |
images.backend.tag | latest | Pin to a CalVer release |
secrets.forailAdminPassword | changeme-admin | Always — it’s a placeholder |
ingress.host | forail.lan | Your real DNS name |
ingress.tls.enabled | true | Disable for plain-HTTP test clusters |
postgres.storage.size | 8Gi | Production sizing |
postgres.enabled | true | Set false + override DB env to point at an external Postgres (RDS, Cloud SQL) |
otelCollector.enabled | true | Disable if you ship an OTel Collector at the cluster level (DaemonSet) |
web.replicas / task.replicas | 1 / 1 | HA / throughput |
Troubleshooting
Browser shows “Server Not Found” for forail.local
That’s the mDNS hijack described above. Use the chart default (forail.lan) and update /etc/hosts accordingly. If you’re locked into a .local hostname, edit /etc/nsswitch.conf on every developer laptop to put files before mdns_minimal — but this is system-wide and may break Avahi-discovered devices on the LAN.
Jobs fail with “unknown work type kubernetes-incluster-auth”
The Forail instance is registered as node_type=control instead of hybrid. Control nodes only orchestrate; they refuse to execute jobs locally and try to dispatch them to a ContainerGroup. The chart’s init.sh runs an explicit ORM update after provision_instance to fix this — if you wrote a custom init, replicate that step:
forail-manage shell -c "
from forail.main.models import Instance
i = Instance.objects.get(hostname='<node>')
i.node_type = 'hybrid'; i.save(update_fields=['node_type'])
"
The same script also flips the auto-created default InstanceGroup off is_container_group=True, which is what the post-migrate signal sets when Forail detects it’s running in k8s.
forail-task pod restarts with exit code 137
OOMKilled — the default memory limit was 2 Gi in earlier chart versions. Bump task.resources.limits.memory to 4Gi (the current default). Memory is tight because supervisord runs uwsgi + dispatcher + Receptor + ansible-runner + a podman EE container concurrently.
Job error “Error updating status file /tmp/receptor/.../status.lock: no such file or directory”
The Receptor work-unit directory disappeared. This used to happen when forail-task was OOMKilled mid-job and the pod restart wiped /tmp. Mitigation:
- Bump task memory (above)
- The chart now mounts
forail-receptorPVC at/tmp/receptorso work units survive Pod restarts
Pods on different nodes can’t reach each other (VirtualBox only)
See the CNI quirk note — Flannel must be patched to bind to the host-only interface (eth1), not the NAT-mode eth0. forail-dev-cluster bakes this into provisioning.
Companion Repositories
| Repo | What it ships |
|---|---|
forail-helm | The Helm chart described on this page |
forail-operator | The Kubernetes operator with 4 CRDs |
forail-dev-cluster | 4-VM Vagrant + VirtualBox cluster (2 master + 2 worker, k8s 1.30) for chart and operator integration testing — includes the Flannel eth1 patch and a post-cluster-setup.sh that installs Traefik, local-path-provisioner, and the forail namespace prerequisites |
forail-devops | The Docker Compose deployment described on the Deployment page — same image set, single-host topology |