Kubernetes Deployment

Forail Platform deploys natively on Kubernetes via the official Helm chart at forail-platform/forail-helm, with an optional operator at forail-platform/forail-operator (v1.0.0) that lets you manage nine Forail resource kinds — organizations, teams, projects, inventories, credentials, job templates, schedules, workflows (DAG), and remote Forail instances — declaratively as Kubernetes Custom Resources, with support for routing different CRs to different Forail backends.

This page is the long-form companion to the Docker Compose Deployment guide. If you already run a Kubernetes cluster, the Helm path is the recommended deployment for production: it manages secrets natively, supports rolling upgrades, and integrates with Ingress controllers, External Secrets, and the wider ecosystem.


When to Pick Kubernetes

Docker ComposeKubernetes (Helm + Operator)
Best forSingle-host deployments, evaluation, small teamsMulti-node clusters, GitOps, declarative job-template management
HANo (single node)Yes (multiple replicas, multi-master cluster)
StorageBind mountsPVCs (any CSI driver)
Secrets.env filek8s Secrets / External Secrets / Sealed Secrets
Ingressnginx + Let’s Encrypt scriptsAny IngressController + cert-manager
Job templatesCreated via UI / API onlyUI / API or kubectl apply -f jt.yaml
Upgradedocker compose pull && up -dhelm upgrade (rolling)

Architecture on Kubernetes

The chart deploys six application workloads plus the Forail backend itself. Every workload runs as its own Deployment (or StatefulSet for stateful data), and is fronted by a ClusterIP Service for in-cluster traffic. Ingress traffic enters through a single Ingress resource that fans out by URL path.

                        ┌──────────────────┐
                        │   Ingress (TLS)  │  forail.lan
                        │  Traefik / NGINX │
                        └─────────┬────────┘
              ┌───────────────────┼─────────────────────┐
              │       /api/admin/static/sso/websocket   │  /
              ▼                                          ▼
      ┌───────────────┐                          ┌────────────────┐
      │   forail-web   │  Django + uwsgi + daphne │ forail-frontend │  React SPA
      │  (Deployment) │  ports: 8013 8015        │  (Deployment)  │  port: 80
      └──┬──────┬──┬──┘                          └────────────────┘
         │      │  │
         │      │  └──── OPA (sidecar via service forail-opa:8181) ──── policy decisions
         │      │
         │      └──── OTel Collector (forail-otel-collector:4317) ──── traces / metrics
         │
         ├──── Postgres (StatefulSet, forail-postgres:5432, PVC 8 Gi)
         │
         ├──── Redis (Deployment, forail-redis:6379, PVC 2 Gi)
         │
         └──── forail-task (Deployment, no Service)
                ├── ansible-runner + podman in privileged container
                ├── Receptor mesh (control socket /var/run/awx-receptor/receptor.sock)
                └── shared PVC forail-projects (mounted in both web + task)

Two extra resources run as one-shots / CRDs:


Namespaces

The chart and operator each install into their own namespace. This separation lets you grant the operator only the permissions it needs and lets you upgrade Forail without touching the operator (or vice versa).

NamespaceContentsCreated by
forailPostgres, Redis, OPA, OTel Collector, forail-web, forail-task, forail-frontend, forail-init Job, Ingress, all chart Secrets and ConfigMapshelm install forail ./forail-helm -n forail --create-namespace (or pre-created if you need to seed pull-secrets first)
forail-operatorThe operator Deployment, its ServiceAccount + ClusterRole + ClusterRoleBinding, and one Secret holding the Forail OAuth2 tokenhelm install forail-operator ./forail-operator/helm -n forail-operator --create-namespace
anyThe nine CRD instances (Organization, Team, Project, Inventory, Credential, JobTemplate, Schedule, Workflow, ForailInstance). Each CR is namespace-scoped; the operator watches all namespaces and reconciles them centrally against Forail.You — kubectl apply -f cr.yaml

The chart’s top-level namespace.create value defaults to false because pre-creating the namespace is the common path: you usually want to kubectl create it first so you can drop in the Harbor pull-secret and a TLS secret before the Pods start trying to mount them.


Networking

Services and ports

Every workload that other workloads (or the Ingress) need to reach gets a ClusterIP Service. forail-task is the only one without a Service — nothing reaches into it; it pulls work off Redis and pushes it through Receptor.

ServicePortsSelectorReached by
forail-web8013 (HTTP API), 8015 (websocket)component=webIngress, forail-task callback, forail-operator
forail-frontend80 (HTTP)component=frontendIngress (path /)
forail-postgres5432component=postgresforail-web, forail-task, forail-init
forail-redis6379component=redisforail-web, forail-task
forail-opa8181 (REST policy decision API)component=opaforail-web (policy checks)
forail-otel-collector4317 (gRPC), 4318 (HTTP)component=otel-collectorforail-web, forail-task (traces + metrics)

Cluster DNS

Workloads reach each other through the cluster DNS at <service>.<namespace>.svc.cluster.local. Inside the forail namespace the bare service name resolves too, so forail.otel.endpoint defaults to http://forail-otel-collector:4317. The operator reaches the API at the FQDN http://forail-web.forail.svc.cluster.local:8013 because it lives in a different namespace.

Ingress and URL routing

A single Ingress resource named forail routes forail.lan by URL path. The chart’s default uses Traefik (the chart’s ingress.className: traefik), but any IngressController works — change className to nginx, contour, etc. Path order matters because Forail owns five distinct path prefixes that must take precedence over the SPA catch-all.

PathTarget ServiceWhy
/apiforail-web:8013REST API (/api/v2/*)
/adminforail-web:8013Django admin
/ssoforail-web:8013SAML / OIDC / social-auth callbacks
/websocketforail-web:8015Job event streaming over WS
/static/forailforail-frontend:80SPA assets — must come before /static below
/staticforail-web:8013Django staticfiles (admin CSS, DRF browser)
/forail-frontend:80SPA catch-all

Hostname TLD pitfall (.local vs .lan)

The chart’s ingress.host default is forail.lan, not forail.local. The reason: most desktop Linux distros ship nsswitch.conf with mdns_minimal [NOTFOUND=return] ahead of files, which intercepts every .local lookup, fails it, and bypasses /etc/hosts entirely. The browser then shows Server Not Found even though curl --resolve forail.local:30080:<node-ip> ... works fine. The .lan TLD has no such hijack.

For host-based access from a developer laptop, add to /etc/hosts:

192.168.56.32  forail.lan

Exposing the cluster

How Ingress traffic actually reaches your nodes depends on the IngressController service type:

CNI quirk on VirtualBox dev clusters

If you build a multi-VM cluster on VirtualBox host-only networks, Flannel’s default --iface auto-detect picks eth0 (the NAT adapter). Pod-to-pod traffic across nodes then dies, with symptoms like connect: no route to host from any cross-node service VIP. Patch the Flannel DaemonSet to bind to the host-only interface explicitly:

kubectl -n kube-flannel patch ds kube-flannel-ds --type=json \
  -p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--iface=eth1"}]'

The forail-dev-cluster Vagrant repo bakes this patch into master-init.sh.


Storage

Five PVCs are created. Sizes are values overrides — start with the defaults and grow the underlying storage when you outgrow them.

PVCDefault sizeModeMounted inPurpose
forail-postgres8 GiRWOStatefulSet forail-postgresDatabase files
forail-redis2 GiRWODeployment forail-redisRDB snapshots / AOF
forail-projects4 GiRWXforail-web + forail-taskSynced project repos (_<id>__<slug>)
forail-receptor2 GiRWOforail-taskReceptor work units (/tmp/receptor)
forail-backups10 GiRWOforail-web (mount-only)Where backup.sh writes .sql.gz archives

RWX requirement: forail-projects is mounted in both forail-web and forail-task. If those Pods land on different nodes, RWX is mandatory. If you’re forced to ReadWriteOnce (e.g. local-path-provisioner on bare metal), pin both Deployments to the same node with a nodeSelector or podAffinity; otherwise the second Pod will sit Pending forever.

Override the StorageClass per-PVC with values like postgres.storage.storageClass: ceph-rbd. Leaving it empty falls back to the cluster default StorageClass.


Prerequisites

Container images

Forail images are published to the public GitHub Container Registry: ghcr.io/forail-platform/*. No pull secret is required.

kubectl create namespace forail

If you mirror the images to your own registry, override images.backend.repository + images.frontend.repository in values.yaml and add an imagePullSecrets entry pointing at your in-cluster docker-registry Secret.

TLS secret

Pre-create a TLS Secret if you want HTTPS on first boot. Self-signed is fine for dev:

openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
    -keyout tls.key -out tls.crt -subj '/CN=forail.lan' \
    -addext 'subjectAltName=DNS:forail.lan,DNS:*.forail.lan'

kubectl -n forail create secret tls forail-tls --cert=tls.crt --key=tls.key

Install Forail core

git clone git@github.com:forail-platform/forail-helm.git
cd forail-helm

helm install forail . -n forail \
    --set forail.admin.user=admin \
    --set secrets.forailAdminPassword='<strong-password>' \
    --set secrets.postgresPassword='<random-32-bytes>' \
    --set secrets.forailSecretKey="$(openssl rand -hex 32)" \
    --set secrets.forailBroadcastWebsocketSecret="$(openssl rand -hex 32)"

The install runs synchronously through the forail-init Job. Watch progress with:

kubectl -n forail get pods -w

Expected end state — every Pod Running 1/1, forail-init-1 Completed:

NAME                              READY   STATUS      RESTARTS   AGE
forail-frontend-7f8c5d9c8b-abcde   1/1     Running     0          3m
forail-init-1-xxxxx                0/1     Completed   0          3m
forail-opa-6cdfb9d79c-fghij        1/1     Running     0          3m
forail-otel-collector-...          1/1     Running     0          3m
forail-postgres-0                  1/1     Running     0          3m
forail-redis-...                   1/1     Running     0          3m
forail-task-...                    1/1     Running     0          3m
forail-web-...                     1/1     Running     0          3m

Optional: AI Assistant

Chart 0.3.0 introduces an optional forail-assistant deployment that wraps the all-in-one image (Ollama + ChromaDB + RAG API in one container). It is disabled by default — enable it on a fresh install or on an upgrade:

helm upgrade forail . -n forail \
    --reuse-values \
    --set assistant.enabled=true

The chart provisions a PersistentVolumeClaim (forail-assistant-data, default 20 Gi) so the LLM model cache and Chroma vector store survive pod restarts. On first boot the entrypoint pulls the configured model (gemma3:1b by default, ≈800 MB) and the embedding model (nomic-embed-text), then indexes the bundled documentation — allow ~5 minutes before the pod is Ready. The startupProbe is sized for this (failureThreshold: 30 at periodSeconds: 10) so the liveness probe will not kill the pod mid-bootstrap.

Key values you may want to override:

ValueDefaultNotes
assistant.modelgemma3:1bSwitch to llama3.1:8b or mistral:7b for higher answer quality at the cost of memory and latency.
assistant.storage.size20GiBump to 30 Gi+ if you change models or grow the indexed corpus.
assistant.resources.limits.memory4Gigemma3:1b inference peaks ~1.5 Gi; larger models need 8 Gi+.

The Service is reachable cluster-internally at forail-assistant.forail.svc.cluster.local:8100. Ingress routing is intentionally not wired up by this release — expose it with a follow-up middleware (Traefik stripPrefix on /assistant) once you decide on a URL contract.


Install the Operator

The operator is shipped separately because not everyone wants declarative-via-CRD management — some teams prefer the UI. Install only after Forail core is healthy (the operator needs to authenticate to it on startup).

Generate a Forail OAuth2 PAT

The operator authenticates with a personal access token issued by Forail. Generate one inside the running web Pod:

TOKEN=$(kubectl -n forail exec deploy/forail-web -- \
    forail-manage create_oauth2_token --user admin | tail -1)
echo "$TOKEN"

The token is shown once — copy it now. Revoking it later: forail-manage revoke_oauth2_tokens --user admin.

Install with Helm

git clone git@github.com:forail-platform/forail-operator.git
cd forail-operator

helm install forail-operator helm/ \
    -n forail-operator --create-namespace \
    --set forail.token="$TOKEN" \
    --set forail.url=http://forail-web.forail.svc.cluster.local:8013

Verify:

kubectl -n forail-operator logs deploy/forail-operator -f

You should see Starting workers for each of the nine reconcilers (organization, team, project, inventory, credential, jobtemplate, schedule, workflow, forailinstance).

For full coverage of the v1.0.0 release — multi-cluster routing via ForailInstance, the OLM bundle, and the declarative Workflow DAG model — see the dedicated Operator v1.0.0 guide.

Custom Resources

Each CR maps to a Forail primary-key after first reconcile (status.forailId). The operator owns the resource via a finalizer — deleting the CR deletes the Forail resource too. Re-applying a CR after manual edits in the Forail UI overwrites the UI changes (drift is detected on a 60s requeue and reconciled toward the CR).

Sample CRs live in forail-operator/config/samples/:

KindMaps toSensitive fields
Organization/api/v2/organizationsNone — top-level tenant container with max-host quota
Team/api/v2/teams + member sync at /teams/{id}/users/None — spec.users[] resolves usernames at reconcile
Project/api/v2/projectsSCM credential resolved by name from an existing Credential CR
Inventory/api/v2/inventories + nested hosts & groupsNone — pure spec
Credential/api/v2/credentialsspec.inputsFrom[] reads sensitive values from k8s Secrets in the same namespace; the operator watches those Secrets and re-syncs on rotation
JobTemplate/api/v2/job_templates with credential / project / inventory references resolved by nameNone
Schedule/api/v2/schedulesNone — RFC 5545 RRULE drives recurrence
Workflow/api/v2/workflow_job_templates + node DAG at /workflow_job_templates/{id}/workflow_nodes/ + edgesDeclarative spec.nodes[] keyed by identifier with successNodes / failureNodes / alwaysNodes graph references
ForailInstanceControl-plane only — describes a Forail backend that other CRs target via spec.forailInstanceBearer token sourced from a k8s Secret via spec.tokenSecretRef

Verification

After both Helm releases land:

# Forail API healthy
kubectl -n forail port-forward svc/forail-web 8013:8013 &
curl http://localhost:8013/api/v2/ping/ | jq

# Browser access (assuming /etc/hosts is set)
open http://forail.lan:30080/

# Operator wired up
kubectl get crd | grep forail-platform.io
kubectl apply -f https://raw.githubusercontent.com/forail-platform/forail-operator/main/config/samples/inventory-sample.yaml
kubectl get inventory production -o jsonpath='{.status}'

Day-2 Operations

Upgrade

git -C forail-helm pull
helm upgrade forail ./forail-helm -n forail --reuse-values

The chart re-runs forail-init on each upgrade (its name embeds {{ .Release.Revision }}). The init script is idempotent. If you upgrade against a populated DB, the migrate step finishes in milliseconds; the heavier preload-data and execution-environment registration steps short-circuit when records already exist.

Backup

kubectl -n forail exec deploy/forail-web -- /usr/local/bin/backup.sh
kubectl -n forail cp forail-web-xxx:/var/backups/forail/forail-<ts>.sql.gz ./forail.sql.gz

Scale forail-task

Forail dispatches jobs to any forail-task replica via the Receptor mesh. Increase task.replicas in values.yaml for more concurrent jobs. Keep in mind:


Customizing values.yaml

The shipped values.yaml is heavily commented. The most commonly tuned values:

PathDefaultWhy you’d change it
images.backend.repositoryghcr.io/forail-platform/forail-backendMirror to your own registry
images.backend.taglatestPin to a CalVer release
secrets.forailAdminPasswordchangeme-adminAlways — it’s a placeholder
ingress.hostforail.lanYour real DNS name
ingress.tls.enabledtrueDisable for plain-HTTP test clusters
postgres.storage.size8GiProduction sizing
postgres.enabledtrueSet false + override DB env to point at an external Postgres (RDS, Cloud SQL)
otelCollector.enabledtrueDisable if you ship an OTel Collector at the cluster level (DaemonSet)
web.replicas / task.replicas1 / 1HA / throughput

Troubleshooting

Browser shows “Server Not Found” for forail.local

That’s the mDNS hijack described above. Use the chart default (forail.lan) and update /etc/hosts accordingly. If you’re locked into a .local hostname, edit /etc/nsswitch.conf on every developer laptop to put files before mdns_minimal — but this is system-wide and may break Avahi-discovered devices on the LAN.

Jobs fail with “unknown work type kubernetes-incluster-auth”

The Forail instance is registered as node_type=control instead of hybrid. Control nodes only orchestrate; they refuse to execute jobs locally and try to dispatch them to a ContainerGroup. The chart’s init.sh runs an explicit ORM update after provision_instance to fix this — if you wrote a custom init, replicate that step:

forail-manage shell -c "
from forail.main.models import Instance
i = Instance.objects.get(hostname='<node>')
i.node_type = 'hybrid'; i.save(update_fields=['node_type'])
"

The same script also flips the auto-created default InstanceGroup off is_container_group=True, which is what the post-migrate signal sets when Forail detects it’s running in k8s.

forail-task pod restarts with exit code 137

OOMKilled — the default memory limit was 2 Gi in earlier chart versions. Bump task.resources.limits.memory to 4Gi (the current default). Memory is tight because supervisord runs uwsgi + dispatcher + Receptor + ansible-runner + a podman EE container concurrently.

Job error “Error updating status file /tmp/receptor/.../status.lock: no such file or directory”

The Receptor work-unit directory disappeared. This used to happen when forail-task was OOMKilled mid-job and the pod restart wiped /tmp. Mitigation:

  1. Bump task memory (above)
  2. The chart now mounts forail-receptor PVC at /tmp/receptor so work units survive Pod restarts

Pods on different nodes can’t reach each other (VirtualBox only)

See the CNI quirk note — Flannel must be patched to bind to the host-only interface (eth1), not the NAT-mode eth0. forail-dev-cluster bakes this into provisioning.


Companion Repositories

RepoWhat it ships
forail-helmThe Helm chart described on this page
forail-operatorThe Kubernetes operator with 4 CRDs
forail-dev-cluster4-VM Vagrant + VirtualBox cluster (2 master + 2 worker, k8s 1.30) for chart and operator integration testing — includes the Flannel eth1 patch and a post-cluster-setup.sh that installs Traefik, local-path-provisioner, and the forail namespace prerequisites
forail-devopsThe Docker Compose deployment described on the Deployment page — same image set, single-host topology