CConnectAI Docs

Set up your brain  /  Self-hosting (advanced)

Advanced setup

Self-hosting (advanced)

The Quickstart installer (npx @connectai/selfhost run) is the recommended way to stand up ConnectAI: it pulls prebuilt images and needs no source. This page is the advanced path for operators who want to drive docker-compose directly, tune the environment, or build from source. If you only want a running brain on localhost, use the Quickstart and skip this page.

For the exhaustive operator reference (every connector model, the first-run wizard, day-2 operations, and the full troubleshooting matrix), see the SELF_HOSTING.md guide that ships in your checkout. This page covers the v1 setup funnel: the compose path, environment, vault, and models.

Prerequisites

  • Docker with the Compose plugin (docker compose version reports v2 or newer).
  • `bash`, `curl`, `openssl`, `python3` on PATH (the boot script uses them to generate secrets and provision the vault).
  • Disk and RAM headroom. The default profile runs Postgres, Infisical (plus its own Postgres and Redis), the API, the loop worker, the console, and Ollama. Budget roughly 6 to 8 GB RAM to boot and 8 GB or more free disk for the Ollama models.
  • Inference horsepower for real workloads. The bundled Ollama runs the chat model on CPU by default, which is fine for evaluation but slow under load. For production latency, run on a host with a supported GPU (Ollama uses it automatically) or bring your own hosted inference and drop Ollama entirely.

The compose path

From a clean checkout, one command stands up the whole stack:

bash
make selfhost

That is equivalent to bash scripts/selfhost/up.sh. It creates deploy/selfhost/.env from the example, generates the random secrets, brings up the Infisical vault and waits for it, provisions a vault machine identity, brings up the full stack, and waits for the API /health check and the console.

If you prefer to drive the steps yourself (or to debug one), the command above is exactly these four, with the same --env-file flag:

bash
cp deploy/selfhost/.env.example deploy/selfhost/.env   # then edit / fill secrets
docker compose --env-file deploy/selfhost/.env -f docker-compose.selfhost.yml up -d infisical
./scripts/selfhost/provision-infisical.sh              # writes INFISICAL_* to the .env
docker compose --env-file deploy/selfhost/.env -f docker-compose.selfhost.yml up -d
The --env-file deploy/selfhost/.env flag matters. Compose reads that file for ${VAR} interpolation (the Infisical encryption and auth secrets). A service's env_file: only injects variables into that container; it does not feed Compose's own ${VAR} substitution. Without the flag, Infisical boots with a blank key and crash-loops.

When it finishes:

Console:  http://localhost:5273
API:      http://localhost:4000/health

Get the one-time setup token (printed once to the API logs at boot):

bash
make selfhost-token

Then open http://localhost:5273, which routes you into the first-run wizard because the instance is unconfigured.

Prebuilt images (skip the source build)

The default compose builds the api, loop-svc, and console images from source on the first up, which can take several minutes. To run prebuilt images instead, docker-compose.selfhost.images.yml references the published GHCR images and only ever pulls:

bash
docker compose -f docker-compose.selfhost.images.yml --env-file ./connectai.env up -d

This is exactly what the npx @connectai/selfhost installer wraps. The three first-party services pin the published prebuilt connectai-app and connectai-console images at ${CONNECTAI_IMAGE_TAG}.

Deploying beyond localhost

The default boot points every origin at localhost. The moment you serve from a real host or domain, set three URLs together, because one of them is baked into the console at build time:

VariableSet it toWhy
VITE_API_BASE_URLyour public API originBaked into the console bundle. Every browser calls the API here. If it stays localhost, requests hit the visitor's own machine. This is the most common self-host mistake.
PUBLIC_BASE_URLyour public API originThe API's public URL: the OAuth redirect base and the session-cookie Secure decision derive from it.
WEB_ORIGINyour public console originThe API trusts this for CORS and the post-connect redirect. It must byte-match the browser Origin (scheme, host, port), with no path or trailing slash.

Use https for all three and terminate TLS at your reverse proxy. Keep the console and API on the same registrable domain (subdomains like console.example.com and api.example.com are fine), because the __session cookie is SameSite=Lax.

On a prebuilt console image you cannot rebuild VITE_API_BASE_URL per boot. Set CONSOLE_API_BASE_URL in the env file instead: the console entrypoint writes /config.js at container start and the SPA prefers it over the baked value.

Environment reference

deploy/selfhost/.env (created from .env.example). The boot fills the secrets for you; these are the values that matter most:

VariableWhat it isDefault / note
PUBLIC_BASE_URLPublic URL the API is reached at; the OAuth redirect base.http://localhost:4000 for local eval.
WEB_ORIGINThe console origin the API trusts for CORS. Byte-match the browser Origin.http://localhost:5273.
VITE_API_BASE_URLBaked into the console bundle at build time.http://localhost:4000.
AUTH_MODElocal (first-party password, the self-host default), oidc, or firebase (cloud only).local. Pinned in the compose so it cannot be unset by editing .env.
COBRAIN_DEPLOYMENT_MODEselfhost arms the fail-closed inference guard.selfhost. Pinned in the compose.
COBRAIN_INFERENCE_PROVIDERollama (bundled, offline) or byo (your own endpoint). Env-pinned: change here and restart.ollama.
OLLAMA_MODEL / OLLAMA_EMBED_MODELThe bundled chat and embedding models.qwen2.5:1.5b / mxbai-embed-large (the embed model must emit 1024-dim vectors).
POSTGRES_PASSWORD / DATABASE_URLThe brain Postgres credentials.Example default. Change for anything but local eval.

Most config is env-pinned: the inference provider, OAuth client credentials, auth mode, and public base URL change only by editing .env and restarting (re-run make selfhost). The console Settings page shows these read-only. The one genuinely runtime-editable value is the admin password.

The vault

Self-host runs the bundled Infisical vault by default. Credential and session plaintext (OAuth tokens, the session signing key) live only in Infisical under the cred_<id> convention; the brain database holds only vault keys plus non-sensitive metadata, never plaintext. This satisfies the key-separation requirement of the binding database privacy rule: the vault key is separate from the encrypted data zone.

The Infisical image is pinned (infisical/infisical:v0.159.28), not latest, because its self-hosted admin API drifts between versions and the provisioning script is written against the pinned version. If you bump the pin, re-verify the provisioning flow, or provision the machine identity in the Infisical UI at http://localhost:8082 and paste the three INFISICAL_* values into your .env.

Inference and model right-sizing

Bundled Ollama (default, offline). No external key, and your data never leaves the box. The trade-off is the model pull: the chat model and the 1024-dim embedding model are several GB and the first pull takes minutes. To pull manually:

bash
docker compose --env-file deploy/selfhost/.env -f docker-compose.selfhost.yml \
  exec ollama ollama pull qwen2.5:1.5b
docker compose --env-file deploy/selfhost/.env -f docker-compose.selfhost.yml \
  exec ollama ollama pull mxbai-embed-large

Host RAM and choosing the chat model. The chat model and the embedder both load into RAM at the same time during ingest and ask. A chat model that does not leave room for the embedder gets OOM-killed, Ollama returns HTTP 500, and per-record ingest fails. The default is right-sized so the substrate works on a modest box:

ModelRAM (with embedder)FitsTrade-off
qwen2.5:1.5b (default)~8 GBmodest boxIngest, retrieval, and citation work. Composes ask prose weakly.
qwen2.5:3b~8 to 12 GBmid boxComposes better; slow on CPU-only.
qwen3.5:latest~16 GB or morelarge box / GPUStrongest local composer; OOMs a small box.

Retrieval (search) is reliable on every option; only ask prose needs the stronger model. For demo-grade answers without a large box, use BYO with a Sonnet-class model rather than a big local model. On a slow CPU-only box, widen the sync lease with COBRAIN_SYNC_LEASE_MS=3600000 so one cold-start ingest finishes inside the lease.

Bring your own hosted inference (BYO). Point ConnectAI at your own OpenAI-compatible endpoint (your provider account, or a proxy such as LiteLLM, vLLM, or Azure OpenAI). In deploy/selfhost/.env:

bash
COBRAIN_INFERENCE_PROVIDER=byo
BYO_INFERENCE_BASE_URL=https://your-endpoint.example/v1
BYO_INFERENCE_API_KEY=...                 # your key; stored in the vault, never logged
BYO_CHAT_MODEL=gpt-4o-mini
BYO_EMBED_MODEL=text-embedding-3-small     # MUST emit 1024-dim vectors

You can then drop the ollama service for a much lighter stack. In self-host mode, ConnectAI's billed hosted gateway is structurally forbidden at the inference chokepoint for both text and embeddings: the only hosted path that can exist is the one you configure, and the guard fails closed.

Day-2 operations

bash
make selfhost-ps       # service health at a glance
make selfhost-logs     # tail the api logs (where the setup token prints)
make selfhost-down     # stop the stack (keeps the data volumes)

Migrations run automatically on API boot under a single advisory lock, so you never run them by hand. To upgrade, git pull (or pull a new image tag) and re-run make selfhost; the next API boot brings the schema forward.

For the full troubleshooting matrix (Infisical key errors, OAuth redirect mismatches, port conflicts, cold-start timeouts), see the SELF_HOSTING.md guide that ships in your checkout.

Where to go next

  • Connect your agent (MCP) points Claude, Cursor, or a CLI at your brain over the read-only MCP endpoint.