Advanced setup
Self-hosting (advanced)
The Quickstart installer (npx @connectai/selfhost run) is the recommended way to stand up ConnectAI: it pulls prebuilt images and needs no source. This page is the advanced path for operators who want to drive docker-compose directly, tune the environment, or build from source. If you only want a running brain on localhost, use the Quickstart and skip this page.
For the exhaustive operator reference (every connector model, the first-run wizard, day-2 operations, and the full troubleshooting matrix), see the SELF_HOSTING.md guide that ships in your checkout. This page covers the v1 setup funnel: the compose path, environment, vault, and models.
Prerequisites
- Docker with the Compose plugin (
docker compose versionreports v2 or newer). - `bash`, `curl`, `openssl`, `python3` on PATH (the boot script uses them to generate secrets and provision the vault).
- Disk and RAM headroom. The default profile runs Postgres, Infisical (plus its own Postgres and Redis), the API, the loop worker, the console, and Ollama. Budget roughly 6 to 8 GB RAM to boot and 8 GB or more free disk for the Ollama models.
- Inference horsepower for real workloads. The bundled Ollama runs the chat model on CPU by default, which is fine for evaluation but slow under load. For production latency, run on a host with a supported GPU (Ollama uses it automatically) or bring your own hosted inference and drop Ollama entirely.
The compose path
From a clean checkout, one command stands up the whole stack:
make selfhostThat is equivalent to bash scripts/selfhost/up.sh. It creates deploy/selfhost/.env from the example, generates the random secrets, brings up the Infisical vault and waits for it, provisions a vault machine identity, brings up the full stack, and waits for the API /health check and the console.
If you prefer to drive the steps yourself (or to debug one), the command above is exactly these four, with the same --env-file flag:
cp deploy/selfhost/.env.example deploy/selfhost/.env # then edit / fill secrets
docker compose --env-file deploy/selfhost/.env -f docker-compose.selfhost.yml up -d infisical
./scripts/selfhost/provision-infisical.sh # writes INFISICAL_* to the .env
docker compose --env-file deploy/selfhost/.env -f docker-compose.selfhost.yml up -d--env-file deploy/selfhost/.env flag matters. Compose reads that file for ${VAR} interpolation (the Infisical encryption and auth secrets). A service's env_file: only injects variables into that container; it does not feed Compose's own ${VAR} substitution. Without the flag, Infisical boots with a blank key and crash-loops.When it finishes:
Console: http://localhost:5273
API: http://localhost:4000/healthGet the one-time setup token (printed once to the API logs at boot):
make selfhost-tokenThen open http://localhost:5273, which routes you into the first-run wizard because the instance is unconfigured.
Prebuilt images (skip the source build)
The default compose builds the api, loop-svc, and console images from source on the first up, which can take several minutes. To run prebuilt images instead, docker-compose.selfhost.images.yml references the published GHCR images and only ever pulls:
docker compose -f docker-compose.selfhost.images.yml --env-file ./connectai.env up -dThis is exactly what the npx @connectai/selfhost installer wraps. The three first-party services pin the published prebuilt connectai-app and connectai-console images at ${CONNECTAI_IMAGE_TAG}.
Deploying beyond localhost
The default boot points every origin at localhost. The moment you serve from a real host or domain, set three URLs together, because one of them is baked into the console at build time:
| Variable | Set it to | Why |
|---|---|---|
VITE_API_BASE_URL | your public API origin | Baked into the console bundle. Every browser calls the API here. If it stays localhost, requests hit the visitor's own machine. This is the most common self-host mistake. |
PUBLIC_BASE_URL | your public API origin | The API's public URL: the OAuth redirect base and the session-cookie Secure decision derive from it. |
WEB_ORIGIN | your public console origin | The API trusts this for CORS and the post-connect redirect. It must byte-match the browser Origin (scheme, host, port), with no path or trailing slash. |
Use https for all three and terminate TLS at your reverse proxy. Keep the console and API on the same registrable domain (subdomains like console.example.com and api.example.com are fine), because the __session cookie is SameSite=Lax.
On a prebuilt console image you cannot rebuild VITE_API_BASE_URL per boot. Set CONSOLE_API_BASE_URL in the env file instead: the console entrypoint writes /config.js at container start and the SPA prefers it over the baked value.
Environment reference
deploy/selfhost/.env (created from .env.example). The boot fills the secrets for you; these are the values that matter most:
| Variable | What it is | Default / note |
|---|---|---|
PUBLIC_BASE_URL | Public URL the API is reached at; the OAuth redirect base. | http://localhost:4000 for local eval. |
WEB_ORIGIN | The console origin the API trusts for CORS. Byte-match the browser Origin. | http://localhost:5273. |
VITE_API_BASE_URL | Baked into the console bundle at build time. | http://localhost:4000. |
AUTH_MODE | local (first-party password, the self-host default), oidc, or firebase (cloud only). | local. Pinned in the compose so it cannot be unset by editing .env. |
COBRAIN_DEPLOYMENT_MODE | selfhost arms the fail-closed inference guard. | selfhost. Pinned in the compose. |
COBRAIN_INFERENCE_PROVIDER | ollama (bundled, offline) or byo (your own endpoint). Env-pinned: change here and restart. | ollama. |
OLLAMA_MODEL / OLLAMA_EMBED_MODEL | The bundled chat and embedding models. | qwen2.5:1.5b / mxbai-embed-large (the embed model must emit 1024-dim vectors). |
POSTGRES_PASSWORD / DATABASE_URL | The brain Postgres credentials. | Example default. Change for anything but local eval. |
Most config is env-pinned: the inference provider, OAuth client credentials, auth mode, and public base URL change only by editing .env and restarting (re-run make selfhost). The console Settings page shows these read-only. The one genuinely runtime-editable value is the admin password.
The vault
Self-host runs the bundled Infisical vault by default. Credential and session plaintext (OAuth tokens, the session signing key) live only in Infisical under the cred_<id> convention; the brain database holds only vault keys plus non-sensitive metadata, never plaintext. This satisfies the key-separation requirement of the binding database privacy rule: the vault key is separate from the encrypted data zone.
The Infisical image is pinned (infisical/infisical:v0.159.28), not latest, because its self-hosted admin API drifts between versions and the provisioning script is written against the pinned version. If you bump the pin, re-verify the provisioning flow, or provision the machine identity in the Infisical UI at http://localhost:8082 and paste the three INFISICAL_* values into your .env.
Inference and model right-sizing
Bundled Ollama (default, offline). No external key, and your data never leaves the box. The trade-off is the model pull: the chat model and the 1024-dim embedding model are several GB and the first pull takes minutes. To pull manually:
docker compose --env-file deploy/selfhost/.env -f docker-compose.selfhost.yml \
exec ollama ollama pull qwen2.5:1.5b
docker compose --env-file deploy/selfhost/.env -f docker-compose.selfhost.yml \
exec ollama ollama pull mxbai-embed-largeHost RAM and choosing the chat model. The chat model and the embedder both load into RAM at the same time during ingest and ask. A chat model that does not leave room for the embedder gets OOM-killed, Ollama returns HTTP 500, and per-record ingest fails. The default is right-sized so the substrate works on a modest box:
| Model | RAM (with embedder) | Fits | Trade-off |
|---|---|---|---|
qwen2.5:1.5b (default) | ~8 GB | modest box | Ingest, retrieval, and citation work. Composes ask prose weakly. |
qwen2.5:3b | ~8 to 12 GB | mid box | Composes better; slow on CPU-only. |
qwen3.5:latest | ~16 GB or more | large box / GPU | Strongest local composer; OOMs a small box. |
Retrieval (search) is reliable on every option; only ask prose needs the stronger model. For demo-grade answers without a large box, use BYO with a Sonnet-class model rather than a big local model. On a slow CPU-only box, widen the sync lease with COBRAIN_SYNC_LEASE_MS=3600000 so one cold-start ingest finishes inside the lease.
Bring your own hosted inference (BYO). Point ConnectAI at your own OpenAI-compatible endpoint (your provider account, or a proxy such as LiteLLM, vLLM, or Azure OpenAI). In deploy/selfhost/.env:
COBRAIN_INFERENCE_PROVIDER=byo
BYO_INFERENCE_BASE_URL=https://your-endpoint.example/v1
BYO_INFERENCE_API_KEY=... # your key; stored in the vault, never logged
BYO_CHAT_MODEL=gpt-4o-mini
BYO_EMBED_MODEL=text-embedding-3-small # MUST emit 1024-dim vectorsYou can then drop the ollama service for a much lighter stack. In self-host mode, ConnectAI's billed hosted gateway is structurally forbidden at the inference chokepoint for both text and embeddings: the only hosted path that can exist is the one you configure, and the guard fails closed.
Day-2 operations
make selfhost-ps # service health at a glance
make selfhost-logs # tail the api logs (where the setup token prints)
make selfhost-down # stop the stack (keeps the data volumes)Migrations run automatically on API boot under a single advisory lock, so you never run them by hand. To upgrade, git pull (or pull a new image tag) and re-run make selfhost; the next API boot brings the schema forward.
For the full troubleshooting matrix (Infisical key errors, OAuth redirect mismatches, port conflicts, cold-start timeouts), see the SELF_HOSTING.md guide that ships in your checkout.
Where to go next
- Connect your agent (MCP) points Claude, Cursor, or a CLI at your brain over the read-only MCP endpoint.