fix(deploy): make LXC deploys atomic and fail-fast

Rebuild the deployment flow to prepare releases remotely, validate env/sudo prerequisites, run migrations in-release, and auto-rollback on health failures. Consolidate deployment docs and add a manual CI workflow so laptop and CI use the same push-based deploy path.
This commit is contained in:
Piotr Oleszczyk 2026-03-07 01:14:30 +01:00
parent d228b44209
commit 2efdb2b785
8 changed files with 1057 additions and 319 deletions

View file

@ -0,0 +1,97 @@
# Deployment Quickstart
This is the short operator checklist. Full details are in `docs/DEPLOYMENT.md`.
Canonical env file locations (and only these):
- `/opt/innercontext/shared/backend/.env`
- `/opt/innercontext/shared/frontend/.env.production`
## 1) Server prerequisites (once)
```bash
mkdir -p /opt/innercontext/releases
mkdir -p /opt/innercontext/shared/backend
mkdir -p /opt/innercontext/shared/frontend
mkdir -p /opt/innercontext/scripts
chown -R innercontext:innercontext /opt/innercontext
```
Create shared env files:
```bash
cat > /opt/innercontext/shared/backend/.env <<'EOF'
DATABASE_URL=postgresql+psycopg://innercontext:change-me@<pg-ip>/innercontext
GEMINI_API_KEY=your-key
EOF
cat > /opt/innercontext/shared/frontend/.env.production <<'EOF'
PUBLIC_API_BASE=http://127.0.0.1:8000
ORIGIN=http://innercontext.lan
EOF
chmod 600 /opt/innercontext/shared/backend/.env
chmod 600 /opt/innercontext/shared/frontend/.env.production
chown innercontext:innercontext /opt/innercontext/shared/backend/.env
chown innercontext:innercontext /opt/innercontext/shared/frontend/.env.production
```
Deploy sudoers:
```bash
cat > /etc/sudoers.d/innercontext-deploy << 'EOF'
innercontext ALL=(root) NOPASSWD: \
/usr/bin/systemctl restart innercontext, \
/usr/bin/systemctl restart innercontext-node, \
/usr/bin/systemctl restart innercontext-pricing-worker, \
/usr/bin/systemctl is-active innercontext, \
/usr/bin/systemctl is-active innercontext-node, \
/usr/bin/systemctl is-active innercontext-pricing-worker
EOF
chmod 440 /etc/sudoers.d/innercontext-deploy
visudo -c -f /etc/sudoers.d/innercontext-deploy
sudo -u innercontext sudo -n -l
```
## 2) Local SSH config
`~/.ssh/config`:
```
Host innercontext
HostName <lxc-ip>
User innercontext
```
## 3) Deploy from your machine
```bash
./deploy.sh
./deploy.sh backend
./deploy.sh frontend
./deploy.sh list
./deploy.sh rollback
```
## 4) Verify
```bash
curl -sf http://innercontext.lan/api/health-check
curl -sf http://innercontext.lan/
```
## 5) Common fixes
Lock stuck:
```bash
rm -f /opt/innercontext/.deploy.lock
```
Show service logs:
```bash
journalctl -u innercontext -n 100
journalctl -u innercontext-node -n 100
journalctl -u innercontext-pricing-worker -n 100
```

View file

@ -1,376 +1,259 @@
# Deployment guide — Proxmox LXC (home network)
# Deployment Guide (LXC + systemd + nginx)
Target architecture:
This project deploys from an external machine (developer laptop or CI runner) to a Debian LXC host over SSH.
Deployments are push-based, release-based, and atomic:
- Build and validate locally
- Upload to `/opt/innercontext/releases/<timestamp>`
- Run backend dependency sync and migrations in that release directory
- Promote once by switching `/opt/innercontext/current`
- Restart services and run health checks
- Auto-rollback on failure
Environment files have exactly two persistent locations on the server:
- `/opt/innercontext/shared/backend/.env`
- `/opt/innercontext/shared/frontend/.env.production`
Each release links to those files from:
- `/opt/innercontext/current/backend/.env` -> `../../../shared/backend/.env`
- `/opt/innercontext/current/frontend/.env.production` -> `../../../shared/frontend/.env.production`
## Architecture
```
Reverse proxy (existing) innercontext LXC (new, Debian 13)
┌──────────────────────┐ ┌────────────────────────────────────┐
│ reverse proxy │────────────▶│ nginx :80 │
│ innercontext.lan → * │ │ /api/* → uvicorn :8000/* │
└──────────────────────┘ │ /* → SvelteKit Node :3000 │
└────────────────────────────────────┘
│ │
FastAPI SvelteKit Node
external machine (manual now, CI later)
|
| ssh + rsync
v
LXC host
/opt/innercontext/
current -> releases/<timestamp>
releases/<timestamp>
shared/backend/.env
shared/frontend/.env.production
scripts/
```
> **Frontend is never built on the server.** The `vite build` + `adapter-node`
> esbuild step is CPU/RAM-intensive and will hang on a small LXC. Build locally,
> deploy the `build/` artifact via `deploy.sh`.
Services:
## 1. Prerequisites
- `innercontext` (FastAPI, localhost:8000)
- `innercontext-node` (SvelteKit Node, localhost:3000)
- `innercontext-pricing-worker` (background worker)
- Proxmox VE host with an existing PostgreSQL LXC and a reverse proxy
- LAN hostname `innercontext.lan` resolvable on the network (via router DNS or `/etc/hosts`)
- The PostgreSQL LXC must accept connections from the innercontext LXC IP
nginx routes:
---
- `/api/*` -> `http://127.0.0.1:8000/*`
- `/*` -> `http://127.0.0.1:3000/*`
## 2. Create the LXC container
## Run Model
In the Proxmox UI (or via CLI):
- Manual deploy: run `./deploy.sh ...` from repo root on your laptop.
- Optional CI deploy: run the same script from a manual workflow (`workflow_dispatch`).
- The server never builds frontend assets.
```bash
# CLI example — adjust storage, bridge, IP to your environment
pct create 200 local:vztmpl/debian-13-standard_13.0-1_amd64.tar.zst \
--hostname innercontext \
--cores 2 \
--memory 1024 \
--swap 512 \
--rootfs local-lvm:8 \
--net0 name=eth0,bridge=vmbr0,ip=dhcp \
--unprivileged 1 \
--start 1
```
## One-Time Server Setup
Note the container's IP address after it starts (`pct exec 200 -- ip -4 a`).
Run on the LXC host as root.
---
## 3. Container setup
```bash
pct enter 200 # or SSH into the container
```
### System packages
### 1) Install runtime dependencies
```bash
apt update && apt upgrade -y
apt install -y git nginx curl ca-certificates gnupg lsb-release libpq5 rsync
```
apt install -y git nginx curl ca-certificates libpq5 rsync python3 python3-venv
### Python 3.12+ + uv
```bash
apt install -y python3 python3-venv
curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=/usr/local/bin sh
```
Installing to `/usr/local/bin` makes `uv` available system-wide (required for `sudo -u innercontext uv sync`).
### Node.js 24 LTS + pnpm
The server needs Node.js to **run** the pre-built frontend bundle, and pnpm to
**install production runtime dependencies** (`clsx`, `bits-ui`, etc. —
`adapter-node` bundles the SvelteKit framework but leaves these external).
The frontend is never **built** on the server.
```bash
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.4/install.sh | bash
. "$HOME/.nvm/nvm.sh"
nvm install 24
```
Copy `node` to `/usr/local/bin` so it is accessible system-wide
(required for `sudo -u innercontext` and for systemd).
Use `--remove-destination` to replace any existing symlink with a real file:
```bash
cp --remove-destination "$(nvm which current)" /usr/local/bin/node
```
Install pnpm as a standalone binary — self-contained, no wrapper scripts,
works system-wide:
```bash
curl -fsSL "https://github.com/pnpm/pnpm/releases/latest/download/pnpm-linux-x64" \
-o /usr/local/bin/pnpm
chmod 755 /usr/local/bin/pnpm
```
### Application user
### 2) Create app user and directories
```bash
useradd --system --create-home --shell /bin/bash innercontext
```
---
## 4. Create the database on the PostgreSQL LXC
Run on the **PostgreSQL LXC**:
```bash
psql -U postgres <<'SQL'
CREATE USER innercontext WITH PASSWORD 'change-me';
CREATE DATABASE innercontext OWNER innercontext;
SQL
```
Edit `/etc/postgresql/18/main/pg_hba.conf` and add (replace `<lxc-ip>` with the innercontext container IP):
```
host innercontext innercontext <lxc-ip>/32 scram-sha-256
```
Then reload:
```bash
systemctl reload postgresql
```
---
## 5. Clone the repository
```bash
mkdir -p /opt/innercontext
git clone https://github.com/your-user/innercontext.git /opt/innercontext
mkdir -p /opt/innercontext/releases
mkdir -p /opt/innercontext/shared/backend
mkdir -p /opt/innercontext/shared/frontend
mkdir -p /opt/innercontext/scripts
chown -R innercontext:innercontext /opt/innercontext
```
---
## 6. Backend setup
### 3) Create shared env files
```bash
cd /opt/innercontext/backend
```
### Install dependencies
```bash
sudo -u innercontext uv sync
```
### Create `.env`
```bash
cat > /opt/innercontext/backend/.env <<'EOF'
DATABASE_URL=postgresql+psycopg://innercontext:change-me@<pg-lxc-ip>/innercontext
GEMINI_API_KEY=your-gemini-api-key
# GEMINI_MODEL=gemini-flash-latest # optional, this is the default
cat > /opt/innercontext/shared/backend/.env <<'EOF'
DATABASE_URL=postgresql+psycopg://innercontext:change-me@<pg-ip>/innercontext
GEMINI_API_KEY=your-key
EOF
chmod 600 /opt/innercontext/backend/.env
chown innercontext:innercontext /opt/innercontext/backend/.env
```
### Run database migrations
```bash
sudo -u innercontext bash -c '
cd /opt/innercontext/backend
uv run alembic upgrade head
'
```
This creates all tables on first run. On subsequent deploys it applies only the new migrations.
> **Existing database (tables already created by `create_db_and_tables`):**
> Run `uv run alembic stamp head` instead to mark the current schema as migrated without re-running DDL.
### Test
```bash
sudo -u innercontext bash -c '
cd /opt/innercontext/backend
uv run uvicorn main:app --host 127.0.0.1 --port 8000
'
# Ctrl-C after confirming it starts
```
### Install systemd service
```bash
cp /opt/innercontext/systemd/innercontext.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable --now innercontext
systemctl status innercontext
```
---
## 7. Frontend setup
The frontend is **built locally and uploaded** via `deploy.sh` — never built on the server.
This section only covers the one-time server-side configuration.
### Create `.env.production`
```bash
cat > /opt/innercontext/frontend/.env.production <<'EOF'
PUBLIC_API_BASE=http://innercontext.lan/api
cat > /opt/innercontext/shared/frontend/.env.production <<'EOF'
PUBLIC_API_BASE=http://127.0.0.1:8000
ORIGIN=http://innercontext.lan
EOF
chmod 600 /opt/innercontext/frontend/.env.production
chown innercontext:innercontext /opt/innercontext/frontend/.env.production
chmod 600 /opt/innercontext/shared/backend/.env
chmod 600 /opt/innercontext/shared/frontend/.env.production
chown innercontext:innercontext /opt/innercontext/shared/backend/.env
chown innercontext:innercontext /opt/innercontext/shared/frontend/.env.production
```
### Grant `innercontext` passwordless sudo for service restarts
### 4) Grant deploy sudo permissions
```bash
cat > /etc/sudoers.d/innercontext-deploy << 'EOF'
innercontext ALL=(root) NOPASSWD: \
/usr/bin/systemctl restart innercontext, \
/usr/bin/systemctl restart innercontext-node, \
/usr/bin/systemctl restart innercontext-pricing-worker
/usr/bin/systemctl restart innercontext-pricing-worker, \
/usr/bin/systemctl is-active innercontext, \
/usr/bin/systemctl is-active innercontext-node, \
/usr/bin/systemctl is-active innercontext-pricing-worker
EOF
chmod 440 /etc/sudoers.d/innercontext-deploy
visudo -c -f /etc/sudoers.d/innercontext-deploy
# Must work without password or TTY prompt:
sudo -u innercontext sudo -n -l
```
### Install systemd services
If `sudo -n -l` fails, deployments will fail during restart/rollback with:
`sudo: a terminal is required` or `sudo: a password is required`.
### 5) Install systemd and nginx configs
After first deploy (or after copying repo content to `/opt/innercontext/current`), install configs:
```bash
cp /opt/innercontext/systemd/innercontext-node.service /etc/systemd/system/
cp /opt/innercontext/systemd/innercontext-pricing-worker.service /etc/systemd/system/
cp /opt/innercontext/current/systemd/innercontext.service /etc/systemd/system/
cp /opt/innercontext/current/systemd/innercontext-node.service /etc/systemd/system/
cp /opt/innercontext/current/systemd/innercontext-pricing-worker.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable innercontext
systemctl enable innercontext-node
systemctl enable --now innercontext-pricing-worker
# Do NOT start yet — build/ is empty until the first deploy.sh run
```
systemctl enable innercontext-pricing-worker
---
## 8. nginx setup
```bash
cp /opt/innercontext/nginx/innercontext.conf /etc/nginx/sites-available/innercontext
ln -s /etc/nginx/sites-available/innercontext /etc/nginx/sites-enabled/
cp /opt/innercontext/current/nginx/innercontext.conf /etc/nginx/sites-available/innercontext
ln -sf /etc/nginx/sites-available/innercontext /etc/nginx/sites-enabled/innercontext
rm -f /etc/nginx/sites-enabled/default
nginx -t
systemctl reload nginx
nginx -t && systemctl reload nginx
```
---
## Local Machine Setup
## 9. Reverse proxy configuration
Point your existing reverse proxy at the innercontext LXC's nginx (`<innercontext-lxc-ip>:80`).
Example — Caddy:
```
innercontext.lan {
reverse_proxy <innercontext-lxc-ip>:80
}
```
Example — nginx upstream:
```nginx
server {
listen 80;
server_name innercontext.lan;
location / {
proxy_pass http://<innercontext-lxc-ip>:80;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
```
Reload your reverse proxy after applying the change.
---
## 10. First deploy from local machine
All subsequent deploys (including the first one) use `deploy.sh` from your local machine.
### SSH config
Add to `~/.ssh/config` on your local machine:
`~/.ssh/config`:
```
Host innercontext
HostName <innercontext-lxc-ip>
HostName <lxc-ip>
User innercontext
```
Make sure your SSH public key is in `/home/innercontext/.ssh/authorized_keys` on the server.
Ensure your public key is in `/home/innercontext/.ssh/authorized_keys`.
### Run the first deploy
## Deploy Commands
From repository root on external machine:
```bash
# From the repo root on your local machine:
./deploy.sh
./deploy.sh # full deploy (default = all)
./deploy.sh all
./deploy.sh backend
./deploy.sh frontend
./deploy.sh list
./deploy.sh rollback
```
This will:
1. Build the frontend locally (`pnpm run build`)
2. Upload `frontend/build/` to the server via rsync
3. Restart `innercontext-node`
4. Upload `backend/` source to the server
5. Run `uv sync --frozen` on the server
6. Restart `innercontext` (runs alembic migrations on start)
7. Restart `innercontext-pricing-worker`
---
## 11. Verification
Optional overrides:
```bash
# From any machine on the LAN:
curl http://innercontext.lan/api/health-check # {"status":"ok"}
curl http://innercontext.lan/api/products # []
curl http://innercontext.lan/ # SvelteKit HTML shell
DEPLOY_SERVER=innercontext ./deploy.sh all
DEPLOY_ROOT=/opt/innercontext ./deploy.sh backend
DEPLOY_ALLOW_DIRTY=1 ./deploy.sh frontend
```
The web UI should be accessible at `http://innercontext.lan`.
## What `deploy.sh` Does
---
For `backend` / `frontend` / `all`:
## 12. Updating the application
1. Local checks (strict, fail-fast)
2. Acquire `/opt/innercontext/.deploy.lock`
3. Create `<timestamp>` release directory
4. Upload selected component(s)
5. Link shared env files in the release directory
6. `uv sync` + `alembic upgrade head` (backend scope)
7. Upload `scripts/`, `systemd/`, `nginx/`
8. Switch `current` to the prepared release
9. Restart affected services
10. Run health checks
11. Remove old releases (keep last 5)
12. Write deploy entry to `/opt/innercontext/deploy.log`
If anything fails after promotion, script auto-rolls back to previous release.
## Health Checks
- Backend: `http://127.0.0.1:8000/health-check`
- Frontend: `http://127.0.0.1:3000/`
- Worker: `systemctl is-active innercontext-pricing-worker`
Manual checks:
```bash
# From the repo root on your local machine:
./deploy.sh # full deploy (frontend + backend)
./deploy.sh frontend # frontend only
./deploy.sh backend # backend only
curl -sf http://127.0.0.1:8000/health-check
curl -sf http://127.0.0.1:3000/
systemctl is-active innercontext
systemctl is-active innercontext-node
systemctl is-active innercontext-pricing-worker
```
---
## Troubleshooting
## 13. Troubleshooting
### 502 Bad Gateway on `/api/*`
### Lock exists
```bash
systemctl status innercontext
journalctl -u innercontext -n 50
# Check .env DATABASE_URL is correct and PG LXC accepts connections
cat /opt/innercontext/.deploy.lock
rm -f /opt/innercontext/.deploy.lock
```
### Product prices stay empty / stale
Only remove the lock if no deployment is running.
### Sudo password prompt during deploy
Re-check `/etc/sudoers.d/innercontext-deploy` and run:
```bash
systemctl status innercontext-pricing-worker
journalctl -u innercontext-pricing-worker -n 50
# Ensure worker is running and can connect to PostgreSQL
visudo -c -f /etc/sudoers.d/innercontext-deploy
sudo -u innercontext sudo systemctl is-active innercontext
```
### 502 Bad Gateway on `/`
### Backend migration failure
Validate env file and DB connectivity:
```bash
systemctl status innercontext-node
journalctl -u innercontext-node -n 50
# Verify /opt/innercontext/frontend/build/index.js exists (deploy.sh ran successfully)
ls -la /opt/innercontext/shared/backend/.env
grep '^DATABASE_URL=' /opt/innercontext/shared/backend/.env
```
### Database connection refused
### Service fails after deploy
```bash
# From innercontext LXC:
psql postgresql+psycopg://innercontext:change-me@<pg-lxc-ip>/innercontext -c "SELECT 1"
# If it fails, check pg_hba.conf on the PG LXC and verify the IP matches
journalctl -u innercontext -n 100
journalctl -u innercontext-node -n 100
journalctl -u innercontext-pricing-worker -n 100
```
## Manual CI Deploy (Optional)
Use the manual Forgejo workflow (`workflow_dispatch`) to run the same `./deploy.sh all` path from CI once server secrets and SSH trust are configured.