A comprehensive guide to the testing, CI/CD, and guardrail systems we built at Glolly to enable confident AI-assisted development. This document explains the philosophy, architecture, and implementation details so you can adapt it to your own stack.
When AI writes most of your code — whether that’s Claude Code, Copilot, or any other LLM-powered tool — the failure mode is different from human-written code. Humans make typos and forget edge cases. LLMs produce code that looks correct, compiles, and even passes a casual review, but has subtle issues: unchecked null access, forgotten await on promises, types coerced through any, test files that pass but don’t actually assert anything meaningful.
The guardrails are the product. Every layer in this system exists to make a specific class of AI-generated bug loud and impossible to ignore. The goal: if AI-generated code passes all checks, you can trust it structurally — and your human review is only about logic and intent.
| Layer | Tool | What It Catches | AI Anti-Pattern It Prevents (Examples) |
|---|---|---|---|
| Compile | TypeScript strict mode | Type errors, null safety, unchecked index access | AI using any, assuming arrays have elements, forgetting null checks |
| Lint | ESLint (strict, type-checked rules) | Floating promises, unsafe any propagation, non-null assertions |
AI using !, forgetting await, leaking any through assignments |
| Format | Prettier | Inconsistent formatting | AI mixing styles across generated files |
| Unit Test | Vitest | Function/component behavior regressions | AI changing behavior while “refactoring” |
| Integration Test | Vitest + Testcontainers | Database/service interaction bugs | AI writing queries that don’t match real DB behavior |
| E2E Test | Playwright | Full-flow regressions across frontend + backend | AI breaking navigation, data display, or checkout flows |
| Visual | Storybook | Component rendering regressions | AI breaking UI without functional test failures |
| Pre-commit | Husky + lint-staged | All of the above, at commit time | AI-generated code getting committed without checks |
| CI | GitHub Actions | All of the above, on every PR | Code that passes locally but fails in a clean environment |
Each layer catches things the previous one misses. The AI has to get past all of them to land code on main.
Our system has two codebases (backend API and frontend web app) with interconnected testing and deployment pipelines. Here’s the high-level flow:
flowchart TB
subgraph BE["Backend Repository"]
BE_CODE["Code Change"] --> BE_PRECOMMIT["Pre-commit Hook<br/>lint-staged + type-check"]
BE_PRECOMMIT --> BE_PR["Pull Request to main"]
BE_PR --> BE_CI["CI Pipeline<br/>Format + Lint + Type Check + Tests w/ Coverage"]
BE_CI -->|merge to main| BE_IMG_STG["Build Docker Image<br/>Tag: staging"]
BE_CI -->|release published| BE_IMG_PROD["Build Docker Image<br/>Tag: latest + version"]
BE_CI -->|release published| BE_DEPLOY["Deploy to Railway<br/>Production"]
end
subgraph FE["Frontend Repository"]
FE_CODE["Code Change"] --> FE_PRECOMMIT["Pre-commit Hook<br/>lint-staged + type-check + codegen"]
FE_PRECOMMIT --> FE_PR["Pull Request to main"]
FE_PR --> FE_CI["CI Pipeline<br/>Format + Lint + Type Check + Tests<br/>+ Codegen Sync + Storybook Build + App Build"]
FE_PR --> FE_E2E["E2E Pipeline<br/>Pulls backend Docker image<br/>Spins up full test stack<br/>Runs Playwright"]
FE_CI -->|release published| FE_DEPLOY["Deploy to Railway<br/>Production"]
end
BE_IMG_STG --> FE_E2E
BE_IMG_PROD --> FE_E2E
%% Subgraph styles
style BE fill:#2a1a1a,stroke:#e94560,color:#e0e0e0
style FE fill:#1a1a2e,stroke:#0f3460,color:#e0e0e0
%% Node classes
classDef beNode fill:#1e1e2e,stroke:#e94560,color:#e0e0e0,stroke-width:2px
classDef feNode fill:#1e1e2e,stroke:#0f3460,color:#e0e0e0,stroke-width:2px
class BE_CODE,BE_PRECOMMIT,BE_PR,BE_CI,BE_IMG_STG,BE_IMG_PROD,BE_DEPLOY beNode
class FE_CODE,FE_PRECOMMIT,FE_PR,FE_CI,FE_E2E,FE_DEPLOY feNode
%% Arrow styles
linkStyle default stroke:#888,stroke-width:2px
The key insight: the backend publishes a Docker image on every merge to main. The frontend E2E tests pull that image and spin up a real API (with real PostgreSQL, Redis, and S3-compatible storage) as the test backend. This means frontend E2E tests exercise the actual backend, not mocks — catching integration bugs that unit tests miss.
TypeScript’s strict mode is the single highest-value guardrail for AI-generated code. It catches type errors at compile time before any test runs.
// tsconfig.json — key settings beyond the default "strict: true"
{
"compilerOptions": {
"strict": true, // Umbrella for all strict flags
"noUncheckedIndexedAccess": true, // Forces null checks on array[index] and obj[key]
"noPropertyAccessFromIndexSignature": true, // Forces bracket notation for dynamic keys
"noUnusedLocals": true, // Catches leftover variables
"noUnusedParameters": true, // Catches unused function params
"noImplicitReturns": true, // Every code path must return
"noFallthroughCasesInSwitch": true, // switch cases must break/return
"noImplicitOverride": true, // Explicit override keyword required
"verbatimModuleSyntax": false // Needed for some CJS interop
}
}
{
"compilerOptions": {
"strict": true,
"noUncheckedIndexedAccess": true,
"exactOptionalPropertyTypes": true, // Prevents assigning undefined to optional props
"noImplicitReturns": true,
"noFallthroughCasesInSwitch": true,
"noImplicitOverride": true,
"noUnusedLocals": true,
"noUnusedParameters": true
}
}
Start with maximum strictness. Only relax a flag if it fights you across an entire codebase, not just in one file. We found exactOptionalPropertyTypes fights React props patterns on the backend enough to omit it there, but it’s fine on the frontend.
We use ESLint’s type-checked rules, which go beyond what the TypeScript compiler checks. The key difference: ESLint can analyze patterns and intent, not just types.
Backend (stricter — no React complexity):
// Key rule categories enabled:
// 1. Promise safety (the #1 AI footgun)
'@typescript-eslint/no-floating-promises': 'error', // Catch forgotten await
'@typescript-eslint/no-misused-promises': 'error', // Catch promises in wrong contexts
'@typescript-eslint/require-await': 'error', // Catch async functions without await
'@typescript-eslint/return-await': ['error', 'always'], // Consistent return await
// 2. Type safety (prevent any leakage)
'@typescript-eslint/no-explicit-any': 'error',
'@typescript-eslint/no-unsafe-assignment': 'error',
'@typescript-eslint/no-unsafe-call': 'error',
'@typescript-eslint/no-unsafe-member-access': 'error',
'@typescript-eslint/no-unsafe-return': 'error',
// 3. Code quality
'@typescript-eslint/no-non-null-assertion': 'error', // AI loves "!" — ban it
'@typescript-eslint/strict-boolean-expressions': 'error', // No implicit truthiness
'@typescript-eslint/switch-exhaustiveness-check': 'error', // All cases handled
'@typescript-eslint/consistent-type-imports': 'error', // type imports separated
// 4. Catch dev leftovers
'no-console': 'error', // No console.log in production code
'no-debugger': 'error',
Frontend (similar, with React additions):
// All the above, plus:
'react-hooks/exhaustive-deps': 'error', // Catch missing useEffect deps
'react/self-closing-comp': 'error',
// Icon library enforcement (prevent mixing icon sets)
'no-restricted-imports': ['error', {
patterns: [
{ group: ['lucide-react', 'lucide-react/*'], message: 'Use @tabler/icons-react instead.' },
]
}],
--max-warnings=0Both in pre-commit hooks and CI, we run ESLint with --max-warnings=0. This means warnings are effectively errors.
Prettier handles all formatting.
{
"printWidth": 100,
"semi": true,
"singleQuote": false,
"trailingComma": "all",
"plugins": ["prettier-plugin-tailwindcss"]
}
Test files get relaxed type-safety rules because mocks inherently use any:
// For *.test.ts and *.spec.ts files:
'@typescript-eslint/no-explicit-any': 'off',
'@typescript-eslint/no-unsafe-assignment': 'off',
'@typescript-eslint/no-unsafe-call': 'off',
'@typescript-eslint/no-unsafe-member-access': 'off',
'@typescript-eslint/no-non-null-assertion': 'off',
'no-console': 'off',
This is a deliberate tradeoff — strict types in test mocks creates more friction than it prevents bugs.
One common failure mode in backend tests is mocking vs production mismatch. Testcontainers solve this by spinning up a real PostgreSQL instance in a Docker container for each test run.
// tests/setup/global-setup.ts — simplified concept
import { PostgreSqlContainer } from '@testcontainers/postgresql';
export default async function globalSetup() {
// Start a real PostgreSQL container
const container = await new PostgreSqlContainer('postgres:18-alpine')
.withDatabase('test_db')
.start();
// Run migrations against it
// Set the DATABASE_URL for test processes
process.env.DATABASE_URL = container.getConnectionUri();
// Return teardown function
return async () => {
await container.stop();
};
}
What we mock vs. what we don’t:
| Dependency | Mocked? | Why |
|---|---|---|
| PostgreSQL | No — real via Testcontainers | SQL behavior must match production |
| Redis | No — real via Testcontainers or Docker Compose | Cache/queue behavior must be real |
| Payment integerations | Yes | External API, costly, rate-limited |
| SMS provider | Yes | External API, costs money per message |
| Email provider | Yes | External API |
| S3/R2 (object storage) | Sometimes — MinIO for integration, mocked for unit | Depends on what’s being tested |
// vitest.config.ts
export default defineConfig({
test: {
coverage: {
provider: 'v8',
reporter: ['text', 'html', 'lcov', 'json-summary'],
include: ['src/**/*.ts'],
exclude: [
'src/services/email/templates/**', // HTML templates
'src/index.ts', // Entry point
'src/yoga.ts', // Server setup
'src/schema.ts', // Generated schema
'.config/**',
'db/**', // Migrations
],
},
fileParallelism: false, // Tests share a DB — run sequentially
hookTimeout: 120_000, // Testcontainers need time to start
testTimeout: 30_000,
},
});
We aim for 90%+ coverage. The json-summary reporter is important — it’s what the CI pipeline reads to post coverage comments on PRs.
Because tests share a single database container (for speed — starting a new container per test is too slow), we run tests sequentially with fileParallelism: false. Each test suite handles its own cleanup (truncating tables, resetting state). This is a deliberate tradeoff: slower tests, but real database behavior and simpler setup.
Frontend tests focus on component behavior and business logic, not implementation details:
// Example: Testing a component renders correctly
import { render, screen } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
describe('ProductCard', () => {
it('shows out-of-stock badge when stock is zero', () => {
render(<ProductCard product={{ ...mockProduct, stock: 0 }} />);
expect(screen.getByText('Out of Stock')).toBeInTheDocument();
});
it('calls onAddToCart with correct quantity', async () => {
const onAddToCart = vi.fn();
render(<ProductCard product={mockProduct} onAddToCart={onAddToCart} />);
await userEvent.click(screen.getByRole('button', { name: /add to cart/i }));
expect(onAddToCart).toHaveBeenCalledWith(mockProduct.id, 1);
});
});
// vitest.config.ts
export default defineConfig({
test: {
globals: true,
environment: 'jsdom',
setupFiles: ['./vitest.setup.ts'],
coverage: {
provider: 'v8',
reporter: ['text', 'text-summary', 'lcov', 'json-summary'],
include: ['lib/**', 'hooks/**', 'stores/**', 'components/**'],
exclude: [
'**/types/**',
'**/*.d.ts',
'**/generated-types.ts', // GraphQL codegen output
'**/*.stories.tsx', // Storybook files
],
thresholds: {
statements: 75,
branches: 70,
functions: 80,
lines: 75,
},
},
},
});
This is where the backend Docker image comes into play. Playwright tests exercise the entire application stack — real frontend, real backend API, real database, real object storage.
The frontend repository includes a docker-compose.test.yml that spins up everything the E2E tests need:
services:
# Real PostgreSQL
postgres:
image: postgres:18.3-alpine
ports: ["5433:5432"]
environment:
POSTGRES_DB: app_test
POSTGRES_USER: testuser
POSTGRES_PASSWORD: testpass
tmpfs:
- /var/lib/postgresql # RAM-backed for speed
healthcheck:
test: ["CMD-SHELL", "pg_isready -U testuser -d app_test"]
interval: 5s
timeout: 3s
retries: 10
# Real Redis
redis:
image: redis:8.4.0-alpine
ports: ["6380:6379"]
tmpfs:
- /data
# S3-compatible object storage (MinIO)
minio:
image: minio/minio:latest
ports: ["9100:9000"]
environment:
MINIO_ROOT_USER: minioadmin
MINIO_ROOT_PASSWORD: minioadmin
command: server /data --console-address ":9001"
# Initialize MinIO bucket
minio-init:
image: minio/mc:latest
depends_on:
minio:
condition: service_healthy
entrypoint: >
sh -c "
mc alias set local http://minio:9000 minioadmin minioadmin &&
mc mb --ignore-existing local/app-uploads
"
# The REAL backend API — pulled from container registry
api:
image: ghcr.io/your-org/your-api:staging
platform: linux/amd64
ports: ["4100:4000"]
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_started
minio-init:
condition: service_completed_successfully
environment:
# Override backend env vars to point at test services
DB_HOST: postgres
DB_PORT: "5432"
DB_NAME: app_test
REDIS_URL: redis://redis:6379
S3_ENDPOINT: http://minio:9000
NODE_ENV: test
MOCK_EXTERNAL_SERVICES: "true" # Backend mocks Paystack, SMS, etc.
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:4000/health"]
interval: 5s
timeout: 5s
retries: 20
start_period: 15s
Key design decisions:
tmpfs for PostgreSQL and Redis — tests run in RAM, much faster than disk I/OMOCK_EXTERNAL_SERVICES: "true" tells the backend to mock external APIs (payment providers, SMS) so E2E tests don’t hit real services5433 instead of 5432, 6380 instead of 6379) so dev and test stacks can run concurrently// playwright.config.ts
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
testDir: './e2e',
timeout: 60_000,
fullyParallel: false, // Sequential — tests may share state
retries: process.env.CI ? 1 : 0,
workers: 1,
use: {
baseURL: 'http://localhost:3001',
trace: 'on-first-retry', // Capture traces for debugging
screenshot: 'only-on-failure',
},
projects: [
{
name: 'chromium',
use: { ...devices['Desktop Chrome'] },
},
{
name: 'webkit',
use: { ...devices['Desktop Safari'] },
},
{
name: 'mobile-android',
use: {
viewport: { width: 375, height: 812 },
isMobile: true,
hasTouch: true,
},
},
],
// Start the frontend dev server for tests
webServer: {
command: 'next dev --port 3001',
url: 'http://localhost:3001',
reuseExistingServer: !process.env.CI,
timeout: 120_000,
},
});
E2E tests cover the critical user flows that, if broken, would lose money or trust:
Each test exercises the full stack: browser → Next.js frontend → GraphQL API → PostgreSQL → response rendered in browser.
Unit tests mock the backend. If the backend changes its GraphQL schema, adds a required field, or changes a response shape, unit tests with mocked responses still pass. E2E tests hit the real backend and fail immediately.
Storybook serves two purposes in our setup: component documentation and visual regression detection.
// .storybook/main.ts
const config: StorybookConfig = {
stories: ['../components/**/*.stories.@(ts|tsx)'],
addons: ['@storybook/addon-a11y', '@storybook/addon-designs'],
framework: {
name: '@storybook/nextjs',
options: {},
},
};
Our CI pipeline runs storybook build as a verification step. This catches:
@storybook/addon-a11y pluginIt’s not a full visual regression test (that would require Chromatic or similar), but the build verification alone catches a class of errors that unit tests miss.
We configure Storybook viewports to match our actual user base — mid-range Android phones, not just iPhone and desktop:
viewport: {
viewports: {
androidSmall: { name: 'Android Small', styles: { width: '360px', height: '800px' } },
androidLarge: { name: 'Android Large', styles: { width: '412px', height: '915px' } },
iphoneSE: { name: 'iPhone SE', styles: { width: '375px', height: '667px' } },
tablet: { name: 'Tablet', styles: { width: '768px', height: '1024px' } },
laptop: { name: 'Laptop', styles: { width: '1366px', height: '768px' } },
},
},
Pre-commit hooks are the last line of defense before code enters the repository. We use Husky to run lint-staged, which applies checks only to staged files (fast feedback).
#!/bin/sh
# .husky/pre-commit
# Encrypt all env files first (prevent accidental secret commits)
pnpm env:encrypt:all
# Stage encrypted env files (excluding sensitive ones)
git diff --name-only | grep -E '^\.env\..+$' | \
grep -v -E '^\.(env\.keys|env\.local|env)$' | \
xargs -r git add || true
# Guard: block commit if secrets are staged
BLOCKED=$(git diff --cached --name-only | grep -E '^(\.env|\.env\.keys|\.env\.local)$' || true)
if [ -n "$BLOCKED" ]; then
echo "Sensitive env files should not be committed:"
echo "$BLOCKED"
exit 1
fi
# Run lint-staged (ESLint + Prettier on staged files)
npx lint-staged
# Full type-check (not just staged files — a change in one file can break another)
pnpm type-check
#!/bin/sh
# Same env encryption and guarding, then:
npm run precommit:check
# Which runs: typecheck + lint + format:check
{
"lint-staged": {
"*.{ts,tsx}": [
"eslint --fix",
"prettier --write"
],
"*.{json,md,yml,yaml}": [
"prettier --write"
]
}
}
lint-staged only runs on staged files, which is fast. But TypeScript type-checking must run on the entire project because changing a type in one file can break imports in files you didn’t touch. The full tsc --noEmit run takes a few seconds and catches cross-file breakage that staged-only checking would miss.
We use dotenvx to encrypt environment files. The pre-commit hook automatically encrypts .env.* files and stages them, while blocking .env, .env.keys, and .env.local (which contain secrets) from ever being committed. This is defense-in-depth — .gitignore should also exclude these, but the hook catches cases where someone force-adds them.
main)name: CI
on:
pull_request:
branches: [main]
jobs:
qa:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: pnpm/action-setup@v5
- uses: actions/setup-node@v6
with:
node-version-file: '.nvmrc'
cache: 'pnpm'
- run: pnpm install --frozen-lockfile
- run: pnpm format:check
- run: pnpm lint
- run: pnpm type-check
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: pnpm/action-setup@v5
- uses: actions/setup-node@v6
- run: pnpm install --frozen-lockfile
- run: pnpm test:coverage
- uses: actions/upload-artifact@v7
if: always()
with:
name: coverage-report
path: coverage/
# Posts coverage summary as a PR comment
coverage-comment:
needs: [qa, test]
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- uses: actions/download-artifact@v8
with:
name: coverage-report
path: ./coverage
- uses: actions/github-script@v8
with:
script: |
// Reads json-summary, posts formatted table to PR
// Shows: Statements, Branches, Functions, Lines with color indicators
main)The frontend CI is more comprehensive because it has more moving parts:
jobs:
check-quality:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/setup-node@v6
- run: npm ci
# 1. Type safety
- run: npm run typecheck
# 2. Code quality
- run: npm run lint
- run: npm run format:check
# 3. Unit + integration tests with coverage
- run: npx vitest run --coverage
# 4. GraphQL codegen sync check
- name: Verify GraphQL Codegen Sync
run: |
npm run codegen
npm run format
if [[ -n $(git status --porcelain) ]]; then
echo "GraphQL types are out of sync with staging!"
exit 1
fi
# 5. Storybook build verification
- run: npm run storybook:build
# 6. Application build verification
- run: npm run build:ci
name: E2E Tests
on:
pull_request:
branches: [main, staging]
jobs:
e2e:
runs-on: ubuntu-latest
timeout-minutes: 25
steps:
- uses: actions/checkout@v6
- uses: actions/setup-node@v6
- run: npm ci
- run: npx playwright install --with-deps chromium webkit
# Authenticate to container registry
- run: echo "${{ secrets.GHCR_TOKEN }}" | docker login ghcr.io -u "${{ github.repository_owner }}" --password-stdin
# Pull the backend image
- run: docker compose -f docker-compose.test.yml pull api
# Start the full test stack
- name: Start test stack
run: |
# Decrypt test env, source it for Docker Compose
npx dotenvx decrypt -f .env.test
set -a && source .env.test && set +a
docker compose -f docker-compose.test.yml up -d --wait
# Wait for API health
- name: Wait for API
run: |
timeout 90 bash -c '
until curl -sf http://localhost:4100/health > /dev/null 2>&1; do
sleep 3
done
'
# Run Playwright
- run: npx dotenvx run -f .env.test -- npx playwright test
# Artifacts for debugging
- uses: actions/upload-artifact@v7
if: always()
with:
name: playwright-report
path: playwright-report/
# Clean up
- if: always()
run: docker compose -f docker-compose.test.yml down -v --remove-orphans
Both backend and frontend pipelines post coverage summaries as PR comments. This gives reviewers immediate visibility into test coverage without digging through CI logs:
## 📊 Coverage Summary
| Category | Coverage | |
|------------|----------|-|
| Statements | 91.2% | 🟢 |
| Branches | 87.4% | 🟢 |
| Functions | 93.1% | 🟢 |
| Lines | 90.8% | 🟢 |
The comment updates on each push (not create-and-duplicate), keeping the PR thread clean.
This is the piece that ties the backend and frontend testing together.
flowchart LR
A["Push to main<br/>(backend repo)"] --> B["GitHub Actions:<br/>Build Docker Image"]
B --> C["Push to GHCR<br/>Tag: staging"]
D["Release published<br/>(backend repo)"] --> E["GitHub Actions:<br/>Build Docker Image"]
E --> F["Push to GHCR<br/>Tag: latest + version"]
C --> G["Frontend E2E Tests<br/>Pull staging image"]
F --> G
%% Accent nodes
style B fill:#1a2e2a,stroke:#00b894,color:#e0e0e0,stroke-width:2px
style E fill:#2e1a1a,stroke:#e17055,color:#e0e0e0,stroke-width:2px
style G fill:#1a1e2e,stroke:#0984e3,color:#e0e0e0,stroke-width:2px
%% Plain nodes
classDef plain fill:#1e1e2e,stroke:#888,color:#e0e0e0,stroke-width:1.5px
class A,C,D,F plain
%% Arrows
linkStyle default stroke:#888,stroke-width:2px
The backend Dockerfile is a multi-stage build optimized for small image size and fast deploys:
# Stage 1: Install all dependencies
FROM node:22-slim AS deps
WORKDIR /app
RUN corepack enable && corepack prepare pnpm@10.27.0 --activate
COPY package.json pnpm-lock.yaml ./
RUN pnpm install --frozen-lockfile
# Stage 2: Build TypeScript
FROM deps AS build
COPY . .
RUN pnpm build
# Stage 3: Production dependencies only
FROM node:22-slim AS prod-deps
WORKDIR /app
RUN corepack enable && corepack prepare pnpm@10.27.0 --activate
COPY package.json pnpm-lock.yaml ./
RUN pnpm install --frozen-lockfile --prod --ignore-scripts
# Stage 4: Minimal runtime
FROM node:22-slim AS runtime
WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*
RUN npm install -g @dotenvx/dotenvx
COPY --from=prod-deps /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY --from=build /app/db ./db
COPY package.json .env.test ./
# dotenvx decrypts env vars at runtime
ENTRYPOINT ["dotenvx", "run", "-f", ".env.test", "--"]
CMD ["node", "dist/src/index.js"]
name: Build & Push API Image
on:
push:
branches: [main] # Build staging image
release:
types: [published] # Build production image
jobs:
build-and-push:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v6
- uses: docker/setup-buildx-action@v4
- uses: docker/login-action@v4
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Determine tags
id: tags
run: |
IMAGE="ghcr.io/${{ github.repository }}"
if [ "${{ github.event_name }}" = "release" ]; then
echo "tags=${IMAGE}:latest,${IMAGE}:${{ github.event.release.tag_name }}" >> "$GITHUB_OUTPUT"
else
echo "tags=${IMAGE}:staging,${IMAGE}:sha-${GITHUB_SHA::7}" >> "$GITHUB_OUTPUT"
fi
- uses: docker/build-push-action@v7
with:
context: .
push: true
tags: ${{ steps.tags.outputs.tags }}
platforms: linux/amd64,linux/arm64
Our backend runs on Railway, which deploys directly from code — we don’t need a Docker image for production hosting. The image exists purely to give the frontend E2E tests a hermetic, reproducible backend environment. This means:
latest on release)This is the most unique part of our setup. We use Claude Code for development. Claude Code can run bash commands, edit files, and make commits. Without guardrails, it could rm -rf /, force-push to main, or install malicious packages. Our guardrail system prevents all of this.
Claude Code has a settings.json file that defines allowed and denied bash command patterns. Here’s the structure:
{
"permissions": {
"allow": [
"Bash(pnpm run build*)",
"Bash(pnpm run lint*)",
"Bash(pnpm run test*)",
"Bash(git add *)",
"Bash(git commit *)",
"Bash(git push *)",
"Bash(ls *)",
"Bash(cat *)",
"Bash(grep *)",
"Bash(find *)"
// ... read-only and safe build commands
],
"deny": [
"Bash(pnpm add *)", // Can't install packages
"Bash(pnpm install*)", // Can't modify node_modules
"Bash(git push --force*)", // Can't force-push
"Bash(git push -f*)",
"Bash(git reset --hard*)", // Can't destroy history
"Bash(rm -rf *)", // Can't delete recursively
"Bash(sudo *)", // No root access
"Bash(kill *)", // Can't kill processes
"Bash(chmod *)", // Can't change permissions
"Bash(curl * | sh*)", // Can't pipe downloads to shell
"Bash(curl * | bash*)"
]
}
}
The philosophy: allow everything the AI needs to do its job (build, test, lint, commit, view files), deny everything that could cause damage (install packages, force-push, delete files, run arbitrary scripts).
Claude Code’s built-in pattern matching doesn’t handle piped commands (cmd1 | cmd2) or complex bash expressions. This is a known bug in Claude Code and may be fixed in the future. Our custom hook script (auto-approve-pipes.sh) parses every bash command the AI wants to run, extracts individual commands, and checks each one against the allow/deny lists.
# Simplified concept of how the hook works:
# 1. Read the command Claude Code wants to run
COMMAND="git log --oneline | head -5"
# 2. Parse into individual commands using shfmt (if available) or regex
# Extracted: ["git log --oneline", "head -5"]
# 3. Check each against deny list first
# "git log --oneline" → not denied ✓
# "head -5" → not denied ✓
# 4. Check each against allow list
# "git log --oneline" → matches "Bash(git log*)" ✓
# "head -5" → matches "Bash(head *)" ✓
# 5. All commands allowed → auto-approve without prompting
If any command in the pipeline is denied, the entire command is blocked. If any command isn’t in the allow list, it falls through to manual approval (the human gets prompted).
The hook also prevents the AI from operating on main directly:
check_main_branch() {
local cmd="$1"
# Block: git checkout main, git push ... main, git reset ... main
echo "$cmd" | grep -qE "git (checkout|switch).*main" && return 0
echo "$cmd" | grep -qE "git push.*main" && return 0
echo "$cmd" | grep -qE "git reset.*main" && return 0
return 1
}
The AI can create branches, commit, push to feature branches, and open PRs — but it can never touch main directly. All code reaches main through PRs, which must pass CI.
Here’s the complete lifecycle of a code change, from AI-generated code to production:
flowchart TB
A["AI generates code<br/>(Claude Code / Copilot / etc)"] --> B{"Pre-commit hook"}
B -->|fail| A
B -->|pass| C["Commit to feature branch"]
C --> D["Push → Open PR"]
D --> E{"CI: Format check"}
E -->|fail| A
E -->|pass| F{"CI: Lint (zero warnings)"}
F -->|fail| A
F -->|pass| G{"CI: Type check (strict)"}
G -->|fail| A
G -->|pass| H{"CI: Unit + Integration tests"}
H -->|fail| A
H -->|pass| I{"CI: Coverage threshold met?"}
I -->|fail| A
I -->|pass| J{"CI: Storybook builds?"}
J -->|fail| A
J -->|pass| K{"CI: App builds?"}
K -->|fail| A
K -->|pass| L{"CI: E2E tests pass?"}
L -->|fail| A
L -->|pass| M["Human review"]
M -->|approved| N["Merge to main"]
N --> O["Backend: Build Docker image (staging)"]
N --> P["Deploy staging"]
P --> Q["Manual QA"]
Q -->|ready| R["Publish release"]
R --> S["Backend: Build Docker image (latest)"]
R --> T["Deploy production"]
%% Accent nodes
style A fill:#1e1a2e,stroke:#6c5ce7,color:#e0e0e0,stroke-width:2px
style N fill:#1a2e2a,stroke:#00b894,color:#e0e0e0,stroke-width:2px
style T fill:#2e1a1a,stroke:#e17055,color:#e0e0e0,stroke-width:2px
%% Decision diamonds
classDef decision fill:#2e2a1a,stroke:#fdcb6e,color:#e0e0e0,stroke-width:1.5px
class B,E,F,G,H,I,J,K,L decision
%% Plain nodes
classDef plain fill:#1e1e2e,stroke:#888,color:#e0e0e0,stroke-width:1.5px
class C,D,M,O,P,Q,R,S plain
%% Arrows
linkStyle default stroke:#888,stroke-width:2px
Every arrow labeled “fail” sends the developer (or AI) back to the start. There are no shortcuts. The system is designed so that by the time a human sees the PR, the code has already passed: formatting, linting, type-checking, unit tests, integration tests, coverage thresholds, Storybook build, app build, and end-to-end tests. The human review can focus entirely on logic, intent, and architecture — not on whether the code works.
This guide describes our specific implementation (Node.js, TypeScript, Next.js, Vitest, Playwright, GitHub Actions, Railway). But the principles are stack-agnostic. Here’s how to adapt each layer:
| Our Stack | Alternatives |
|---|---|
| TypeScript strict mode | Mypy strict (Python), Rust’s borrow checker, Go’s type system |
| ESLint type-checked rules | Ruff (Python), Clippy (Rust), golangci-lint (Go) |
The principle: turn on the strictest settings your language supports. Relax only when a specific rule fights your entire codebase.
| Our Stack | Alternatives |
|---|---|
| Vitest | Jest, pytest, Go testing |
| Testcontainers (real DB) | SQLite in-memory (lighter but less realistic), Docker Compose per test |
| React Testing Library | Vue Test Utils, Svelte Testing Library |
| Playwright | Cypress, Selenium |
The principle: test against real infrastructure where possible (real database, real cache). Mock only external services you don’t control (payment APIs, SMS providers).
| Our Stack | Alternatives |
|---|---|
| GitHub Actions | GitLab CI, CircleCI, Buildkite |
| GHCR (container registry) | Docker Hub, ECR, GCR |
| Railway (hosting) | Vercel, Render, Fly.io, AWS |
The principle: CI runs the exact same checks as pre-commit, but in a clean environment. No caching tricks that could hide failures.
| Our Stack | Alternatives |
|---|---|
| Claude Code hooks + permissions | Cursor rules, Copilot workspace policies, custom MCP servers |
The principle: define an explicit allow-list of what the AI can do. Everything else requires human approval. Never let the AI install packages, modify CI, or push to protected branches without review.