Lesson 2: Infrastructure Planning

Learning Objectives

Design a production-grade Hermes deployment architecture
Choose between VPS, Docker, Modal, and Daytona hosting options
Plan for high availability, disaster recovery, and scaling

Priya’s Infrastructure Decision

NovaCraft runs on AWS. Priya needs Hermes to:

Run 24/7 with minimal downtime
Handle requests from 50 team members across 3 time zones
Connect securely to internal APIs
Keep all data within the company’s AWS account

2.1 Deployment Options

Option	Best For	Cost	Complexity	Data Control
VPS (bare metal)	Small teams (<20)	$20-50/mo	Low	Full
Docker (recommended)	Mid teams (20-200)	$50-200/mo	Medium	Full
Modal (serverless)	Burst workloads	Pay-per-use	Low	Partial
Daytona (cloud dev)	Dev/test	$30-100/mo	Low	Full

2.2 Docker Deployment (Priya’s Choice)

Server Sizing

NovaCraft: 50 users, ~200 requests/day

Recommended:
  CPU:    4 vCPU
  RAM:    8 GB
  Disk:   50 GB SSD
  OS:     Ubuntu 24.04 LTS
  AWS:    t3.xlarge (~$120/month)

Docker Compose

# docker-compose.yml
version: "3.8"

services:
  hermes:
    image: nousresearch/hermes-agent:latest
    container_name: hermes-agent
    restart: unless-stopped
    ports:
      - "127.0.0.1:8080:8080"
    volumes:
      - ./config:/root/.hermes
      - ./data:/root/.hermes/data
    environment:
      - HERMES_API_KEY=${HERMES_API_KEY}
      - TZ=UTC
    healthcheck:
      test: ["CMD", "hermes", "doctor"]
      interval: 60s
      timeout: 10s
      retries: 3
    deploy:
      resources:
        limits:
          memory: 4G
          cpus: "2.0"

Initial Setup

# 1. Install Docker
curl -fsSL https://get.docker.com | bash

# 2. Create project directory
mkdir -p /opt/hermes/{config,data}
cd /opt/hermes

# 3. Create docker-compose.yml (as above)

# 4. Set API key
echo "HERMES_API_KEY=your-key-here" > .env

# 5. Start
docker compose up -d

# 6. Verify
docker compose logs -f hermes

2.3 Network Architecture

┌────────────────────────────────────────────┐
│                  AWS VPC                    │
│                                             │
│  ┌──────────┐     ┌──────────────────┐     │
│  │ ALB/Nginx│────→│  Hermes Agent     │     │
│  │ (HTTPS)  │     │  (Docker)         │     │
│  └──────────┘     └───────┬──────────┘     │
│                           │                 │
│  ┌─────────┐  ┌──────────▼──────────┐     │
│  │ Slack   │  │  Internal Services   │     │
│  │ Webhook │  │  Jira · GitHub ·     │     │
│  └─────────┘  │  Datadog · etc.      │     │
│               └─────────────────────┘     │
└────────────────────────────────────────────┘

Nginx Reverse Proxy

# /etc/nginx/sites-available/hermes
server {
    listen 443 ssl http2;
    server_name hermes.novacraft.internal;

    ssl_certificate     /etc/ssl/hermes.crt;
    ssl_certificate_key /etc/ssl/hermes.key;

    location / {
        proxy_pass http://127.0.0.1:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

2.4 Configuration

SOUL.md for NovaCraft

# SOUL.md — NovaCraft AI Assistant

## Identity
You are NovaCraft's AI assistant, helping a 50-person B2B SaaS team.
We build project management tools for mid-size companies.

## Company Facts
- Founded: 2022
- Team: 50 people (SF, London, Bangalore)
- Stack: Python/FastAPI backend, React frontend, PostgreSQL, AWS
- Revenue: $5M ARR
- Key product: NovaCraft PM (project management SaaS)

## Communication Style
- Professional but friendly
- Use Slack-appropriate formatting (bullet points, code blocks)
- Keep responses concise for Slack (< 300 words unless asked for detail)
- When uncertain, say so—don't hallucinate internal data

LLM Provider Configuration

hermes config edit

llm:
  provider: openrouter
  model: anthropic/claude-sonnet
  fallback: nous/hermes-3-70b
  max_tokens: 4096
  temperature: 0.3     # Lower for enterprise (more deterministic)

2.5 High Availability

Health Monitoring

# Systemd watchdog
# /etc/systemd/system/hermes-watchdog.service
[Unit]
Description=Hermes Agent Watchdog
After=docker.service

[Service]
Type=oneshot
ExecStart=/opt/hermes/scripts/healthcheck.sh

[Timer]
OnCalendar=*:0/5

#!/bin/bash
# /opt/hermes/scripts/healthcheck.sh

if ! docker compose -f /opt/hermes/docker-compose.yml ps | grep -q "Up"; then
    echo "Hermes is down, restarting..."
    docker compose -f /opt/hermes/docker-compose.yml restart
    curl -X POST "$SLACK_WEBHOOK" \
      -d '{"text":"⚠️ Hermes Agent was down and has been restarted."}'
fi

Backup Strategy

# Daily backup of config and data
# /opt/hermes/scripts/backup.sh

BACKUP_DIR="/opt/hermes/backups/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"

# Backup config
cp -r /opt/hermes/config "$BACKUP_DIR/config"

# Backup data (memory, skills, etc.)
cp -r /opt/hermes/data "$BACKUP_DIR/data"

# Upload to S3
aws s3 sync "$BACKUP_DIR" "s3://novacraft-backups/hermes/$(date +%Y%m%d)/"

# Rotate: keep 30 days
find /opt/hermes/backups -maxdepth 1 -mtime +30 -exec rm -rf {} +

2.6 Disaster Recovery

Recovery Plan

Scenario	RTO	RPO	Recovery Steps
Container crash	1 min	0	Auto-restart (Docker policy)
Server reboot	5 min	0	Auto-start (systemd)
Data corruption	30 min	24h	Restore from S3 backup
Full server loss	2h	24h	Provision new server + restore

Quick Restore

# On new server:
# 1. Install Docker
# 2. Pull latest backup
aws s3 sync "s3://novacraft-backups/hermes/latest/" /opt/hermes/

# 3. Start
cd /opt/hermes && docker compose up -d

# 4. Verify
docker compose logs -f

2.7 Hands-On Exercise

Deploy Hermes on Docker:

mkdir -p /opt/hermes && cd /opt/hermes
# Create docker-compose.yml from section 2.2
docker compose up -d

Configure health check: Add the watchdog script
Set up daily backup: Create the backup cron job

crontab -e
# Add: 0 3 * * * /opt/hermes/scripts/backup.sh

Write your SOUL.md: Customize with your company context
Test failover: Stop the container and verify auto-restart

docker stop hermes-agent
# Wait 60 seconds, verify it auto-restarts
docker ps

Lesson Summary

Key Point	Details
Deployment	Docker Compose recommended for 20-200 users
Sizing	4 vCPU / 8 GB for 50 users
Network	Nginx reverse proxy + HTTPS
HA	Health checks + auto-restart + daily backups
DR	RTO 2h, RPO 24h with S3 backups

Next Lesson: Multi-Platform Gateway—connecting Slack, Discord, Email, and more.