Lesson 2: Infrastructure Planning
Lesson 2: Infrastructure Planning
Learning Objectives
- Design a production-grade Hermes deployment architecture
- Choose between VPS, Docker, Modal, and Daytona hosting options
- Plan for high availability, disaster recovery, and scaling
Priya’s Infrastructure Decision
NovaCraft runs on AWS. Priya needs Hermes to:
- Run 24/7 with minimal downtime
- Handle requests from 50 team members across 3 time zones
- Connect securely to internal APIs
- Keep all data within the company’s AWS account
2.1 Deployment Options
| Option | Best For | Cost | Complexity | Data Control |
|---|---|---|---|---|
| VPS (bare metal) | Small teams (<20) | $20-50/mo | Low | Full |
| Docker (recommended) | Mid teams (20-200) | $50-200/mo | Medium | Full |
| Modal (serverless) | Burst workloads | Pay-per-use | Low | Partial |
| Daytona (cloud dev) | Dev/test | $30-100/mo | Low | Full |
2.2 Docker Deployment (Priya’s Choice)
Server Sizing
NovaCraft: 50 users, ~200 requests/day
Recommended:
CPU: 4 vCPU
RAM: 8 GB
Disk: 50 GB SSD
OS: Ubuntu 24.04 LTS
AWS: t3.xlarge (~$120/month)
Docker Compose
# docker-compose.yml
version: "3.8"
services:
hermes:
image: nousresearch/hermes-agent:latest
container_name: hermes-agent
restart: unless-stopped
ports:
- "127.0.0.1:8080:8080"
volumes:
- ./config:/root/.hermes
- ./data:/root/.hermes/data
environment:
- HERMES_API_KEY=${HERMES_API_KEY}
- TZ=UTC
healthcheck:
test: ["CMD", "hermes", "doctor"]
interval: 60s
timeout: 10s
retries: 3
deploy:
resources:
limits:
memory: 4G
cpus: "2.0"
Initial Setup
# 1. Install Docker
curl -fsSL https://get.docker.com | bash
# 2. Create project directory
mkdir -p /opt/hermes/{config,data}
cd /opt/hermes
# 3. Create docker-compose.yml (as above)
# 4. Set API key
echo "HERMES_API_KEY=your-key-here" > .env
# 5. Start
docker compose up -d
# 6. Verify
docker compose logs -f hermes
2.3 Network Architecture
┌────────────────────────────────────────────┐
│ AWS VPC │
│ │
│ ┌──────────┐ ┌──────────────────┐ │
│ │ ALB/Nginx│────→│ Hermes Agent │ │
│ │ (HTTPS) │ │ (Docker) │ │
│ └──────────┘ └───────┬──────────┘ │
│ │ │
│ ┌─────────┐ ┌──────────▼──────────┐ │
│ │ Slack │ │ Internal Services │ │
│ │ Webhook │ │ Jira · GitHub · │ │
│ └─────────┘ │ Datadog · etc. │ │
│ └─────────────────────┘ │
└────────────────────────────────────────────┘
Nginx Reverse Proxy
# /etc/nginx/sites-available/hermes
server {
listen 443 ssl http2;
server_name hermes.novacraft.internal;
ssl_certificate /etc/ssl/hermes.crt;
ssl_certificate_key /etc/ssl/hermes.key;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
2.4 Configuration
SOUL.md for NovaCraft
# SOUL.md — NovaCraft AI Assistant
## Identity
You are NovaCraft's AI assistant, helping a 50-person B2B SaaS team.
We build project management tools for mid-size companies.
## Company Facts
- Founded: 2022
- Team: 50 people (SF, London, Bangalore)
- Stack: Python/FastAPI backend, React frontend, PostgreSQL, AWS
- Revenue: $5M ARR
- Key product: NovaCraft PM (project management SaaS)
## Communication Style
- Professional but friendly
- Use Slack-appropriate formatting (bullet points, code blocks)
- Keep responses concise for Slack (< 300 words unless asked for detail)
- When uncertain, say so—don't hallucinate internal data
LLM Provider Configuration
hermes config edit
llm:
provider: openrouter
model: anthropic/claude-sonnet
fallback: nous/hermes-3-70b
max_tokens: 4096
temperature: 0.3 # Lower for enterprise (more deterministic)
2.5 High Availability
Health Monitoring
# Systemd watchdog
# /etc/systemd/system/hermes-watchdog.service
[Unit]
Description=Hermes Agent Watchdog
After=docker.service
[Service]
Type=oneshot
ExecStart=/opt/hermes/scripts/healthcheck.sh
[Timer]
OnCalendar=*:0/5
#!/bin/bash
# /opt/hermes/scripts/healthcheck.sh
if ! docker compose -f /opt/hermes/docker-compose.yml ps | grep -q "Up"; then
echo "Hermes is down, restarting..."
docker compose -f /opt/hermes/docker-compose.yml restart
curl -X POST "$SLACK_WEBHOOK" \
-d '{"text":"⚠️ Hermes Agent was down and has been restarted."}'
fi
Backup Strategy
# Daily backup of config and data
# /opt/hermes/scripts/backup.sh
BACKUP_DIR="/opt/hermes/backups/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
# Backup config
cp -r /opt/hermes/config "$BACKUP_DIR/config"
# Backup data (memory, skills, etc.)
cp -r /opt/hermes/data "$BACKUP_DIR/data"
# Upload to S3
aws s3 sync "$BACKUP_DIR" "s3://novacraft-backups/hermes/$(date +%Y%m%d)/"
# Rotate: keep 30 days
find /opt/hermes/backups -maxdepth 1 -mtime +30 -exec rm -rf {} +
2.6 Disaster Recovery
Recovery Plan
| Scenario | RTO | RPO | Recovery Steps |
|---|---|---|---|
| Container crash | 1 min | 0 | Auto-restart (Docker policy) |
| Server reboot | 5 min | 0 | Auto-start (systemd) |
| Data corruption | 30 min | 24h | Restore from S3 backup |
| Full server loss | 2h | 24h | Provision new server + restore |
Quick Restore
# On new server:
# 1. Install Docker
# 2. Pull latest backup
aws s3 sync "s3://novacraft-backups/hermes/latest/" /opt/hermes/
# 3. Start
cd /opt/hermes && docker compose up -d
# 4. Verify
docker compose logs -f
2.7 Hands-On Exercise
- Deploy Hermes on Docker:
mkdir -p /opt/hermes && cd /opt/hermes
# Create docker-compose.yml from section 2.2
docker compose up -d
-
Configure health check: Add the watchdog script
-
Set up daily backup: Create the backup cron job
crontab -e
# Add: 0 3 * * * /opt/hermes/scripts/backup.sh
-
Write your SOUL.md: Customize with your company context
-
Test failover: Stop the container and verify auto-restart
docker stop hermes-agent
# Wait 60 seconds, verify it auto-restarts
docker ps
Lesson Summary
| Key Point | Details |
|---|---|
| Deployment | Docker Compose recommended for 20-200 users |
| Sizing | 4 vCPU / 8 GB for 50 users |
| Network | Nginx reverse proxy + HTTPS |
| HA | Health checks + auto-restart + daily backups |
| DR | RTO 2h, RPO 24h with S3 backups |
Next Lesson: Multi-Platform Gateway—connecting Slack, Discord, Email, and more.