Production Best Practices¶
This guide covers best practices for deploying FIRST Inference Gateway in production environments.
Security¶
Authentication & Authorization¶
Restrict access to: - specific identity providers (AUTHORIZED_IDP_DOMAINS and Globus High-Assurance policy) - specific groups (GLOBUS_GROUPS and AUTHORIZED_GROUPS_PER_IDP)
See example environment file and Globus Setup for more details.
Secrets Management¶
Never store secrets in code or version control.
Use Environment Files¶
Docker Secrets¶
services:
gateway:
secrets:
- db_password
- globus_secret
secrets:
db_password:
file: ./secrets/db_password.txt
globus_secret:
file: ./secrets/globus_secret.txt
Vault Integration¶
For enterprise deployments, integrate with HashiCorp Vault or similar.
HTTPS/TLS¶
Always use HTTPS in production.
Let's Encrypt with Certbot¶
Custom Certificates¶
server {
listen 443 ssl http2;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
}
Firewall Configuration¶
# Ubuntu/Debian
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw deny 8000/tcp # Don't expose Django directly
# CentOS/RHEL
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --reload
Performance¶
Database Optimization¶
Connection Pooling¶
# settings.py
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'CONN_MAX_AGE': 600, # Persistent connections
'OPTIONS': {
'connect_timeout': 10,
}
}
}
Indexes¶
Ensure proper indexes on frequently queried fields:
python manage.py dbshell
CREATE INDEX idx_endpoint_slug ON resource_server_endpoint(endpoint_slug);
CREATE INDEX idx_created_at ON resource_server_listendpointslog(created_at);
Caching¶
Redis Configuration¶
# settings.py
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.redis.RedisCache',
'LOCATION': 'redis://redis:6379/0',
'OPTIONS': {
'CLIENT_CLASS': 'django_redis.client.DefaultClient',
'CONNECTION_POOL_KWARGS': {
'max_connections': 50
}
}
}
}
Gunicorn Configuration¶
Worker Calculation¶
For a 16-core machine:
Production Config¶
# gunicorn_asgi.config.py
import multiprocessing
bind = "0.0.0.0:8000"
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "uvicorn.workers.UvicornWorker"
worker_connections = 1000
timeout = 120
keepalive = 5
max_requests = 1000
max_requests_jitter = 50
Nginx Optimization¶
upstream gateway {
least_conn; # Load balancing algorithm
server 127.0.0.1:8000 max_fails=3 fail_timeout=30s;
server 127.0.0.1:8001 max_fails=3 fail_timeout=30s;
keepalive 64;
}
server {
listen 443 ssl http2;
# Gzip compression
gzip on;
gzip_types text/plain text/css application/json application/javascript;
gzip_min_length 1000;
# Client body size
client_max_body_size 100M;
client_body_buffer_size 1M;
# Timeouts
proxy_connect_timeout 600s;
proxy_send_timeout 600s;
proxy_read_timeout 600s;
send_timeout 600s;
# Buffering
proxy_buffering off; # Important for streaming
proxy_request_buffering off;
# Headers
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
location /static/ {
alias /path/to/staticfiles/;
expires 30d;
add_header Cache-Control "public, immutable";
}
location / {
proxy_pass http://gateway;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
Monitoring¶
Application Monitoring¶
Prometheus Metrics¶
Add to docker-compose.yml:
services:
prometheus:
image: prom/prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
ports:
- "9090:9090"
Grafana Dashboards¶
services:
grafana:
image: grafana/grafana
volumes:
- grafana_data:/var/lib/grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=secure_password
Log Aggregation¶
Structured Logging¶
# logging_config.py
LOGGING = {
'version': 1,
'formatters': {
'json': {
'class': 'pythonjsonlogger.jsonlogger.JsonFormatter',
'format': '%(asctime)s %(name)s %(levelname)s %(message)s'
}
},
'handlers': {
'file': {
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/gateway.log',
'maxBytes': 10485760, # 10MB
'backupCount': 10,
'formatter': 'json'
}
}
}
ELK Stack Integration¶
For large deployments, consider Elasticsearch + Logstash + Kibana.
Health Checks¶
Kubernetes Probes¶
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
Custom Health Endpoint¶
Create a health check view in Django to verify database, Redis, and Globus Compute connectivity.
Backup & Recovery¶
Database Backups¶
Automated Backups¶
#!/bin/bash
# backup_db.sh
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups/postgres"
BACKUP_FILE="$BACKUP_DIR/backup_$DATE.sql.gz"
pg_dump -h localhost -U inferencedev inferencegateway | gzip > $BACKUP_FILE
# Keep only last 30 days
find $BACKUP_DIR -name "backup_*.sql.gz" -mtime +30 -delete
Add to crontab:
Point-in-Time Recovery¶
Configure PostgreSQL for WAL archiving:
Configuration Backups¶
# Backup environment and fixtures
tar -czf config_backup_$(date +%Y%m%d).tar.gz \
.env \
fixtures/ \
nginx_app.conf \
gunicorn_asgi.config.py
Scaling¶
Horizontal Scaling¶
Multiple Gateway Instances¶
Session Affinity¶
For stateful sessions:
Database Scaling¶
Read Replicas¶
# settings.py
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'HOST': 'primary.db.internal',
},
'replica': {
'ENGINE': 'django.db.backends.postgresql',
'HOST': 'replica.db.internal',
}
}
DATABASE_ROUTERS = ['path.to.ReplicaRouter']
Connection Pooling (PgBouncer)¶
# pgbouncer.ini
[databases]
inferencegateway = host=localhost port=5432 dbname=inferencegateway
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
Inference Backend Scaling¶
Federated Endpoints¶
Deploy multiple Globus Compute endpoints and configure federated routing for automatic load balancing.
Auto-Scaling¶
Configure Globus Compute endpoints to auto-scale based on demand:
Maintenance¶
Zero-Downtime Deployments¶
Blue-Green Deployment¶
- Deploy new version alongside old
- Switch traffic to new version
- Monitor for issues
- Decommission old version
Rolling Updates¶
# Update one instance at a time
for server in gateway1 gateway2 gateway3; do
ssh $server "cd /app && git pull && systemctl restart gateway"
sleep 60 # Allow time to stabilize
done
Database Migrations¶
Always test migrations in staging first:
# Backup before migrating
./backup_db.sh
# Run migration
python manage.py migrate
# If issues occur, restore backup
psql -h localhost -U inferencedev inferencegateway < backup.sql
Disaster Recovery¶
Disaster Recovery Plan¶
- Recovery Time Objective (RTO): 2 hours
- Recovery Point Objective (RPO): 1 hour
Backup Strategy¶
- Hourly: Database transaction logs
- Daily: Full database backup
- Weekly: Complete system backup (config, logs, data)
- Monthly: Archived to off-site storage
Failover Procedures¶
Document step-by-step procedures for:
- Gateway failure → Switch to backup gateway
- Database failure → Promote read replica
- Complete site failure → Activate DR site
Checklist¶
Pre-Production¶
- [ ] All secrets are externalized
- [ ] HTTPS/TLS configured
- [ ] Firewall rules applied
- [ ] DEBUG=False
- [ ] Strong passwords set
- [ ] Database backed up
- [ ] Monitoring configured
- [ ] Log aggregation set up
- [ ] Health checks working
- [ ] Load testing completed
- [ ] Disaster recovery plan documented
Post-Deployment¶
- [ ] Monitor logs for errors
- [ ] Verify all endpoints responding
- [ ] Check database performance
- [ ] Test authentication flow
- [ ] Verify Globus Compute connectivity
- [ ] Run integration tests
- [ ] Document any issues