Administrator Guide¶
Welcome to the FIRST Inference Gateway Administrator Guide. This guide will help you deploy and configure the gateway for your organization.
Overview¶
Setting up FIRST involves two main components:
- Globus Setup: The applications and clients that communicate with Globus services
- Gateway Installation: The central API service that handles authentication and routing
- Inference Backend Setup: The actual inference servers where models run
These can be deployed independently and connected together through configuration.
Prerequisites¶
Before you begin, ensure you have:
- Python 3.12 or later
- Docker and Docker Compose (for Docker deployment)
- PostgreSQL Server (or use Docker)
- Globus Account
- Access to compute resources (for inference backends)
Globus Applications¶
Globus applications are required to operate the service and manage the authentication and authorization layer.
Deployment Architecture¶
Choose your deployment approach:
Gateway Deployment Options¶
Docker Deployment (Recommended)
- Quick setup with Docker Compose
- Pros: Easy to deploy, includes all dependencies, portable
- Cons: Requires Docker knowledge
- Docker Guide →
Bare Metal Deployment
- Direct installation on your server infrastructure
- Pros: More control, better performance, easier debugging
- Cons: Manual dependency management
- Bare Metal Guide →
Inference Backend Options¶
Globus Compute + vLLM (Recommended for Production)
- Deploy vLLM on HPC clusters with Globus Compute for remote execution
- Best for: Multi-cluster, federated deployments, HPC environments
- Globus Compute Setup →
Local vLLM
- Run vLLM inference server locally without Globus Compute
- Best for: Single-node deployments, development
- Local vLLM Setup →
Direct API Connection
- Connect to existing OpenAI-compatible APIs (OpenAI, Anthropic, etc.)
- Best for: Simple setup, using commercial APIs
- Direct API Setup →
Setup Workflow¶
Phase 1: Gateway Installation¶
- Choose your deployment method (Docker or Bare Metal)
- Register Globus applications
- Configure environment variables
- Initialize the database
- Start the gateway service
Phase 2: Inference Backend Setup¶
- Choose your backend type
- Install required software (vLLM, Globus Compute, etc.)
- Configure the backend
- Register endpoints/functions
- Test the connection
Phase 3: Connect Gateway and Backend¶
- Update fixture files with backend details
- Load fixtures into the gateway database
- Verify end-to-end functionality
Common Patterns¶
Pattern 1: Quick Local Development¶
graph LR
A[Gateway] --> B[Local vLLM]
B --> C[Small Model<br/>OPT-125M] Use: Development and testing
Setup Time: ~15 minutes
Resources: 1 GPU or CPU
Pattern 2: Production Single Cluster¶
graph LR
A[Gateway] --> B[Globus Compute]
B --> C[HPC Cluster]
C --> D[Multiple Models] Use: Production deployment on single HPC cluster
Setup Time: ~2 hours
Resources: HPC cluster access
Pattern 3: Federated Multi-Cluster¶
graph LR
A[Gateway] --> B[Cluster 1<br/>Globus Compute]
A --> C[Cluster 2<br/>Globus Compute]
A --> D[Cluster 3<br/>Globus Compute] Use: Maximum availability and resource pooling
Setup Time: ~4 hours
Resources: Multiple HPC clusters
Next Steps¶
Ready to get started? Choose your path:
- Quick Start: Docker Deployment
- Full Setup: Bare Metal Deployment
- Backend Setup: Inference Backend Overview
Production Examples¶
ALCF Sophia Cluster¶
We provide production-ready examples from our deployment at Argonne Leadership Computing Facility (ALCF) Sophia cluster:
- Modular launcher scripts with automatic Ray setup for multi-node models
- Environment management with dynamic version selection
- Production configurations for single-node and multi-node deployments (up to 405B parameter models)
- Advanced features: chunked prefill, prefix caching, tool calling
These examples are located in compute-endpoints/ and compute-functions/ directories and should be adapted for your specific HPC environment.
See ALCF Examples
View the complete ALCF Sophia production setup in the Globus Compute Guide including:
sophia_env_setup_with_ray.sh- Environment and Ray cluster managementlaunch_vllm_model.sh- Flexible vLLM launcher with multi-node support- Example YAML configurations for various model sizes