FIRST¶
Welcome to the documentation for the Federated Inference Resource Scheduling Toolkit (FIRST). FIRST enables AI Model inference as a service across distributed HPC clusters through an OpenAI-compatible API.
What is FIRST?¶
FIRST (Federated Inference Resource Scheduling Toolkit) is a system that allows secure, remote execution of Inference on AI Models through an OpenAI-compatible API. It validates and authorizes inference requests to scientific computing clusters using Globus Auth and Globus Compute.
System Architecture¶

The Inference Gateway consists of several components:
- API Gateway: Django-based REST/Ninja API that handles authorization and request routing
- Globus Auth: Authentication and authorization service
- Globus Compute Endpoints: Remote execution framework on HPC clusters (or local machines)
- Inference Server Backend: High-performance inference service for LLMs (e.g., vLLM)
Quick Links¶
For Administrators¶
- Globus Setup - Create Globus project and register applications
- Docker Deployment - Fast-track Docker deployment in under 10 minutes
- Bare Metal Setup - Complete installation on your own infrastructure
- Inference Backend - Connect to OpenAI API, local vLLM, or Globus Compute
- Kubernetes - Deploy on Kubernetes clusters (Coming Soon)
For Users¶
- User Guide - Complete guide for authentication and making requests
- API Reference - API endpoint documentation
- Examples - Code examples and tutorials
Key Features¶
- Federated Access: Route requests across multiple HPC clusters automatically
- OpenAI-Compatible: Works with existing OpenAI SDK and tools
- Secure: Globus Auth integration with group-based access control
- High Performance: Support for vLLM and other optimized inference backends
- Flexible: Deploy via Docker, bare metal, or Kubernetes
- Scalable: Auto-scaling and resource management for HPC environments
Example Deployment¶
For a production example, see the ALCF Inference Endpoints documentation.
Getting Help¶
- GitHub: Report issues or contribute
- Citation: Research Paper
- API Reference: Complete API documentation
License¶
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Quick Start Paths
- Just want to try it out? → Docker Quickstart
- Need full control? → Bare Metal Setup
- Want to use the API? → User Guide