FIRST¶

Welcome to the documentation for the Federated Inference Resource Scheduling Toolkit (FIRST). FIRST enables AI Model inference as a service across distributed HPC clusters through an OpenAI-compatible API.

What is FIRST?¶

FIRST (Federated Inference Resource Scheduling Toolkit) is a system that allows secure, remote execution of Inference on AI Models through an OpenAI-compatible API. It validates and authorizes inference requests to scientific computing clusters using Globus Auth and Globus Compute.

System Architecture¶

System Architecture

The Inference Gateway consists of several components:

API Gateway: Django-based REST/Ninja API that handles authorization and request routing
Globus Auth: Authentication and authorization service
Globus Compute Endpoints: Remote execution framework on HPC clusters (or local machines)
Inference Server Backend: High-performance inference service for LLMs (e.g., vLLM)

Quick Links¶

For Administrators¶

Globus Setup - Create Globus project and register applications
Docker Deployment - Fast-track Docker deployment in under 10 minutes
Bare Metal Setup - Complete installation on your own infrastructure
Inference Backend - Connect to OpenAI API, local vLLM, or Globus Compute
Kubernetes - Deploy on Kubernetes clusters (Coming Soon)

For Users¶

User Guide - Complete guide for authentication and making requests
API Reference - API endpoint documentation
Examples - Code examples and tutorials

Key Features¶

Federated Access: Route requests across multiple HPC clusters automatically
OpenAI-Compatible: Works with existing OpenAI SDK and tools
Secure: Globus Auth integration with group-based access control
High Performance: Support for vLLM and other optimized inference backends
Flexible: Deploy via Docker, bare metal, or Kubernetes
Scalable: Auto-scaling and resource management for HPC environments

Example Deployment¶

For a production example, see the ALCF Inference Endpoints documentation.

Getting Help¶

GitHub: Report issues or contribute
Citation: Research Paper
API Reference: Complete API documentation

License¶

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Quick Start Paths

Just want to try it out? → Docker Quickstart
Need full control? → Bare Metal Setup
Want to use the API? → User Guide