gpt-oss-20b
gpt-oss-20b is an open-weight mixture-of-experts model from OpenAI with 21 billion total parameters and 3.6 billion active parameters per forward pass, designed for low-latency inference and deployment on consumer or single-GPU hardware. It supports the same reasoning level configuration, fine-tuning, and agentic capabilities as gpt-oss-120b — including function calling, tool use, and structured outputs — at significantly lower cost and latency, making it ideal for cost-sensitive or edge deployments.
Features
Serverless API
GPT-OSS is available via sciforium' serverless API, where you pay per token. There are several ways to call the sciforium API, including sciforium' Python client, the REST API, or OpenAI's Python client.
DocsOn-demand Deployments
On-demand deployments allow you to use GPT-OSS on dedicated GPUs with sciforium' high-performance serving stack with high reliability and no rate limits.
Docs