gpt-oss-120b
gpt-oss-120b is an open-weight mixture-of-experts model from OpenAI with 117 billion total parameters and 5.1 billion active parameters per forward pass, optimized to run on a single H100 or AMD MI300X GPU using native MXFP4 quantization. It delivers o3/o4-mini-class reasoning with configurable thinking depth, full chain-of-thought access, and native tool use — including function calling, browsing, and structured output generation — making it well-suited for production, general-purpose, and high-reasoning use cases.
Features
Serverless API
GPT-OSS is available via sciforium' serverless API, where you pay per token. There are several ways to call the sciforium API, including sciforium' Python client, the REST API, or OpenAI's Python client.
DocsOn-demand Deployments
On-demand deployments allow you to use GPT-OSS on dedicated GPUs with sciforium' high-performance serving stack with high reliability and no rate limits.
Docs