Sciforium

gpt-oss-20b

gpt-oss-20b is an open-weight mixture-of-experts model from OpenAI with 21 billion total parameters and 3.6 billion active parameters per forward pass, designed for low-latency inference and deployment on consumer or single-GPU hardware. It supports the same reasoning level configuration, fine-tuning, and agentic capabilities as gpt-oss-120b — including function calling, tool use, and structured outputs — at significantly lower cost and latency, making it ideal for cost-sensitive or edge deployments.

Features

Serverless API

GPT-OSS is available via sciforium' serverless API, where you pay per token. There are several ways to call the sciforium API, including sciforium' Python client, the REST API, or OpenAI's Python client.

Docs

On-demand Deployments

On-demand deployments allow you to use GPT-OSS on dedicated GPUs with sciforium' high-performance serving stack with high reliability and no rate limits.

Docs

Metadata

State

Contact Sales

Type

LLM

Creator

OpenAI

Hugging Face

gpt-oss-20b

Specification

Model Weights

BF16

Activation

BF16

KV Cache

BF16

Supported Functionality

Fine-tuning

Contact Sales

Serverless

Supported

Context Length

131K

Embeddings

Input Modality

Text

Output Modality

Text

MiniMax M2.5

Kimi K2.5

GLM 5

DeepSeek V3.2

gpt-oss-120b

gpt-oss-20b

Qwen3 Instruct

Qwen3 Thinking

Qwen3 Coder

Qwen3.5

Qwen3 VL Instruct

Qwen3 ASR

Qwen-Image

Qwen-Image-Edit

Flux2

Stable Diffusion 3.5

Hunyuan Image

Z-Image

Wan2.2-I2V

Wan2.2-T2V

Hunyuan Image

Z-Image