Sciforium

Qwen3 VL Instruct

Qwen3 VL Instruct is a vision-language mixture-of-experts model with 30 billion total parameters and 3 billion active parameters, designed for visual understanding, multimodal agent tasks, and high-resolution image analysis up to megapixel-level inputs. It supports strong multilingual OCR, visual grounding, GUI automation, and spatial reasoning — making it well-suited for document processing, visual question answering, and agentic workflows that require tight integration of language and vision.

Features

Serverless API

Qwen3 VL is available via sciforium' serverless API, where you pay per token. There are several ways to call the sciforium API, including sciforium' Python client, the REST API, or OpenAI's Python client.

Docs

Agentic Capabilities

Docs

Metadata

State

Ready

Type

LLM

Creator

Alibaba / Qwen

Hugging Face

Qwen3-VL-235B-A22B-Instruct

Specification

Model Weights

BF16

Activation

BF16

KV Cache

BF16

Supported Functionality

Fine-tuning

Contact Sales

Serverless

Supported

Context Length

256K

Embeddings

Input Modality

Output Modality

Text

MiniMax M2.5

Kimi K2.5

GLM 5

DeepSeek V3.2

gpt-oss-120b

gpt-oss-20b

Qwen3 Instruct

Qwen3 Thinking

Qwen3 Coder

Qwen3.5

Qwen3 VL Instruct

Qwen3 ASR

Qwen-Image

Qwen-Image-Edit

Flux2

Stable Diffusion 3.5

Hunyuan Image

Z-Image

Wan2.2-I2V

Wan2.2-T2V

Hunyuan Image

Z-Image