Qwen3 VL Instruct
Qwen3 VL Instruct is a vision-language mixture-of-experts model with 30 billion total parameters and 3 billion active parameters, designed for visual understanding, multimodal agent tasks, and high-resolution image analysis up to megapixel-level inputs. It supports strong multilingual OCR, visual grounding, GUI automation, and spatial reasoning — making it well-suited for document processing, visual question answering, and agentic workflows that require tight integration of language and vision.
Features
Serverless API
Qwen3 VL is available via sciforium' serverless API, where you pay per token. There are several ways to call the sciforium API, including sciforium' Python client, the REST API, or OpenAI's Python client.
DocsAgentic Capabilities
Qwen3 VL is available via sciforium' serverless API, where you pay per token. There are several ways to call the sciforium API, including sciforium' Python client, the REST API, or OpenAI's Python client.
Docs