Qwen 3.6 Model Configs

llama-server configurations optimized for coding on Vulkan.

Qwen3.6-27B MTP

27B parameter model with Multi-Token Prediction (MTP) for speculative decoding. Uses a Q8 nextn draft model for fast token generation. Best all-round config for coding.

Qwen3.6-27B XXS

IQ4_XS quantized 27B model with ngram-based speculative decoding. Lighter quantization for faster inference at the cost of some quality.

Qwopus3.5-9B Coder

9B coder model (Qwopus3.5) with MTP speculative decoding. Smaller and faster, good for quick coding tasks.

Qwen3.6-35B-A3B MoE

Mixture-of-Experts model with 35B total / 3B active parameters. Multiple configs ranging from fast IQ3_M quant to larger Q4_K_XL.

Qwen3.6-35B-A3B Base

Base configuration for the 35B-A3B UD model with Q4_K_S quantization. Good reference config with 80K context.