Qwen 3.6 Model Configs

llama-server configurations optimized for coding on Vulkan.

Qwen3.6-27B MTP

27B parameter IQ4 model with Multi-Token Prediction (MTP) for speculative decoding. Uses a Q8 nextn draft model for fast token generation. Best all-round config for coding. Running on a single 16GB AMD 6800

Qwen3.6-27B NGRAM

IQ4_XS quantized 27B model with light ngram-based speculative decoding intead of MTD. Running on a single 16GB AMD 6800 It consumes less VRAM and compute than the MTD version for 1/3 of prediction performances, more context available, better suited for domains where prediction don't matter much.

Qwopus3.5-9B Coder

9B coder model (Qwopus3.5) with MTP speculative decoding. Smaller and faster, good for quick coding tasks yet slower than MoE.

Qwen3.6-35B-A3B MoE

Mixture-of-Experts model with 35B total / 3B active parameters. Running on a single 16GB AMD 6800Multiple configs ranging from fast IQ3_M quant to larger Q4_K_XL.

Qwen3.6-35B-A3B Base

Base configuration for the 35B-A3B UD model with Q4_K_S quantization, no MTD heads. Good reference config with 80K context. Running on a single 16GB AMD 6800