Qwen 3.6 Model Configs

llama-server configurations optimized for coding on Vulkan on a AMD 6800 16GB.

Qwen3.6-27B Dense (IQ3_M, MTP)

27B uncensored heretic v2 with MTP speculative decoding. ~128K context on a single 16GB GPU with KV Q8/Q5.

Qwen3.6-27B Dense (IQ4_XS, no MTP)

cHunter789 27B IQ4_XS variant without MTP heads. ~115K context on a single 16GB with KV Q5, 91K Q8/Q5

Qwen3.6-35B-A3B MoE

35B total / 3B active MoE. ByteShape IQ3_S MTD variant (128K ctx, ~140 t/s) and IQ3_X no-MTD variant (200k ctx). Running on a single 16GB AMD 6800.

Qwopus3.5-9B Coder

9B coder model (Qwopus3.5) with MTP speculative decoding. ~81K context headless, Q6_K quantization. Optimized for fast coding tasks on 16GB GPU.