Qwen 3.6 Model Configs

llama-server configurations optimized for coding on Vulkan on a AMD 6800 16GB.

Qwen3.6-27B Dense (IQ3_M, MTP)

27B uncensored heretic v2 with MTP speculative decoding. ~128K context on a single 16GB GPU with KV Q8/Q5.

dense_mtp_iq3.sh — Think — reasoning on, MTP draft spec, 128K ctx

Qwen3.6-27B Dense (IQ4_XS, no MTP)

cHunter789 27B IQ4_XS variant without MTP heads. ~115K context on a single 16GB with KV Q5, 91K Q8/Q5

dense_iq4.sh — Think — reasoning on, no MTP, 115K ctx

Qwen3.6-35B-A3B MoE

35B total / 3B active MoE. ByteShape IQ3_S MTD variant (128K ctx, ~140 t/s) and IQ3_X no-MTD variant (200k ctx). Running on a single 16GB AMD 6800.

fastest_moe.sh — MTD Think — IQ3_S, reasoning on, MTD, 128K ctx
moe.sh — No-MTD — IQ3_S 3.48bpw, reasoning on, 200k ctx

Qwopus3.5-9B Coder

9B coder model (Qwopus3.5) with MTP speculative decoding. ~81K context headless, Q6_K quantization. Optimized for fast coding tasks on 16GB GPU.

dense_9b.sh — Normal — 81K context, MTP draft spec, Q6_K

Generated from lm/ scripts — 4 model families, 5 configs total.