Qwen 3.6 Model Configs
llama-server configurations optimized for coding on Vulkan.
27B parameter model with Multi-Token Prediction (MTP) for speculative decoding. Uses a Q8 nextn draft model for fast token generation. Best all-round config for coding.
new_27B_mtp.sh — Normal — optimized for coding, 32K context headless new_27B_mtp.sh_think — Think — reasoning mode with higher temperature new_27B_mtp.sh_chat — Chat — 81K context, higher temp for conversational use new_27B_mtp.sh_opti — Optimized — 81K context, tuned batch size and cache
IQ4_XS quantized 27B model with ngram-based speculative decoding. Lighter quantization for faster inference at the cost of some quality.
new_27B_xxs.sh — Normal — ngram-mod speculative decoding, coding optimized new_27B_xxs.sh_think — Think — higher temperature for reasoning
9B coder model (Qwopus3.5) with MTP speculative decoding. Smaller and faster, good for quick coding tasks.
new_9b.sh — Normal — 81K context, Q6_K quantization new_9b.sh_think — Think — 30K context, higher temperature
Mixture-of-Experts model with 35B total / 3B active parameters. Multiple configs ranging from fast IQ3_M quant to larger Q4_K_XL.
qwen_MoE.sh_ngram — Normal — IQ3_M quant, ngram speculative decoding qwen_MoE.sh_ngram_think — Think — 121K context, higher fit-target fastest_MoE.sh_mtd — Fastest MTD — Q4_K_S UD with MTP draft moe_mtd_xl.sh — MTD XL — Q4_K_XL quant, 80K fit-ctx
Base configuration for the 35B-A3B UD model with Q4_K_S quantization. Good reference config with 80K context.
qwen3.6.sh — Base — Q4_K_S UD, ngram speculative decoding
Generated from lm/ scripts — 5 model families, 13 configs total.