llama-server configurations optimized for coding on Vulkan on a AMD 6800 16GB.
27B uncensored heretic v2 with MTP speculative decoding. ~128K context on a single 16GB GPU with KV Q8/Q5.
cHunter789 27B IQ4_XS variant without MTP heads. ~115K context on a single 16GB with KV Q5, 91K Q8/Q5
35B total / 3B active MoE. ByteShape IQ3_S MTD variant (128K ctx, ~140 t/s) and IQ3_X no-MTD variant (200k ctx). Running on a single 16GB AMD 6800.
9B coder model (Qwopus3.5) with MTP speculative decoding. ~81K context headless, Q6_K quantization. Optimized for fast coding tasks on 16GB GPU.