Qwen3.6-27B Dense (IQ4_XS, no MTP)

cHunter789 27B IQ4_XS variant without MTP heads. ~115K context on a single 16GB with KV Q5, 91K Q8/Q5

dense_iq4.sh

Think — reasoning on, no MTP, 115K ctx

# optimized for coding, no MTP heads, 14.3GB
# cHunter789/Qwen3.6-27B-i1-IQ4_XS-GGUF
# max context:    90K headless K q8_0, 115712K K q5_0, 92672ctx on KDE
# 1. Set Environment Variables
export LD_LIBRARY_PATH="/home/eaman/llama/bin_vulkan" 

# 2. Run the Server
/home/eaman/llama/bin_vulkan/llama-server \
 -m /home/eaman/lm/models/mradermacher/Qwen3.6-27B/Qwen3.6-27B.i1-IQ4_XS-attn_qkv-IQ4_XS.gguf \
    --host 0.0.0.0     -np 1 -fa on --no-mmap --jinja \
        -b 1024 -ub 128 \
    --fit-target 50 \
    -ctk q8_0 -ctv q5_0 \
    --temp 0.6  --top-k 25 --top-p 0.95 --min-p 0.0 \
        --presence-penalty 0.0 --repeat-penalty 1.0 \
        --reasoning on --reasoning-budget 4096 --reasoning-budget-message " -- Reasoning budget exceeded, proceed to final answer." \
        --cache-ram 6000 -ngl 99 -lv 4 --no-warmup  --timeout 900 \