2025-03-12 – OddMeta

Open WebUI不能正确显示qwq32b和deepseek的think标签问题

问题现象整个think标签不对，如下图：解决方案1：失败参考：https://github.com/open-webui/open-webui/discussions/11348This might be a quick fix：in backend/open_webui/utils/middleware.py , line 1313, function tag_content_handler :change the elif to if .找到middlewares.py :1313 定位到那个elif，改成if重启openwebui，问题依旧。解决方案2：OK 参考：https://github.com/open-webui/open-webui/issues/11259With TabbyAPI, I’m able to get the “normal” tag when removing it from the chat template.The end looks like this after the modification:{%- if add_generation_prompt %}\n […]

LLM

Open WebUI不能正确显示qwq32b和deepseek的think标签问题

相比于ollama, llama.cpp等框架, vllm是一个可以产品化部署的方案，适用于需要大规模部署和高并发推理的场景，采用 PagedAttention 技术，能够有效减少内存碎片，提高内存利用率，从而显著提升推理速度。在处理长序列输入时，性能优势更为明显。因此，今天先用vllm来验证一下QWQ32B 的情况。硬件环境租的AutoDL的GPU服务器做的测试 •软件环境 PyTorch 2.5.1、Python 3.12(ubuntu22.04)、Cuda 12.1 •硬件环境￮GPU：RTX 4090(24GB) * 2 ￮CPU：64 vCPU Intel(R) Xeon(R) Gold 6430 ￮内存：480G（至少需要382G）￮硬盘：1.8T（实际使用需要380G左右）一、虚拟环境 conda create –prefix=/root/autodl-tmp/jacky/env/vllm python==3.12.3 conda activate /root/autodl-tmp/jacky/envs/vllm/ pip install vllm 二、安装 vLLM export VLLM_VERSION=0.6.1.post1 export PYTHON_VERSION=310 pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux1_x86_64.whl –extra-index-url https://download.pytorch.org/whl/cu118 三、从huggingface下载模型计划测试 […]

LLM

用vllm 0.7.3 + QWQ32B Q4量化版本功能、性能测试

1 comment