ascend-robotfix PaddleOCR-VL-1.5, PP-DocLayoutV2, PP-DocLayoutV3

文件	最后提交记录	最后更新时间
README.md	fix PaddleOCR-VL-1.5, PP-DocLayoutV2, PP-DocLayoutV3 Co-authored-by: wangzihan-zzz<1780093255@qq.com> # message auto-generated for no-merge-commit merge: !7594 merge master into master fix PaddleOCR-VL-1.5, PP-DocLayoutV2, PP-DocLayoutV3 Created-by: wangzihan-zzz Commit-by: wangzihan-zzz Merged-by: ascend-robot Description: ## Motivation fix PaddleOCR-VL-1.5, PP-DocLayoutV2, PP-DocLayoutV3 docs ## Modification fix PaddleOCR-VL-1.5 README contents，update PP-DocLayoutV2 README、infer.py，fix PP-DocLayoutV3 README ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7594	4 天前
infer.py	add PaddleOCR-VL-1.5 model Co-authored-by: wangzihan-zzz<1780093255@qq.com> # message auto-generated for no-merge-commit merge: !7576 merge PaddleOCR-VL-1.5 into master add PaddleOCR-VL-1.5 model Created-by: wangzihan-zzz Commit-by: wangzihan-zzz Merged-by: ascend-robot Description: ## Motivation add PaddleOCR-VL-1.5 model ## Modification add README、infer.py ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7576	4 天前
infer_preformance.py	add PaddleOCR-VL-1.5 model Co-authored-by: wangzihan-zzz<1780093255@qq.com> # message auto-generated for no-merge-commit merge: !7576 merge PaddleOCR-VL-1.5 into master add PaddleOCR-VL-1.5 model Created-by: wangzihan-zzz Commit-by: wangzihan-zzz Merged-by: ascend-robot Description: ## Motivation add PaddleOCR-VL-1.5 model ## Modification add README、infer.py ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7576	4 天前
overall_metric.py	add PaddleOCR-VL-1.5 model Co-authored-by: wangzihan-zzz<1780093255@qq.com> # message auto-generated for no-merge-commit merge: !7576 merge PaddleOCR-VL-1.5 into master add PaddleOCR-VL-1.5 model Created-by: wangzihan-zzz Commit-by: wangzihan-zzz Merged-by: ascend-robot Description: ## Motivation add PaddleOCR-VL-1.5 model ## Modification add README、infer.py ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7576	4 天前
test_omnidocbench.py	add PaddleOCR-VL-1.5 model Co-authored-by: wangzihan-zzz<1780093255@qq.com> # message auto-generated for no-merge-commit merge: !7576 merge PaddleOCR-VL-1.5 into master add PaddleOCR-VL-1.5 model Created-by: wangzihan-zzz Commit-by: wangzihan-zzz Merged-by: ascend-robot Description: ## Motivation add PaddleOCR-VL-1.5 model ## Modification add README、infer.py ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7576	4 天前

PaddleOCR-VL-1.5（vLLM）-推理指导

概述
推理环境准备
快速上手

概述

PaddleOCR-VL-1.5是百度推出的文档解析模型，是PaddleOCR-VL模型的迭代版本

本文档介绍了PaddleOCR-VL-1.5模型的部署流程，包括推理环境准备、模型部署、功能验证，旨在帮助用户快速完成模型部署和验证

注意：完整的PaddleOCR-VL-1.5流程包含其中的VLM组件PaddleOCR-VL-1.5-0.9B与版面分析模型PP-DocLayoutV3，PaddleOCR-VL-1.5-0.9B并不是PaddleOCR-VL-1.5的一个模型变种

版本说明：

url=https://www.modelscope.cn/models/PaddlePaddle/PaddleOCR-VL-1.5
model_name=PaddleOCR-VL-1.5

推理环境准备

该模型需要以下插件与驱动
仅支持Atlas 800I A2/Atlas 800T A2

表 1 版本配套表

配套	版本	环境准备指导
固件与驱动	25.2.RC1	Pytorch框架推理环境准备
CANN	8.5.1	-
Python	3.11.14	-
vLLM-Ascend	0.19.1rc1	-
PyTorch	2.9.0	-

快速上手

环境配置

拉取源码

git clone https://gitcode.com/ascend/ModelZoo-PyTorch.git
cd ModelZoo-PyTorch/ACL_PyTorch/built-in/ocr/PaddleOCR-VL-1.5

拉取镜像

# 复用vllm基础镜像或参考vllm-ascend社区文档搭建vllm环境：
docker pull quay.io/ascend/vllm-ascend:v0.19.1rc1

获取权重

下载PaddleOCR-VL-1.5模型权重，并放置于本地目录PaddleOCR-VL-1.5

mkdir PaddleOCR-VL-1.5
modelscope download --model PaddlePaddle/PaddleOCR-VL-1.5 --local_dir ./PaddleOCR-VL-1.5

PaddleOCR-VL-1.5-0.9B模型推理

启动vLLM服务

export TASK_QUEUE_ENABLE=1
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
vllm serve ./PaddleOCR-VL-1.5/ \
          --host 0.0.0.0 \
          --port 8000 \
          --max-num-batched-tokens 16384 \
          --served-model-name PaddleOCR-VL-1.5-0.9B \
          --trust-remote-code \
          --no-enable-prefix-caching \
          --mm-processor-cache-gb 4 \
          --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'

发送请求

服务启动后，可以使用 OpenAI API 客户端进行查询

from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://localhost:8000/v1",
    timeout=3600
)

# Task-specific base prompts
TASKS = {
    "ocr": "OCR:",
    "table": "Table Recognition:",
    "formula": "Formula Recognition:",
    "chart": "Chart Recognition:",
}

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
                }
            },
            {
                "type": "text",
                "text": TASKS["ocr"]
            }
        ]
    }
]

response = client.chat.completions.create(
    model="PaddleOCR-VL-1.5-0.9B",
    messages=messages,
    temperature=0.0,
)
print(f"Generated text: {response.choices[0].message.content}")

结合PP-DocLayoutV3进行端到端流程推理

参考PP-DocLayoutV3部署客户端

注意：在部署PaddleOCR-VL-1.5模型端到端流程时，建议将PP-DocLayoutV3与PaddleOCR-VL-1.5-0.9B模型的环境隔离，以防依赖冲突

在客户端使用推理脚本执行端到端推理，图片为官方示例“paddleocr_vl_demo.png”

python infer.py

端到端性能测试

针对官方示例图片，进行端到端耗时性能测试

python infer_preformance.py

精度测试

获取数据集

创建数据集目录OmniDocBenchV1.5，下载多样性文档解析评测集OmniDocBench数据集的pdfs和标注，解压并放置在OmniDocBenchV1.5目录下文件目录格式大致如下：

📁 workdir
 ├── infer.py
 ├── ……
 └── 📁 OmniDocBenchV1.5
     ├── OmniDocBench.json
     └── 📁 pdfs
         └── ***.pdf

推理结果

运行测试脚本test_omnidocbench.py

python3 test_omnidocbench.py --data_dir=OmniDocBenchV1.5/pdfs

可选参数说明
- data_dir: 数据集路径
- output_path: markdown文件存放路径
- layout_detection_model_name: 布局检测模型名称，默认为PP-DocLayoutV3
- layout_detection_model_dir: 布局检测模型路径，默认为PP-DocLayoutV3-weight
- vllm_ip: 模型对外接口，默认为"http://127.0.0.1:8000/v1"

推理执行完成后，解析结果存放于OmniDocBenchV1.5_out_pdf目录。按照如下步骤汇总解析结果，使得文件名与数据集标注对应

mkdir OmniDocBenchV1.5_out_pdf/end2end
cp OmniDocBenchV1.5_out_pdf/*.md OmniDocBenchV1.5_out_pdf/end2end
for f in OmniDocBenchV1.5_out_pdf/end2end/*_0.md; do mv "$f" "${f%_0.md}.md"; done

测评环境构建

获取测评源码并构建环境

安装OmniDocBench基础环境

git clone https://github.com/opendatalab/OmniDocBench.git
cd OmniDocBench
git reset --hard 176a7813e41427d21acac3c243308cb2fdff9054
conda create -n omnidocbench python=3.10
conda activate omnidocbench
pip install -r requirements.txt

公式精度指标CDM需要额外安装环境

step.1 install nodejs

wget https://nodejs.org/dist/v16.13.1/node-v16.13.1-linux-arm64.tar.xz
tar -xf node-v16.13.1-linux-arm64.tar.xz
mv node-v16.13.1-linux-arm64/* /usr/local/nodejs/
ln -s /usr/local/nodejs/bin/node /usr/local/bin
ln -s /usr/local/nodejs/bin/npm /usr/local/bin
node -v

step.2 install imagemagic

git clone https://github.com/ImageMagick/ImageMagick.git ImageMagick-7.1.2
cd ImageMagick-7.1.2
apt-get update && apt-get install -y libpng-dev zlib1g-dev libgs-dev libxml2-dev libfontconfig1-dev
apt-get install -y ghostscript
./configure
make
sudo make install
sudo ldconfig /usr/local/lib
convert --version

step.3 install latexpdf

sudo apt-get install texlive-full

step.4 install python requriements

pip install -r metrics/cdm/requirements.txt

测评配置修改

修改OmniDocBench测评代码中的config文件，具体来说，我们使用端到端测评配置，修改configs/end2end.yaml文件中的ground_truth的data_path为下载的OmniDocBench.json路径，修改prediction的data_path中提供整理的推理结果的文件夹路径，如下：
```
# -----以下是需要修改的部分 -----
...
dataset:
   dataset_name: end2end_dataset
   ground_truth:
   data_path: ../OmniDocBenchV1.5/OmniDocBench.json
   prediction:
   data_path: ../OmniDocBenchV1.5_out_pdf/end2end
```
精度测量结果

配置好config文件后，只需要将config文件作为参数传入，运行以下代码即可进行评测：
```
python pdf_validation.py --config ./configs/end2end.yaml
```
评测结果将会存储在result目录下，Overall指标的计算方式为:

$CDM3\text{Overall} = \frac{(1-\textit{Text Edit Distance}) \times 100 + \textit{Table TEDS} +\textit{Formula CDM}}{3}$

运行overall_metric.py可以得到精度结果：
```
python overall_metric.py
```

精度结果

在OmniDocBenchV1.5数据集上的精度为：

模型	硬件	overall	官方精度
PaddleOCR-VL-1.5	Atlas 800I A2	94.38%	94.93%

模型推理性能

使用官方测试图片paddleocr_vl_demo.png测试端到端耗时为：

模型	硬件	耗时(s)
PaddleOCR-VL-1.5	Atlas 800I A2	3.2