A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
WanJuan3.0(“万卷·丝路”)一个作为综合性的纯文本语料库,收集了多个国家地区的网络公开信息、文献、专利等资料,数据总规模超 1.2TB,Token 总数超过 300B(300 billion),处于国际领先水平。首期开源的语料库主要由泰语、俄语、阿拉伯语、韩语和越南语 5 个子集构成,每个子集的数据规模均超过 150GB。
OpenDataLab介绍
暂无简介
[CVPR 2024] 3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions
项目展示
查看全部项目 >UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
StarA high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Star- Like
- Like
暂无简介
Star[ICCV25 Highlight] The official implementation of the paper "LEGION: Learning to Ground and Explain for Synthetic Image Detection"
Star[NeurIPS 2025 �] FakeVLM: Advancing Synthetic Image Detection through Explainable Multimodal Models and Fine-Grained Artifact Analysis
Star暂无简介
Like- Like

