5d9fb3e4创建于 2025年11月6日历史提交

language:

en license: apache-2.0 size_categories:
1K<n<10K task_categories:
video-classification
text-to-video
text-classification pretty_name: Veo 3.1 Human Preferences dataset_info: features:
- name: prompt dtype: string
- name: video1 dtype: string
- name: video2 dtype: string
- name: weighted_results1_Alignment dtype: float64
- name: weighted_results2_Alignment dtype: float64
- name: detailedResults_Alignment list:
  - name: userDetails struct:
    - name: age dtype: string
    - name: country dtype: string
    - name: gender dtype: string
    - name: language dtype: string
    - name: occupation dtype: string
    - name: userScores struct:
      - name: global dtype: float64
  - name: votedFor dtype: string
- name: weighted_results1_Coherence dtype: float64
- name: weighted_results2_Coherence dtype: float64
- name: detailedResults_Coherence list:
  - name: userDetails struct:
    - name: age dtype: string
    - name: country dtype: string
    - name: gender dtype: string
    - name: language dtype: string
    - name: occupation dtype: string
    - name: userScores struct:
      - name: global dtype: float64
  - name: votedFor dtype: string
- name: weighted_results1_Preference dtype: float64
- name: weighted_results2_Preference dtype: float64
- name: detailedResults_Preference list:
  - name: userDetails struct:
    - name: age dtype: string
    - name: country dtype: string
    - name: gender dtype: string
    - name: language dtype: string
    - name: occupation dtype: string
    - name: userScores struct:
      - name: global dtype: float64
  - name: votedFor dtype: string
- name: file_name1 dtype: string
- name: file_name2 dtype: string
- name: model1 dtype: string
- name: model2 dtype: string splits:
- name: train num_bytes: 6227078 num_examples: 1643 download_size: 660798 dataset_size: 6227078 configs:
config_name: default data_files:
- split: train path: data/train-* tags:
videos
t2v
text-2-video
text2video
text-to-video
human
annotations
preferences
likert
coherence
alignment
wan
wan 2.1
veo2
veo
pikka
alpha
sora
hunyuan
veo3
mochi-1
seedance-1-pro
seedance
seedance 1
Marey
moonvalley
sora2
openai
veo 3.1

Rapidata Video Generation Veo 3.1 Human Preference

In this dataset, ~74k human responses from ~23k human annotators were collected to evaluate the Veo 3.1 video generation model on our benchmark. This dataset was collected using the Rapidata Python API, accessible to anyone and ideal for large scale data annotation.

Explore our latest model rankings on our website.

If you get value from this dataset and would like to see more in the future, please consider liking it ❤️

Overview

In this dataset, ~74k human responses from ~23k human annotators were collected to evaluate the Veo 3.1 video generation model on our benchmark. This dataset was collected in roughtly 30 min using the Rapidata Python API, accessible to anyone and ideal for large scale data annotation. The benchmark data is accessible on huggingface directly.

Explanation of the colums

The dataset contains paired video comparisons. Each entry includes 'video1' and 'video2' fields, which contain links to downscaled GIFs for easy viewing. The full-resolution videos can be found here

The weighted_results column contains scores ranging from 0 to 1, representing aggregated user responses. Individual user responses can be found in the detailedResults column.

Alignment

The alignment score quantifies how well an video matches its prompt. Users were asked: "Which video fits the description better?".

Examples

A 3D animated journey through a magical forest where trees glow with neon colors and mythical creatures roam. The camera weaves between luminescent flora and sparkling streams under a twilight sky.

Veo 3.1

(Score: 80.78%)

Mochi 1

(Score: 19.22%)

A hyper-realistic view of a motorcycle racing through a neon-lit city at night, reflecting on wet streets. The camera follows closely as the rider leans into sharp turns, capturing speed and agility.

Veo 3.1

(Score: 29.62%)

Veo 3

(Score: 70.37%)

Coherence

The coherence score measures whether the generated video is logically consistent and free from artifacts or visual glitches. Without seeing the original prompt, users were asked: "Which video has more glitches and is more likely to be AI generated?"

Examples

Veo 3.1

(Glitch Rating: 31.24%)

Veo 2

(Glitch Rating: 68.76%)

Veo 3.1

(Glitch Rating: 82.31%)

Marey

(Glitch Rating: 17.69%)

Preference

The preference score reflects how visually appealing participants found each video, independent of the prompt. Users were asked: "Which video do you prefer aesthetically?"

Examples

Veo 3.1

(Score: 64.23%)

Veo 2

(Score: 35.77%)

Veo 3.1

(Score: 22.76%)

Kling v2.1

(Score: 77.24%)

About Rapidata

Rapidata's technology makes collecting human feedback at scale faster and more accessible than ever before. Visit rapidata.ai to learn more about how we're revolutionizing human feedback collection for AI development.

Other Datasets

We run a benchmark of the major video generation models, the results can be found on our website. We rank the models according to their coherence/plausiblity, their aligment with the given prompt and style prefernce. The underlying 2M+ annotations can be found here:

Link to the Rich Video Annotation dataset
Link to the Coherence dataset
Link to the Text-2-Image Alignment dataset
Link to the Preference dataset