Large-Scale TV Dataset
Partial Video Copy Detection
(STVD-PVCD)

The STVD-PVCD dataset deals with the performance evaluation of partial video copy detection methods in computer vision. It is designed from a protocol with a TV capture [1, 2] ensuring a deeper scalability, a robust groundtruthing and a control of degradations for a fine performance evaluation. It is the largest public dataset on the task with a near 83k videos having a total duration of 10,660 hours.

The dataset is composed of a reference set, six test sets A to F (presenting different categories and levels of degradation) with the groundtruth. It is provided as different files containing:

short reference videos with ids,
long positive/negative videos for testing/training,
the groundtruth with the reference ids, timestamps and durations.

The groundtruth is provided as a CSV file having the format
Ref_Video; Pos_Video; Ref_Length; Pos_Length; Start_Copy where

Ref_Video is the label / file name of the reference video,
Pos_Video is the label / file name of the positive video,
Ref_Length is the length of the reference video in number of frames with a 30 FPS rate,
Pos_Length is the length of the positive video in number of frames with a 30 FPS rate, that is we have Pos_Length > Ref_Length,
Start_Copy is the index of the first frame of the reference video copy appearing in the positive video such as Start_Copy ∈ [0; Ref_Length-1]
and Start_Copy < Pos_Length − Ref_Length.

e.g. ref_a; pos_a; 112; 842; 100

The test sets A to F are detailed in [1, 2] and for short below.

Set A: is a root capture to tune the characterization tasks.
Set B: is a "hello world" test set.
Set C: is a test set with scalability and pixel attack.
Set D: is a test set with scalability and global transformations.
Set E: applies video speeding with scalability.
Set F: combines the test sets C, D and E.

For the needs of visualization and testing, some samples (reference, positive, negative videos with the grountruth) are given in the next table for the different test sets.

	Reference	Positive	Groundtruth	Negative
sample A	ref_a	pos_a	gth_a	neg_a
sample B	ref_b	pos_b	gth_b	neg_b
sample C	ref_c	pos_c	gth_c	neg_c
sample D	ref_d	pos_d	gth_d	neg_d
sample E	ref_e	pos_e	gth_e	neg_e
sample F	ref_f	pos_f	gth_f	neg_f

The different files constituting the dataset are given below protected with a password. The dataset is available for non-commercial research purposes. Before to download the dataset, get the agreement (in english or french version) and sign it. Then, send the scanned version to Mathieu Delalandre . After verifying your request, we will contact you with the password to unzip the dataset.

The different files constituting the dataset are given here. We provide first the files for the reference videos and groundtruth. The test sets A to F are given in the next table (STVD is still under publication, the test set F will be delivered later).

	Positive videos	Negative videos	Total duration (h)	Size (GiB)	Link	thumb
set A	3,780	12,165	1,960	458	download
set B	3,780	3,780	860	18.6	download
set C	3,780	12,165	1,960	6.5	download
set D	3,780	12,165	1,960	20.8	download
set E	3,780	12,165	1,960	21.8	download
set F	3,780	12,165	1,960	16.1	download

NB. Our storage service at the UT delivers at 3-16 MB/s for downloading (from a low / high speed connection, respectively) with concurrent access.

For kick-off, we list here works with experiments on the STVD-PVCD dataset.

Set	Refs
B	[LVH2022], `........`
C	[TNF2022], [LVH2022]
D	[LVH2023], `........`

Please cite one of the following papers, in english [1] or french [2], if you use this dataset.

V.H. Le, M. Delalandre and D. Conte. A large-Scale TV Dataset for partial video copy detection. International Conference on Image Analysis and Processing (ICIAP), Lecture Notes in Computer Science (LNCS), vol 13233, pp. 388-399, 2022.
V.H. Le, M. Delalandre and D. Conte. Une large base de données pour la détection de segments de vidéos TV. Journées Francophones des Jeunes Chercheurs en Vision par Ordinateur (ORASIS), 2021.

Large-Scale TV Dataset Partial Video Copy Detection (STVD-PVCD)

Large-Scale TV Dataset
Partial Video Copy Detection
(STVD-PVCD)