Large-Scale TV Dataset
Parallel Machine Scheduling
(STVD-PMS)

The STVD-PMS dataset is related to the problem of Parallel Machine Scheduling (PMS) for the capture of live videos within the TV Workstation [1-4]. This PMS problem can be characterized as follows.

Two collections are proposed in the dataset with offline and online latency, respectively. The process to constitute these collections is detailed here.

The different files constituting the dataset are given below protected with a password. The dataset is available for non-commercial research purposes. Before to download the dataset, get the agreement (in english or french version) and sign it. Then, send the scanned version to Mathieu Delalandre email. After verifying your request, we will contact you with the password to unzip the dataset.

The two collections composing the dataset are given in the next table. They are provided as CSV files having the format
Channel; Hashcode; Start; Stop; Latency(s); Mean(s); Std(s) where

e.g.
1; 00808c3868b1e82f; 20210130204500; 20210130205000; 273; 117; 245

For the needs of testing, some samples of the two collections are given here sample_1, sample_2.

Collection Latency Duration Period Channels Jobs Hashcodes m Link
Collection 1 Offline 170 days 12/20 - 05/21 26 99k 5,615 8 download
Collection 2 Online 30 days 02/21 8 6k 223 2;4 download

For the needs of kick-off, the STVD-PMS dataset is provided with an "hello world" algorithm for processing. The paralell scheduling of jobs within the dataset can be characterized as a Weighted Job Interval selection Problem (WJISP). This could be adressed with the parameterized algorithm GREEDYα [ErlebachT2001]. A C++ implementation of that algorithm is provided here PRD_GreedyAlpha (with functions to format XMLTV files into CSV). The output of the algorithm could used for characterization but also for an online control of the video capture. This requires a video platform (e.g. [4]) with a control solution. The TvStation_Remote project provides a library for the InfraRed control of video devices using PhidgetIR sensors.

Please cite the following papers, in english [1] or french [2], if you use this dataset.

  1. V.H. Le, M. Delalandre and D. Conte. A large-Scale TV Dataset for partial video copy detection. International Conference on Image Analysis and Processing (ICIAP), Lecture Notes in Computer Science (LNCS), vol 13233, pp. 388-399, 2022.
  2. V.H. Le, M. Delalandre and D. Conte. Une large base de données pour la détection de segments de vidéos TV. Journées Francophones des Jeunes Chercheurs en Vision par Ordinateur (ORASIS), 2021.
  3. V.H. Le, M. Delalandre and D. Conte. Real-time detection of partial video copy on TV workstation. Conference on Content-Based Multimedia Indexing (CBMI), pp. 1-4, 2021.
  4. M. Delalandre. The TV Workstation project: a research scope. FICT seminar, Thanh Hóa, Vietnam, 25th of October 2022.