Action (Action Classification in Video)
- Classes : 10 actions (Diving (14 videos),Golf Swing (18 videos) ,Kicking (20 videos),Lifting (6 videos))
- This dataset consists of a set of actions collected from various sports which are typically featured on broadcast television channels such as the BBC and ESPN. The video sequences were obtained from a wide range of stock footage websites including BBC Motion gallery, and GettyImages.
- This dataset features video sequences that were obtained using a R/C-controlled blimp equipped with an HD camera mounted on a gimbal.The collection represents a diverse pool of actions featured at different heights and aerial viewpoints. Multiple instances of each action were recorded at different flying altitudes which ranged from 400-450 feet and were performed by different actors.
- It contains 11 action categories collected from YouTube.
- Walk, Run, Jump, Gallop sideways, Bend, One-hand wave, Two-hands wave, Jump in place, Jumping Jack, Skip.
- UCF50 is an action recognition dataset with 50 action categories, consisting of realistic videos taken from YouTube.
- The Action Similarity Labeling (ASLAN) Challenge.
MSR Action Recognition Datasets
- The dataset was captured by a Kinect device. There are 12 dynamic American Sign Language (ASL) gestures, and 10 people. Each person performs each gesture 2-3 times.
KTH Recognition of human actions
- Contains six types of human actions (walking, jogging, running, boxing, hand waving and hand clapping) performed several times by 25 subjects in four different scenarios: outdoors, outdoors with scale variation, outdoors with different clothes and indoors.
Hollywood-2 Human Actions and Scenes dataset
- Hollywood-2 datset contains 12 classes of human actions and 10 classes of scenes distributed over 3669 video clips and approximately 20.1 hours of video in total.
- This dataset contains 5 different collective activities : crossing, walking, waiting, talking, and queueing and 44 short video sequences some of which were recorded by consumer hand-held digital camera with varying view point.
- The Olympic Sports Dataset contains YouTube videos of athletes practicing different sports.
- Surveillance-type videos
- The dataset is designed to be realistic, natural and challenging for video surveillance domains in terms of its resolution, background clutter, diversity in scenes, and human activity/event categories than existing action recognition datasets.
HMDB: A Large Video Database for Human Motion Recognition
- Collected from various sources, mostly from movies, and a small proportion from public databases, YouTube and Google videos. The dataset contains 6849 clips divided into 51 action categories, each containing a minimum of 101 clips.
- Dataset of 9,532 images of humans performing 40 different actions, annotated with bounding-boxes.
- Fully annotated dataset of RGB-D video data and data from accelerometers attached to kitchen objects capturing 25 people preparing two mixed salads each (4.5h of annotated data). Annotated activities correspond to steps in the recipe and include phase (pre-/ core-/ post) and the ingredient acted upon.
Penn Sports Action The dataset contains 2326 video sequences of 15 different sport actions and human body joint annotations for all sequences.
- A Kinect dataset for hand detection in naturalistic driving settings as well as a challenging 19 dynamic hand gesture recognition dataset for human machine interfaces.
- Observations of several subjects setting a table in different ways. Contains videos, motion capture data, RFID tag readings,…
- This dataset comprises of 10 actions related to breakfast preparation, performed by 52 different individuals in 18 different kitchens.
MPII Cooking Activities Dataset
- Cooking Activities dataset.
- This dataset consists of seven meal-preparation activities, each performed by 10 subjects. Subjects perform the activities based on the given cooking recipes.
UTD-MHAD: multimodal human action recogniton dataset
- The dataset consists of four temporally synchronized data modalities. These modalities include RGB videos, depth videos, skeleton positions, and inertial signals (3-axis acceleration and 3-axis angular velocity) from a Kinect RGB-D camera and a wearable inertial sensor for a comprehensive set of 27 human actions.
Action Recognition Datasets: “NTU RGB+D” Dataset and “NTU RGB+D 120” Dataset
- “NTU RGB+D” contains 60 action classes and 56,880 video samples. “NTU RGB+D 120” extends “NTU RGB+D” by adding another 60 classes and another 57,600 video samples, i.e., “NTU RGB+D 120” has 120 classes and 114,480 samples in total. These two datasets both contain RGB videos, depth map sequences, 3D skeletal data, and infrared (IR) videos for each sample. Each dataset is captured by three Kinect V2 cameras concurrently.