Vision Datasets

👀

1.Recognition/Classification
2.Localization
- General
3.Detection
4.Segmentation
- Segmentation(Image and Videos)
- Samentic Labeling
5.Identification
6.Action Classification
- Action
7.Pose Estimation
- Pose Estimation
08.Visual Relationship Detection
- Visual Relationship Detection
9.Ocr
- Ocr
10.2D code reading
- 2D Code Reading
11.Tracking
- Tracking
12.Optical Flow
- Optcal Flow
13.Egomotion
- Egomotion
14.Reconstruction or Inpainting
15.Stereo Vision
- Stereo Vision
16.Intrinsic Image Decomposition
- Intrinsic
17.Visual Survillance
- Visual Survillance
18.Image Captioning
- Image Captioning
19.Foreground Background
- Foreground Background
20.Image Stitching
- Image Stitching
21.Contain Based Image Retrival
- General
22.Enhancement,Restoration,merging
23.Kinematic Motion Detection
- General
- Performance Prediction
24.Multi level classification
- General
25.Audio-Visual Learning
- General
26.Text-to-Image and Image-to-Text
- Image to Text
- Text to Image
27.Handwritten Mathematical Expression
- Handwritten mathematical expression
28.Depth Estimation
- Depth Estimation
29.Image Generation
- Image Generation
30.Anti-Spoofing and Face Maipulation
- Face Manipulation
- Face Anti Spoofing
31.Document Analysis
- Forms
- Invoices
32.Visual Question Answering
- Visual Question Answering
33.Neural Radiance Fields and Rendering
- Neural Radiance Fields
- Image Based Rendering
34.Energy,Calorie Estimation
- Calorie
- Energy
35.Navigation
- Photo-based walkthroughs
36.Image2image
- Image2image

Change Log and References

26.Text-to-Image and Image-to-Text

Image to Text

Image to text is like image captioning

Image to text or text to image

LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL DATASETS

dataset of 5,85 billion