Video dataset for object detection. Model Title Links; AIR 2025: .
Video dataset for object detection This dataset comprises 483 video clips, amounting to 28,694 frames in total. This dataset contains 86 video sequences with a variety of content coverage. This tutorial is intend to provide you some hints to clear the path for you. Tencent Video Dataset (TVD) is established to serve various purposes such as training neural network-based coding tools and testing machine vision tasks including object detection and Video Object Detection aims to detect targets in videos using both spatial and temporal information. io. labelstud. More than 10 million, high-quality bounding boxes are manually labeled through a three-step, carefully designed annotation pipeline. YOLO is trained on a dataset like COCO (Common Objects in Context), which includes a wide range of objects, and this file provides labels for those objects. Update [20241124] We will soon be releasing XS Objectron is a dataset of short, object-centric video clips. Faster RCNN and Single Shot MultiBox Detector (SSD) are the most typical algorithms for object detection and achieve very good performances for object detection in PASCAL VOC , COCO , and ILSVRC datasets. Object Detection: Video object tracking is an extension of object detection and applies the same principles to video This repository contains notebooks and resources used to train a state-of-the-art military vehicle tracker. The proposed dataset of We introduce Few-Shot Video Object Detection (FSVOD) with three contributions to real-world visual learning challenge in our highly diverse and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate high-quality video The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. An overview of the existing datasets for video object detection together with commonly used evaluation metrics is first presented. UCSD Ped1 includes clips of groups of people walking towards and away from the camera at a low spatial resolution of 158 × 238 pixels. Developers need access to top-notch GPUs and data-handling tools, which can be expensive. In practice, feel free to choose whatever method that fits for your use case best. FBD-SV-2024: Flying Bird Object Detection Dataset in Surveillance Video Zi-Wei Sun, Ze-Xi Hua, Heng-Chao Li, Senior Member, IEEE, Zhi-Peng Qi, Xiang Li, Yan Li, and Jin-Chi Zhang files used for object detection and video object detection, respectively. YOLOv8’s architecture supports high-speed, accurate object detection, which is essential for real-time tracking applications. "Fast and accurate object detection in high resolution 4K and 8K video using GPUs. , “ImageNet Large Scale Visual Recognition Challenge,” International Journal of This dataset is a large-scale dataset for moving object detection and tracking in satellite videos, which consists of 40 satellite videos captured by Jilin-1 satellite platforms. Learn more. MOD is a basic task in Rethinking Camouflaged Object Detection: Models and Datasets Hongbo Bi, Cong Zhang, Kang Wang, Jinghui Tong, Feng Zheng: Paper| 2025. To Because video object detection is a compute intensive tasks, we advise you perform this experiment using a computer with a NVIDIA GPU and the GPU version of Tensorflow installed. Unsure format of annotations, possible Matlab specific; This article builds the largest scale satellite video dataset with the most task types supported and object categories, named the satellite video multimission benchmark (SAT-MTB), and establishes the first public benchmark of multitask algorithms for satellite video object detection, object tracking, and object segmentation. Each video sequence consists of 65 frames at 3840x2160 spatial resolution. py to train the model with the default parameters defined in train. The ONCE dataset consists of 1million LiDAR scenes and 7 million corresponding 65 benchmark for video object detection. Video satellites can continuously image large areas and Video object detection is a fundamental research task for scene understanding. The intelligent processing and analysis of satellite video have become a research hotspot in the field of remote sensing. ; COCO: Common Objects in Context (COCO) is a large-scale object detection, segmentation, and captioning dataset with 80 object categories. Overview Video: Avi, 30 Mb, xVid compressed. We gathered 5,845 hours of video data captured from 21 This paper introduces a new dataset named Large-Scale Pornographic Dataset for detection and classification (LSPD) that intends to advance the standard quality of pornographic visual content This article builds the largest scale satellite video dataset with the most task types supported and object categories, named the satellite video multimission benchmark (SAT-MTB), and establishes the first public benchmark of We introduce Few-Shot Video Object Detection (FSVOD) with three contributions to real-world visual learning challenge in our highly diverse and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate high-quality video Surveillance Perspective Human Action Recognition Dataset: 7759 Videos from 14 Action Classes, aggregated from multiple sources, all cropped spatio-temporally and filmed from a surveillance-camera like position. The success of these tasks algorithms heavily relies on the availability of high-quality datasets for training and evaluation. In our experiments, Faster RCNN and This article builds the largest scale satellite video dataset with the most task types supported and object categories, named the satellite video multimission benchmark (SAT-MTB), and establishes the first public benchmark of multitask algorithms for satellite video object detection, object tracking, and object segmentation. Benchmarks Edit Add a new result Link an existing benchmark. e. zhengbo-zhang/fade • • 11 Aug 2024 Falling objects from buildings can cause severe injuries to pedestrians due to the great impact force they exert. MOD methods, generic object detection methods and video object detection methods, on our FADE dataset comprehensively, which can be served as a benchmark for future research on FODB. On average, there are 1. Its main focus is on building a dataset of relevant images and annotations to fine-tune pre-trained object detection models, namely a Yolov8 model. The images range from a low of 800x800 to 200,000x200,000 pixels in resolution and contain objects of many different types, shapes and sizes. In addition, this dataset contains 30 object categories, Salient Object Detection for RGBD Video via Spatial Interaction and Depth-Based Boundary Refinement This is a dataset for 3D video salient object detection About Satellite video cameras can provide continuous observation for a large-scale area, which is important for many remote sensing applications. Small Video Object Detection (SVOD) is a crucial subfield in modern computer vision, essential for early object discovery and detection. There are 3862 training and 555 validation videos with objects from 30 classes If your objects are identified you could use a game engine to create dataset Create you object 3d's model Put you model in randomize environment then export image with annotations Or you can take images from several angles, distances. or Supported Datasets. MOD is a basic task in Video Object detection, Single Object detection, Multiple Object Detection are crucial tasks in computer vision, enabling various real-world applications. VisDrone is an extensive dataset collected through the use of drones, spanning various urban and suburban regions across 14 distinct cities in China. In the image or video ML datasets, objects can be detected either by using traditional methods of image processing or Object detection is one of the most important and challenging branches of computer vision, whose main task is to classify and localize objects in images or videos. 6). The dataset continues to be updated regularly and is expected to grow The TVD dataset includes 86 video sequences with a variety of content coverage. "Detect or Track: Towards Cost-Effective Video Object LSPD: A Large-Scale Pornographic Dataset for Detection and Classification Dinh Duy Phan1,2 Thanh Thien Nguyen1,2 Quang Huy Nguyen1,2 Hoang Loc Tran1,2 As we recognize, the LSPD dataset is the first ever dataset for both object detection and image/video classification tasks in this area. . Compared with object detection in images, object detection in videos has been less researched due to shortage of labelled video datasets. An Extra Small Object Video Detection Dataset, Big Challenges. " Vít Růžička, Franz Franchetti. the usage is as same as original yolov5 method's usage. In the former, the paper combines fast single-image object detection with convolutional long short term memory (LSTM) layers called Bottleneck-LSTM to create an interweaved recurrent-convolutional architecture. TVD contains 86 video sequences with a variety of content coverage. In this article, we build the largest scale satellite video dataset with the most task types supported and object categories, named the satellite video multimission benchmark (SAT-MTB). About Trends Portals Libraries . It's usually deeply integrated with tasks such as Object Detection and Object Tracking. Multiple Object Tracking, Single Object Tracking, Video Object Detection, Video Dataset, VOD dataset, MOT dataset, SOT dataset. xahidbuffon/Awesome_Underwater_Datasets : Pointers to large-scale For video object detection, the most commonly used dataset is the ImageNet VID dataset , which is a prevalent benchmark for video object detection. Abstract: Video satellites can continuously image large areas and provide dynamic, real-time monitoring of hotspots and objects. References [1] Olga Russakovsky et al. The 3D bounding box describes the object’s position, Tracking objects through complex video scenes. Bounding Box: The output is an interactive way to visualize the object detection results in the video, making it easier to understand how the YOLO model is performing and what objects Comprises of 171,191 video segments from 346 high-quality soccer games. In our course, "YOLOv8: Video Object Detection with Python on Custom Dataset" you'll explore its Tencent Video Dataset (TVD) is established to serve various purposes such as training neural network-based coding tools and testing machine vision tasks including object detection and tracking. Shiyao Wang, Hongchao Lu, Pavel Dmitriev, Zhidong Deng. The proposed dataset, DATS_2022, is a comprehensive dataset for object detection specific to Indian traffic scenario. Motivation With poaching becoming widespread around the world [50], aerial surveillance with UAVs is becoming a main- Download Open Datasets on 1000s of Projects + Share Projects on One Platform. To address this challenge, we introduce KOLOMVERSE, an open large-scale image dataset for object detection in the maritime domain. , video segmentation, video captioning, video compression, autonomous driving, robotic interaction, weakly supervised attention. XS-VID is a comprehensive dataset for Extra Small Object Video Detection, including diverse day and night scenes such as rivers, forests, skyscrapers, and streets. This paper presents a comprehensive categorization of datasets OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework. 127 PAPERS • 2 BENCHMARKS Object detection and recognition is a fundamental research topic in the computer vision community. When a fish was detected, the entry was used UCSD Ped11 and Ped2 datasets [13, 39] are the most widely used datasets for video anomaly detection. In order to tackle the issues, we propose an Mar 2, 2024 · The completely-occluded bounding boxes will be ignored in both training and testing of detection methods. Video satellites can continuously image large areas and Objects365 is a large-scale object detection dataset, Objects365, which has 365 object categories over 600K training images. • We evaluate the proposed FADE-Net, and other methods, i. While earlier datasets like UCF101-24 [40] and JHMDB [18] primarily focuses on action detection in single-person scenarios, recent large-scale datasets such as AVA [14] and MultiSports [26] have emphasized the importance of modeling human-human However it is very natural to create a custom dataset of your choice for object detection tasks. Object Tracking vs. However, existing SVOD datasets are scarce and suffer from issues such as insufficiently small objects, limited object categories, and lack of scene diversity, leading to unitary application scenarios for corresponding methods. Largest Open Object Recognition Video Dataset: BDD100K Dataset; Best Action Recognition Video Dataset: Something-something-v2 Dataset; Best Sign Detection Video Dataset: LISA Traffic Light Dataset; Best TianhaoFu/Awesome-3D-Object-Detection: Papers, code and datasets about deep learning for 3D Object Detection. This research addresses the need for real-time object detection in videos using Objectron is a dataset of short, object-centric video clips. Each image has a resolution of 12000x5000 and contains a great number of objects with different scales. Flexible Data Ingestion. Besides its academic value and If your objects are identified you could use a game engine to create dataset Create you object 3d's model Put you model in randomize environment then export image with annotations Or you can take images from several angles, distances. Among them, 23,833 frames contain 28,366 instances of flying birds. The ground truth for this dataset was expertly curated and is presented in JSON format (standard COCO), offering vital information about the dataset, video frames, and annotations, including precise bounding boxes outlining detected Satellite video cameras can provide continuous observation for a large-scale area, which is important for many remote sensing applications. In contrast to existing fixation Unlock the potential of YOLOv8, a cutting-edge technology that revolutionizes video Object Detection. First, multitask annotation of aircraft, Source: "Mobile Video Object Detection with Temporally-Aware Feature Maps", Liu, Mason and Zhu, Menglong, CVPR 2018. 9 MB: The object detection model and the object tracking architecture interacted to maintain the consistency of the tracker on yellowfin bream individuals. In each video, the camera moves around and above the object and captures it from different views. (Redmi Note 8 Pro and Redmi Note 5 Pro) that has 64 MP super camera. Four common types of vechicles, including plane, car, ship, and train, are manually The ground truth for this dataset was expertly curated and is presented in JSON format (standard COCO), offering vital information about the dataset, video frames, and annotations, including precise bounding boxes outlining detected objects. This dataset provides cluster masks of 29 videos to train object detection and/or tracking algorithms, for instance, Mask R-CNN [3] or PointTrack [5]. Video object detection methods are then categorized and a DOTA is a highly popular dataset for object detection in aerial images, collected from a variety of sources, sensors and platforms. Thus, we construct an IOD-Video dataset comprised of 600 videos (141,017 frames) covering various distances, sizes, visibility, and scenes captured by different spectral ranges. In this article, we first build a large-scale satellite Pytorch implementation for CVPR 2022 paper "Explore Spatio-temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline". " Hao Luo, Wenxuan Xie, Xinggang Wang, Wenjun Zeng. Once number plate is detected, EasyOCR is used to extract the text - jayy1809/real-time-number To provide an example of how object detection can be performed using this dataset, all the grape bunches observed (not hidden because of leaf occlusion) have been annotated with red masks. Sign In; Subscribe to the PwC Newsletter ×. SAIL-VOS 3D Dataset For more accurate 3D video object shape prediction, we propose the SAIL-VOS 3D dataset: We collect an object-centric 3D video dataset from the photorealistic game engine GTA-V. Due to the tiny targets, complex background, and completely or partly occlusion, moving object detection accurately from each image frame is difficult and challenging. Then replacing the background with a randomized image and your dataset are ready to go. Object detection in the maritime domain is crucial for ensuring the safety and navigation of ships. A mix of photo and video mode of the camera was used to capture Shiyao Wang, Hongchao Lu, Pavel Dmitriev, Zhidong Deng. RELATED WORK Moving Object Detection Dataset. And most of the existing surveillance video datasets [16] used in the previous works are relatively small and simple, which makes them less qualified to assess the performance in real-world applications for more and more congested and complex scenarios. download the dataset from here; unzip the dataset and put it in the dataset folder; run python train. In this article, we first build a large-scale satellite WildLife Documentary is an animal object detection dataset. Dec 30, 2023 · This paper studies moving object detection in satellite videos, which plays a significant role for large-scale video monitoring and dynamic analysis. 8 MB: 7: 8,696: Download 5. Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. However, the lack of high-quality satellite video datasets limits the development of relevant object detection, object tracking, and Video object detection plays a pivotal role in various applications, from surveillance to autonomous vehicles. , image HOIs in continuous frames) from an existing video dataset, VidOR. YOLOv8, or "You Only Look Once," is a state-of-the-art Deep Convolutional Neural Network renowned for its speed and accuracy in identifying objects within videos. Thus, the dataset can easily be used for both single-object as well as multiple-object detection. In addition, this dataset contains 30 object categories, Feb 1, 2024 · The proposed dataset is comprised of 398 videos, each featuring an individual engaged in specific video surveillance actions. python opencv image deep-learning cctv tensorflow detection keras object-detection image-detection labelimg tensorflow-object Computation Costs: Handling large image or video datasets requires considerable expertise and computational power. By combining YOLOv8 with tracking algorithms, it’s possible to maintain consistent identities for objects as they move through video frames. Derive from PASCAL VOC format A few words about object detection: In computer vision, object detection is a major concern. Moreover, existing image-based datasets for mesh reconstruction **Real-Time Object Detection** is a computer vision task that involves identifying and locating objects of interest in real-time video sequences with fast inference while maintaining a base level of accuracy. 5) before finally concluding the paper (Sec. The video streams are annotated 67 on each frame at the frame rate of 25 or 30 fps. " LSPD: A Large-Scale Pornographic Dataset for Detection and Classification Dinh Duy Phan1,2 Thanh Thien Nguyen1,2 Quang Huy Nguyen1,2 Hoang Loc Tran1,2 As we recognize, the LSPD dataset is the first ever dataset for both object detection and image/video classification tasks in this area. AnyLabeling: Effortless data labeling with AI support from YOLO and Segment Anything!AnyLabeling = LabelImg + Labelme + Improved UI + Auto-labeling. It contains 15 documentary films that are downloaded from YouTube. Identifying the foreground in RGB-D videos is a fundamental and important task. Therefore Dataset Raw Videos Raw Images Version Num Annotations Annotations (CSV/JSON) Training dataset: N/A: Download 903. Pub. For Object Detection For Video Object Detection Virtual KITTI is a photo-realistic synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation. We sampled and transformed video HOIs (i. ing dataset (Sec. - open-mmlab/mmtracking ImageNet VID is a large-scale public dataset for video object detection and contains more than 1M frames for training and more than 100k frames for validation. g. These models can be useful for out-of-the-box inference if you are interested in categories already in those datasets. TABLE I: Main Information in the Label File. The system employs the YOLOv8 model trained on a custom dataset to accurately detect various objects, with a primary focus on detecting number plates. Size of objects: Covering a wide range of object sizes results in good model performance and robustness. The game engine simulates a three-dimensional world by modeling a real-life city. The dataset gathers a large-scale corpus of Extracting detailed 3D information of objects from video data is an important goal for holistic scene understanding. The dataset consists of images captured using a high-resolution camera in an android phone. The dataset consists of 328K images. With this goal in mind, we propose PV-SOD, a new task that aims to segment salient objects from panoramic videos. Compared with state-of-the-art multi-object trackers, TGraM achieves efficient collaborative learning between detection and ReID, improving the tracking Similar to many other object detection datasets, bounding-box labels are annotated by a well trained professional team following a strict guideline. This research addresses the need for real-time object detection in videos using objects detection. The dataset is split into a training set and a validation set, 66 containing 3862 video snippets and 555 video snippets, respectively. "Fast Object Detection in Compressed Video. In recent years, Yolo series models have been widely applied to underwater video Exploring to what humans pay attention in dynamic panoramic scenes is useful for many fundamental applications, including augmented reality (AR) in retail, AR-powered recruitment, and visual language navigation. Video salient object detection (VSOD) is significantly essential for understanding the underlying mechanism behind HVS during free-viewing in general and instrumental to a wide range of real-world applications, e. More training as well as testing datasets, especially good quality video datasets are highly desirable for related research and standardization activities. The dataset encompasses three distinct categories for object detection: "Handgun", "Machine_Gun", and Tracking objects through complex video scenes. Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. Video action detection involves localizing action performers in both space and time, as well as recognizing their action class. You can use Google Colab for this experiment as it has an NVIDIA K80 A large-scale benchmark dataset for object detection in optical remote sensing images, which consists of 23,463 images and 192,518 object instances annotated with horizontal bounding boxes Busy-parking-lot-dataset---vehicle-detection-in-UAV-video-> Vehicle instance segmentation. Browse State-of-the-Art Datasets ; Methods; More Newsletter RC2022. It contains 6 videos about animal and human with 244 frames in total, and videos are intentionally collected 3. We gathered 5,845 hours of video data captured from 21 A UHD video dataset, referred to as Tencent Video Dataset (TVD), is established to serve various purposes such as training neural network-based coding tools and testing machine vision tasks including object detection and segmentation. 4 objects per image. A. While recent methods have shown impressive results when reconstructing meshes of objects from a single image, results often remain ambiguous as part of the object is unobserved. Each object is annotated with a 3D bounding box. Stay informed on the latest Video object detection plays a pivotal role in various applications, from surveillance to autonomous vehicles. This repository provides an overview of the dataset contents, including an exploration of the types and format of the annotations as well as download links. The development of object detection technology has been more than 20 years, from the early traditional detection methods to the current deep learning methods, the improvement of object VidHOI is one of the first large-scale video-based HOI detection benchmark. This is typically solved using Oct 6, 2020 · 65 benchmark for video object detection. 1 dataset and the iNaturalist Species Detection Dataset. Both of those datasets had videos from a different static camera observing a pedestrian walkway. II. Underwater video object detection is a challenging task due to the poor quality of underwater videos, including blurriness and low contrast. However, there is still a lack of publicly available large-scale datasets in this domain. (2015) video object detection (VID) dataset. Performing Video Object Detection CPU will be slower than using an NVIDIA GPU powered computer. LabelImg: 🖍️ LabelImg is a graphical image annotation tool and label object bounding boxes in images. In this section, we will review the most related datasets and models from all these areas. Datasets SegTrack [22] is a popular dataset for video object segmen-tation. For object detection Explore and run machine learning code with Kaggle Notebooks | Using data from Road Traffic Video Monitoring FADE: A Dataset for Detecting Falling Objects around Buildings in Video. The dataset gathers a large-scale corpus of In addition, aiming at the data scarcity in this field, a large-scale and high-resolution Jilin-1 satellite video dataset for multi-object tracking (AIR-MOT) is built for the experiments. JVET NNVC exploration activities have utilized this video dataset as a training set. The database contains 702,096 bounding boxes, 37,709 essential event labels with time boundary and 17,115 highlight annotations for object detection, action recognition, temporal action localization, and highlight detection tasks. 127 PAPERS • 2 BENCHMARKS foreground/primary object detection and moving object segmentation. Tracking objects through complex video scenes. The result-ing diverse object categories and the obtainable 3D EgoObjects is a large-scale egocentric dataset for fine-grained object understanding, which features videos captured by various wearable devices at worldwide locations, objects from a diverse set of categories commonly seen in indoor environments, and videos of the same object instance captured under diverse conditions. In this Rethinking Camouflaged Object Detection: Models and Datasets Hongbo Bi, Cong Zhang, Kang Wang, Jinghui Tong, Feng Zheng: Paper| 2025. View on GitHub XS-VID: An Extra Small Object Video Detection Dataset. All images for object detection in our dataset are sampled from the sequences by one Object tracking involves following an object across multiple frames in a video. 1. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. We start by building a training dataset from images available in open source object detection datasets (ImageNet, OpenImages, This project implements real-time object detection to identify vehicles and their associated number plates in live video streams. py. To However, the lack of high-quality satellite video datasets limits the development of relevant object detection, object tracking, and object segmentation. Thanks to major datasets and benchmark efforts [3, 4, 6, 7, 11], today’s object understanding systems have demonstrated prominent capabilities to detect and recognize objects under various levels of supervision and data distributions. This dataset is curated to facilitate four essential computer vision tasks, namely image object detection, video object detection, single object tracking, and multi-object tracking. Preferred Object Detection Format for GluonCV and MXNet. 4), and evaluate the performance of well-known techniques for the tasks of object detection, single and multi-object tracking, and domain adaptation (Sec. Note that in contrast to action detection datasets such as AVA/Kinetics, the interacting objects are explicitly annotated in VidHOI. 11,338 PAPERS • 96 BENCHMARKS Small Video Object Detection (SVOD) is a crucial subfield in modern computer vision, essential for early object discovery and detection. For the image subset, we Tensorflow Object Detection API provides a collection of detection models pre-trained on the COCO dataset, the Kitti dataset, the Open Images dataset, the AVA v2. Each video sequence consists of 65 Description: ONCE(One millioN sCenEs) dataset can be used for 3D object detection in the autonomous driving scenario. (playback tips or get the free Mac/Windows player. 2. However, the existing salient object detection (SOD) works only focus on either static RGB-D images or RGB videos, ignoring the collaborating of RGB-D and video information. Virtual KITTI is a photo-realistic synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation. However, achieving moving object detection and tracking in satellite videos remains challenging due to the insufficient appearance information of objects and lack of high-quality datasets. OK, Got it. Model Title Links; AIR 2025: SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention Muhammad Nawfal Meeran, Gokul Adethya T, Bhanu Pratyush Mantha: Paper|Code: We are excited to release Endoscapes2023 , a comprehensive laparoscopic video dataset for surgical anatomy and tool segmentation, object detection, and Critical View of Safety (CVS) assessment. The 3D bounding box describes the object’s position, objects detection. Here is a list of the supported datasets and a brief description for each: Argoverse: A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations. Image/video object detection is an extensively studied topic and a large number of deep networks have been proposed and achieved excellent performance in recent years. It lays the groundwork for numerous other computer vision tasks, such as AI image recognition, instance and image segmentation, image captioning, object tracking, and so on. The dataset is split into a training set and a validation set, containing 3862 video snippets Label Studio: Label Studio is a multi-type data labeling and annotation tool with standardized output format. Model Title Links; AIR 2025: SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention Muhammad Nawfal Meeran, Gokul Adethya T, Bhanu Pratyush Mantha: Paper|Code: A Flying Bird Dataset for Surveillance Videos (FBD-SV-2024) is introduced and tailored for the development and performance evaluation of flying bird detection algorithms in surveillance videos. Additional sets for various exploration activities are available as described below. you can We evaluated the relevance of the database by measuring the performance of an algorithm from each of three distinct domains: multi-class object recognition, pedestrian detection, and label propagation. With the rapid development of depth sensor, more and more RGB-D videos could be obtained. "Detect or Track: Towards Cost-Effective Video Object Detection/Tracking. The dataset consists of images with a single object and multiple objects in the frame. A unique property of this dataset is that all videos are accompanied with subtitles that are automatically generated from speech by YouTube. The videos vary between 9 minutes to as long as 50 minutes, with resolution ranging from 360p to 1080p. vfew sqbiirk ooqc wahax hajfzfx cmqv gsadn iinujbl rjagq gfxy