Small object detection: Difference between revisions

Content deleted Content added
Changing short description from "small object detection" to "Detecting small objects in digital images"
m Cleanup, curly → straight quotes, punctuation before citation
Line 1:
{{Short description|Detecting small objects in digital images}}
 
'''Small object detection''' is a particular case of [[object detection]] where various techniques are employed to detect small objects in digital images and videos. “Small"Small objects”objects" refer toare objects having a small pixel footprint in the input image. In areas such as [[Aerial photography|aerial imagery]], [[State of the art|state-of-the-art]] object detection techniques unperformed because of small objects.
 
== Uses ==
[[File:Track_Results.webm|thumb|An example of object tracking]]
ItSmall object detection has applications in various fields such as Video [[surveillance]] (Traffic video Surveillance,<ref>{{Cite journal |last=Saran K B |last2=Sreelekha G |title=Traffic video surveillance: Vehicle detection and classification |url=http://ieeexplore.ieee.org/document/7432948/ |journal=2015 International Conference on Control Communication & Computing India (ICCC) |___location=Trivandrum, Kerala, India |publisher=IEEE |pages=516–521 |doi=10.1109/ICCC.2015.7432948 |isbn=978-1-4673-7349-4}}</ref><ref>{{Cite journal |last=Nemade |first=Bhushan |date=2016-01-01 |title=Automatic Traffic Surveillance Using Video Tracking |url=https://www.sciencedirect.com/science/article/pii/S1877050916001836 |journal=Procedia Computer Science |series=Proceedings of International Conference on Communication, Computing and Virtualization (ICCCV) 2016 |language=en |volume=79 |pages=402–409 |doi=10.1016/j.procs.2016.03.052 |issn=1877-0509}}</ref>, [[Content-based image retrieval|Small object retrieval]],<ref>{{Cite journal |last=Guo |first=Haiyun |last2=Wang |first2=Jinqiao |last3=Xu |first3=Min |last4=Zha |first4=Zheng-Jun |last5=Lu |first5=Hanqing |date=2015-10-13 |title=Learning Multi-view Deep Features for Small Object Retrieval in Surveillance Scenarios |url=https://doi.org/10.1145/2733373.2806349 |journal=Proceedings of the 23rd ACM international conference on Multimedia |series=MM '15 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=859–862 |doi=10.1145/2733373.2806349 |isbn=978-1-4503-3459-4}}</ref>, [[Anomaly detection]],<ref>{{Cite journal |last=Ingle |first=Palash Yuvraj |last2=Kim |first2=Young-Gab |date=2022-05-19 |title=Real-Time Abnormal Object Detection for Video Surveillance in Smart Cities |url=https://www.mdpi.com/1424-8220/22/10/3862 |journal=Sensors |language=en |volume=22 |issue=10 |pages=3862 |doi=10.3390/s22103862 |issn=1424-8220 |pmc=9143895 |pmid=35632270}}</ref>, [[Maritime surveillance]], [[Aerial survey|Drone surveying]], [[Traffic flow|Traffic flow analysis]],<ref>{{Cite journal |last=Tsuboi |first=Tsutomu |last2=Yoshikawa |first2=Noriaki |date=2020-03-01 |title=Traffic flow analysis in Ahmedabad (India) |url=https://www.sciencedirect.com/science/article/pii/S2213624X18301974 |journal=Case Studies on Transport Policy |language=en |volume=8 |issue=1 |pages=215–228 |doi=10.1016/j.cstp.2019.06.001 |issn=2213-624X}}</ref>, and [[Video tracking|Object tracking]].
 
== Problems with small objects ==
 
* Modern-day object detection algorithms such as You Only Look Once(YOLO)<ref>{{Cite journal |last=Redmon |first=Joseph |last2=Divvala |first2=Santosh |last3=Girshick |first3=Ross |last4=Farhadi |first4=Ali |date=2016-05-09 |title=You Only Look Once: Unified, Real-Time Object Detection |url=http://arxiv.org/abs/1506.02640 |journal=arXiv:1506.02640 [cs] |doi=10.48550/arxiv.1506.02640}}</ref><ref>{{Cite journal |last=Redmon |first=Joseph |last2=Farhadi |first2=Ali |date=2016-12-25 |title=YOLO9000: Better, Faster, Stronger |url=http://arxiv.org/abs/1612.08242 |journal=arXiv:1612.08242 [cs] |doi=10.48550/arxiv.1612.08242}}</ref><ref>{{Cite journal |last=Redmon |first=Joseph |last2=Farhadi |first2=Ali |date=2018-04-08 |title=YOLOv3: An Incremental Improvement |url=http://arxiv.org/abs/1804.02767 |journal=arXiv:1804.02767 [cs] |doi=10.48550/arxiv.1804.02767}}</ref><ref>{{Cite journal |last=Bochkovskiy |first=Alexey |last2=Wang |first2=Chien-Yao |last3=Liao |first3=Hong-Yuan Mark |date=2020-04-22 |title=YOLOv4: Optimal Speed and Accuracy of Object Detection |url=http://arxiv.org/abs/2004.10934 |journal=arXiv:2004.10934 [cs, eess] |doi=10.48550/arxiv.2004.10934}}</ref><ref>{{Cite journal |last=Wang |first=Chien-Yao |last2=Bochkovskiy |first2=Alexey |last3=Liao |first3=Hong-Yuan Mark |date=2021-02-21 |title=Scaled-YOLOv4: Scaling Cross Stage Partial Network |url=http://arxiv.org/abs/2011.08036 |journal=arXiv:2011.08036 [cs] |doi=10.48550/arxiv.2011.08036}}</ref><ref>{{Cite journal |last=Li |first=Chuyi |last2=Li |first2=Lulu |last3=Jiang |first3=Hongliang |last4=Weng |first4=Kaiheng |last5=Geng |first5=Yifei |last6=Li |first6=Liang |last7=Ke |first7=Zaidan |last8=Li |first8=Qingyuan |last9=Cheng |first9=Meng |last10=Nie |first10=Weiqiang |last11=Li |first11=Yiduo |last12=Zhang |first12=Bo |last13=Liang |first13=Yufei |last14=Zhou |first14=Linyuan |last15=Xu |first15=Xiaoming |date=2022-09-07 |title=YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications |url=http://arxiv.org/abs/2209.02976 |journal=arXiv:2209.02976 [cs] |doi=10.48550/arxiv.2209.02976}}</ref><ref>{{Cite journal |last=Wang |first=Chien-Yao |last2=Bochkovskiy |first2=Alexey |last3=Liao |first3=Hong-Yuan Mark |date=2022-07-06 |title=YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors |url=http://arxiv.org/abs/2207.02696 |journal=arXiv:2207.02696 [cs] |doi=10.48550/arxiv.2207.02696}}</ref> heavily uses convolution layers to learn [[Feature (computer vision)|features]]. As an object passes through convolution layers, its size gets reduced. Therefore, the small object disappears after several layers and becomes undetectable.
* Sometimes, the shadow of an object is detected as a part of object itself.<ref>{{Cite journal |last=Zhang |first=Mingrui |last2=Zhao |first2=Wenbing |last3=Li |first3=Xiying |last4=Wang |first4=Dan |date=2020-12-11 |title=Shadow Detection Of Moving Objects In Traffic Monitoring Video |url=https://ieeexplore.ieee.org/document/9338958/ |journal=2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC) |___location=Chongqing, China |publisher=IEEE |pages=1983–1987 |doi=10.1109/ITAIC49862.2020.9338958 |isbn=978-1-7281-5244-8}}</ref> So, the placement of the bounding box tends to centre around a shadow rather than an object. In the case of vehicle detection, [[pedestrian]] and two-wheeler detection suffer because of this.
* At present, [[Unmanned aerial vehicle|drones]] are very widely used in aerial imagery.<ref>{{Cite journal |title=Interactive workshop "How drones are changing the world we live in" |url=http://ieeexplore.ieee.org/document/7486437/ |journal=2016 Integrated Communications Navigation and Surveillance (ICNS) |___location=Herndon, VA |publisher=IEEE |pages=1–17 |doi=10.1109/ICNSURV.2016.7486437 |isbn=978-1-5090-2149-9}}</ref> They are equipped with some sorthardware of hardware([[Sensor|sensors]]) and software ([[Algorithm|algorithms]]) that helps tohelp maintain a particular stable position during their flight. In windy conditions, the drone automatically makes fine moves to maintain its position and that changes the view near the boundary. It may be possible that some new objects appear near the image boundary. Overall, these affect classification, detection, and eventually tracking accuracy.
 
[[File:Disp_shadow.jpg|thumb|Shadow and drone movement effect]]
 
== Methods ==
Various methods<ref>{{Cite web |title=An Evaluation of Deep Learning Methods for Small Object Detection |url=https://www.hindawi.com/journals/jece/2020/3189691/ |access-date=2022-09-14 |website=www.hindawi.com |language=en |doi=10.1155/2020/3189691}}</ref> are available to detect small objects, which fallsfall under three categories:
[[File:Yolov5.jpg|thumb|YOLOv5 detection result]]
[[File:Y5_sahi.jpg|thumb|YOLOv5 and SAHI interface]]
Line 25:
 
==== Choosing a data set that has small objects ====
The [[machine learning]] model's output depends on “How"How well it is trained."<ref name=":0">{{Cite journal |last=Gong |first=Zhiqiang |last2=Zhong |first2=Ping |last3=Hu |first3=Weidong |date=2019 |title=Diversity in Machine Learning |url=https://ieeexplore.ieee.org/document/8717641/ |journal=IEEE Access |volume=7 |pages=64323–64350 |doi=10.1109/ACCESS.2019.2917620 |issn=2169-3536}}</ref> So, the data set must include small objects to detect such objects. Also, modern-day detectors, such as YOLO, rely on anchors.<ref>{{Cite web |last=Christiansen |first=Anders |date=2022-06-10 |title=Anchor Boxes — The key to quality object detection |url=https://towardsdatascience.com/anchor-boxes-the-key-to-quality-object-detection-ddf9d612d4f9 |access-date=2022-09-14 |website=Medium |language=en}}</ref> Latest versions of YOLO (starting from YOLOv5<ref>{{Citation |last=Jocher |first=Glenn |title=ultralytics/yolov5: v6.2 - YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai integrations |date=2022-08-17 |url=https://zenodo.org/record/7002879 |publisher=Zenodo |doi=10.5281/zenodo.3908559 |access-date=2022-09-14 |last2=Chaurasia |first2=Ayush |last3=Stoken |first3=Alex |last4=Borovec |first4=Jirka |last5=NanoCode012 |last6=Kwon |first6=Yonghye |last7=TaoXie |last8=Michael |first8=Kalen |last9=Fang |first9=Jiacong}}</ref>) uses an auto-anchor algorithm to find good anchors based on the nature of object sizes in the data set. Therefore, it is mandatory to have smaller objects in the data set.
 
==== Generating more data via augmentation, if required ====
[[Deep learning]] models have billions of neurons that settle down to some weights after training. Therefore, it requires a good amount of quantitative and qualitative data for better training.<ref>{{Cite web |title=The Size and Quality of a Data Set {{!}} Machine Learning |url=https://developers.google.com/machine-learning/data-prep/construct/collect/data-size-quality |access-date=2022-09-14 |website=Google Developers |language=en}}</ref> [[Data augmentation]] is useful technique to generate more diverse data<ref name=":0" /> from an existing data set.
 
==== Increasing image capture resolution and model’s input resolution ====
Line 34:
 
==== Auto learning anchors ====
Selecting anchor size plays a vital role in small object detection.<ref>{{Cite journal |last=Zhong |first=Yuanyi |last2=Wang |first2=Jianfeng |last3=Peng |first3=Jian |last4=Zhang |first4=Lei |date=2020-01-26 |title=Anchor Box Optimization for Object Detection |url=http://arxiv.org/abs/1812.00469 |journal=arXiv:1812.00469 [cs] |doi=10.48550/arxiv.1812.00469}}</ref> Instead of hand picking it, use algorithms that identify it for you based on the data set. YOLOv5 uses a [[K-means clustering|K-means algorithm]] to define anchor size.
 
==== Tiling approach during training and inference ====
State-of-the-art object detectors allow only the fixed size of image and changeschange the input image size according to it. This change may deform the small objects in the image. The tiling approach<ref>{{Cite journal |last=Unel |first=F. Ozge |last2=Ozkalayci |first2=Burak O. |last3=Cigla |first3=Cevahir |title=The Power of Tiling for Small Object Detection |url=https://ieeexplore.ieee.org/document/9025422/ |journal=2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) |___location=Long Beach, CA, USA |publisher=IEEE |pages=582–591 |doi=10.1109/CVPRW.2019.00084 |isbn=978-1-7281-2506-0}}</ref> helps when youran image has a high resolution than the model's fixed input size; instead of scaling it down, the image is broken down into tiles and then used in training. SameThe same approach is used during inference as well.
 
==== Feature Pyramid Network (FPN) ====
Use a feature [[Pyramid (image processing)|pyramid]] network<ref>{{Cite journal |last=Lin |first=Tsung-Yi |last2=Dollár |first2=Piotr |last3=Girshick |first3=Ross |last4=He |first4=Kaiming |last5=Hariharan |first5=Bharath |last6=Belongie |first6=Serge |date=2017-04-19 |title=Feature Pyramid Networks for Object Detection |url=http://arxiv.org/abs/1612.03144 |journal=arXiv:1612.03144 [cs] |doi=10.48550/arxiv.1612.03144}}</ref> to learn features at a multi-scale.: Ee.g., Twin Feature Pyramid Networks (TFPN),<ref>{{Cite journal |last=Liang |first=Yi |last2=Changjian |first2=Wang |last3=Fangzhao |first3=Li |last4=Yuxing |first4=Peng |last5=Qin |first5=Lv |last6=Yuan |first6=Yuan |last7=Zhen |first7=Huang |title=TFPN: Twin Feature Pyramid Networks for Object Detection |url=https://ieeexplore.ieee.org/document/8995365/ |journal=2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI) |___location=Portland, OR, USA |publisher=IEEE |pages=1702–1707 |doi=10.1109/ICTAI.2019.00251 |isbn=978-1-7281-3798-8}}</ref>, Extended Feature Pyramid Network (EFPN).<ref>{{Cite journal |last=Deng |first=Chunfang |last2=Wang |first2=Mengmeng |last3=Liu |first3=Liang |last4=Liu |first4=Yong |date=2020-04-09 |title=Extended Feature Pyramid Network for Small Object Detection |url=http://arxiv.org/abs/2003.07021 |journal=arXiv:2003.07021 [cs] |doi=10.48550/arxiv.2003.07021}}</ref>. FPN helps to sustain features of small objects against convolution layers.
 
=== Add-on techniques ===
Instead of modifying existing methods, some add-on techniques are there, which can be directly placed on top of existing approaches to detect smaller objects. One such technique is Slicing Aided Hyper Inference(SAHI).<ref>{{Cite journal |last=Akyon |first=Fatih Cagatay |last2=Altinuc |first2=Sinan Onur |last3=Temizel |first3=Alptekin |date=2022-07-12 |title=Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection |url=http://arxiv.org/abs/2202.06934 |journal=arXiv:2202.06934 [cs] |doi=10.48550/arxiv.2202.06934}}</ref>. Here, theThe image is sliced into different-sized multiple overlapping patches. [[Hyperparameter (machine learning)|Hyper-parameters]] define their dimensions. Then, patches are resized, while maintaining the aspect ratio during fine-tuning. These patches are then provided for training the model.
 
=== Well-Optimised techniques for small object detection ===
Various deep learning techniques are available that focus on such object detection problems.: Ee.g., Feature-Fused SSD,<ref>{{Cite journal |last=Cao |first=Guimei |last2=Xie |first2=Xuemei |last3=Yang |first3=Wenzhe |last4=Liao |first4=Quan |last5=Shi |first5=Guangming |last6=Wu |first6=Jinjian |date=2018-04-10 |title=Feature-fused SSD: fast detection for small objects |url=https://www.spiedigitallibrary.org/conference-proceedings-of-spie/10615/106151E/Feature-fused-SSD-fast-detection-for-small-objects/10.1117/12.2304811.full |journal=Ninth International Conference on Graphic and Image Processing (ICGIP 2017) |publisher=SPIE |volume=10615 |pages=381–388 |doi=10.1117/12.2304811}}</ref>, YOLO-Z.<ref>{{Cite journal |last=Benjumea |first=Aduen |last2=Teeti |first2=Izzeddin |last3=Cuzzolin |first3=Fabio |last4=Bradley |first4=Andrew |date=2021-12-23 |title=YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles |url=http://arxiv.org/abs/2112.11798 |journal=arXiv:2112.11798 [cs] |doi=10.48550/arxiv.2112.11798}}</ref>. Such methods work on “How"How to sustain features of small objects while they pass through convolution networks."
 
== Other applications ==