Small object detection: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: title, template type. Add: s2cid, isbn, arxiv, doi, pages, chapter. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by SemperIocundus | #UCB_webform 888/2500
Citation bot (talk | contribs)
Add: bibcode, pages, volume. Removed URL that duplicated identifier. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 480/967
 
(12 intermediate revisions by 7 users not shown)
Line 5:
== Uses ==
[[File:Track_Results.webm|thumb|An example of object tracking]]
Small object detection has applications in various fields such as Video [[surveillance]] (Traffic video Surveillance,<ref>{{Cite journalbook |last1=Saran K B |last2=Sreelekha G |title=Traffic video surveillance: Vehicle detection and classification |url=https://ieeexplore.ieee.org/document/7432948 |journal=2015 International Conference on Control Communication & Computing India (ICCC) |chapter=Traffic video surveillance: Vehicle detection and classification |year=2015 |___location=Trivandrum, Kerala, India |publisher=IEEE |pages=516–521 |doi=10.1109/ICCC.2015.7432948 |isbn=978-1-4673-7349-4|s2cid=14779393 }}</ref><ref>{{Cite journal |last=Nemade |first=Bhushan |date=2016-01-01 |title=Automatic Traffic Surveillance Using Video Tracking |url=https://www.sciencedirect.com/science/article/pii/S1877050916001836 |journal=Procedia Computer Science |series=Proceedings of International Conference on Communication, Computing and Virtualization (ICCCV) 2016 |language=en |volume=79 |pages=402–409 |doi=10.1016/j.procs.2016.03.052 |issn=1877-0509|doi-access=free }}</ref> [[Content-based image retrieval|Small object retrieval]],<ref>{{Cite journalbook |last1=Guo |first1=Haiyun |last2=Wang |first2=Jinqiao |last3=Xu |first3=Min |last4=Zha |first4=Zheng-Jun |last5=Lu |first5=Hanqing |datetitle=2015-10-13Proceedings of the 23rd ACM international conference on Multimedia |titlechapter=Learning Multi-view Deep Features for Small Object Retrieval in Surveillance Scenarios |date=2015-10-13 |chapter-url=https://doi.org/10.1145/2733373.2806349 |journal=Proceedings of the 23rd ACM International Conference on Multimedia |series=MM '15 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=859–862 |doi=10.1145/2733373.2806349 |isbn=978-1-4503-3459-4|s2cid=9041849 }}</ref><ref>{{Cite journal |last1=Galiyawala |first1=Hiren |last2=Raval |first2=Mehul S. |last3=Patel |first3=Meet |date=2022-05-20 |title=Person retrieval in surveillance videos using attribute recognition |url=https://doi.org/10.1007/s12652-022-03891-0 |journal=Journal of Ambient Intelligence and Humanized Computing |volume=15 |pages=291–303 |language=en |doi=10.1007/s12652-022-03891-0 |s2cid=248951090 |issn=1868-5145|url-access=subscription }}</ref> [[Anomaly detection]],<ref>{{Cite journal |last1=Ingle |first1=Palash Yuvraj |last2=Kim |first2=Young-Gab |date=2022-05-19 |title=Real-Time Abnormal Object Detection for Video Surveillance in Smart Cities |journal=Sensors |language=en |volume=22 |issue=10 |pages=3862 |doi=10.3390/s22103862 |issn=1424-8220 |pmc=9143895 |pmid=35632270|bibcode=2022Senso..22.3862I |doi-access=free }}</ref> [[Maritime surveillance]], [[Aerial survey|Drone surveying]], [[Traffic flow|Traffic flow analysis]],<ref>{{Cite journal |last1=Tsuboi |first1=Tsutomu |last2=Yoshikawa |first2=Noriaki |date=2020-03-01 |title=Traffic flow analysis in Ahmedabad (India) |url=https://www.sciencedirect.com/science/article/pii/S2213624X18301974 |journal=Case Studies on Transport Policy |language=en |volume=8 |issue=1 |pages=215–228 |doi=10.1016/j.cstp.2019.06.001 |s2cid=195543435 |issn=2213-624X|doi-access=free }}</ref> and [[Video tracking|Object tracking]].
 
== Problems with small objects ==
 
* Modern-day object detection algorithms such as [[You Only Look Once(YOLO)]]<ref>{{cite arXiv |last1=Redmon |first1=Joseph |last2=Divvala |first2=Santosh |last3=Girshick |first3=Ross |last4=Farhadi |first4=Ali |date=2016-05-09 |title=You Only Look Once: Unified, Real-Time Object Detection |class=cs.CV |eprint=1506.02640}}</ref><ref>{{cite arXiv |last1=Redmon |first1=Joseph |last2=Farhadi |first2=Ali |date=2016-12-25 |title=YOLO9000: Better, Faster, Stronger |class=cs.CV |eprint=1612.08242}}</ref><ref>{{cite arXiv |last1=Redmon |first1=Joseph |last2=Farhadi |first2=Ali |date=2018-04-08 |title=YOLOv3: An Incremental Improvement |class=cs.CV |eprint=1804.02767}}</ref><ref>{{cite arXiv |last1=Bochkovskiy |first1=Alexey |last2=Wang |first2=Chien-Yao |last3=Liao |first3=Hong-Yuan Mark |date=2020-04-22 |title=YOLOv4: Optimal Speed and Accuracy of Object Detection |class=cs.CV |eprint=2004.10934}}</ref><ref>{{cite arXiv |last1=Wang |first1=Chien-Yao |last2=Bochkovskiy |first2=Alexey |last3=Liao |first3=Hong-Yuan Mark |date=2021-02-21 |title=Scaled-YOLOv4: Scaling Cross Stage Partial Network |class=cs.CV |eprint=2011.08036}}</ref><ref>{{cite arXiv |last1=Li |first1=Chuyi |last2=Li |first2=Lulu |last3=Jiang |first3=Hongliang |last4=Weng |first4=Kaiheng |last5=Geng |first5=Yifei |last6=Li |first6=Liang |last7=Ke |first7=Zaidan |last8=Li |first8=Qingyuan |last9=Cheng |first9=Meng |last10=Nie |first10=Weiqiang |last11=Li |first11=Yiduo |last12=Zhang |first12=Bo |last13=Liang |first13=Yufei |last14=Zhou |first14=Linyuan |last15=Xu |first15=Xiaoming |date=2022-09-07 |title=YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications |class=cs.CV |eprint=2209.02976}}</ref><ref>{{cite arXiv |last1=Wang |first1=Chien-Yao |last2=Bochkovskiy |first2=Alexey |last3=Liao |first3=Hong-Yuan Mark |date=2022-07-06 |title=YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors |class=cs.CV |eprint=2207.02696}}</ref> heavily uses convolution layers to learn [[Feature (computer vision)|features]]. As an object passes through convolution layers, its size gets reduced. Therefore, the small object disappears after several layers and becomes undetectable.
* Sometimes, the shadow of an object is detected as a part of object itself.<ref>{{Cite journalbook |last1=Zhang |first1=Mingrui |last2=Zhao |first2=Wenbing |last3=Li |first3=Xiying |last4=Wang |first4=Dan |date=2020-12-11 |title=Shadow Detection Of Moving Objects In Traffic Monitoring Video |url=https://ieeexplore.ieee.org/document/9338958 |journal=2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC) |chapter=Shadow Detection of Moving Objects in Traffic Monitoring Video |date=2020-12-11 |volume=9 |___location=Chongqing, China |publisher=IEEE |pages=1983–1987 |doi=10.1109/ITAIC49862.2020.9338958 |isbn=978-1-7281-5244-8|s2cid=231824327 }}</ref> So, the placement of the bounding box tends to centre around a shadow rather than an object. In the case of vehicle detection, [[pedestrian]] and two-wheeler detection suffer because of this.
* At present, [[Unmanned aerial vehicle|drones]] are very widely used in aerial imagery.<ref>{{Cite journal |title=Interactive workshop "How drones are changing the world we live in" |url=https://ieeexplore.ieee.org/document/7486437 |journal=2016 Integrated Communications Navigation and Surveillance (ICNS)book |year=2016 |___location=Herndon, VA |publisher=IEEE |pages=1–17 |doi=10.1109/ICNSURV.2016.7486437 |isbn=978-1-5090-2149-9|s2cid=21388151 |chapter=Interactive workshop "How drones are changing the world we live in" |title=2016 Integrated Communications Navigation and Surveillance (ICNS) }}</ref> They are equipped with hardware ([[sensor]]s) and software ([[algorithm]]s) that help maintain a particular stable position during their flight. In windy conditions, the drone automatically makes fine moves to maintain its position and that changes the view near the boundary. It may be possible that some new objects appear near the image boundary. Overall, these affect classification, detection, and eventually tracking accuracy.
 
[[File:Disp_shadow.jpg|thumb|Shadow and drone movement effect|alt=Here, both images are from same video. See, How the shadow of objects affecting detection accuracy. Also, drone's self-movement changes the scene near boundary(Refer to object "car" at bottom-left corner).]]
Line 17:
== Methods ==
Various methods<ref>{{Cite journal |title=An Evaluation of Deep Learning Methods for Small Object Detection |journal=Journal of Electrical and Computer Engineering |year=2020 |language=en |doi=10.1155/2020/3189691|doi-access=free |last1=Nguyen |first1=Nhat-Duy |last2=Do |first2=Tien |last3=Ngo |first3=Thanh Duc |last4=Le |first4=Duy-Dinh |volume=2020 |pages=1–18 }}</ref> are available to detect small objects, which fall under three categories:
[[File:Yolov5 (Ariel top view of Ahmedabad, Gujarat, India, 2022).jpg|thumb|YOLOv5 detection result]]
[[File:Y5_sahiYOLOv5 and SAHI interface (Ariel top view of Ahmedabad, Gujarat, India, 2022).jpg|thumb|YOLOv5 and SAHI interface]]
[[File:Yolov7 (Ariel top view of Ahmedabad, Gujarat, India, 2022).jpg|thumb|YOLOv7 detection output]]
 
=== Improvising existing techniques ===
Line 25:
 
==== Choosing a data set that has small objects ====
The [[machine learning]] model's output depends on "How well it is trained."<ref name=":0">{{Cite journal |last1=Gong |first1=Zhiqiang |last2=Zhong |first2=Ping |last3=Hu |first3=Weidong |date=2019 |title=Diversity in Machine Learning |url=https://ieeexplore.ieee.org/document/8717641 |journal=IEEE Access |volume=7 |pages=64323–64350 |doi=10.1109/ACCESS.2019.2917620 |s2cid=206491718 |issn=2169-3536|doi-access=free |arxiv=1807.01477 |bibcode=2019IEEEA...764323G }}</ref> So, the data set must include small objects to detect such objects. Also, modern-day detectors, such as YOLO, rely on anchors.<ref>{{Cite web |last=Christiansen |first=Anders |date=2022-06-10 |title=Anchor Boxes — The key to quality object detection |url=https://towardsdatascience.com/anchor-boxes-the-key-to-quality-object-detection-ddf9d612d4f9 |access-date=2022-09-14 |website=Medium |language=en}}</ref> Latest versions of YOLO (starting from YOLOv5<ref>{{cite journal |last1=Jocher |first1=Glenn |title=ultralytics/yolov5: v6.2 - YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai integrations |date=2022-08-17 |url=https://zenodo.org/record/7002879 |doi=10.5281/zenodo.3908559 |access-date=2022-09-14 |last2=Chaurasia |first2=Ayush |last3=Stoken |first3=Alex |last4=Borovec |first4=Jirka |last5=NanoCode012 |last6=Kwon |first6=Yonghye |last7=TaoXie |last8=Michael |first8=Kalen |last9=Fang |first9=Jiacong }}</ref>) uses an auto-anchor algorithm to find good anchors based on the nature of object sizes in the data set. Therefore, it is mandatory to have smaller objects in the data set.
 
==== Generating more data via augmentation, if required ====
Line 37:
 
==== Tiling approach during training and inference ====
State-of-the-art object detectors allow only the fixed size of image and change the input image size according to it. This change may deform the small objects in the image. The tiling approach<ref>{{Cite journalbook |last1=Unel |first1=F. Ozge |last2=Ozkalayci |first2=Burak O. |last3=Cigla |first3=Cevahir |title=The Power of Tiling for Small Object Detection |url=https://ieeexplore.ieee.org/document/9025422 |journal=2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) |chapter=The Power of Tiling for Small Object Detection |year=2019 |___location=Long Beach, CA, USA |publisher=IEEE |pages=582–591 |doi=10.1109/CVPRW.2019.00084 |isbn=978-1-7281-2506-0|s2cid=198903617 }}</ref> helps when an image has a high resolution than the model's fixed input size; instead of scaling it down, the image is broken down into tiles and then used in training. The same approach is used during inference as well.
 
==== Feature Pyramid Network (FPN) ====
Use a feature [[Pyramid (image processing)|pyramid]] network<ref>{{cite arXiv |last1=Lin |first1=Tsung-Yi |last2=Dollár |first2=Piotr |last3=Girshick |first3=Ross |last4=He |first4=Kaiming |last5=Hariharan |first5=Bharath |last6=Belongie |first6=Serge |date=2017-04-19 |title=Feature Pyramid Networks for Object Detection |class=cs.CV |eprint=1612.03144}}</ref> to learn features at a multi-scale: e.g., Twin Feature Pyramid Networks (TFPN),<ref>{{Cite journalbook |last1=Liang |first1=Yi |last2=Changjian |first2=Wang |last3=Fangzhao |first3=Li |last4=Yuxing |first4=Peng |last5=Qin |first5=Lv |last6=Yuan |first6=Yuan |last7=Zhen |first7=Huang |title=TFPN: Twin Feature Pyramid Networks for Object Detection |url=https://ieeexplore.ieee.org/document/8995365 |journal=2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI) |chapter=TFPN: Twin Feature Pyramid Networks for Object Detection |year=2019 |___location=Portland, OR, USA |publisher=IEEE |pages=1702–1707 |doi=10.1109/ICTAI.2019.00251 |isbn=978-1-7281-3798-8|s2cid=211211764 }}</ref> Extended Feature Pyramid Network (EFPN).<ref>{{cite arXiv |last1=Deng |first1=Chunfang |last2=Wang |first2=Mengmeng |last3=Liu |first3=Liang |last4=Liu |first4=Yong |date=2020-04-09 |title=Extended Feature Pyramid Network for Small Object Detection |class=cs.CV |eprint=2003.07021}}</ref> FPN helps to sustain features of small objects against convolution layers.
 
=== Add-on techniques ===
Line 46:
 
=== Well-Optimised techniques for small object detection ===
Various deep learning techniques are available that focus on such object detection problems: e.g., Feature-Fused SSD,<ref>{{Cite journalbook |last1=Cao |first1=Guimei |last2=Xie |first2=Xuemei |last3=Yang |first3=Wenzhe |last4=Liao |first4=Quan |last5=Shi |first5=Guangming |last6=Wu |first6=Jinjian |title=Ninth International Conference on Graphic and Image Processing (ICGIP 2017) |chapter=Feature-fused SSD: Fast detection for small objects |editor-first1=Junyu |editor-first2=Hui |editor-last1=Dong |editor-last2=Yu |date=2018-04-10 |title=Featurechapter-fused SSD: fast detection for small objects |url=https://www.spiedigitallibrary.org/conference-proceedings-of-spie/10615/106151E/Feature-fused-SSD-fast-detection-for-small-objects/10.1117/12.2304811.full |journal=Ninth International Conference on Graphic and Image Processing (ICGIP 2017) |publisher=SPIE |volume=10615 |pages=381–388 |doi=10.1117/12.2304811|arxiv=1709.05054 |bibcode=2018SPIE10615E..1EC |isbn=9781510617414 |s2cid=20592770 }}</ref> YOLO-Z.<ref>{{cite arXiv |last1=Benjumea |first1=Aduen |last2=Teeti |first2=Izzeddin |last3=Cuzzolin |first3=Fabio |last4=Bradley |first4=Andrew |date=2021-12-23 |title=YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles |class=cs.CV |eprint=2112.11798}}</ref> Such methods work on "How to sustain features of small objects while they pass through convolution networks."
 
== Other applications ==
 
* Crowd counting<ref>{{Cite journalbook |last1=Rajendran |first1=Logesh |last2=Shyam Shankaran |first2=R |title=Bigdata2021 EnabledIEEE RealtimeInternational CrowdConference Surveillanceon UsingBig ArtificialData Intelligenceand AndSmart DeepComputing Learning(BigComp) |urlchapter=https://ieeexplore.ieee.org/document/9373133Bigdata |journal=2021Enabled IEEERealtime InternationalCrowd ConferenceSurveillance onUsing BigArtificial DataIntelligence and SmartDeep Computing (BigComp)Learning |year=2021 |___location=Jeju Island, Korea (South) |publisher=IEEE |pages=129–132 |doi=10.1109/BigComp51126.2021.00032 |isbn=978-1-7281-8924-6|s2cid=232236614 }}</ref><ref>{{Cite journalbook |last1=Sivachandiran |first1=S. |last2=Mohan |first2=K. Jagan |last3=Nazer |first3=G. Mohammed |datetitle=2022-03-29 6th International Conference on Computing Methodologies and Communication (ICCMC) |titlechapter=Deep Transfer Learning Enabled High-Density Crowd Detection and Classification using Aerial Images |url=https://ieeexplore.ieee.org/document/9753982 |journaldate=2022 6th International Conference on Computing Methodologies and Communication (ICCMC)-03-29 |___location=Erode, India |publisher=IEEE |pages=1313–1317 |doi=10.1109/ICCMC53470.2022.9753982 |isbn=978-1-6654-1028-1|s2cid=248131806 }}</ref><ref>{{Cite journalbook |last1=Santhini |first1=C. |last2=Gomathi |first2=V. |title=Crowd Scene Analysis Using Deep Learning Network |url=https://ieeexplore.ieee.org/document/8550851 |journal=2018 International Conference on Current Trends Towardstowards Converging Technologies (ICCTCT) |chapter=Crowd Scene Analysis Using Deep Learning Network |year=2018 |pages=1–5 |doi=10.1109/ICCTCT.2018.8550851|isbn=978-1-5386-3702-9 |s2cid=54438440 }}</ref><ref>{{Cite journalbook |last1=Sharath |first1=S.V. |last2=Biradar |first2=Vidyadevi |last3=Prajwal |first3=M.S. |last4=Ashwini |first4=B. |date=2021-11-19 |title=Crowd Counting in High Dense Images using Deep Convolutional Neural Network |url=https://ieeexplore.ieee.org/document/9663716 |journal=2021 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER) |chapter=Crowd Counting in High Dense Images using Deep Convolutional Neural Network |date=2021-11-19 |___location=Nitte, India |publisher=IEEE |pages=30–34 |doi=10.1109/DISCOVER52564.2021.9663716 |isbn=978-1-6654-1244-5|s2cid=245707782 }}</ref>
* Vehicle re-identification<ref>{{Cite journal |last1=Wang |first1=Hongbo |last2=Hou |first2=Jiaying |last3=Chen |first3=Na |date=2019 |title=A Survey of Vehicle Re-Identification Based on Deep Learning |url=https://ieeexplore.ieee.org/document/8915694 |journal=IEEE Access |volume=7 |pages=172443–172469 |doi=10.1109/ACCESS.2019.2956172 |bibcode=2019IEEEA...7q2443W |s2cid=209319743 |issn=2169-3536|doi-access=free }}</ref>
* Animal detection<ref>{{Cite journalbook |last1=Santhanam |first1=Sanjay |last2=B |first2=Sudhir Sidhaarthan |last3=Panigrahi |first3=Sai Sudha |last4=Kashyap |first4=Suryakant Kumar |last5=Duriseti |first5=Bhargav Krishna |datetitle=2021-11-26 International Conference on Computational Intelligence and Computing Applications (ICCICA) |titlechapter=Animal Detection for Road safety using Deep Learning |url=https://ieeexplore.ieee.org/document/9697287 |journaldate=2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)-11-26 |___location=Nagpur, India |publisher=IEEE |pages=1–5 |doi=10.1109/ICCICA52458.2021.9697287 |isbn=978-1-6654-2040-2|s2cid=246663727 }}</ref><ref>{{Cite journalbook |last1=Li |first1=Nopparut |last2=Kusakunniran |first2=Worapan |last3=Hotta |first3=Seiji |title=Detection of Animal Behind Cages Using Convolutional Neural Network |url=https://ieeexplore.ieee.org/document/9158137 |journal=2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) |chapter=Detection of Animal Behind Cages Using Convolutional Neural Network |year=2020 |___location=Phuket, Thailand |publisher=IEEE |pages=242–245 |doi=10.1109/ECTI-CON49241.2020.9158137 |isbn=978-1-7281-6486-1|s2cid=221086279 }}</ref><ref>{{Cite journalbook |last1=Oishi |first1=Yu |last2=Matsunaga |first2=Tsuneo |title=2010 IEEE International Geoscience and Remote Sensing Symposium |chapter=Automatic detection of moving wild animals in airborne remote sensing images |url=https://ieeexplore.ieee.org/document/5654227 |journal=2010 IEEE International Geoscience and Remote Sensing Symposium |year=2010 |pages=517–519 |doi=10.1109/IGARSS.2010.5654227|isbn=978-1-4244-9565-8 |s2cid=16812504 }}</ref><ref>{{Cite journal |last1=Ramanan |first1=D. |last2=Forsyth |first2=D.A. |last3=Barnard |first3=K. |title=Building models of animals from video |url=https://ieeexplore.ieee.org/document/1642665 |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |year=2006 |volume=28 |issue=8 |pages=1319–1334 |doi=10.1109/TPAMI.2006.155 |pmid=16886866 |bibcode=2006ITPAM..28.1319R |s2cid=1699015 |issn=0162-8828}}</ref>
* Fish detection<ref>{{Cite journal |title=Fish Detection Using Deep Learning |journal=Applied Computational Intelligence and Soft Computing |year=2020 |language=en |doi=10.1155/2020/3738108|doi-access=free |last1=Cui |first1=Suxia |last2=Zhou |first2=Yu |last3=Wang |first3=Yonghui |last4=Zhai |first4=Lujun |volume=2020 |pages=1–13 }}</ref>
 
Line 68:
* [https://github.com/VisDrone/VisDrone-Dataset VisDrone] dataset by AISKYEYE team at Lab of Machine Learning and Data Mining, Tianjin University, China.
 
{{Computer vision}}
[[Category:Image sensors]]
[[Category:Imaging]]