Revision as of 04:09, 26 September 2022 edit Citation bot (talk \| contribs) Bots 5,868,193 edits Alter: url, journal, template type. URLs might have been anonymized. Add: pmid, isbn, arxiv, pages, journal, volume, eprint, doi-access, bibcode, s2cid, year, authors 1-4. editors 1-2. Removed proxy/dead URL that duplicated identifier. Removed access-date with no URL. Removed parameters. Formatted dashes. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Headbomb \| #UCB_toolbar ← Previous edit		Revision as of 04:11, 26 September 2022 edit undo Headbomb (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, Page movers, File movers, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 473,387 edits ce Next edit →
Line 9: == Problems with small objects == * Modern-day object detection algorithms such as You Only Look Once(YOLO)<ref>{{cite arXiv \|last1=Redmon \|first1=Joseph \|last2=Divvala \|first2=Santosh \|last3=Girshick \|first3=Ross \|last4=Farhadi \|first4=Ali \|date=2016-05-09 \|title=You Only Look Once: Unified, Real-Time Object Detection \|eprint=1506.02640}}</ref><ref>{{cite arXiv \|last1=Redmon \|first1=Joseph \|last2=Farhadi \|first2=Ali \|date=2016-12-25 \|title=YOLO9000: Better, Faster, Stronger \|eprint=~~arxiv.~~1612.08242}}</ref><ref>{{cite arXiv \|last1=Redmon \|first1=Joseph \|last2=Farhadi \|first2=Ali \|date=2018-04-08 \|title=YOLOv3: An Incremental Improvement \|eprint=~~arxiv.~~1804.02767}}</ref><ref>{{cite arXiv \|last1=Bochkovskiy \|first1=Alexey \|last2=Wang \|first2=Chien-Yao \|last3=Liao \|first3=Hong-Yuan Mark \|date=2020-04-22 \|title=YOLOv4: Optimal Speed and Accuracy of Object Detection \|eprint=~~arxiv.~~2004.10934}}</ref><ref>{{cite arXiv \|last1=Wang \|first1=Chien-Yao \|last2=Bochkovskiy \|first2=Alexey \|last3=Liao \|first3=Hong-Yuan Mark \|date=2021-02-21 \|title=Scaled-YOLOv4: Scaling Cross Stage Partial Network \|eprint=~~arxiv.~~2011.08036}}</ref><ref>{{cite arXiv \|last1=Li \|first1=Chuyi \|last2=Li \|first2=Lulu \|last3=Jiang \|first3=Hongliang \|last4=Weng \|first4=Kaiheng \|last5=Geng \|first5=Yifei \|last6=Li \|first6=Liang \|last7=Ke \|first7=Zaidan \|last8=Li \|first8=Qingyuan \|last9=Cheng \|first9=Meng \|last10=Nie \|first10=Weiqiang \|last11=Li \|first11=Yiduo \|last12=Zhang \|first12=Bo \|last13=Liang \|first13=Yufei \|last14=Zhou \|first14=Linyuan \|last15=Xu \|first15=Xiaoming \|date=2022-09-07 \|title=YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications \|eprint=~~arxiv.~~2209.02976}}</ref><ref>{{cite arXiv \|last1=Wang \|first1=Chien-Yao \|last2=Bochkovskiy \|first2=Alexey \|last3=Liao \|first3=Hong-Yuan Mark \|date=2022-07-06 \|title=YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors \|eprint=~~arxiv.~~2207.02696}}</ref> heavily uses convolution layers to learn [[Feature (computer vision)\|features]]. As an object passes through convolution layers, its size gets reduced. Therefore, the small object disappears after several layers and becomes undetectable. * Sometimes, the shadow of an object is detected as a part of object itself.<ref>{{Cite journal \|last1=Zhang \|first1=Mingrui \|last2=Zhao \|first2=Wenbing \|last3=Li \|first3=Xiying \|last4=Wang \|first4=Dan \|date=2020-12-11 \|title=Shadow Detection Of Moving Objects In Traffic Monitoring Video \|url=https://ieeexplore.ieee.org/document/9338958 \|journal=2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC) \|volume=9 \|___location=Chongqing, China \|publisher=IEEE \|pages=1983–1987 \|doi=10.1109/ITAIC49862.2020.9338958 \|isbn=978-1-7281-5244-8\|s2cid=231824327 }}</ref> So, the placement of the bounding box tends to centre around a shadow rather than an object. In the case of vehicle detection, [[pedestrian]] and two-wheeler detection suffer because of this. * At present, [[Unmanned aerial vehicle\|drones]] are very widely used in aerial imagery.<ref>{{Cite journal \|title=Interactive workshop "How drones are changing the world we live in" \|url=https://ieeexplore.ieee.org/document/7486437 \|journal=2016 Integrated Communications Navigation and Surveillance (ICNS) \|year=2016 \|___location=Herndon, VA \|publisher=IEEE \|pages=1–17 \|doi=10.1109/ICNSURV.2016.7486437 \|isbn=978-1-5090-2149-9\|s2cid=21388151 }}</ref> They are equipped with hardware ([[sensor]]s) and software ([[algorithm]]s) that help maintain a particular stable position during their flight. In windy conditions, the drone automatically makes fine moves to maintain its position and that changes the view near the boundary. It may be possible that some new objects appear near the image boundary. Overall, these affect classification, detection, and eventually tracking accuracy. Line 34: ==== Auto learning anchors ==== Selecting anchor size plays a vital role in small object detection.<ref>{{cite arXiv \|last1=Zhong \|first1=Yuanyi \|last2=Wang \|first2=Jianfeng \|last3=Peng \|first3=Jian \|last4=Zhang \|first4=Lei \|date=2020-01-26 \|title=Anchor Box Optimization for Object Detection \|eprint=~~arxiv.~~1812.00469}}</ref> Instead of hand picking it, use algorithms that identify it based on the data set. YOLOv5 uses a [[K-means clustering\|K-means algorithm]] to define anchor size. ==== Tiling approach during training and inference ==== Line 40: ==== Feature Pyramid Network (FPN) ==== Use a feature [[Pyramid (image processing)\|pyramid]] network<ref>{{cite arXiv \|last1=Lin \|first1=Tsung-Yi \|last2=Dollár \|first2=Piotr \|last3=Girshick \|first3=Ross \|last4=He \|first4=Kaiming \|last5=Hariharan \|first5=Bharath \|last6=Belongie \|first6=Serge \|date=2017-04-19 \|title=Feature Pyramid Networks for Object Detection \|eprint=~~arxiv.~~1612.03144}}</ref> to learn features at a multi-scale: e.g., Twin Feature Pyramid Networks (TFPN),<ref>{{Cite journal \|last1=Liang \|first1=Yi \|last2=Changjian \|first2=Wang \|last3=Fangzhao \|first3=Li \|last4=Yuxing \|first4=Peng \|last5=Qin \|first5=Lv \|last6=Yuan \|first6=Yuan \|last7=Zhen \|first7=Huang \|title=TFPN: Twin Feature Pyramid Networks for Object Detection \|url=https://ieeexplore.ieee.org/document/8995365 \|journal=2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI) \|year=2019 \|___location=Portland, OR, USA \|publisher=IEEE \|pages=1702–1707 \|doi=10.1109/ICTAI.2019.00251 \|isbn=978-1-7281-3798-8\|s2cid=211211764 }}</ref> Extended Feature Pyramid Network (EFPN).<ref>{{cite arXiv \|last1=Deng \|first1=Chunfang \|last2=Wang \|first2=Mengmeng \|last3=Liu \|first3=Liang \|last4=Liu \|first4=Yong \|date=2020-04-09 \|title=Extended Feature Pyramid Network for Small Object Detection \|eprint=~~arxiv.~~2003.07021}}</ref> FPN helps to sustain features of small objects against convolution layers. === Add-on techniques === Instead of modifying existing methods, some add-on techniques are there, which can be directly placed on top of existing approaches to detect smaller objects. One such technique is Slicing Aided Hyper Inference(SAHI).<ref>{{cite arXiv \|last1=Akyon \|first1=Fatih Cagatay \|last2=Altinuc \|first2=Sinan Onur \|last3=Temizel \|first3=Alptekin \|date=2022-07-12 \|title=Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection \|eprint=~~arxiv.~~2202.06934}}</ref> The image is sliced into different-sized multiple overlapping patches. [[Hyperparameter (machine learning)\|Hyper-parameters]] define their dimensions. Then patches are resized, while maintaining the aspect ratio during fine-tuning. These patches are then provided for training the model. === Well-Optimised techniques for small object detection === Various deep learning techniques are available that focus on such object detection problems: e.g., Feature-Fused SSD,<ref>{{Cite journal \|last1=Cao \|first1=Guimei \|last2=Xie \|first2=Xuemei \|last3=Yang \|first3=Wenzhe \|last4=Liao \|first4=Quan \|last5=Shi \|first5=Guangming \|last6=Wu \|first6=Jinjian \|editor-first1=Junyu \|editor-first2=Hui \|editor-last1=Dong \|editor-last2=Yu \|date=2018-04-10 \|title=Feature-fused SSD: fast detection for small objects \|url=https://www.spiedigitallibrary.org/conference-proceedings-of-spie/10615/106151E/Feature-fused-SSD-fast-detection-for-small-objects/10.1117/12.2304811.full \|journal=Ninth International Conference on Graphic and Image Processing (ICGIP 2017) \|publisher=SPIE \|volume=10615 \|pages=381–388 \|doi=10.1117/12.2304811\|arxiv=1709.05054 \|bibcode=2018SPIE10615E..1EC \|isbn=9781510617414 \|s2cid=20592770 }}</ref> YOLO-Z.<ref>{{cite arXiv \|last1=Benjumea \|first1=Aduen \|last2=Teeti \|first2=Izzeddin \|last3=Cuzzolin \|first3=Fabio \|last4=Bradley \|first4=Andrew \|date=2021-12-23 \|title=YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles \|eprint=~~arxiv.~~2112.11798}}</ref> Such methods work on "How to sustain features of small objects while they pass through convolution networks." == Other applications ==

Small object detection: Difference between revisions