Image segmentation: Difference between revisions

Content deleted Content added
Wisepl (talk | contribs)
mNo edit summary
Line 352:
[[Pulse-coupled networks|Pulse-coupled neural networks (PCNNs)]] are neural models proposed by modeling a cat's visual cortex and developed for high-performance [[biomimetic]] [[image processing]]. In 1989, Reinhard Eckhorn introduced a neural model to emulate the mechanism of a cat's visual cortex. The Eckhorn model provided a simple and effective tool for studying the visual cortex of small mammals, and was soon recognized as having significant application potential in image processing. In 1994, the Eckhorn model was adapted to be an image processing algorithm by John L. Johnson, who termed this algorithm Pulse-Coupled Neural Network.<ref>{{cite journal|last1=Johnson|first1=John L.|date=September 1994|title=Pulse-coupled neural nets: translation, rotation, scale, distortion, and intensity signal invariance for images|doi=10.1364/AO.33.006239|pmid=20936043|publisher=OSA|volume=33|journal=Applied Optics|number=26|pages=6239–6253|bibcode=1994ApOpt..33.6239J}}</ref> Over the past decade, PCNNs have been utilized for a variety of image processing applications, including: image segmentation, feature generation, face extraction, motion detection, region growing, noise reduction, and so on. A PCNN is a two-dimensional neural network. Each neuron in the network corresponds to one pixel in an input image, receiving its corresponding pixel's color information (e.g. intensity) as an external stimulus. Each neuron also connects with its neighboring neurons, receiving local stimuli from them. The external and local stimuli are combined in an internal activation system, which accumulates the stimuli until it exceeds a dynamic threshold, resulting in a pulse output. Through iterative computation, PCNN neurons produce temporal series of pulse outputs. The temporal series of pulse outputs contain information of input images and can be utilized for various image processing applications, such as image segmentation and feature generation. Compared with conventional image processing means, PCNNs have several significant merits, including robustness against noise, independence of geometric variations in input patterns, capability of bridging minor intensity variations in input patterns, etc.
 
[[U-Net]]In is a2015, [[convolutional neural network|convolutional neural networks]] reached state of the art in semantic segmentation.<ref>{{Cite journal |last=Long |first=Jonathan |last2=Shelhamer |first2=Evan |last3=Darrell |first3=Trevor |date=2015 |title=Fully Convolutional Networks for Semantic Segmentation |url=https://openaccess.thecvf.com/content_cvpr_2015/html/Long_Fully_Convolutional_Networks_2015_CVPR_paper.html |pages=3431–3440}}</ref> [[U-Net]] is an architecture which takes as input an image and outputs a label for each pixel.<ref>{{cite arXiv|last1=Ronneberger|first1=Olaf|last2=Fischer|first2=Philipp|last3=Brox|first3=Thomas|title=U-Net: Convolutional Networks for Biomedical Image Segmentation|eprint=1505.04597|date=2015|class=cs.CV}}</ref> U-Net initially was developed to detect cell boundaries in biomedical images. U-Net follows classical [[autoencoder]] architecture, as such it contains two sub-structures. The encoder structure follows the traditional stack of convolutional and max pooling layers to increase the receptive field as it goes through the layers. It is used to capture the context in the image. The decoder structure utilizes transposed convolution layers for upsampling so that the end dimensions are close to that of the input image. Skip connections are placed between convolution and transposed convolution layers of the same shape in order to preserve details that would have been lost otherwise.
 
In addition to pixel-level semantic segmentation tasks which assign a given category to each pixel, modern segmentation applications include instance-level semantic segmentation tasks in which each individual in a given category must be uniquely identified, as well as panoptic segmentation tasks which combines these two tasks to provide a more complete scene segmentation.<ref name="Panoptic Segmentation"/>