Deep Learning

www.DLR.de • IMF

BigSkyEarth – Sorrento October 2016

Deep Learning – An interdisciplinary View of Learning Algorithms for Remote Sensing Image Analysis

www.DLR.de • IMF


Who am I ? ➢

Dimitrios Marmanis

www.DLR.de • IMF


Who am I ? ➢

➢

Dimitrios Marmanis PhD Candidate ➢ German Aerospace Center → DLR-IMF ➢ Technical University of Munich → TUM

www.DLR.de • IMF


Who am I ? ➢

➢

➢

Dimitrios Marmanis PhD Candidate ➢ German Aerospace Center → DLR-IMF ➢ Technical University of Munich → TUM Supervisors ➢ U. Stilla - TUM ➢ M. Datcu - DLR-IMF ➢ K. Schindler - ETHZ

www.DLR.de • IMF


Who am I ? ➢

➢

➢

➢

Dimitrios Marmanis PhD Candidate ➢ German Aerospace Center → DLR-IMF ➢ Technical University of Munich → TUM Supervisors ➢ U. Stilla - TUM ➢ M. Datcu - DLR-IMF ➢ K. Schindler - ETHZ Interests ➢ Deep Learning application on EO Data ➢ Advancement of Deep Learning Models

www.DLR.de • IMF


Who am I ? ➢

➢

➢

➢

➢

Dimitrios Marmanis PhD Candidate ➢ German Aerospace Center → DLR-IMF ➢ Technical University of Munich → TUM Supervisors ➢ U. Stilla - TUM ➢ M. Datcu - DLR-IMF ➢ K. Schindler - ETHZ Interests ➢ Deep Learning application on EO Data ➢ Advancement of Deep Learning Models Focus ➢ Machine Learning, ➢ Remote Sensing ➢ Computer Vision

www.DLR.de • IMF


Presentation Outline

www.DLR.de • IMF


Presentation Outline ➢

Brief Introduction to Deep Learning & CNN

www.DLR.de • IMF




➢

Notable Breakthroughs in Computer Vision

www.DLR.de • IMF




➢


➢

Important Findings in Remote Sensing & Astronomy

www.DLR.de • IMF




➢


➢


➢

Intriguing Properties of CNNs

www.DLR.de • IMF




➢


➢


➢

Intriguing Properties of CNNs

➢

How to Get Into Deep Learning

www.DLR.de • IMF



www.DLR.de • IMF


What is DeepLearning ? Multiple definitions, however they all agree in the following aspects:

www.DLR.de • IMF


What is DeepLearning ? Multiple definitions, however they all agree in the following aspects: ➢

Multiple layers of processing units

www.DLR.de • IMF




➢

End-to-end automatic learning features

www.DLR.de • IMF




➢


➢

Hierarchical feature representation → From low to high-level abstraction

www.DLR.de • IMF




➢


➢

Hierarchical feature representation → From low to high-level abstraction

➢

Supervised or Unsupervised frameworks exist

www.DLR.de • IMF


TraditionalMethods Vs. DeepLearning

www.DLR.de • IMF



www.DLR.de • IMF



www.DLR.de • IMF


Historical Evolution of Deep Learning Models

www.DLR.de • IMF


Historical Evolution of Deep Learning Models

www.DLR.de • IMF


How CNNs Work ?

www.DLR.de • IMF


How CNNs Work ?

➢

Architecture designed for processing data with spatial consistency – local trainable kernels

www.DLR.de • IMF


How CNNs Work ?

➢

➢

Architecture designed for processing data with spatial consistency – local trainable kernels Learn hierarchical representations → depth of network

www.DLR.de • IMF


How CNNs Work ?

➢

➢

➢

Architecture designed for processing data with spatial consistency – local trainable kernels Learn hierarchical representations → depth of network Efficient for large images extents– local computations through shared weights (trainable kernels)

www.DLR.de • IMF


How CNNs Work ?

➢

➢

➢

Architecture designed for processing data with spatial consistency – local trainable kernels Learn hierarchical representations → depth of network Efficient for large images extents– local computations through shared weights (trainable kernels)

www.DLR.de • IMF


The First Breakthrough 2012 ➢

AlexNet

www.DLR.de • IMF



AlexNet : Large CNN net → Won 2012 ImageNet Large-Scale Visual Recognition Challenge (= Olympics of Computer Vision)

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).

www.DLR.de • IMF



➢

AlexNet : Large CNN net → Won 2012 ImageNet Large-Scale Visual Recognition Challenge (= Olympics of Computer Vision) First time CNN breakthrough in a “real-problem” task


www.DLR.de • IMF




➢

First time CNN breakthrough in a “real-problem” task

➢

CNN error-rate achiever 15.4% → second best entry had 26.2% error-rate


www.DLR.de • IMF




➢


➢


➢

Trained on 2 GPUs for ~ 5 days


www.DLR.de • IMF




➢


➢


➢

Trained on 2 GPUs for ~ 5 days

➢

learning content from 15 million images


www.DLR.de • IMF



www.DLR.de • IMF


Object Classification and Detection in Photos

www.DLR.de • IMF


Object Classification and Detection in Photos ➢

Train a CNN on many millions of images examples

www.DLR.de • IMF



➢

Train a CNN on many millions of images examples Current systems achieve superhuman performance (=5.1 %) → error rate of 3.57 % in the classification task

www.DLR.de • IMF



➢

➢

Train a CNN on many millions of images examples Current systems achieve superhuman performance (=5.1 %) → error rate of 3.57 % in the classification task Knowledge acquired during training is transferable to a plethora of different visionrelated tasks (transfer-learning)

www.DLR.de • IMF



➢

➢

Train a CNN on many millions of images examples Current systems achieve superhuman performance (=5.1 %) → error rate of 3.57 % in the classification task Knowledge acquired during training is transferable to a plethora of different visionrelated tasks (transfer-learning)

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385.

www.DLR.de • IMF


Grayscale Image Colorization

www.DLR.de • IMF


Grayscale Image Colorization ➢

Colorize multimedia-images & video frames

www.DLR.de • IMF



➢

Colorize multimedia-images & video frames Impressive results for images similar to ImageNet, but not necessarily for every type of image

www.DLR.de • IMF



➢

Colorize multimedia-images & video frames Impressive results for images similar to ImageNet, but not necessarily for every type of image

Zhang, R., Isola, P., & Efros, A. A. (2016). Colorful Image Colorization. arXiv preprint arXiv:1603.08511. http://richzhang.github.io/colorization/

www.DLR.de • IMF


Automatic Handwriting Generation

www.DLR.de • IMF


Automatic Handwriting Generation ➢

Learn the relationship between the pen movement (coordinates) and and respective letters

www.DLR.de • IMF



➢

Learn the relationship between the pen movement (coordinates) and and respective letters Through gained knowledge new text can be generated on the fly using a learned style

www.DLR.de • IMF



➢

Learn the relationship between the pen movement (coordinates) and and respective letters Through gained knowledge new text can be generated on the fly using a learned style

Graves, A. (2013). Generating sequences with recurrent neural networks.arXiv preprint arXiv:1308.0850. http://www.cs.toronto.edu/~graves/handwriting.html

www.DLR.de • IMF


Automatic Caption Generation ➢

www.DLR.de • IMF



Generate coherent sentences describing the image content

www.DLR.de • IMF



➢

Generate coherent sentences describing the image content Method : Break the problem into parts → Object detection with CNNs and sentence generation with LSTMs (Long-Short Term Memory Networks)

www.DLR.de • IMF



➢

➢

Generate coherent sentences describing the image content Method : Break the problem into parts → Object detection with CNNs and sentence generation with LSTMs (Long-Short Term Memory Networks) Work on images and video as well

www.DLR.de • IMF



➢

➢

Generate coherent sentences describing the image content Method : Break the problem into parts → Object detection with CNNs and sentence generation with LSTMs (Long-Short Term Memory Networks) Work on images and video as well

Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3128-3137). http://cs.stanford.edu/people/karpathy/deepimagesent/

www.DLR.de • IMF


Image and Video Style Transfer

www.DLR.de • IMF


Image and Video Style Transfer ➢

Image Style Transfer (texture transfer)

www.DLR.de • IMF



➢

Image Style Transfer (texture transfer) Main intuition : Retain structure of image (contect) and “superimpose” a particular style (texture)

www.DLR.de • IMF



➢

➢

Image Style Transfer (texture transfer) Main intuition : Retain structure of image (contect) and “superimpose” a particular style (texture) Used for artistic style transfer (paintings) and Photorealistic image styling

www.DLR.de • IMF



➢

➢

Image Style Transfer (texture transfer) Main intuition : Retain structure of image (contect) and “superimpose” a particular style (texture) Used for artistic style transfer (paintings) and Photorealistic image styling

Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576. Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2414-2423). http://www.genekogan.com/works/style-transfer.html

www.DLR.de • IMF


Detecting Location of Image on a Global Scale

www.DLR.de • IMF


Detecting Location of Image on a Global Scale ➢

Train a large CNN images and respective location on the world (Grid-like location)

www.DLR.de • IMF




➢

Training dataset ~91 million images and respective locations

www.DLR.de • IMF




➢


➢

Test the model on 2.3 million images (Flickr) – indoor & outdoor scenes

www.DLR.de • IMF




➢


➢


➢

Tested again human – model achieves superhuman performance

www.DLR.de • IMF




➢


➢


➢

Tested again human – model achieves superhuman performance

Weyand, T., Kostrikov, I., & Philbin, J. (2016). Planet-photo geolocation with convolutional neural networks. arXiv preprint arXiv:1602.05314. https://www.geoguessr.com/

www.DLR.de • IMF



www.DLR.de • IMF


RS Image Classification with PreTrained CNNs

www.DLR.de • IMF


RS Image Classification with PreTrained CNNs ➢

Proposed Model → two-step approach for semantic classification of RS images

www.DLR.de • IMF



➢

Proposed Model → two-step approach for semantic classification of RS images Approach : feature extraction from a pre-trained CNN model (Overfeat) & with a trainable CNN model on top

www.DLR.de • IMF



➢

➢

Proposed Model → two-step approach for semantic classification of RS images Approach : feature extraction from a pre-trained CNN model (Overfeat) & with a trainable CNN model on top Pre-trained model generating the feature descriptors from ImageNet dataset– no knowledge of remote sensing images

www.DLR.de • IMF



➢

➢

➢

Proposed Model → two-step approach for semantic classification of RS images Approach : feature extraction from a pre-trained CNN model (Overfeat) & with a trainable CNN model on top Pre-trained model generating the feature descriptors from ImageNet dataset– no knowledge of remote sensing images Test performance on UC-Merced Landuse classification benchmark → 21 semantic classes like sparse residential, medium residential, buildings, tennis-fields etc.

➢ ➢ ➢ ➢ ➢

Marmanis, D., Datcu, M., Esch, T., & Stilla, U. (2016). Deep Learning Earth Observation Classification Using ImageNet Pretrained Networks. IEEE Geoscience and Remote Sensing Letters, 13(1), 105-109.

www.DLR.de • IMF



➢

➢

➢

Proposed Model → two-step approach for semantic classification of RS images Approach : feature extraction from a pre-trained CNN model (Overfeat) & with a trainable CNN model on top Pre-trained model generating the feature descriptors from ImageNet dataset– no knowledge of remote sensing images Test performance on UC-Merced Landuse classification benchmark → 21 semantic classes like sparse residential, medium residential, buildings, tennis-fields etc.

www.DLR.de • IMF




www.DLR.de • IMF




www.DLR.de • IMF


Semantic Annotation of VHSR Image using CNNs

www.DLR.de • IMF


Semantic Annotation of VHSR Image using CNNs ➢

Use an ensemble of pre-trained Computer Vision models for annotating Remotely Sensed data → resolution 5 to 9cm/ pixel

www.DLR.de • IMF



➢

Use an ensemble of pre-trained Computer Vision models for annotating Remotely Sensed data → resolution 5 to 9cm/ pixel Data intensities have extensive intra-class variability with an overall decreased interclass separation – increased importance of topology and context

www.DLR.de • IMF



➢

➢

Use an ensemble of pre-trained Computer Vision models for annotating Remotely Sensed data → resolution 5 to 9cm/ pixel Data intensities have extensive intra-class variability with an overall decreased interclass separation – increased importance of topology and context We proved that different pre-trained models over the same architecture result in complementary outcomes-→ when combined achieve superior performance

www.DLR.de • IMF



➢

➢

➢

Use an ensemble of pre-trained Computer Vision models for annotating Remotely Sensed data → resolution 5 to 9cm/ pixel Data intensities have extensive intra-class variability with an overall decreased interclass separation – increased importance of topology and context We proved that different pre-trained models over the same architecture result in complementary outcomes-→ when combined achieve superior performance Structured models seem not to improve results – CNN results in structured predictions

www.DLR.de • IMF



➢

➢

➢

Use an ensemble of pre-trained Computer Vision models for annotating Remotely Sensed data → resolution 5 to 9cm/ pixel Data intensities have extensive intra-class variability with an overall decreased interclass separation – increased importance of topology and context We proved that different pre-trained models over the same architecture result in complementary outcomes-→ when combined achieve superior performance Structured models seem not to improve results – CNN results in structured predictions

➢

Marmanis, D., Wegner, J. D., Galliani, S., Schindler, K., Datcu, M., & Stilla, U. (2016). Semantic segmentation of aerial images with an ensemble of CNNs. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 3

www.DLR.de • IMF



Results on ISPRS Benchmark

www.DLR.de • IMF




www.DLR.de • IMF




www.DLR.de • IMF


Classification of Galaxies Based on Morphology

www.DLR.de • IMF


Classification of Galaxies Based on Morphology ➢

Competition with open open source code → Galaxy Zoo Challenge

www.DLR.de • IMF




➢

Task: Human-like performance on classification of galaxies → 37 galaxy classes

www.DLR.de • IMF




➢


➢

Labels where acquired through crowdsourcing project → Galaxy Zoo website

www.DLR.de • IMF




➢


➢


➢

Best Model CNN – 7 layers (4 convolutional & fully connected) - No use of pretrained model (this aspect was not investigated)

www.DLR.de • IMF




➢


➢


➢

➢

Best Model CNN – 7 layers (4 convolutional & fully connected) - No use of pretrained model (this aspect was not investigated) RMSE → 0.07492 → Highly accurate model

www.DLR.de • IMF


Intriguing Property of CNNs

www.DLR.de • IMF


Network Initialization through TranferLearning

www.DLR.de • IMF



➢

In various cases no need to train a network from scratch (random initialization) → Use of an rich information pre-trained network may provide better results and reduced training time

www.DLR.de • IMF



➢

➢

In various cases no need to train a network from scratch (random initialization) → Use of an rich information pre-trained network may provide better results and reduced training time Pre-trained models → state-of-the art performance in a variety of vision related tasks

www.DLR.de • IMF



➢

➢

➢

In various cases no need to train a network from scratch (random initialization) → Use of an rich information pre-trained network may provide better results and reduced training time Pre-trained models → state-of-the art performance in a variety of vision related tasks Plethora of freely available, ready to use pre-trained models

www.DLR.de • IMF



➢

➢

➢

➢

In various cases no need to train a network from scratch (random initialization) → Use of an rich information pre-trained network may provide better results and reduced training time Pre-trained models → state-of-the art performance in a variety of vision related tasks Plethora of freely available, ready to use pre-trained models Same network with different initialization and/or different architecture will probably produce slightly different results – > non-convex feature space

www.DLR.de • IMF


ModelZoo : An Open Repository for CNN Pretrained models

www.DLR.de • IMF


ModelZoo : An Open Repository for CNN Pretrained models ➢

Online repository with dozens of CNN pre-trained model

https://github.com/BVLC/caffe/wiki/Model-Zoo https://bitbucket.org/deeplab/deeplab-public/

www.DLR.de • IMF



➢

Online repository with dozens of CNN pre-trained model Model specialization vary → visual classification, image similarity, robotics, speech, 3D reconstruction, contour-detection, etc.


www.DLR.de • IMF



➢

➢

Online repository with dozens of CNN pre-trained model Model specialization vary → visual classification, image similarity, robotics, speech, 3D reconstruction, contour-detection, etc. Standardize format for easy share and use –> through Caffe Library


www.DLR.de • IMF



➢


➢

Standardize format for easy share and use –> through Caffe Library

➢

Unrestricted use (BVLC license)


www.DLR.de • IMF



➢


➢

Standardize format for easy share and use –> through Caffe Library

➢

Unrestricted use (BVLC license)

➢

Ready run with minimal effort https://github.com/BVLC/caffe/wiki/Model-Zoo https://bitbucket.org/deeplab/deeplab-public/

www.DLR.de • IMF


How to Get Into Deep Learning

www.DLR.de • IMF


Software Libraries

www.DLR.de • IMF


Software Libraries ➢

More than 50 different libraries

www.DLR.de • IMF



➢

More than 50 different libraries Each software has a different approach and scope → eg. scientific experimentation, application-oriented, easy of use, etc.

www.DLR.de • IMF



➢

➢

More than 50 different libraries Each software has a different approach and scope → eg. scientific experimentation, application-oriented, easy of use, etc. Google is taking over through their newly released library → TensorFlow

www.DLR.de • IMF



➢


➢

Google is taking over through their newly released library → TensorFlow

➢

There are a few alternatives that won’t die soon → support from IBM and Facebook

www.DLR.de • IMF



➢


➢

Google is taking over through their newly released library → TensorFlow

➢

There are a few alternatives that won’t die soon → support from IBM and Facebook

➢

Import to think before you make a choice

www.DLR.de • IMF


Three Most Important Software Libraries

www.DLR.de • IMF


Three Most Important Software Libraries ➢

Caffe ➢ Written mainly in C++ ➢ Bindings in Matlab & Python ➢ Mainly targeting vision problems ➢ Very fast and modular

www.DLR.de • IMF



➢

Caffe ➢ Written mainly in C++ ➢ Bindings in Matlab & Python ➢ Mainly targeting vision problems ➢ Very fast and modular TensorFlow / Theano ➢ Symbolic expression compiler ➢ Allows automatic differentiation ➢ Symbolic flow graphs (TensorFlow) ➢ Distributed computation (TensorFlow) ➢ Most popular library – supported by Google

www.DLR.de • IMF



➢

➢

Caffe ➢ Written mainly in C++ ➢ Bindings in Matlab & Python ➢ Mainly targeting vision problems ➢ Very fast and modular TensorFlow / Theano ➢ Symbolic expression compiler ➢ Allows automatic differentiation ➢ Symbolic flow graphs (TensorFlow) ➢ Distributed computation (TensorFlow) ➢ Most popular library – supported by Google Torch ➢ Matlab-like environment – LuaJIT ➢ Very advanced framework ➢ Can allow changes over the models on the fly

www.DLR.de • IMF


Further Reading on Deep Learning Very reach online repository ➢ ➢ ➢ ➢ ➢

Books Important publications Courses Video lectures Tutorials https://github.com/priyaank/deep-learning

www.DLR.de • IMF


The End

Questions ???