Machine Learning Transfer Learning Survey Explains Key Concepts & Real-World Apps

Abstract:
Machine learning and data mining techniques have been used in numerous real-world
applications. An assumption of traditional machine learning methodologies is the
training data and testing data are taken from the same domain, such that the input
feature space and data distribution characteristics are the same. However, in some
real-world machine learning scenarios, this assumption does not hold. There are cases
where training data is expensive or difficult to collect. Therefore, there is a need to create
high-performance learners trained with more easily obtained data from different
domains. This methodology is referred to as transfer learning. This survey paper formally
defines transfer learning, presents information on current solutions, and reviews
applications applied to transfer learning. Lastly, there is information listed on software
downloads for various transfer learning solutions and a discussion of possible future
research work. The transfer learning solutions surveyed are independent of data size
and can be applied to big data environments.
Keywords: Transfer learning, Convolutional Neural Networks(ConvNets), ImageNet, VGG16,

Machine Learning.
Background:
Machine Learning and Data Mining techniques have been used successfully in
numerous real-world scenarios, from early onset cancer detection to autonomous cars. They
have been widely applied to complex applications where patterns from past
information(training data) can be extracted in order to predict future outcomes[1]. Classic
machine learning is characterized by training and testing data that have same input feature
characteristics and distribution. When a difference arises, results of a predictive model may
decrease[2].
With the complexity of algorithms and applications increasing, top of the line hardware
is a must for training high-performance learners. A large Dataset is an important necessity for
training learners that have high accuracy. Collecting and processing the data to keep the input
features and distribution homogenous is an expensive and time-consuming endeavor.
Therefore there is a need to create robust machine learning models for a target domain trained
from a related source domain. This is the fundamental motivation behind Transfer Learning.
Transfer Learning:
Transfer learning aims to create high-performance learners and models for a target
domain trained from a related source domain. A necessity for transfer learning arises when the
training data is limited due to various reasons: data available being scarce or inaccessible or
the collection and curation of available data might be too expensive. With more and more data
repositories being released every day, using models trained on existing repositories that are
related to the target domain make transfer learning a valuable approach that is able to cut
down costs and hardware requirements.
Transfer Learning focuses on improving a model by focusing on storing knowledge

gained while solving a problem and applying it to another, slightly related domain. One can find
ample real-world examples of transfer learning which are non-technical in nature. Consider two
people who are aspiring to become proficient in the piano. While one person is completely new
to the field of music, the other is well familiar with the guitar. The candidate with extensive
knowledge and familiarity with the guitar will be able to pick up playing the piano with greater
ease than the newcomer. This occurs as the guitar player is able to transfer his existing skills
and knowledge to learn the piano more effectively[3]. Transfer learning has been applied to
machine learning solutions successfully, consider prediction of sentiment through text analysis
of product reviews. Consider the labeled data available on digital cameras is more plentiful
than that available on food reviews. If both training and target data is derived from the
subdomain of Camera reviews, traditional machine learning models will produce accurate
prediction results. If this model were to be applied to predict sentiment of food, based on
camera review data, the results are likely to be inaccurate owing to the fact that the domains
are not same. However, the domains of camera reviews and food reviews have similarities in
that they are both written in the same language comprising of similar words, strings, sentences,
and structure. They both represent people’s experience with products belonging to either class
and are used to gauge satisfaction[84]. Transfer learning can be applied to improve the
performance of the model as the two domains are related. An alternative way to view data
domains in a transfer learning environment is that the target and training data exist as
subdomains linked by a high-level common domain. The piano player and the guitar player are
sub-domains of the music domain. Food reviews and digital camera reviews are sub-domains
of the review domain. High-level common domains determine how the subdomains are related.
Transfer Learning Working:
We proceed in two steps:
Step 1 : We import a huge dataset with 1M images of 10k categories (cats, trucks, computers,
etc.) called ImageNet and train a deep convolutional neural network (CNN) to classify the
images of this source dataset.
Step 2 : We use this CNN as a starting point and train it to classify the medical images dataset
(target dataset). This procedure is called fine tuning from a warm restart. We have another
option: extracting the representation of the images at the penultimate layer in the deep neural
network and train any standard classification algorithm on these new features. This is called
feature extraction. There are many other approaches, such as penalization of the weights by
similarity or multi-task learning.
Transfer Learning applications:
The surveyed works in this paper demonstrate that transfer learning has been applied to
many real-world applications. There are a number of application examples pertaining to natural
language processing, more specifically in the areas of sentiment classification, text
classification, spam email detection, and multiple language text classification.
Other well-represented transfer learning applications include image classification and video
concept classification. Applications that are more selectively addressed in the previous papers
include WiFi localization classification, muscle fatigue classification, drug efficacy
classification, human activity classification, software defect classification, and cardiac
arrhythmia classification.
Transfer learning has been successfully applied to many machine learning applications
like image classification [4, 5], software defect classification [7], text sentiment classification [6],
multi-language text classification [8,9] and human activity classification [10]. The application-
specific solutions tend to be related to the field of natural language processing and image
processing. In the literature, there are a number of transfer learning solutions that are specific
to the application of recommendation systems. Recommendation systems provide users with
recommendations or ratings for a particular domain (e.g. movies, books, etc.), which are based
on historical information. However, when the system does not have sufficient historical
information (referred to as the data sparsity issue presented in Moreno [11]), then the
recommendations are not reliable. In the cases where the system does not have sufficient
domain data to make reliable predictions (for example when a movie is just released), there is a
need to use previously collected information from a different domain (using books for example).
The aforementioned problem has been directly addressed using transfer learning
methodologies and captured in papers by Moreno [11].
Transfer learning solutions continue to be applied to a diverse number of real-world

applications, and in some cases, the applications are quite obscure. The application of head
pose classification finds a learner trained with previously captured labeled head positions to
predict a new head position. Head pose classification is used for determining the attentiveness
of drivers, analyzing social behavior, and human interaction with robots. Head positions
captured in source training data will have different head tilt ranges and angles than that of the
predicted target. The paper by Rajagopal [12] addresses the head pose classification issues
using transfer learning solutions.
Transfer learning makes use of pre-trained models that have been trained on a dataset
and have the biases and weights of whatever dataset that they were trained on with the core
idea being learned features are transferable to different data.
Pretrained-Models are especially favored in the data science community to create powerful
image classifiers based on limited data sets. Keras Applications has a set of convolutional
neural networks that have been pre-trained on the ImageNet dataset and are open source.
Convolutional Neural Networks:
Convolutional networks (ConvNets) have recently enjoyed a great success in large-

scale image and video recognition [14,13] which has become possible due to the large public
image repositories,such as ImageNet [15], and high-performance computing systems, such as
GPUs or large-scale distributed clusters.
In particular, an important role in the advance of deep visual recognition architectures

has been played by the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC)
(Russakovsky et al., 2014), which has served as a testbed for a few generations of large-scale
image classification systems, from high-dimensional shallow feature encodings (Perronnin et
al., 2010) (the winner of ILSVRC-2011) to deep ConvNets [14] (the winner of ILSVRC-2012).
ImageNet is a project that aims to categorize and label images into 22,000 discrete object
categories for computer vision research. The aim of the ILSVRC challenge is to train a model
that can classify an input image into 1000 object categories. Models are trained on 1.2 million
training images provided in the dataset. 50,000 images are used to validate the model and
another 100,000 images for testing. The Keras core library is an open source set of state-of-
the-art pre-trained networks that have the highest performance on the ImageNet challenge.
These models have been used successfully to classify images outside the ImageNet Dataset
via transfer learning.
With ConvNets becoming more of a commodity in the computer vision field, a number
of attempts have been made to improve the original architecture of Krizhevsky et al [14] in a bid
to achieve better accuracy. For instance, the best-performing submissions to the ILSVRC2013
utilized smaller receptive window size and smaller stride of the first convolutional layer. Another
line of improvements dealt with training and testing the networks densely over the whole image
and over multiple scales.
VGG 16 ConvNet:
The VGG network architecture was introduced by Simoyan and Zisserman in their 2014
paper [13], Very Deep Convolutional Networks for Large Scale Image Recognition, in which
they address another important aspect of ConvNet architecture design – its depth. To this end,
Simoyan & Zisserman fix other parameters of the architecture and steadily increase the depth
of the network by adding more convolutional layers, which is feasible due to the use of very
small (3×3) convolution filters in all layers. It features Maxpooling and a Softmax function at the
end. It achieves 7.5% top-5 error on ILSVRC-2012-Val, 7.4% top-5 error on ILSVRC-2012-test.
They have made the model and its weights open source under the Visual Geometry Group of
University of Oxford.
VGG 16 Architecture
Layers :
Convolution : Convolutional layers convolve around the image to detect edges, lines, blobs of
colors and other visual elements. Convolutional layers hyperparameters are the number of
filters, filter size, stride, padding and activation functions for introducing nonlinearity.
MaxPooling : Pooling layers reduces the dimensionality of the images by removing some of the
pixels from the image. Maxpooling replaces a n x n area of an image with the maximum pixel
value from that area to downsample the image.
Dropout : Dropout is a simple and effective technique to prevent the neural network from
overfitting during the training. Dropout is implemented by only keeping a neuron active with
some probability p and setting it to 0 otherwise. This forces the network to not learn redundant
information.
Flatten : Flattens the output of the convolution layers to feed into the Dense layers.
Dense : Dense layers are the traditional fully connected networks that maps the scores of the
convolutional layers into the correct labels with some activation function(softmax used here)
Activation functions :
Activation layers apply a nonlinear operation to the output of the other layers such as
convolutional layers or dense layers.
ReLu Activation : ReLu or Rectified Linear Unit computes the function $f(x)=max(0,x) to
threshold the activation at 0.
Softmax Activation : Softmax function is applied to the output layer to convert the scores into
probabilities that sum to 1.
Conclusion:
The subject of transfer learning is a well-researched area as evidenced with more than 700
academic papers addressing the topic in the last 5 years. In this survey, a background of
Transfer Learning, its working and applications are discussed. Performance of Convolutional
Neural networks in particular with Image Classification are found to be promising avenues for
transfer learning applications. Popular source convolutional neural network VGG 16, it’s
architecture is discussed. With the recent proliferation of sensors being deployed in cell
phones, vehicles, buildings, roadways, and computers, larger and more diverse information is
being collected. The diversity in data collection makes heterogeneous transfer learning
solutions more important moving forward. Larger data collection sizes highlight the potential for
big data solutions being deployed concurrent with current transfer learning solutions. How the
diversity and large size of sensor data integrates into transfer learning solutions is an interesting
topic of future research. Another area of future work pertains to the scenario where the output
label space is different between domains. With new data sets being captured and being made
available, this topic could be a needed area of focus for the future. Lastly, the literature has
very few transfer learning solutions addressing the scenario of unlabeled source and unlabeled
target data, which is certainly an area for expanded research.
References:
1. Witten IH, Frank E. Data mining, practical machine learning tools and techniques. 3rd ed.
San Francisco: Morgan Kaufmann Publishers; 2011.
2. Shimodaira H. Improving predictive inference under covariate shift by weighting the log-
likelihood function. J StatPlan Inf. 2000;90(2):227–44
3. Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng.
2010;22(10):1345–59
4. Zhu Y, Chen Y, Lu Z, Pan S, Xue G, Yu Y, Yang Q. Heterogeneous transfer learning for
image classification. In: Proceedings of the national conference on artificial intelligence, vol.
2. 2011. p. 1304–9.
5. Kulis B, Saenko K, Darrell T. What you saw is not what you get: domain adaptation using
asymmetric kernel transforms.In: IEEE 2011 conference on computer vision and pattern
recognition. 2011. p. 1785–92
6. Wang C, Mahadevan S. Heterogeneous domain adaptation using manifold alignment. In:
Proceedings of the 22nd international joint conference on artificial intelligence, vol. 2. 2011.
p. 541–46.
7. Nam J, Kim S (2015) Heterogeneous defect prediction. In: Proceedings of the 2015 10th
joint meeting on foundations of software engineering. 2015. p. 508–19
8. Prettenhofer P, Stein B. (2010) Cross-language text classification using structural
correspondence learning. In:Proceedings of the 48th annual meeting of the association for
computational linguistics. 2010. p. 1118–27
9. Zhou JT, Tsang IW, Pan SJ Tan M. Heterogeneous domain adaptation for multiple classes.
In: International conference on artificial intelligence and statistics. 2014. p. 1095–103.
10. Harel M, Mannor S. Learning from multiple outlooks. In: Proceedings of the 28th
international conference on machine learning. 2011. p. 401–8
11. Moreno O, Shapira B, Rokach L, Shani G (2012) TALMUD—transfer learning for multiple
domains. In: Proceedings of the 21st ACM international conference on information and
knowledge management. 2012. p. 425–34.
12. Rajagopal AN, Subramanian R, Ricci E, Vieriu RL, Lanz O, Ramakrishnan KR, Sebe N.
Exploring transfer learning approaches for head pose classification from multi-view
surveillance images. Int J Comput Vis. 2014;109(1–2):146–67.
13. Very Deep Convolutional Networks for Large-Scale Image Recognition K. Simonyan, A.
Zisserman arXiv:1409.1556
14. Krizhevsky, A., Sutskever, I., and Hinton, G. E. ImageNet classification with deep
convolutional neural networks. In NIPS, pp. 1106–1114, 2012
15. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale
hierarchical image database. In Proc. CVPR, 2009.

Machine Learning Transfer Learning Survey Explains Key Concepts & Real-World Apps

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Machine Learning Transfer Learning Survey Explains Key Concepts & Real-World Apps

Загружено:

Авторское право:

Доступные форматы

Abstract:

Keywords: Transfer learning, Convolutional Neural Networks(ConvNets), ImageNet, VGG16,

Transfer Learning focuses on improving a model by focusing on storing knowledge

Transfer Learning Working:

We proceed in two steps:

Transfer learning solutions continue to be applied to a diverse number of real-world

Convolutional Neural Networks:

Convolutional networks (ConvNets) have recently enjoyed a great success in large-

In particular, an important role in the advance of deep visual recognition architectures

Вам также может понравиться