Chapter 3

Chapter 3 : Methodology
Introduction
This section includes detailed descriptions of this study paper's research design and methodology.
This describes the methods, design and process used to achieve the goals of this research The goal
of this research is to create a model that uses less to train data in machine learning and yet still
achieve high accuracy for real world scenarios when there are less examples to train model.
Research aims
The goal of this research is to optimize artificial neural models by modelling a model that requires
less training data and yet achieving high accuracy The research findings are intended to contribute
to the development artificial neural networks. The research targets are based on:
1. Build a model that uses less training data and yet achieving high accuracy
2. Implement an algorithm that achieves high accuracy with less training data.
Research Design
Deep learning refers to a sub field of machine learning focused on learning levels of
representations, referring to a hierarchy of features, variables or concepts where higher-lever
concepts are described from lower-lever concepts, and the same lower-lever concepts can help
define other higher-lever concepts.
Deep learning is learning multiple levels of representation and abstraction, helping to comprehend
information like images, video, and text. The concept of Deep Learning comes from the study of
Artificial Neural Network, a Deep Learning structure is a multilayer perceptron that contains more
hidden layers.Fig shows the flow of data in a deep learning convolutionary neural network
architecture.
Model Architecture
CNN
Convolutionary neural networks feed into neural networks specialized mainly in image data
processing. Since images are symmetrical through a position shift, weight sharing and selective
field techniques are used to construct filter banks that remove geometrically related features from
the image data set. The system is structured hierarchically over many layers to obtain higher-level
features after each layer. To function in a similar way as perceptron classifiers, fully connected
layers are applied to the top of the architecture. Typically, the network is equipped using gradient
backpropagation techniques.
CNNs filter banks have a kernel size attribute correlated with the number of weights used to remove
an image patch's features. The proposed hybrid model uses five by five kernel sizes. It is defined
that there are thirty-two kernels in the first convolution layer and sixty-four filters in the second.
The third layer has a hundred and twenty-eight nodes in a fully connected architecture. Ten nodes
are also fully connected to the fourth layer. The last layer is a softmax feature.
SVM and Kernels
The Support Vector Machine algorithms used as nodes in the network in the proposed system. This
is a highly successful and elegant mathematical approach widely used by the machine learning
community. It demonstrates a problem of optimization that offers a convex error surface. It is
typically used to allow non-linearly separate classification of data set in accordance with kernel
methods.
Hybrid architecture
Before proceeding below are further the notations below will used in this section.
w lij means weight j in node i of layer l. The weight connects to node j in layer l-1
z li means linear activation value of node i in layer l
o li means rectified activation value of node i in layer l
Where OlI =ϕ( (z l −1˙ , wli )) and ϕ (x)=max(0 , x) is the rectifier function and Zli =( z l −1˙ , wli ) omitting
the subscript means a vector for example o i means a vector output of layer l.
The emphasis is on stacking SVMs at the top of each other in multi-layered SVM networks. Several
options are available:
1. Use non-linear kernels on non linear SVM units
2. Using linear SVM units but with sufficient output non-linearity
The model uses linear SVM units and pass through a rectifier to their outputs. Which makes them
just like the typical rectified linear units (ReLU) but with a change that these ReLU units are
hooked to a local hinge-loss during the training that results in a rectified linear SVM unit.
That is, the actual output y^ is passed to the hinge-loss in a typical linear SVM unit without any
non-linearity, but in a rectified linear SVM the output is passed through a rectifier first and then into
the hinge-loss. So the rectified linear SVM units do not learn when off because the gradient of the
rectifier is zero and also stop learning when on if they can clearly classify the input data because the
derivative of the hinge-loss is zero for points that the node can clearly classify.
The architecture therefore includes typical ReLUs that are hooked to local hinge-loss functions and
a single linear SVM output layer that is also hooked to its own hinge-loss function.This results in a
deep rectified linear SVM network that can, in theory, go infinitely deep because the sign signal can
continue to propagate through an infinitely large number of layers without any gradient problem
vanishing or exploding.
Back Sign Signal Propagation (BSSP)
For every layer l, every layer has a local hinge-loss function given by Li. So we can calculate the
derivatives quickly, as long as we have the signal y li for the node I weights w li ,In an ordinary
gradient descent, weights can be updated. If the layer is an output layer, the signal from the desired
signal can be easily obtained. For example, we can get the sign signal from the desired one-hot
output vector.In the local hinge-loss, the signal feeds as
Lli=max (0 , 1− oli y li)
The sign signal will be carrying information to signal the side on which the input should fall to the
given node, negative means that the input should fall on the negative side and positive means that it
should fall on the positive side of the hyperplane.The signal can therefore have three values+ 1, 0
and-1 where zero means nothing. That node in the layer tries to change the weights in order to learn
an estimated total hyperplane margin that divides the output as indicated by the signal.Therefore, by
hooking these nodes to a local hinge-loss and feeding them as the target in the sign signal, this
allows each layer to optimize itself provided the above layer sign signal.
Layer l feeds into layer l+1 for a feed forward architecture. A sign signal is generated by that the
local loss in layer l+1 must also be minimized while learning the outputs from layer l.
Since the relative magnitudes of the derivatives are important but they are discarded by the sign
operation, there is need to ensure that those nodes with strong gradient feedback still learn more
quickly than those with weak gradient feedback. This can be achieved by letting the node learn and
setting all the other deltas to zero with the maximum absolute . In short, this is competitive credit
assignment or CCA because nodes with strong absolute deltas are credited or blamed for the
complete performance of a given layer. Remember that at the output level, CCA is not present, it is
only for hidden units. This means that competition between nodes sharing the same inputs, depth
column nodes, for modeling the inputs is based on the absolute magnitude of the deltas (feedback)
and not on the magnitude of their respective activations. Modeling the input is assigned to the nodes
that receive the strongest feedback, which helps neurons specialize during learning.
The reason CCA is important is that the node with the stronger absolute feedback magnitude is
allowed to learn while the other is suppressed, given two nodes A and B that receive feedback
deltas of significant magnitude difference. For example, node A receives a value + 100 feedback,
while node B receives a value + 0.2 feedback, passing them through the sign operation yields+ 1 in
both cases, thus the node receiving a stronger feedback, in this case node A and B receiving a
weaker feedback can all learn at the same rate, which can cause the learning algorithm to fail to
learn complex functions. CCA breaks that by allowing A to learn and freezes B to the case for the
moment Where B provides input that is greater than A, then B will know while A is freezing.
Data Collection
MNIST stands for the Standards and Technology Revised National Institute. A list of handwritten
digits obtained and produced by Yann LeCun, Corinna Cortes, and Christopher J.C. Burges is the
original MNIST dataset. It is perpetuated by its authors as a strong dataset for which machine
learning and pattern recognition techniques can be conducted without intense data pre-processing
efforts.The paper uses the MNIST dataset provided by the goggle tensor-flow package for bench
marking machine learning algorithms.
The MNIST dataset can be accessed by installing directly, the tensor-flow from the python tensor-
flow package , or by using a machine learning library that already contains the MNIST as an
integrated dataset. The MNIST data sets total size is about 11.6 megabytes. Table 1 provides a
complete overview of learning and test sets file sizes provided by the MNIST data set.
Name Description Number Size

train-images-idx3-ubyte.gz Training set images 60 000 9.9 megabytes
train-images-idx1-ubyte.gz Training set labels 60 000 28.6 kilobytes
t10k-images-idx3-ubyte.gz Test set image 10 000 1.6 megabytes
t10k-labels-idx1-ubyte.gz Test set label 10 000 4.5 kilobytes
Every observation found in the dataset is a 28-by-28-pixel greyscale image. In addition, one of ten
category labels is labeled for each observation. The standard learning and test scales are
respectively 60,000 and 10,000 observations.
Research Procedure
The research's main objective is to model and implement a model that reduces training data while
still achieving high accuracy.the procedure is illustrated by the figure below
Data Collection and Feature Extraction
The data collected applies to measurement measurements, and each sample should include the
corresponding features of the source. The characteristics of the input can be divided into two
sections and tags for classes.
Such samples should be divided into the training dataset and the test dataset after obtaining
sufficient data. The former is used to construct the model of prediction, while the latter is used to
verify and improve the performance of the model.
Feature Selection and Scaling
For reality, hundreds of features can be contained in the software used for machine learning.
Leaving out relevant characteristics or maintaining irrelevant characteristics can lead to poor
predictor performance. The goal of feature selection is to pick the optimal subset with the fewest
number of features that contribute most to the accuracy of learning.
There are typically three alternative approaches to selecting features, including filter, wrapper and
embedded, depending on the relationship between feature selection process and model layout.
When evaluating feature importance, the filter approach is independent of the proposed model.
When measuring the function ratings, the wrapper method takes into account the quality of the
forecast. The embedded approach combines the choice of features and the prediction accuracy in its
method.The stop conditions for different algorithms are related to the selection of the search
algorithm, the evaluation criteria for the feature, and the specific requirements for the application.
Many machine-based learning algorithms, such as ANN, SVR, and KNN, are adaptive to the input
space level. Therefore, the process of normalization should be completed before the training starts.
In other words, all input and path loss values should be modified between −1 and 1 or between 0
and 1.
Model Selection
There may be different models for machine learning, and the model selection should consider both
accuracy and complexity requirements. This paper uses the model discussed in the architecture
section of the model.
Hyperparameter Setting and Model Training
Hyperparameters are the parameters whose values are set prior to the start of the learning process.
Typical Hyperparameters include the number of hidden layers and neurons in CNN, SVM's kernel
function regularization coefficients and parameters. A collection of suitable hyperparameters should
be carefully selected to optimize performance and optimization. Grid search, random search, and
Bayesian optimization are the main methods of optimization for hyperparameters. In this paper,
using the grid search process, the final values of hyperparameters are obtained. It is a systematic
search method that takes the best output parameters as the end result by going through all the
Model parameters are those parameters learned from training samples. It is worth mentioning
that different learning methods have different model parameters. During the model training process,
model parameters such as weights and biases are automatically learned.
Model Evaluation
In general, the performance of optimization-based machine learning is evaluated by samples in the

test data set that do not appear in the cycle of model training. Assessment metrics include accuracy
of estimation, property of generalization, and complexity. A model test data will be used to measure
the accuracy in terms of evaluating the accuracy after training.
Based on the analyzed performance, the model and the Hyper parameters are modified and the
prediction model is further improved. High precision can be achieved after the optimal model has
been built.
Research limitations
Use cases for machine learning applications are increasing and they are multiple applications using
supervised and unsupervised learning as well as classification and regression algorithms. Work is
limited to this work being restricted to supervised learning and classification algorithm by
convolutionary neural networks, yet there are many other use cases in addition to classification
involving training data optimization that is unsupervised learning and regression issues.

Chapter 3

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Chapter 3

Загружено:

Авторское право:

Доступные форматы

Chapter 3 : Methodology

SVM and Kernels

Back Sign Signal Propagation (BSSP)

Lli=max (0 , 1− oli y li)

Name Description Number Size

Data Collection and Feature Extraction

Hyperparameter Setting and Model Training

In general, the performance of optimization-based machine learning is evaluated by samples in the

Вам также может понравиться