Вы находитесь на странице: 1из 9

A Hybrid Machine Learning Approach

for Daily Prediction of Solar Radiation

Mehrnoosh Torabi1(&), Amir Mosavi2,3,4 , Pinar Ozturk4,

Annamaria Varkonyi-Koczy3,5, and Vajda Istvan3
Hormozgan Regional Electric Co, Bandarabbas, Iran
Institute of Advanced Studies Koszeg, iASK, Kőszeg, Hungary
Institute of Automation, Kando Kalman, Faculty of Electrical Engineering,
Obuda University, Budapest 1431, Hungary
Department of Computer Science, Norwegian University of Science
and Technology, Trondheim, Norway
Department of Mathematics and Informatics, J. Selye University,
Komarno, Slovakia

Abstract. In this paper, we present a Cluster-Based Approach (CBA) that

utilizes the support vector machine (SVM) and an artificial neural network
(ANN) to estimate and predict the daily horizontal global solar radiation. In the
proposed CBA-ANN-SVM approach, we first conduct clustering analysis and
divided the global solar radiation data into clusters, according to the calendar
months. Our approach aims at maximizing the homogeneity of data within the
clusters, and the heterogeneity between the clusters. The proposed CBA-ANN-
SVM approach is validated and the precision is compared with ANN and SVM
techniques. The mean absolute percentage error (MAPE) for the proposed
approach was reported lower than those of ANN and SVM.

Keywords: Global solar radiation  Prediction

Support vector machine (SVM)  Machine learning
Artificial neural networks (ANN)

1 Introduction

Renewable energy systems aim at satisfying the ever increasing energy demands in a
sustainable manner through reducing the greenhouse emissions and climate change risk
reduction [1, 2]. Among the renewable energies, the solar is generally considered as the
most promising sources, partly due to its availability [3, 4]. As a consequence, we are
seeing an increase in solar energy technologies. However, the capability to maximize
the utilization and efficiency of solar energy remains a difficult task, partly due to
challenges in the collecting and accurate analyzing of the solar radiation data. Nev-
ertheless, the solar energy projects can highly benefit from a reliable solar radiation
information. In fact, the global solar radiation is a highly relevant parameter in mon-
itoring, simulating, prediction, and sizing of solar energy technologies [5–9]. Thus, it is

© Springer Nature Switzerland AG 2019

G. Laukaitis (Ed.): INTER-ACADEMIA 2018, LNNS 53, pp. 1–9, 2019.
2 M. Torabi et al.

essential to be able to accurately predict the solar radiation using proper techniques
even at the absence of adequate data.
Several data mining techniques have been employed in business and medical sci-
ences [10], and in recent times, the focus has been on exploring approaches to deter-
mining patterns in data set that can be used for description and prediction. Data mining
is considered as an inductive machine learning (ML) technique, where the past data set
is utilized for training and learning the model of interest. This learning is representing
via determining the relationships among the variables and extracting meaningful pat-
terns. The objective of data mining is to use these meaningful patterns for the purpose
of accurate prediction [11, 17]. Artificial neural network (ANN) and support vector
machines (SVMs), two well-known data mining techniques, have been successfully
used to estimate global solar radiation. For example, Mubiru and Banda [12] used ANN
technique for estimation of monthly mean daily global solar irradiation at several
locations in Uganda. Jiang [13] proposed an ANN model to estimate monthly mean
daily global solar radiation in different cities of China. The evaluation of their model
shows better precision than the empirical models examined in the paper. Najafi et al.
[14] developed a coupled ANN algorithm to predict daily solar radiation in a number of
cities in Iran. It was found that the proposed algorithm achieves a better performance
than the Angstrӧm-Prescott model. Mathioulakis et al. [15] applied an advanced ANN
technique in the daily prediction. In their novel work, a number of different sets of
input parameters has been used. They further propose ANN as an effective method to
predict the solar radiation for a global estimation. Azeez [16] studied the monthly
prediction through using maximum ambient temperature, Sunshine duration, and rel-
ative humidity as the required input parameters. In addition, Mosavi et al. [17]
reviewed similar methods of prediction. In another study, Chen et al. [18] evaluated the
usage of SVMs for predicting the monthly mean based on the site’s minimum and
maximum temperature employing different functions of SVM with promising results.
Furthermore, Chen et al. [19] proposed a number of duration-based SVM algorithms
which showed superior results. Mosavi and Varkonyi [20] also utilized SVMs to
predict solar radiation considering the ambient temperature. Chen and Li [21] assessed
the performance of 20 SVM for estimation and reported that using SVM-based models
could result in better accuracy compared to ANN models.
Guermoui et al. [22] evaluated the utility of two support vector regression
(SVR) models, based on the radial basis function and the polynomial basis function, for
prediction of monthly mean daily global solar radiation. Their funding’s indicated SVR
based on the polynomial basis function have better accuracy over SVR based on the
radial basis function [23, 27, 28]. A number of authors have also attempted to achieve
better accuracy in estimating solar radiation using the hybrid approaches. For example,
Wu et al. [24] integrated the time delay neural network (TDNN) with autoregressive
and moving average (ARMA) algorithm to predict hourly solar radiation. The hybrid
model provides a higher capability compared to either TDNN model or the ARMA
model alone. Similarly, Moeini et al. [25] proposed a hybrid approach of fuzzy and
hidden Markov models to effectively predict the solar irradiation. Their results
demonstrated that the predictions of the proposed model are close to the training data
set. Halabi et al. [29] developed a hybrid approach by integrating simulated annealing
(SA) and genetic programming (GP). The results of their sensitivity analysis showed
A Hybrid Machine Learning Approach for Daily Prediction 3

that the suggested model provide accurate predictions. Guermoui et al. [22] compared
the precision of a hybrid SVM model with ANN and GP. As an alternative, we propose
a new concept to estimate global solar radiation on a horizontal surface, using a cluster-
based approach (CBA). Our CBA utilizes both ANN and SVM approaches to accu-
rately estimate daily global solar radiation, and this new approach is hereafter referred
to as CBA-ANN-SVM. This hybrid approach enjoys the benefits offered by both ANN
and SVM as well as those of the clustering technique. Clustering analysis classifies the
global solar radiation data into various clusters. This allows us to maximize the
homogeneity of data within the clusters as well as maximizing the heterogeneity
between the clusters. To test the validity of the proposed method, we use measured data
over a period of 10 years, including different meteorological variables and the hori-
zontal radiation, from Kerman region in Iran. We then compared the performance of
the proposed CBA-ANN-SVM method against those using ANN and SVM techniques.

2 Description of Data Collection

The city of Kerman located is the capital of Kerman province in Iran is used as the case
study in this paper. This studied site is located between 32°N and 25°55/N and also
between 53°26/E and 59°29/E. This location is in the sunniest spot of the region with
the sea level elevation of 1,756 m and the location of 30°29/N and 57°06/E. The region
has a dry and moderate climate. According to the long-term measured data, the monthly
average air temperature varies from 4.6 °C to 26.8 °C and the yearly average is
15.9 °C. The monthly average relative humidity varies between 19% and 53% with the
annual average of 32%. The data set includes 10 years daily sampled data, consisting of
the horizontal global solar radiation (H), sunshine duration (n), maximum and mini-
mum air temperature (Tmax and Tmin) for the period of December 1994 to January
2005. In this study, to filter the data sets and reduce the abnormalities and inconsis-
tencies in the values the concept of daily clearness index (Kt) was used. For this aim,
we compute Kt and determine and omitte the values of the out of range of
0.015 < Kt < 1 [26, 30]. Kt is defined as the ratio of horizontal global solar radiation
(H) to the radiation on a horizontal surface (Ho). To model the horizontal global solar
radiation via proposed method, the parameters of n, Tmax, Tmin, Ho and maximum
possible sunshine duration (N) are considered as inputs. Furthermore, the values of N
and Ho were computationally modeled utilizing the equations.

Table 1. Pearson correlation coefficient between the global solar radiation and input variables.
**. Correlation is significant at the n** N** Tmin** Tmax** Ho**
0.01 level (2-tailed).
Pearson correlation coefficient H 0.716 0.825 0.646 0.764 0.822

To identify the influence of considered parameters on accurate prediction of global

solar radiation, the Pearson correlation coefficient between the dependent parameter
(output) and independent parameters (inputs) were calculated using SPSS software.
4 M. Torabi et al.

Table 1 presents these achieved Pearson correlation coefficient. According to the

Table 1, it is noticed that all considered inputs have favorable correlations with global
solar radiation. However, the highest correlation is achieved for maximum possible
sunshine hours (N) while the lowest correlation is obtained for minimum air temper-
ature (Tmin). As one of the most effective graphical methods to determine the corre-
lation, pattern or trend between two parameters is the scatter plot, to illustrate the
correlations attained between global solar radiation and the considered inputs param-
eters their scatter plots are depicted. The scatter plots between H and the inputs n, N,
Tmin, Tmax and Ho are shown in Fig. 1(a–d), respectively.

Fig. 1. Scatter plots of horizontal global solar radiation and the considered input parameters

3 Modeling

In order to build the models, the Clementine software version 12.0 has been utilized.
Three different methods including the SVM, ANN and the hybrid cluster based method
that uses ANN and SVM (CBA-ANN-SVM) have been developed and used for this
research work. In the following, all developed models are explained, and then the best
model with least estimation error is determined.

3.1 Implemented Model Using SVM Approach

SVM is one of the new and well-known ML approaches. It is capable to perform
favorably even when the data samples are limited or they are non-linear and also the
dataset is high-dimensional or there exist local minima. SVM is also capable of high
generalization. Figure 2 illustrates the implemented model based upon the SVM
approach. For modeling, initially, the used data sets are brought to Clementine. Source
node that has been named “Imported Data”, reads in data from external source (dataset
that we have preprocessed) into Clementine. A “Partition” node is utilized to split the
data into separate subsets or samples for training and evaluation stages of model
building. For this study, 50% of the data sets were used for the training purpose and
50% of data sets were utilized for the testing purpose. Partition node has “random seed”
option. By this option, we can ensure different samples (by selecting another subset of
data records) will be generated each time the node is executed. By “Type” node, we tell
Modeling node (“SVM” node) whether fields will be predictor fields or predicted fields.
This node also describes data type (string, integer, real, date, time, or timestamp) in a
given field. “SVM” node is a Modeling node. This sequence of operations is known as
A Hybrid Machine Learning Approach for Daily Prediction 5

a data stream. When the stream is executed and model is built, the model nugget
(“SVM-Energy”) is created and added to the Models palette in the upper right corner of
the application window. In accordance with Clementine software, to see modeling
result we have to add the model nugget to the stream and attach the model nugget to the
“Type” node, at the same point as the Modeling node. “Analysis” node helps to
determine whether the model is acceptably accurate. Building the SVM model requires
a trade-off between maximizing the margins and the minimizing learning error. The
Clementine software has a regularization parameter “c”, which is used to regulate this
trade-off. Increasing c leads to higher classification accuracy (reduced regression error)
but it may also cause overfitting. In this study, three different kernel functions of linear,
polynomial, and sigmoid are tested. After building each model, its performance to
estimate global solar radiation was evaluated by calculating the mean absolute per-
centage error (MAPE) and standard deviation (SD). The MAPE is obtained by:

N  i 
Hesti  Hmaes 
MAPE ¼  i  ð1Þ
N i¼1 Hmeas

Where Hiesti and Himeas are the ith predicted the global solar radiation values,
respectively, and N represents the total number of data samples. In order to develop the
final SVM model with the lowest MAPE, a polynomial function with adjustment
parameter of 8 and gamma parameter of 2.5 was used. Table 2 shows the attained
MAPE and SD values for prediction of global solar radiation employing the proposed
SVM model. The significance of each considered input element to predict global solar
radiation based through the proposed SVM is shown in Fig. 2. According to the Fig. 2,
it is noticed that, Tmin has a little importance on estimation of global solar radiation
using SVM model while the highest importance belongs to the N.

3.2 Implemented Model Using ANN Approach

The second method employed to predict global solar radiation is advanced on the basis
of ANN technique. The implemented model on the basis of ANN is also shown in
Fig. 2. Similar to the SVM method, in the beginning, the used data sets are brought to
Clementine. The “Partition” node is used to divide the data into two subsets for training
and evaluation stages of model building. After building the model, the “Analysis” node
is used to determine that whether there is any overfitting. Considering the supervised
ANN, every single learning phase is named a cycle. These cycles continue till the
networks’ weight get stable. The parameter “Persistence” is set equal to 400 in this
model which means that if till 400 cycles the error would remain constant then the
model has become stable. For various settings, the global solar radiation modeling was
conducted and subsequently the MAPE and SD values were computed. The final and
best model was built with one layer of input, two hidden layers and one output layer.
The achieved MAPE and SD values using the best ANN model developed is presented
in Table 2. It is observed that n is the most relevant element whereas Tmin and Tmax of
which influences on estimation are close to each other have the least significance.
6 M. Torabi et al.

3.3 Implemented Model Based on Clustering (CBA-ANN-SVM

Another model developed in this research work is based upon clustering. The goal is to
verify the strength of clustering for global solar radiation estimation. The architecture
of hybrid cluster based model is as follow: Step 1: Clustering, Step 2: Modeling for
each cluster. One of the important points regarding the clustering is determining the
number of clusters. The two step algorithm has the advantage which makes it possible
to specify the number of clusters manually. Also, the algorithm can calculate the
number of clusters automatically. In fact, there is no need to initial choice of the
number of clusters. In addition, the algorithm is not sensitive to outliers’ data, although
in this study the outliers were omitted from data sets using solar data cleaning process.
Thus, the two step algorithm has been utilized in this study. To analyze the rules on
clusters, the decision tree and c5.0 algorithm have been used as presented in Fig. 2.

Fig. 2. Model implementation using SVM, ANN and the 2-step algorithm for clustering.

The clustering was performed on the basis of considering different variables such
as: (1) H and month, (2) H, number of month and n as well as (3) H, month and number
of days. For all their cases, the data sets were clustered to 12 clusters based on the
number of months. Thus, the number of month is the influential variable in clustering.
According to the analysis conducted using c5.0 algorithm, the governing rules on the
clusters are presented. Thus, in the first step, based on the variable of month and using
the unsupervised learning method, the inputs are clustered and divided to a series of
sub-sets which have the similar features (homogeneous groups). In the next step, the
estimations are conducted separately in each clusters using one of the techniques of
ANN and SVM, considered as supervised learning. Figure 2 offers a graphical repre-
sentation of data and distribution fields (H and number of month) between the clusters.
It shows that the significance of variables is equal to 1 which indicates the high
importance of these two variables in clustering. After clustering, modeling was per-
formed separately on each cluster. For each cluster, the SVM and ANN methods were
used. The data sets were divided into two subsets for training and testing by Partition
node. To obtain the final error of models, the results of the clusters were combined
together separately. Figure 2 illustrates the implemented model on the basis of hybrid
cluster based method. In the Table 2, the utilized models as well as the obtained values
MAPE and SD for the hybrid cluster based approach are presented for each cluster.
A Hybrid Machine Learning Approach for Daily Prediction 7

Table 2. The obtained MAPE and SD for the SVM, ANN and CBA-ANN-SVM models.
SVM 1.565 2.806
ANN 1.603 2.735
CBA-ANN-SVM 1.342 2.256

3.4 Performance Comparisons

Table 2 presents the comparisons between the performances of all three models based
on obtained MAPE and SD values.
In the hybrid cluster based approach (CBA-ANN-SVM), it is found that number of
months is an important factor in clustering. In fact, during the data clustering, the data
sets are assigned in the target cluster based upon the number of months. Afterwards for
estimation of horizontal global solar radiation, the proposed model utilizes the target
cluster according to the number of months. The results offered in Table 2 is the ver-
ification regarding the benefits of utilizing the cluster based method to predict the
global solar radiation. As the lowest error values is achieved for the hybrid CBA-ANN-
SVM model, this model is introduced as the superior one for estimation of global solar

4 Conclusions

In this study, a Cluster-Based Approach (CBA) was introduced to estimate daily global
solar radiation on a horizontal surface. For this aim, the clustering paradigm along with
ANN and SVM techniques were utilized in our proposed hybrid approach (CBA-ANN-
SVM). To demonstrate the practicality of CBA-ANN-SVM, we evaluated the approach
using 10 years of measured data sets from an Iranian city located in a sunny part of the
country. The measured sunshine hours, calculated the maximum amount of the possible
sunshine hours, maximum and minimum air temperatures, and extraterritorial solar
radiation were used as inputs for the prediction of global the solar radiation. Clustering
was performed to categorize the global solar radiation data into the clusters. It was
found that number of months is a significant parameter in clustering. To achieve this,
the clustering was performed according to the month of the year, so that the data sets
could be clustered into 12 clusters based on the month. This allowed us to maximize
the homogeneity of data within the clusters and the heterogeneity between the clusters.
Our evaluation of the CBA-ANN-SVM approach indicated that this approach resulted
in a higher accuracy compared to using ANN and SVM techniques. For example, the
MAPE using our approach is 1.342%, as compared to 1.603% and 1.565% using ANN
and SVM, respectively.

Acknowledgment. This work has partially been sponsored by the Hungarian National Scientific
Fund under contract OTKA 129374 and the Research & Development Operational Program for
the project “Modernization and Improvement of Technical Infrastructure for Research and
Development of J. Selye University in the Fields of Nanotechnology and Intelligent Space”,
8 M. Torabi et al.

ITMS 26210120042, co-funded by the European Regional Development Fund. Dr. Mosavi
contributed in this research during the tenure of an ERCIM Alain Bensoussan Fellowship Pro-
gramme. The support and research infrastructure of Institute of Advanced Studies Koszeg, iASK,
is acknowledged.

1. Hernandez, R.: Environmental impacts of utility-scale solar energy. Renew. Sustain. Energy
Rev. 29, 766–779 (2014)
2. Hosseini, E.: A review on green energy potentials in Iran. Renew. Sustain. Energy Rev. 27,
533–545 (2013)
3. Torabi, M., et al.: A Hybrid Clustering and Classification Technique for Forecasting Short-
Term Energy Consumption, Environmental Progress & Sustainable Energy. Wiley, Hoboken
4. Mekhilef, S.: A review on solar energy use in industries. Renew. Sustain. Energy Rev. 15,
1777–1790 (2011)
5. Imani, M.H.: Strategic behavior of retailers for risk reduction and profit increment via
distributed generators and demand response programs. Energies 11(6), 1–24 (2018)
6. Rusen, S.: Estimation of daily global solar irradiation by coupling ground measurements of
bright sunshine hours to satellite imagery. Energy 58, 417–425 (2013)
7. Darvishzadeh, A.: Modeling the strain impact on refractive index and optical transmission
rate. Physica B: Condens. Matter 543, 14–17 (2018)
8. Ulgen, K., Hepbasli, A.: Diffuse solar radiation estimation models for Turkey’s big cities.
Energy Convers. Manag. 50, 149–156 (2009)
9. Karakoti, I., Pande, B., Pandey, K.: Evaluation of different diffuse radiation models for
Indian stations. Renew. Sustain. Energy Rev. 15, 2378–2384 (2011)
10. Mosavi, A.: The large scale system of multiple criteria decision making. Large Scale
Complex Syst. Theory Appl. 9(1), 354–359 (2010)
11. Vargas, R., Mosavi, A., Ruiz, L.: Deep learning: a review. In: Advances in Intelligent
Systems and Computing (2017)
12. Mubiru, J.: Estimation of monthly average daily global solar irradiation using artificial neural
networks. Sol. Energy 82, 181–187 (2008)
13. Jiang, Y.: Computation of monthly mean daily global solar. Energy 34, 1276–1283 (2009)
14. Najafi, B., et al.: An intelligent artificial neural network-response surface methodology
method. Energies 11(4), 860 (2018)
15. Mathioulakis, E.: Artificial neural networks for the performance prediction of heat pump hot
water heaters. Int. J. Sustain. Energ. 37(2), 173–192 (2018)
16. Azeez, A.: Artificial neural network estimation of global solar. Appl. Sci. Res. 3(2), 586–595
17. Mosavi, A., et al.: Predicting the future using web knowledge: state of the art survey. In:
Advances in Intelligent Systems and Computing, vol 660. Springer, Heidelberg (2018)
18. Chen, L.: Estimation of monthly solar radiation from measured temperatures using support
vector machines-a case study. Renew. Energy 36, 413–420 (2011)
19. Chen, J.L.: Assessing the potential of support vector machine for estimating daily solar
radiation using sunshine duration. Energy Convers. Manag. 75, 311–318 (2013)
20. Mosavi, A., Varkonyi-Koczy, A.R.: Integration of machine learning and optimization for
robot learning. In: Advances in Intelligent Systems and Computing. Springer, Heidelberg
A Hybrid Machine Learning Approach for Daily Prediction 9

21. Chen, J.L., Li, G.S.: Evaluation of support vector machine for estimation of solar radiation
from measured meteorological variables. Theor. Appl. Climatol. 115, 627–638 (2014)
22. Guermoui, M.: Support vector regression methodology for estimating global solar radiation
in Algeria. Eur. Phys. J. Plus 133(1), 22 (2018)
23. Keshtegar, B.: Comparison of four heuristic regression techniques in solar radiation. Renew.
Sustain. Energy Rev. 81, 330–341 (2018)
24. Wu, J., Chan, C.K.: Prediction of hourly solar radiation using a novel hybrid model of
ARMA and TDNN. Sol. Energy 85, 808–817 (2011)
25. Moeini, I., et al.: Modeling the time-dependent characteristics of perovskite solar cells. Sol.
Energy 170, 969–973 (2018)
26. Mosavi, A., et al.: Industrial applications of big data: state of the art survey. Adv. Intell. Syst.
Comput. 660, 225–232 (2017)
27. Mosavi, A., et al.: Review on the usage of the multiobjective optimization package of
modeFrontier in the energy. In: Advances in Intelligent Systems and Computing, pp. 217–
224 (2017)
28. Mosavi, A., et al.: Reviewing the novel machine learning tools for materials design. In:
Advances in Intelligent Systems and Computing, pp. 50–58 (2017)
29. Halabi, L.M.: Performance evaluation of hybrid adaptive neuro-fuzzy inference system
models for predicting monthly global solar radiation. Appl. Energy 213, 247–261 (2018)
30. Moeini, I., et al.: Modeling the detection efficiency in photodetectors with temperature-
dependent mobility and carrier lifetime. In: Superlattices and Microstructures (2018)