0 оценок0% нашли этот документ полезным (0 голосов)

3 просмотров9 страницIn this paper, we present a Cluster-Based Approach (CBA) that
utilizes the support vector machine (SVM) and an artificial neural network
(ANN) to estimate and predict the daily horizontal global solar radiation. In the
proposed CBA-ANN-SVM approach, we first conduct clustering analysis and
divided the global solar radiation data into clusters, according to the calendar
months. Our approach aims at maximizing the homogeneity of data within the
clusters, and the heterogeneity between the clusters. The proposed CBA-ANNSVM
approach is validated and the precision is compared with ANN and SVM
techniques. The mean absolute percentage error (MAPE) for the proposed
approach was reported lower than those of ANN and SVM.

Aug 12, 2018

© © All Rights Reserved

PDF, TXT или читайте онлайн в Scribd

In this paper, we present a Cluster-Based Approach (CBA) that
utilizes the support vector machine (SVM) and an artificial neural network
(ANN) to estimate and predict the daily horizontal global solar radiation. In the
proposed CBA-ANN-SVM approach, we first conduct clustering analysis and
divided the global solar radiation data into clusters, according to the calendar
months. Our approach aims at maximizing the homogeneity of data within the
clusters, and the heterogeneity between the clusters. The proposed CBA-ANNSVM
approach is validated and the precision is compared with ANN and SVM
techniques. The mean absolute percentage error (MAPE) for the proposed
approach was reported lower than those of ANN and SVM.

© All Rights Reserved

0 оценок0% нашли этот документ полезным (0 голосов)

3 просмотров9 страницIn this paper, we present a Cluster-Based Approach (CBA) that
utilizes the support vector machine (SVM) and an artificial neural network
(ANN) to estimate and predict the daily horizontal global solar radiation. In the
proposed CBA-ANN-SVM approach, we first conduct clustering analysis and
divided the global solar radiation data into clusters, according to the calendar
months. Our approach aims at maximizing the homogeneity of data within the
clusters, and the heterogeneity between the clusters. The proposed CBA-ANNSVM
approach is validated and the precision is compared with ANN and SVM
techniques. The mean absolute percentage error (MAPE) for the proposed
approach was reported lower than those of ANN and SVM.

© All Rights Reserved

Вы находитесь на странице: 1из 9

Annamaria Varkonyi-Koczy3,5, and Vajda Istvan3

1

Hormozgan Regional Electric Co, Bandarabbas, Iran

2

Institute of Advanced Studies Koszeg, iASK, Kőszeg, Hungary

3

Institute of Automation, Kando Kalman, Faculty of Electrical Engineering,

Obuda University, Budapest 1431, Hungary

amir.mosavi@kvk.uni-obuda.hu

4

Department of Computer Science, Norwegian University of Science

and Technology, Trondheim, Norway

amir.mosavi@ntnu.no

5

Department of Mathematics and Informatics, J. Selye University,

Komarno, Slovakia

utilizes the support vector machine (SVM) and an artiﬁcial neural network

(ANN) to estimate and predict the daily horizontal global solar radiation. In the

proposed CBA-ANN-SVM approach, we ﬁrst conduct clustering analysis and

divided the global solar radiation data into clusters, according to the calendar

months. Our approach aims at maximizing the homogeneity of data within the

clusters, and the heterogeneity between the clusters. The proposed CBA-ANN-

SVM approach is validated and the precision is compared with ANN and SVM

techniques. The mean absolute percentage error (MAPE) for the proposed

approach was reported lower than those of ANN and SVM.

Support vector machine (SVM) Machine learning

Artiﬁcial neural networks (ANN)

1 Introduction

Renewable energy systems aim at satisfying the ever increasing energy demands in a

sustainable manner through reducing the greenhouse emissions and climate change risk

reduction [1, 2]. Among the renewable energies, the solar is generally considered as the

most promising sources, partly due to its availability [3, 4]. As a consequence, we are

seeing an increase in solar energy technologies. However, the capability to maximize

the utilization and efﬁciency of solar energy remains a difﬁcult task, partly due to

challenges in the collecting and accurate analyzing of the solar radiation data. Nev-

ertheless, the solar energy projects can highly beneﬁt from a reliable solar radiation

information. In fact, the global solar radiation is a highly relevant parameter in mon-

itoring, simulating, prediction, and sizing of solar energy technologies [5–9]. Thus, it is

G. Laukaitis (Ed.): INTER-ACADEMIA 2018, LNNS 53, pp. 1–9, 2019.

https://doi.org/10.1007/978-3-319-99834-3_35

2 M. Torabi et al.

essential to be able to accurately predict the solar radiation using proper techniques

even at the absence of adequate data.

Several data mining techniques have been employed in business and medical sci-

ences [10], and in recent times, the focus has been on exploring approaches to deter-

mining patterns in data set that can be used for description and prediction. Data mining

is considered as an inductive machine learning (ML) technique, where the past data set

is utilized for training and learning the model of interest. This learning is representing

via determining the relationships among the variables and extracting meaningful pat-

terns. The objective of data mining is to use these meaningful patterns for the purpose

of accurate prediction [11, 17]. Artiﬁcial neural network (ANN) and support vector

machines (SVMs), two well-known data mining techniques, have been successfully

used to estimate global solar radiation. For example, Mubiru and Banda [12] used ANN

technique for estimation of monthly mean daily global solar irradiation at several

locations in Uganda. Jiang [13] proposed an ANN model to estimate monthly mean

daily global solar radiation in different cities of China. The evaluation of their model

shows better precision than the empirical models examined in the paper. Najaﬁ et al.

[14] developed a coupled ANN algorithm to predict daily solar radiation in a number of

cities in Iran. It was found that the proposed algorithm achieves a better performance

than the Angstrӧm-Prescott model. Mathioulakis et al. [15] applied an advanced ANN

technique in the daily prediction. In their novel work, a number of different sets of

input parameters has been used. They further propose ANN as an effective method to

predict the solar radiation for a global estimation. Azeez [16] studied the monthly

prediction through using maximum ambient temperature, Sunshine duration, and rel-

ative humidity as the required input parameters. In addition, Mosavi et al. [17]

reviewed similar methods of prediction. In another study, Chen et al. [18] evaluated the

usage of SVMs for predicting the monthly mean based on the site’s minimum and

maximum temperature employing different functions of SVM with promising results.

Furthermore, Chen et al. [19] proposed a number of duration-based SVM algorithms

which showed superior results. Mosavi and Varkonyi [20] also utilized SVMs to

predict solar radiation considering the ambient temperature. Chen and Li [21] assessed

the performance of 20 SVM for estimation and reported that using SVM-based models

could result in better accuracy compared to ANN models.

Guermoui et al. [22] evaluated the utility of two support vector regression

(SVR) models, based on the radial basis function and the polynomial basis function, for

prediction of monthly mean daily global solar radiation. Their funding’s indicated SVR

based on the polynomial basis function have better accuracy over SVR based on the

radial basis function [23, 27, 28]. A number of authors have also attempted to achieve

better accuracy in estimating solar radiation using the hybrid approaches. For example,

Wu et al. [24] integrated the time delay neural network (TDNN) with autoregressive

and moving average (ARMA) algorithm to predict hourly solar radiation. The hybrid

model provides a higher capability compared to either TDNN model or the ARMA

model alone. Similarly, Moeini et al. [25] proposed a hybrid approach of fuzzy and

hidden Markov models to effectively predict the solar irradiation. Their results

demonstrated that the predictions of the proposed model are close to the training data

set. Halabi et al. [29] developed a hybrid approach by integrating simulated annealing

(SA) and genetic programming (GP). The results of their sensitivity analysis showed

A Hybrid Machine Learning Approach for Daily Prediction 3

that the suggested model provide accurate predictions. Guermoui et al. [22] compared

the precision of a hybrid SVM model with ANN and GP. As an alternative, we propose

a new concept to estimate global solar radiation on a horizontal surface, using a cluster-

based approach (CBA). Our CBA utilizes both ANN and SVM approaches to accu-

rately estimate daily global solar radiation, and this new approach is hereafter referred

to as CBA-ANN-SVM. This hybrid approach enjoys the beneﬁts offered by both ANN

and SVM as well as those of the clustering technique. Clustering analysis classiﬁes the

global solar radiation data into various clusters. This allows us to maximize the

homogeneity of data within the clusters as well as maximizing the heterogeneity

between the clusters. To test the validity of the proposed method, we use measured data

over a period of 10 years, including different meteorological variables and the hori-

zontal radiation, from Kerman region in Iran. We then compared the performance of

the proposed CBA-ANN-SVM method against those using ANN and SVM techniques.

The city of Kerman located is the capital of Kerman province in Iran is used as the case

study in this paper. This studied site is located between 32°N and 25°55/N and also

between 53°26/E and 59°29/E. This location is in the sunniest spot of the region with

the sea level elevation of 1,756 m and the location of 30°29/N and 57°06/E. The region

has a dry and moderate climate. According to the long-term measured data, the monthly

average air temperature varies from 4.6 °C to 26.8 °C and the yearly average is

15.9 °C. The monthly average relative humidity varies between 19% and 53% with the

annual average of 32%. The data set includes 10 years daily sampled data, consisting of

the horizontal global solar radiation (H), sunshine duration (n), maximum and mini-

mum air temperature (Tmax and Tmin) for the period of December 1994 to January

2005. In this study, to ﬁlter the data sets and reduce the abnormalities and inconsis-

tencies in the values the concept of daily clearness index (Kt) was used. For this aim,

we compute Kt and determine and omitte the values of the out of range of

0.015 < Kt < 1 [26, 30]. Kt is deﬁned as the ratio of horizontal global solar radiation

(H) to the radiation on a horizontal surface (Ho). To model the horizontal global solar

radiation via proposed method, the parameters of n, Tmax, Tmin, Ho and maximum

possible sunshine duration (N) are considered as inputs. Furthermore, the values of N

and Ho were computationally modeled utilizing the equations.

Table 1. Pearson correlation coefﬁcient between the global solar radiation and input variables.

**. Correlation is signiﬁcant at the n** N** Tmin** Tmax** Ho**

0.01 level (2-tailed).

Pearson correlation coefﬁcient H 0.716 0.825 0.646 0.764 0.822

solar radiation, the Pearson correlation coefﬁcient between the dependent parameter

(output) and independent parameters (inputs) were calculated using SPSS software.

4 M. Torabi et al.

Table 1, it is noticed that all considered inputs have favorable correlations with global

solar radiation. However, the highest correlation is achieved for maximum possible

sunshine hours (N) while the lowest correlation is obtained for minimum air temper-

ature (Tmin). As one of the most effective graphical methods to determine the corre-

lation, pattern or trend between two parameters is the scatter plot, to illustrate the

correlations attained between global solar radiation and the considered inputs param-

eters their scatter plots are depicted. The scatter plots between H and the inputs n, N,

Tmin, Tmax and Ho are shown in Fig. 1(a–d), respectively.

Fig. 1. Scatter plots of horizontal global solar radiation and the considered input parameters

3 Modeling

In order to build the models, the Clementine software version 12.0 has been utilized.

Three different methods including the SVM, ANN and the hybrid cluster based method

that uses ANN and SVM (CBA-ANN-SVM) have been developed and used for this

research work. In the following, all developed models are explained, and then the best

model with least estimation error is determined.

SVM is one of the new and well-known ML approaches. It is capable to perform

favorably even when the data samples are limited or they are non-linear and also the

dataset is high-dimensional or there exist local minima. SVM is also capable of high

generalization. Figure 2 illustrates the implemented model based upon the SVM

approach. For modeling, initially, the used data sets are brought to Clementine. Source

node that has been named “Imported Data”, reads in data from external source (dataset

that we have preprocessed) into Clementine. A “Partition” node is utilized to split the

data into separate subsets or samples for training and evaluation stages of model

building. For this study, 50% of the data sets were used for the training purpose and

50% of data sets were utilized for the testing purpose. Partition node has “random seed”

option. By this option, we can ensure different samples (by selecting another subset of

data records) will be generated each time the node is executed. By “Type” node, we tell

Modeling node (“SVM” node) whether ﬁelds will be predictor ﬁelds or predicted ﬁelds.

This node also describes data type (string, integer, real, date, time, or timestamp) in a

given ﬁeld. “SVM” node is a Modeling node. This sequence of operations is known as

A Hybrid Machine Learning Approach for Daily Prediction 5

a data stream. When the stream is executed and model is built, the model nugget

(“SVM-Energy”) is created and added to the Models palette in the upper right corner of

the application window. In accordance with Clementine software, to see modeling

result we have to add the model nugget to the stream and attach the model nugget to the

“Type” node, at the same point as the Modeling node. “Analysis” node helps to

determine whether the model is acceptably accurate. Building the SVM model requires

a trade-off between maximizing the margins and the minimizing learning error. The

Clementine software has a regularization parameter “c”, which is used to regulate this

trade-off. Increasing c leads to higher classiﬁcation accuracy (reduced regression error)

but it may also cause overﬁtting. In this study, three different kernel functions of linear,

polynomial, and sigmoid are tested. After building each model, its performance to

estimate global solar radiation was evaluated by calculating the mean absolute per-

centage error (MAPE) and standard deviation (SD). The MAPE is obtained by:

N i

1X

i

Hesti Hmaes

100

MAPE ¼ i ð1Þ

N i¼1 Hmeas

Where Hiesti and Himeas are the ith predicted the global solar radiation values,

respectively, and N represents the total number of data samples. In order to develop the

ﬁnal SVM model with the lowest MAPE, a polynomial function with adjustment

parameter of 8 and gamma parameter of 2.5 was used. Table 2 shows the attained

MAPE and SD values for prediction of global solar radiation employing the proposed

SVM model. The signiﬁcance of each considered input element to predict global solar

radiation based through the proposed SVM is shown in Fig. 2. According to the Fig. 2,

it is noticed that, Tmin has a little importance on estimation of global solar radiation

using SVM model while the highest importance belongs to the N.

The second method employed to predict global solar radiation is advanced on the basis

of ANN technique. The implemented model on the basis of ANN is also shown in

Fig. 2. Similar to the SVM method, in the beginning, the used data sets are brought to

Clementine. The “Partition” node is used to divide the data into two subsets for training

and evaluation stages of model building. After building the model, the “Analysis” node

is used to determine that whether there is any overﬁtting. Considering the supervised

ANN, every single learning phase is named a cycle. These cycles continue till the

networks’ weight get stable. The parameter “Persistence” is set equal to 400 in this

model which means that if till 400 cycles the error would remain constant then the

model has become stable. For various settings, the global solar radiation modeling was

conducted and subsequently the MAPE and SD values were computed. The ﬁnal and

best model was built with one layer of input, two hidden layers and one output layer.

The achieved MAPE and SD values using the best ANN model developed is presented

in Table 2. It is observed that n is the most relevant element whereas Tmin and Tmax of

which influences on estimation are close to each other have the least signiﬁcance.

6 M. Torabi et al.

Approach)

Another model developed in this research work is based upon clustering. The goal is to

verify the strength of clustering for global solar radiation estimation. The architecture

of hybrid cluster based model is as follow: Step 1: Clustering, Step 2: Modeling for

each cluster. One of the important points regarding the clustering is determining the

number of clusters. The two step algorithm has the advantage which makes it possible

to specify the number of clusters manually. Also, the algorithm can calculate the

number of clusters automatically. In fact, there is no need to initial choice of the

number of clusters. In addition, the algorithm is not sensitive to outliers’ data, although

in this study the outliers were omitted from data sets using solar data cleaning process.

Thus, the two step algorithm has been utilized in this study. To analyze the rules on

clusters, the decision tree and c5.0 algorithm have been used as presented in Fig. 2.

Fig. 2. Model implementation using SVM, ANN and the 2-step algorithm for clustering.

The clustering was performed on the basis of considering different variables such

as: (1) H and month, (2) H, number of month and n as well as (3) H, month and number

of days. For all their cases, the data sets were clustered to 12 clusters based on the

number of months. Thus, the number of month is the influential variable in clustering.

According to the analysis conducted using c5.0 algorithm, the governing rules on the

clusters are presented. Thus, in the ﬁrst step, based on the variable of month and using

the unsupervised learning method, the inputs are clustered and divided to a series of

sub-sets which have the similar features (homogeneous groups). In the next step, the

estimations are conducted separately in each clusters using one of the techniques of

ANN and SVM, considered as supervised learning. Figure 2 offers a graphical repre-

sentation of data and distribution ﬁelds (H and number of month) between the clusters.

It shows that the signiﬁcance of variables is equal to 1 which indicates the high

importance of these two variables in clustering. After clustering, modeling was per-

formed separately on each cluster. For each cluster, the SVM and ANN methods were

used. The data sets were divided into two subsets for training and testing by Partition

node. To obtain the ﬁnal error of models, the results of the clusters were combined

together separately. Figure 2 illustrates the implemented model on the basis of hybrid

cluster based method. In the Table 2, the utilized models as well as the obtained values

MAPE and SD for the hybrid cluster based approach are presented for each cluster.

A Hybrid Machine Learning Approach for Daily Prediction 7

Table 2. The obtained MAPE and SD for the SVM, ANN and CBA-ANN-SVM models.

Model MAPE SD

SVM 1.565 2.806

ANN 1.603 2.735

CBA-ANN-SVM 1.342 2.256

Table 2 presents the comparisons between the performances of all three models based

on obtained MAPE and SD values.

In the hybrid cluster based approach (CBA-ANN-SVM), it is found that number of

months is an important factor in clustering. In fact, during the data clustering, the data

sets are assigned in the target cluster based upon the number of months. Afterwards for

estimation of horizontal global solar radiation, the proposed model utilizes the target

cluster according to the number of months. The results offered in Table 2 is the ver-

iﬁcation regarding the beneﬁts of utilizing the cluster based method to predict the

global solar radiation. As the lowest error values is achieved for the hybrid CBA-ANN-

SVM model, this model is introduced as the superior one for estimation of global solar

radiation.

4 Conclusions

In this study, a Cluster-Based Approach (CBA) was introduced to estimate daily global

solar radiation on a horizontal surface. For this aim, the clustering paradigm along with

ANN and SVM techniques were utilized in our proposed hybrid approach (CBA-ANN-

SVM). To demonstrate the practicality of CBA-ANN-SVM, we evaluated the approach

using 10 years of measured data sets from an Iranian city located in a sunny part of the

country. The measured sunshine hours, calculated the maximum amount of the possible

sunshine hours, maximum and minimum air temperatures, and extraterritorial solar

radiation were used as inputs for the prediction of global the solar radiation. Clustering

was performed to categorize the global solar radiation data into the clusters. It was

found that number of months is a signiﬁcant parameter in clustering. To achieve this,

the clustering was performed according to the month of the year, so that the data sets

could be clustered into 12 clusters based on the month. This allowed us to maximize

the homogeneity of data within the clusters and the heterogeneity between the clusters.

Our evaluation of the CBA-ANN-SVM approach indicated that this approach resulted

in a higher accuracy compared to using ANN and SVM techniques. For example, the

MAPE using our approach is 1.342%, as compared to 1.603% and 1.565% using ANN

and SVM, respectively.

Acknowledgment. This work has partially been sponsored by the Hungarian National Scientiﬁc

Fund under contract OTKA 129374 and the Research & Development Operational Program for

the project “Modernization and Improvement of Technical Infrastructure for Research and

Development of J. Selye University in the Fields of Nanotechnology and Intelligent Space”,

8 M. Torabi et al.

ITMS 26210120042, co-funded by the European Regional Development Fund. Dr. Mosavi

contributed in this research during the tenure of an ERCIM Alain Bensoussan Fellowship Pro-

gramme. The support and research infrastructure of Institute of Advanced Studies Koszeg, iASK,

is acknowledged.

References

1. Hernandez, R.: Environmental impacts of utility-scale solar energy. Renew. Sustain. Energy

Rev. 29, 766–779 (2014)

2. Hosseini, E.: A review on green energy potentials in Iran. Renew. Sustain. Energy Rev. 27,

533–545 (2013)

3. Torabi, M., et al.: A Hybrid Clustering and Classiﬁcation Technique for Forecasting Short-

Term Energy Consumption, Environmental Progress & Sustainable Energy. Wiley, Hoboken

(2018)

4. Mekhilef, S.: A review on solar energy use in industries. Renew. Sustain. Energy Rev. 15,

1777–1790 (2011)

5. Imani, M.H.: Strategic behavior of retailers for risk reduction and proﬁt increment via

distributed generators and demand response programs. Energies 11(6), 1–24 (2018)

6. Rusen, S.: Estimation of daily global solar irradiation by coupling ground measurements of

bright sunshine hours to satellite imagery. Energy 58, 417–425 (2013)

7. Darvishzadeh, A.: Modeling the strain impact on refractive index and optical transmission

rate. Physica B: Condens. Matter 543, 14–17 (2018)

8. Ulgen, K., Hepbasli, A.: Diffuse solar radiation estimation models for Turkey’s big cities.

Energy Convers. Manag. 50, 149–156 (2009)

9. Karakoti, I., Pande, B., Pandey, K.: Evaluation of different diffuse radiation models for

Indian stations. Renew. Sustain. Energy Rev. 15, 2378–2384 (2011)

10. Mosavi, A.: The large scale system of multiple criteria decision making. Large Scale

Complex Syst. Theory Appl. 9(1), 354–359 (2010)

11. Vargas, R., Mosavi, A., Ruiz, L.: Deep learning: a review. In: Advances in Intelligent

Systems and Computing (2017)

12. Mubiru, J.: Estimation of monthly average daily global solar irradiation using artiﬁcial neural

networks. Sol. Energy 82, 181–187 (2008)

13. Jiang, Y.: Computation of monthly mean daily global solar. Energy 34, 1276–1283 (2009)

14. Najaﬁ, B., et al.: An intelligent artiﬁcial neural network-response surface methodology

method. Energies 11(4), 860 (2018)

15. Mathioulakis, E.: Artiﬁcial neural networks for the performance prediction of heat pump hot

water heaters. Int. J. Sustain. Energ. 37(2), 173–192 (2018)

16. Azeez, A.: Artiﬁcial neural network estimation of global solar. Appl. Sci. Res. 3(2), 586–595

(2011)

17. Mosavi, A., et al.: Predicting the future using web knowledge: state of the art survey. In:

Advances in Intelligent Systems and Computing, vol 660. Springer, Heidelberg (2018)

18. Chen, L.: Estimation of monthly solar radiation from measured temperatures using support

vector machines-a case study. Renew. Energy 36, 413–420 (2011)

19. Chen, J.L.: Assessing the potential of support vector machine for estimating daily solar

radiation using sunshine duration. Energy Convers. Manag. 75, 311–318 (2013)

20. Mosavi, A., Varkonyi-Koczy, A.R.: Integration of machine learning and optimization for

robot learning. In: Advances in Intelligent Systems and Computing. Springer, Heidelberg

(2017)

A Hybrid Machine Learning Approach for Daily Prediction 9

21. Chen, J.L., Li, G.S.: Evaluation of support vector machine for estimation of solar radiation

from measured meteorological variables. Theor. Appl. Climatol. 115, 627–638 (2014)

22. Guermoui, M.: Support vector regression methodology for estimating global solar radiation

in Algeria. Eur. Phys. J. Plus 133(1), 22 (2018)

23. Keshtegar, B.: Comparison of four heuristic regression techniques in solar radiation. Renew.

Sustain. Energy Rev. 81, 330–341 (2018)

24. Wu, J., Chan, C.K.: Prediction of hourly solar radiation using a novel hybrid model of

ARMA and TDNN. Sol. Energy 85, 808–817 (2011)

25. Moeini, I., et al.: Modeling the time-dependent characteristics of perovskite solar cells. Sol.

Energy 170, 969–973 (2018)

26. Mosavi, A., et al.: Industrial applications of big data: state of the art survey. Adv. Intell. Syst.

Comput. 660, 225–232 (2017)

27. Mosavi, A., et al.: Review on the usage of the multiobjective optimization package of

modeFrontier in the energy. In: Advances in Intelligent Systems and Computing, pp. 217–

224 (2017)

28. Mosavi, A., et al.: Reviewing the novel machine learning tools for materials design. In:

Advances in Intelligent Systems and Computing, pp. 50–58 (2017)

29. Halabi, L.M.: Performance evaluation of hybrid adaptive neuro-fuzzy inference system

models for predicting monthly global solar radiation. Appl. Energy 213, 247–261 (2018)

30. Moeini, I., et al.: Modeling the detection efﬁciency in photodetectors with temperature-

dependent mobility and carrier lifetime. In: Superlattices and Microstructures (2018)

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.