Академический Документы
Профессиональный Документы
Культура Документы
Abstract-A countrys growth is strongly measured by socio economic and previous academic performance
quality of its education system. Education sector, across parameters to predict academic performance using
the globe has witnessed sea change in its functioning. data mining techniques. The emotional skills like
Today it is recognized as an industry and like any other assertion, leadership, stress management etc are
industry it is facing challenges, the major challenges of
obtained, using standard Emotional Skill assessment
higher education being decrease in students success
rate and their leaving a course without completion. An process ESAP.
early prediction of students failure may help the
management provide timely counseling as well coaching
to increase success rate and student retention. We use Data mining tasks can be either descriptive or
different classification techniques to build performance predictive. Descriptive data mining uses techniques
prediction model based on students social integration, of association rule mining, clustering etc. to find
academic integration, and various emotional skills patterns hidden in large data set and help in
which have not been considered so far. Two algorithms intelligent decision making. Predictive data mining
J48 (Implementation of C4.5) and Random Tree have
constructs models using rule set, decision tree,
been applied to the records of MCA students of colleges
affiliated to Guru Gobind Singh Indraprastha neural nets, and support vectors etc. to predict the
University to predict third semester performance. class of a new data set.
Random Tree is found to be more accurate in
predicting performance than J48 algorithm. The objective of this paper is to predict the third
semester performance of MCA students. The
Keywords- classification, data mining, prediction, rationale behind considering third semester for
prediction is the observation that most of the
I. INTRODUCTION students drop out of the course after first year and
also students normally take a year to get integrated
Preliminary education adds to a nations literacy rate in an institute academic environment. Two decision
but higher education has a direct impact on the work tree algorithms, J48 and Random Tree, have been
force being provided to the industry and hence used to build the model and the main contribution of
directly affects the economy. this paper is the model comparison along with
finding the impact of various attributes on
Lots of Institutions of higher learning have been set students performance.
up across India. However the quality of education is
judged by the success rate of students and to what The remainder of this paper is organized as follows.
extent an institute is capable of retaining its students. Section 2 discusses previous work followed by
Predicting students performance can help identify experimental settings in section 3.Section 4 presents
the students who are at risk of failure and thus the result and conclusions are discussed in section 5.
management can provide timely help and take
essential steps to coach the students to improve his II. LITERATURE SURVEY
performance.
Most cited literature survey papers in Educational
Data mining techniques have been applied to predict data Ming have been by Romero and Ventura [4],
the academic performance of the students based on Ryan Baker [16], and Romero and Ventura [5] which
their socio economic condition and previous indicate performance prediction as one of the
academic performances. This paper explores the link
emerging field of educational data mining. Paris,
between emotional skills of the students along with
256
neighbor one R and J Rip.J48 has been found to be location and other attributes like gender, medium at
most suitable of all classifiers. secondary level are found to be less relevant
Kabra and Bichkar [15] experimented with 346 first The students dropping out of an open polytechnic of
year students of an engineering college collecting New Zealand due to failure has been explored by
their demographic data (category, gender etc), past Kovaic.Z [18].Enrollment data consisting of socio-
performance data (SSC or 10th marks, HSC or 10 + 2 demographic variables (age, gender, ethnicity,
exam marks etc.), address and contact number to education, work status, and disability) and study
predict whether a student will PASS/FAIL or get environment (course programme and course block),
promoted(When he fails in 3 theory and 2 practical of 435 students of polytechnic students of
subjects). J48 algorithm in WEKA produces a Information system course were collected. The final
prediction model with accuracy 60.46 %. The most label consisting of two categories PASS (those who
important attribute in predicting students completed the course) and FAIL (Those who did not
performance is found to be HSCCET. The social complete) were considered. Feature selection
attributes like category, parents occupation, living indicated that most important attributes for prediction
are ethnicity, course programme and course block.
III. EXPERIMENTAL SETTING data. Data source from the total of 250 instances in
the raw data, the data cleaning process ended up in
The major objective of the proposed methodology is 215 instances.
to build the classification model that classifies a
students third semester performance as BAVG
(<60%), AVG (60% to less than 70%) , ABVG (70% C. Modeling
to less than79%) and EXCL (>=80%) . The
classifiers, has been built by combining the Standard The open source data mining tool Waikato
Process for Data Mining that includes business Environment for Knowledge Analysis (WEKA), has
understanding, data understanding, data preparation, been used for classification. WEKA provides inbuilt
modeling and finally application of data mining algorithms that can be applied to any data set.
techniques which is classification in present study.
D. Classification
A. Data Understanding
Tree-based methods classify instances by sorting the
The data of MCA students from various Institutions instances down the tree from the root to some leaf
affiliated to GGSIP University was collected through node, which provides the classification of a particular
a structured questionnaire. A sample of 250 students instance. Each node in the tree specifies a test of
was collected having 25 attributes which included some attribute of the instance and each branch
academic integration, social integration and descending from that node corresponds to one of the
emotional skills as shown in Table I. possible values for this attribute [16]. J48 is a class
for generating a pruned or unpruned C4.5 decision
B. Data-Preprocessing tree while Random Tree constructs a tree that
considers K randomly chosen attributes at each node
The data collected was saved as Excel spread sheets. without pruning. We have used Cross-validation for
The cleaning process required data eliminating data testing as it has been proved to be more suitable for
with missing values, correcting inconsistent data, limited dataset and gives best estimate of error [10].
identifying outliers, as well as removing duplicate
257
TABLE I. Attributes Description
D. Classification
Tree-based methods classify instances by sorting the for generating a pruned or unpruned C4.5 decision
instances down the tree from the root to some leaf tree while Random Tree constructs a tree that
node, which provides the classification of a particular considers K randomly chosen attributes at each node
instance. Each node in the tree specifies a test of without pruning. We have used Cross-validation for
some attribute of the instance and each branch testing as it has been proved to be more suitable for
descending from that node corresponds to one of the limited dataset and gives best estimate of error [8].
possible values for this attribute [12]. J48 is a class
IV. RESULT AND DISCUSSION
while summary of random tree and rules are shown in
J48 and Random tree were applied on the data set Fig .2 and Table III. The performance of algorithms
using 10 fold cross validation. The summary and the is evaluated on the basis of recall and precision and
rules obtained by J48 are listed in Fig. 1 and Table II, true positive(TP) rate. Precision is defined as number
258
of correct positive prediction over total number of irrelevant and high recall means that most of the
positive prediction and recall is defined as number of results retuned by the algorithm are relevant. The
correct positive prediction over total number of performance comparison of J48 and Random Tree is
positive cases. A high precision indicates that shown in Table IV.
algorithm returns more relevant results than
259
Figure 2. Random Tree Result Summary
260
It is evident from the rules derived from the J48 and x Socio economic conditions are having only
Random tree that marginal effect on performance.
x Result of second semester is key influencer of
third semester result. It is expected also, as the The performance of both the algorithm is
programming subjects of second semester forms satisfactory; however, higher overall accuracy
the foundation of programming subjects of third (94.418%) was attained by Random Tree
semester. implementation as compared to J48 with 88.372%
x Consistently good academic performance is accuracy. Also the True Positive Rate, Precision and
clearly a good indication of good performance in Recall measures of Random tree are higher than J48
third semester too. and in line with the corresponding accuracy.
x Out of all emotional attributes leadership and
drive of the students have been found to affect
the performance.
261
REFERENCES [10] M. Ramaswami, and R. Bhaskaran, A CHAID Based
Performance Prediction Model in Educational Data Mining,
International Journal of Computer Science, Vol. 7, Issue 1,
[1] B. Sen, E. Uar and D. Delen, Predicting and
No. 1.of 2010.
Analyzing Secondary Education Placement-Test
Scores: A Data Mining Approach, Expert System with
[11] M. Wook, Y.H.Yahaya, N. Wahab, M. R.M. Isa, N. F.
Application, Volume 39, Issue 10, 2012.
Awang a International Conference nd H.Y. Seong,
Predicting NDUM Student's Academic Performance Using
[2] B.K.Bhardwaj and S.Paul , Mining Educational Data
Data Mining Techniques, Paper presented at International
to Analyze Students Performance, International
Conference of Computer and Electrical Engineering, ICCEE.
Journal Advanced Computer Science and application
December 28-30. 2009.
Vol. 2 No. 6 , 2011 .
[12] Mitchell, T.: Machine Learning. McGraw Hill, New York
[3] B. M. Bidgoli, D.Koshy, G.Kortemeyer, W.F.Punch,
(1997).
Predicting Student Performance: An Applicant of Data
Mining methods with an educational web based
[13] N. S. Shah, Predicting Factors that Affect Students
system , 33rd ASEE/ IEEE .frontiers in Education
Academic Performance By Using Data Mining, Pakistan
Conference 20004.
Business Review, January 2012.
[4] C. Romero and S. Ventura, Educational data mining: a
[14] P.Cheewaprakobkit, Study of Factor Analysis Affecting
survey from 1995 to 2005, Expert Systems with
Achievements of Undergraduate, Paper presented at
Applications, no. 33, pp. 135146, 2007.
International Multi Conference of Engineers and Computer
Scientists, IMECS , Hong Kong, HK, March 13 - 15, 2013.
[5] C.Romero ans S, Ventura, Educational Data Mining:
A Review of the State of the Art,IEEE Transaction on
[15] R. R. Kabra, R.R, Bichkar , Performance Prediction of
Systems, Man, and Cybernatics,Vol.40,No.6,2010.
Engineering Students using Decision Trees, International
Journal of Computer Applications, Volume 36, No.11, 2011.
[6] D.Kabakchieva, Predicting Student Performance by
using Data Mining methods for classification. ,
[16] R.S.J.D Baker and K.Yacef, The State of Educational Data
Cybernetics and Information Technologies, Volume 13,
Mining in 2009: A Review and Future Visions , Journal of
2013.
Educational Data Mining, 1, Vol 1, No 1, 2009.
[7] E. Osmanbegovic and M. Suljic, Data mining
[17] T.Nghe, J.Paul , Aneek and Peter Heddawy, A Comparitive
Approach for Prediction of Student Performance
Analysis of Techniques for Predicting Academic
Economic Review - Journal of Economics &
Performance, Paper presented at 37th ASEE/IEEE
Business Vol. 10, issue 1, 2012.
Conference, Frontiers in Education Conference - Global
Engineering: Knowledge Without Borders, Opportunities
[8] IH. Witten and E. Frank, Data Mining: Practical
machine learning tools and techniques. San Francisco: Without Passports, Milwaukee,WI,October 10-13,2007.
Morgan Kaufmann, 2 ed., 2005.
[18] =-.RYDL(DUO\3UHGLFWLRQRI6WXGHQW6XFFHVV
[9] I.H. M. Paris, L.S. Affecndy and N.Musthafa, Mining Students Enrolment Data, Paper presented at
Improving Performance Prediction using Voting Proceedings of Informing Science & IT Education
technique in data Mining, World Academy of Science, Conference (InSITE) ,Casinio Italia, June, 19-24,2010.
Engineering and Technology World Academy of
Science, Engineering and Technology, Vol 38,2010.
262