Вы находитесь на странице: 1из 6

European Journal of Scientific Research ISSN 1450-216X Vol.43 No.1 (2010), pp.24-29 EuroJournals Publishing, Inc. 2010 http://www.eurojournals.com/ejsr.

htm

Data Mining Model for Higher Education System


Shaeela Ayesha Corresponding Author Department of Computer Science,University of Agriculture, Faisalabad, Pakistan E-mail: ayesha007_uaf@yahoo.com Tasleem Mustafa Department of Computer Science,University of Agriculture, Faisalabad, Pakistan Ahsan Raza Sattar Department of Computer Science,University of Agriculture, Faisalabad, Pakistan M.Inayat Khan Department of Statistics & Mathematics, University of Agriculture, Faisalabad, Pakistan Abstract Data mining is used to extract meaningful information and to develop significant relationships among variables stored in large data set/ data warehouse. In this paper data mining technique named k-means clustering is applied to analyze students learning behavior. Here K-means clustering method is used to discover knowledge that come from educational environment. The students evaluation factors like class quizzes mid and final exam assignment are studied. It is recommended that all these correlated information should be conveyed to the class teacher before the conduction of final exam. This study will help the teachers to reduce the drop out ratio to a significant level and improve the performance of students.

Keywords: Data Mining, Educational Data Mining, Clustering, k-means, Database.

1. Introduction
Data Mining is a process of extracting previously unknown, valid, potentional useful and hidden patterns from large data sets (Connolly, 1999). As the mount of data stored in educational databases is increasing rapidly. In order to get required benefits from such large data and to find hidden relationships between variables using different data mining techniques developed and used (Han and Kamber, 2006). Clustering and decision tree are most widely used techniques for future prediction. The main goal of clustering is to partition students into homogeneous groups according to their characteristics and abilities (Kifaya, 2009). These applications can help both instructor and student to enhance the quality education. This study aims to analyze how different factors effect a students learning behavior and performance during academic career using k-means and decision tree in an educational institute. Decision tree analysis is a popular data mining technique that can be used to explain the interdependencies among different variables such as attendance ratio and grade ratio.

Data Mining Model for Higher Education System

25

Clustering is one of the basic techniques often used in analyzing data sets. This study makes use of cluster analysis to segment students into groups according to their characteristics. The remaining parts of the paper are organized as follow: section 2 described the related work in Education Data Mining. Section 3 provides a general description of the data we used. Section 4 described the process stage of data used. Section 5 reports our experimental analysis of data mining methods applied on educational data set. Finally, we conclude this paper with an outlook for future work.

2. Related Work
Data mining is an emerging methodology used in educational field to enhance our understanding of learning process to focus on identifying, extracting and evaluating variables related to the learning process of students (Alaa el-Halees 2009). (Kifaya, 2009) K-means clustering is a widely used method that is easy and quite simple to understand. Cluster analysis describes the similarity between different cases by calculating the distance. These cases are divided into different clusters due to their similarity. In (Galit, 2007) gave a case study that use students data to analyze their learning behavior to predict the results and to warn students at risk before their final exams. (Han and Kamber, 2006) explained that k-means is a well known clustering algorithm tends to uncover relations among variables already presented in dataset. (Erdogan and Timor 2005) used educational data mining to identify and enhance educational process which can improve their decision making process. Finally (Henrik ,2001) concluded that clustering was effective in finding hidden relationships and associations between different categories of students.

3. Data Mining
Data mining techniques are used to operate on large volumes of data to discover hidden patterns and relationships helpful in decision making (Connolly, 1999). Data mining software allow the users to analyze data from different dimensions categorize it and a summarized the relationships, identified during the mining process (Han and Kamber, 2006. Different data mining techniques are used in various fields of life such as medicine, statistical analysis, engineering, education, banking, marketing, sale, etc (Maclennan., 2005). Cluster analysis used to segment a large set of data into subsets called clusters. Each cluster is a collection of data objects that are similar to one another are placed within the same cluster but are dissimilar to objects in other clusters. (Behrouz.et.al., 2003, Dongsong, 2004). 3.1. Data mining in Higher Education System Education is an essential element for the betterment and progress of a country. It enables the people of a country civilized and well mannered. Mining in educational environment is called Educational Data Mining, concern with developing new methods to discover knowledge from educational databases(Galit, 2007) (Erdogan and Timor 2005) ,in order to analyze students trends and behaviors toward education(Alaa el-Halees , 2009). Lack of deep and enough knowledge in higher educational system may prevent system management to achieve quality objectives, data mining methodology can help bridging this knowledge gaps in higher education system.

4 Proposed Model
In a university results over all performance of a student is determined by internal assessment as well as external assessment. Internal assessment is made on the bases of a students assignment marks, class quiz, lab work, attendance previous semester grade and his/ her involvement in extra curriculum

26

Shaeela Ayesha, Tasleem Mustafa, Ahsan Raza Sattar and M.Inayat Khan

activities. While at the same time external assessment of a student based on marks scored in final exam. The proposed model makes prediction about fail and pass ratio of students based on class performance as well as system inform the students about the ratio of class attendance. The proposed model also deals with entrance ratio of students in a particular department and exit ratio after successful completion of degree. Model was developed using DMX queries available in visual studio 2005. If prev-sem-grade=high, class-quiz=good, assignment=complete, practical-wok=good midterm=good, attendance=regular and then final-grade=good If prev-sem-grade=average, class-quiz=good, assignment=incomplete, practical-wok=good mid-term=average and attendance=regular then final-grade= average If prev-sem-grade=low, class-quiz=average, assignment=incomplete, practical-wok = poor mid-term=low and attendance=irregular then final-grade=low The proposed model identifies the weak students before final exam in order to save them from serious harm. Teachers can take appropriate steps at right time to improve the performance of student in final exam. It deals with both kind of assessments especially internal assessment in order to predict students whose performance is low. This model check the performance of student at different levels before final exam in order to predict weak students and take appropriate steps to save them from failure. 4.1. Application In this study, data gathered from university students was analyzed using a data mining technique namely k-means clustering. In order to apply this technique following steps were performed in sequence: 4.1.2. Data Set The data set used in this study was obtained from department of Computer Science, University of Agriculture, Faisalabad in 2008-09. Initially 120 students were enrolled in the degree. 4.1.3. Database The database management system used in this study was Microsoft SQL server 2005. This software was used because it was compatible and efficient to use with the database management system i.e. relational database and the other reason was that the data was maintained in this database prior to the study. 4.1.4. Application Software The programming environment used for application was Visual studio 2005 for building data mining model. It was suitable for development of mining model and was compatible with SQL Server 2005, in which data was maintained/ stored. 4.2. Data mining Process Data mining process consist of following steps: 4.2.1. Preparations In this step data stored in different tables was joined in a single table after joining process errors were removed. 4.2.2. Data Selection and Transformation In this step we determined the fields of study used for analysis. Data is inform of yes/no is transformed in form of 1/0.

Data Mining Model for Higher Education System

27

4.2.3. Implementation of Mining Model In this step, k-means clustering algorithm was applied to the processed data to get valuable information. K-means is an old and most widely used clustering algorithm developed by MacQueen in 1967(Erdogan and Timor 2005).

4.3. Results and Discussion The model produced following results


Graph.1: Shows the relationship between GPA and Attendance ratio.
Student Record
4 GPA 3 2 1 0 0 20 40 Attendance 60 80 100 GPA Attendance

Data Arrangement in tables We grouped the students regarding their final grades in several ways 3 of which are: Assign possible labels that are same as number of possible grades. Group the students in three classes High Medium and Low. Categorized the students with one of two class labels Passed for grade above 2.0 and Failed for grade less than or equal to 2.0.
Table 1
Class 1 2 3 4 5 6 7 8 9 GPA 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 No. of Students 3 1 7 4 5 35 40 16 9 Percentage 2.50 0.83 5.83 3.33 4.17 29.17 33.33 13.33 7.50

28

Shaeela Ayesha, Tasleem Mustafa, Ahsan Raza Sattar and M.Inayat Khan
Graph 2: Number and percentage of students regarding to GPA

9 8 7 6 5 4 3 2 1 0.0 10.0 20.0 30.0 40.0 Percentage Student # GPA

Table 2
Class High Medium Low GPA >=3.5 2.0 > GPA < 3.5 <= 2.0 No. of Students 27 62 31 Percentage 22.5 51.67 25.83

Graph 3: Shows the percentage of students getting high, medium and low GPA
Percentage of Student's GPA

High Medium Low

Table 3
Class Passed Failed GPA > 2.0 <= 2.0 No. of Students 89 31 Percentage 74.17 25.83

5. Conduction and Future Work


In this study we make use of data mining process in a students database using k-means clustering algorithm to predict students learning activities. We hope that the information generated after the implementation of data mining technique may be helpful for instructor as well as for students. This work may improve students performance; reduce failing ratio by taking appropriate steps at right time to improve the quality of education. For future work, we hope to refine our technique in order to get more valuable and accurate outputs, useful for instructors to improve the students learning outcomes. Some different software may be utilized while at the same time various factors will be used.

Data Mining Model for Higher Education System

29

References
[1] [2] [3] [4] [5] [6] [7] [8] [9] Alaa el-Halees (2009) Mining Students Data to Analyze e-Learning Behavior: A Case Study. Behrouz.et.al., (2003) Predicting Student Performance: An Application of Data Mining Methods With The Educational Web-Based System Lon-CAPA 2003 IEEE, Boulder, CO. Connolly T., C. Begg and A. Strachan (1999) Database Systems: A Practical Approach to Design, Implementation, and Management (3rd Ed.). Harlow: Addison-Wesley.687 Erdogan and Timor (2005) A data mining application in a student database. Journal of Aeronautic and Space Technologies July 2005 Volume 2 Number 2 (53-57) Galit.et.al (2007)Examining online learning processes based on log files analysis: a case study. Research, Refelection and Innovations in Integrating ICT in Education. Henrik (2001) Clustering as a Data Mining Method in a Web-based System for Thoracic Surgery: 2001 Han,J. and Kamber, M., (2006) "Data Mining: Concepts and Techniques", 2nd edition. The Morgan Kaufmann Series in Data Management Systems, Jim Gray, Series Editor. Kifaya(2009) Mining student evaluation using associative classification and clustering. Communications of the IBIMA vol. 11 IISN 1943-7765. ZhaoHui. Maclennan.J, (2005). Data Mining with SQL Server 2005 Wihely Publishing, Inc.

Вам также может понравиться