College Data Mining

1.
Briefly discuss how the Lerner College of Business could use data mining in each of the
following situations:
a. When deciding which undergrads to admit to the College of Business as internal transfers
from other Colleges :
A data mining tool can help Lerner College of Business in selecting best candidates out of the
total number of applications for internal transfers in their admissions database. It can also
predict with certain accuracy, if the candidate will graduate or not:
Classification: On a training data set, rule induction algorithm will learn to separate the
subjects of study according to class labels like: undergraduate, transfer, same majors,
different majors, current college, cumulative grade point average, etc. Once the training
data set starts producing relevant classifications, the model can then be used on validated data
set.
Another technique that can be used to segregate the best candidates out of the lot is the use of
decision trees through if-and then statements. A sample decision is represented on the next
page. This technique is useful when the data labels or variables are finite and hierarchical as
compared to neural networks or rule induction algorithms.
Clustering: can be a useful technique, if the data labels are unknown. Typologies like k-means
clustering or TwoStep can be used to segregate the data points according to similarity in a large
data set. By the TwoStep clustering algorithm, we can differentiate between the transfers and
other applicants. This can further be confirmed by k-means algorithm.
Prediction: The National Student Clearing House now allows community colleges and
universities to match their data. This means that data miners and decision makers can now
compare academic behavior of a student at a community college, if s/he is applying at Lerner,
especially if they are transferring from another major - to predict what their transfer outcome
might be: dropouts speeders or laggards. Using the transfers cluster and then splitting
it further into speeders who quickly complete their degrees because of their privileged socio-
economic backgrounds or laggards who take their time in completing it or dropouts who
will never complete the course. Other variables they will compare are student demographics,
courses taken, units accumulated and financial aid then doing supervised data mining through
neural networks (Neural Net) and rule induction algorithms (C5.0 or C&RT) simultaneously can
give the tool a prediction accuracy from anywhere between 72 80%.
Appli cants
Undergraduate Graduate
No Yes
Transfer New applicant
Yes No
GPA => 3.0 GPA <= 3.0
Yes No
Pre-requisite or
equivalent
courses
Ot her courses
No
Yes
Requi red
Accreditation of
current school
Not from
accredited
current school
Yes No
Admissions Office

b. When scheduling classes for MBA students in the part-time program, i.e., which classes to
offer each semester and which night to offer them on.

Similar data mining techniques that are explained earlier can be used in scheduling classes for
part-time MBA students at Lerner. Also, these techniques can help manage how each class
should be placed in the curricula and how to place them per week.
Pattern recognition can help identify hidden patterns among which courses a part-time MBA
student takes per semester and how frequently the classes for each are scheduled per week in
other universities. This can help formulate the generic rules for credits required for part-time
students.
An association rule between the generated course results from the above experiment can be
compared with the availability of the lecturers which can help in the formation of the academic
calendar accordingly. Another rule can help determine which students go on vacations
frequently and which dont this can give the college an insight into which semester the core
courses should be placed in and how electives should be scheduled in semesters where
probability of vacations taken by lecturers and students is higher.
A MapReduce implementation, though a big data tool, can help the university understand the
traffic conditions on different weekdays around the university area. This can further provide
insight as to which days the university can schedule its classes.
Classification: on the basis of data labels like pre-requisite courses taken, professional
profile, majors selected, credits completed, years of experience, etc. can be done to
segregate those part-time students that can be interested/eligible in taking up the courses that
are available, on those nights when the lecturers are free and when students too can come to
the college.
Clustering: Skills that can be beneficial for the completion of each course can be clustered
together. Association rules can be applied to these clusters to identify relationships among
different skills and their commonality among different courses. The clustered and most popular
skills can be grouped into a pre-requisite course and can be placed in the beginning of the
academic year for the MBA student.
On the basis of historical data, sequential relationships can be determined between which
classrooms will be available each night and their respective capacity. A Prediction algorithm
(C4.5) can be used in determining the number of applicants that will enroll for classes
scheduled each semester.
2. Accenture is an international consulting firm. Go to the URL below and listen to the mp3
audio file entitled Analytics Panel during the Tribeca Film Festival. Then list the three most
important points that you feel it makes.
This talk at Tribeca Film Festival (2009) revolves around: Analytics. Decisions. Execution.
Analytics: Each definition suggested by the experts in this talk holds true. It is statistically
rigorous technique against data, but you need to define the right attributes, ask the right
questions, each of these attributes should be correctly prioritized and weighed, so that you can
take the right decisions based on the analysis done and then execute them timely.
It is not only about technology, but about people and processes as well. It needs to be
engrained in the business processes and supported by heuristics. The companies who use
analytics are innovative in a rigorous way. They do it consistently. It is not a one-off gimmick.
Decisions: Decisions can be about anything small or big. They can be about predicting injuries
or optimizing expenditure or even which movie will be the biggest hit of the year or who is
statistically best to star in it!
You cannot expect analytics to give you 100 percent data. The case in point is how well you can
decide on the basis of data as compared to data absence. The aim is not to get the best solution
ever, but better than any competitor. Studies show that managers can make better decisions
with the help of data. The intent is to get a balance between art and science. If you dont ask
the right questions at the right time, you cannot make analytics help you. This is where
knowledge and experience come into picture.
Execution: The last major point this group makes is that just taking right decisions based on
facts is not enough. It is imperative to make sure that these decisions are executed and
implemented till the Z. There should be parameters, metrics and ways to measure effectiveness
of the experiment being conducted. That is how they can ensure that the decision is reaping
results as desired or not. If not, they would have to tweak their questions, then do the analytics
and re-run the implementation till the time the result is nearer to the desired goal. This has to
be a consistent effort too. May be starting bottoms up can give analytics an edge to develop
over time, it can be easier to start, get some early wins, cheaper and can have cleaner
performance metrics to match-up with.

College Data Mining

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

College Data Mining

Загружено:

Авторское право:

Доступные форматы

1.

Вам также может понравиться