Вы находитесь на странице: 1из 11

Assignment

CAP-617
Datawarehouse & Data Mining
Submitted to: Respected Neha Maam Submitted by: Ankur Singh Roll no: - RD1802B07

Homework-1 Course Code CAP617T Data warehousing and Data Mining Date of Allocation: 23-08-12 Date of Submission: 08-09-12

1. How we can use Data Mining in any organization? Take an example of lovely professional university and also give the alternative terms that we use for data mining. Ans 1

Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data. Strong patterns can be used to make non-trivial predictions on new data Programs that detect patterns and rules in the data Data mining is ready for application in the business & scientific community because it is supported by three technologies that are now sufficiently mature: o Massive data collection o Powerful multiprocessor computers o Data mining algorithms

Data mining is a process of searching, cleaning, collecting analyzing data from various sources of databases for the purpose of evaluation. Lovely Professional University is also applying data mining in visualizing the student data in following ways or the application used by LPU are as follows:-

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Topper of a school Highest Placement From Which School Analysis and visualization of data Providing feedback for supporting instructors Recommendations for students Student modelling Detecting undesirable student behaviours Grouping students Social network analysis Developing concept maps Best all-rounder. Calculation of CGPA.

2. Develop a decision tree based on the information provided below: The mens basketball coach wants to look through the student records and produce a list of all full-time male students who are at least: 6 feet 5 inches (or 77 inches) tall who weigh at least 180 pounds. Who played atleast 2 national level tournaments Who has the capacity to play 180 hours in a week Ans 2
Height <77 >77

Rejected

Weight >180

<180

Rejected <2

National Level Tourn >=2

Rejected <180 h

Stemina >=180 h

Rejected

Selected

3. Suppose I have to prepare a report on how an organization or a company is gaining popularity in all the states and countries in which it has its own branches. After collecting information from all of them I will prepare my project report. Name and explain the techniques available for data mining and also explain which the best possible technique to do this task is? And why? Ans 3 DATA MINING TASKS:

Classification: infers the defining characteristics of a certain group (such as customers who have been lost to competitors). Clustering: identifies groups of items that share a particular characteristic. (Clustering differs from classification in that no predefining characteristic is given in classification.) Association: identifies relationships between events that occur at one time (such as the contents of a shopping basket). Sequencing: similar to association, except that the relationship exists over a period of time (such as repeat visits to a supermarket or use of a financial planning product). Forecasting: estimates future values based on patterns within large sets of data (such as demand forecasting).

DATA MINING TECHNIQUES: Statistics Point Estimation Calculating a single value from a given sample data using statistical tech. like mean, median and mode. Data Summarization Creating a histogram is one of the best technique to summarize the data

Bayesian Techniques By using Bayes theorem we can classify data items available in data bank Testing a Hypothesis By applying a mathematical formula Regression Technique of applying sample data to a hypothesis in order to prove or disapprove a hypothesis Machine Learning Generating a computer system capable of acquiring data to generate useful information. Its concept is to make a computer system behave and act like a human being who learns from experience, analysis the observation made. ML enables to discover new and interesting structures about a set of data that are previously unknown. Decision Trees Each branch represents a classification question while the leaves represent the partition of the classified information. Generally suitable for the tasks related to classification and clustering. Meta Learning Combining predictions made from multiple models of data mining and analyzing those predictions to formulate a new unknown prediction. We can use predictions made by different data mining techniques as an input to meta learner.

According to me my report will only be summarized by the statistical analysis because I m doing the survey so I will be generating statistical report so that accordingly I can very my result to any one.

4. List out the comparison and difference between OLAP and OLTP . And also discuss how it is different from data mining? Ans 4
OLTP Used for Transaction Processing Holds current data Represents Operator/Clerical View Normalized efficient Transaction Processing Application driven Stores all data Few indexes Design for OLAP Used for Query Processing Holds historical data Represents Managerial View De-normalized Design for Query Processing

Analysis driven Stores relevant data Many indexes(to improve performance)

Relatively smaller database Volatile Data Many joins Many concurrent users

Large database size Non-Volatile Data Few joins Relatively few concurrent users

OLAP is a design paradigm, a way to seek information out of the physical data store. OLAP is all about summation. It aggregates information from multiple systems, and stores it in a multi-dimensional format. These could be a star schema, snowflake schema or a hybrid kind of a schema. Data mines leverage information within and without the organization to aid in answering business questions. They involve ratios and algorithms like decision trees, nearest neighbour classification and mural networks, along with clustering of data.

OLAP provides summary data and generates rich calculations. For example, OLAP answers questions like "How do sales of mutual funds in North America for this quarter compare with sales a year ago? What can we predict for sales next quarter? What is the trend as measured by percent change?" Data mining discovers hidden patterns in data. Data mining operates at a detail level instead of a summary level. Data mining answers questions like "Who is likely to buy a mutual fund in the next six months, and what are the characteristics of these likely buyers?"

5. Discuss the steps involved in Data Mining Process by taking one example of an organization lets say AIRTEL. Discuss how data mining processing steps are taken care by this company. Ans 5 Statistical analysis:Statistical analysis refers to a collection of methods used to process large amounts of data and report overall trends. Statistical analysis is particularly useful when dealing with noisy data. Statistical analysis provides ways to objectively report on how unusual an event is based on historical data. Advantages Identify and select potential sample members. Contact sampled individuals and collect data from those who are hard to reach (or reluctant to respond). Evaluate and test questions. Select the mode for posing questions and collecting responses. Train and supervise interviewers (if they are involved). Check data files for accuracy and internal consistency. Adjust survey estimates to correct for identified errors. Example :This analysis simplifies that in customer care services how many customer are satisfied according to the survey or a group of questionnaires

Example 2

Thus in my opinion the statistical technique is the best for analyzing a firm like the Airtel

6. In a university there are more than 30,000 students and I want to fetch only those students who are Academic Toppers or sports or any other co curricular activity topper. Which model of data mining I will follow to complete this task and why? Ans 6 Decision tree: Each branch represents a classification question while the leaves represents the partition of the classified information. Generally suitable for the tasks related to classification and clustering. Decision Tree Advantages 1. 2. 3. 4. 5. Easy to understand Map nicely to a set of business rules Applied to real problems Make no prior assumptions about the data Able to process both numerical and categorical data

Likewise I have given in my example to find a student according to some condition:Height <77 >77

Rejected

Weight >180

<180

Rejected <2

National Level Tourn >=2

Rejected <180 h

Stemina >=180 h

Rejected

Selected