Вы находитесь на странице: 1из 9

# Amity Campus Uttar Pradesh India 201303

ASSIGNMENTS
PROGRAM: BSc IT SEMESTER-VI
Subject Name Study COUNTRY Roll Number (Reg.No.) Student Name : Data Warehousing and Mining : : :

INSTRUCTIONS a) Students are required to submit all three assignment sets. ASSIGNMENT Assignment A Assignment B Assignment C b) c) d) e) DETAILS Five Subjective Questions Three Subjective Questions + Case Study Objective or one line Questions MARKS 10 10 10

Total weightage given to these assignments is 30%. OR 30 Marks All assignments are to be completed as typed in word/pdf. All questions are required to be attempted. All the three assignments are to be completed by due dates and need to be submitted for evaluation by Amity University. f) The students have to attached a scan signature in the form.

Signature : Date :

_________________________________ _________________________________

## Data Warehousing and Mining

Assignment A
Q1. Discuss various types of concept hierarchies by providing two examples for each type? Q2 Illustrate the typical requirements of clustering data mining. Q3 State various evaluation criteria that are essential for classification and prediction methods. Q4. What is meant by data reduction? Discuss any two data reduction strategies for obtaining a reduced data representation. Q5 Differentiate between STAR and SNOWFLAKE schemas Q6 State the salient differences between data query and knowledge query?

Assignment B
Q.1 Case Study Q1. Suppose that a data warehouse consists of four dimensions date, viewer, cinema hall and movie and two measures count and charge, where charge is the ticket fee that the viewer pays for watching the movie on a given date. The viewers can be children below 5, above 5, adults or seniors with each category having its own charge rate. i) Draw a star schema diagram for data warehouse. ii) Starting with the base cuboid [date, viewer, cinema hall, movie], what specific OLAP operations one should perform in order to list the total charge paid by adults at the cinema hall Paradise in 2004? Q2. Give an example to show that items in a strong association rule may actually be negatively correlated. Q3. What are Bayesian classifiers? Explain the theorem on which Bayesian classification is based.. Q4. Explain the application of data mining in CRM in Healthcare. How Data Mining algorithms can be implemented in CRM.

Assignment C
1) Which of the following statements correctly describe a Dimension table in Dimensional Modeling? 1: Dimension tables contain fields that describe the facts. 2: Dimension tables do not contain numeric fields. 3: Dimension tables are typically larger than fact tables. 4: Dimension tables do not need system-generated keys. 5: Dimension tables usually have fewer fields than fact tables 2) How are dimensions in a Multi-Dimensional Database related? 1: Hierarchically. 2: Through foreign keys. 3: Through a hierarchy and foreign keys. 4: Through a network. 5: Through an inverse list. 3) What is a primary risk of a 'phased' implementation? 1: Previous implementations may need to be reworked. 2: The project may lose momentum. 3: Business Analysts will find problems in the data sooner. 4: Executives will lose focus. 5: The project budget may be exceeded. 4) How do highly distributed source systems impact the Data Warehouse or Data Mart project? 1: The source data exists in multiple environments. 2: The location of the source systems has minimal impact on the Data Warehouse or Data Mart implementation. 3: The timing and coordination of software development, extraction, and data updates are more complex. 4: Large volumes of data must be moved between locations. 5: Additional network and data communication hardware will be needed. 5) OLAP tool (as described above)? 1: Drill down to another level of detail. 2: Display the top 10 items that meet a specific selection criteria. 3: Trend analysis. 4: Calculate a rolling average on a set of data. 5: Display a report based on specific selection criteria. 6) In a Data Mart Only architecture, what will the Data Mart Development Team(s) encounter? 1: There is little or no minimal data redundancy across all of the Data Mart databases. 2: Issues such as inconsistent definitions and dirty data in extracting data from multiple source systems will be addressed several times. 3: Database design will be easier than expected because Data Mart databases support only a single user. 4: There is ease in consolidating the Data Marts to create a Data Warehouse. 5: It is easy to develop the data extraction system due to the use of the warehouse as a single datasource.

7) What is the primary responsibility of the 'project sponsor' during a Data Warehouse project? 1: To manage the day-to-day project activity. 2: To review and approve all decisions concerning the project. 3: To approve and monitor the project budget. 4: To ensure cooperation and support from all 'involved' departments. 5: To communicate project status to higher management and the board of directors. 8) What are Metadata? 1: Data used only by the IS organization. 2: Information that describes and defines the organization's data. 3: Definitions of data elements. 4: Any business data occurring in large volumes. 5: Summarized data. 9) How can the managers of a department best understand the cost of their use of the data warehouse? 1: A percentage of the business department's budget should be directed to the maintenance and enhancement of the Data Warehouse. 2: Institute a charge-back system of computer costs for the access to the Data Warehouse. 3: Develop a training program for department management. 4: Provide executive management with computer utilization reports that show what percentage of utilization is due to the Data Warehouse. 5: Business managers should participate in the acquisition process for computer hardware and software. 10) Which of the following is NOT a consequence of the creation of independent Data Marts? 1: Potentially different answers to a single business question if the question is asked of more than one Data Mart. 2: Increase in data redundancy due to duplication of data between the Data Marts. 3: Consistent definitions of the data in the Data Marts. 4: Creation of multiple application systems that have duplicate processing due to the duplication of data between the Data Marts. 5: Increased costs of hardware as the databases in the Data Marts grow. 11) What is meant by artificial intelligence when it is applied to data cleansing and transformation tools? 1: The tool can perform highly complex mathematical and statistical calculations to create derived data elements. 2: The tool can accomplish highly complex code translations when data comes from multiple source systems. 3: The tool can determine through heuristics the changes needed for a set of dirty data and then make the changes. 4: The tool can perform highly complex summarizations across multiple databases. 5: The tool can identify data that appears to be inconsistent between multiple source systems and provide reporting to assist in the clean up of the source system data. 12) Which of the following classes of corporations can gain the most insights from their legacy data? 1: A corporation that wants to determine the attitude of its customers towards the

corporation. 2: A corporation that offers new products and services. 3: A new corporation. 4: A corporation that has existed for a long time. 5: A corporation that is constantly introducing new and different products and services. 13) Which of the following is NOT found in an Entity Relationship Model? 1: A definition for each Entity and Data Element. 2: Entity Relationship Diagram 3: Entity and Data Element Names 4: Fact and Dimension Tables 5: Business Rules associated with the entities, entity relationships, and the data elements. 14) What is Data Mining? 1: The capability to drill down into an organization's data once a question has been raised. 2: The setting up of queries to alert management when certain criteria are met. 3: The process of performing trend analysis on the financial data of an organization. 4: The automated process of discovering patterns and relationships in an organization's data. 5: A class of tools that support the manual process of identifying patterns in large databases. 15) What does implementing a Data Warehouse or Data Mart help reduce? 1: The data gathering effort for data analysis. 2: Hardware costs. 3: User requests for custom reports. 4: Costs when management downsizes the organization. 5: All of the above. 16) Profitability Analysis is one of the most common applications of data warehousing. Why is Profitability Analysis in data warehousing more difficult than usually expected? 1: Almost every manager in an organization wants to get profitability reports. 2: Revenue data cannot be tracked accurately. 3: Expense data is often tracked at a higher level of detail than revenue data. 4: Revenue data is difficult to collect and organize. 5: Transaction grain data is required to properly compute profitability figures. 17) An operational system is which of the following? A. A system that is used to run the business in real time and is based on historical data. B. A system that is used to run the business in real time and is based on current data. C. A system that is used to support decision making and is based on current data. D. A system that is used to support decision making and is based on historical data. 18) A data warehouse is which of the following? A. Can be updated by end users. B. Contains numerous naming conventions and formats. C. Organized around important subject areas. D. Contains only current data. 19) The load and index is which of the following? A.A process to reject data from the data warehouse and to create the necessary indexes B.A process to load the data in the data warehouse and to create the necessary indexes