Data Warehousing and Mining: Unit: Introduction and Datawarehousing

DATA WAREHOUSING AND MINING
I. UNIT: INTRODUCTION AND DATAWAREHOUSING 2MARKS QUESTIONS WITH ANSWERS: Q1. Define data mining? ANS: Data mining is known as Knowledge Discovery from Data (KDD). It is the process of extracting or mining knowledge from large amount of data stored in database, data warehouse and repository. Q2. Define data warehouse? ANS: Data warehouse is a repository of information collected from multiple sources, stored under unified schema that usually reside at single site and it facilitate management decision making. Features: a. b. c. d. Subject Oriented. Integrated. Time Variant. Non Volatile.
Q3. Define data cleaning and data integration? ANS: Data cleaning It is a data preprocessing technique that can be applied to remove noise and correct inconsistencies in the data. Data integration merges data from multiple sources into a coherent data store, such as a data warehouse or a data cube. Q4. Define data transformation? ANS: Data are transformed or consolidated into forms appropriate for mining by performing summary or aggregate operations. Normalization may be applied. For example, normalization may improve the accuracy and efficiency of mining algorithms involving distance measurements.
Q5. What is the difference between spatial and temporal database? ANS: TEMPORAL DATABASE a. Stores relational data include time related attributes. b. Not represented in raster format. c. Example: Inventory Control and Stock Exchange. SPATIAL DATABASE Contain spatial related information. Represented in raster format. Example: Geographic database, Computer Aided Design, VLSI and satellite Image.
Q6. Explain about text and multimedia database? ANS: Text Database: Database that contain word description for objects. May be highly unstructured (such as World Wide Web) and some semi structured (such as Email message, HTML/ XML web pages).
Multimedia Database: Store image, audio and video data. Applications like picture content based retrieval, voice mail system, WWW, speech based user interface. Specialized storage and search techniques are required and to be integrated with data mining methods.
Q7. Give any 3 mining functionalities? ANS: Functionalities described below: a. Concept / Class description: Characterization and Discrimination. b. Association analysis. c. Classification and Prediction. d. Cluster analysis. e. Outlier analysis. f. Evolution analysis. Q8. Define data characterization and data discretization? ANS: Data Characterization is summerization of general features or characteristics of target class of data.
Data Discrimination is comparison of general features of target class of data with general features of object from 1 or set of contrasting class. Q9. What are different types of Association Rule? ANS: Types of Association Rule are: a. Single Dimensional Association Rule: Association rule that contain a single attribute or predicate. b. Multi Dimensional Association Rule: Association rule that contains more than one attributes or predicates. Q10. Define Classification and Prediction? ANS: Classification is the process of finding a model that describes and distinguishes data class or concepts for purpose of being able to use model to predict class of objects whose class label is unknown. Predicts discrete and unordered labels. Prediction models continuous valued function. It is used to predict missing or unavailable numerical data values rather than class label. Q11. Define Cluster analysis and outlier? ANS: Cluster analysis is the process of grouping of data into class or concepts based on maximizing intra class similarity and minimizing inter class similarity. Outlier: A database may contain data objects that do not comply with general behavior model of the data. These data objects are outliers. Q12. Give major issues in data mining? ANS: The major issues in data mining are a. Mining methodology and user interactive issues. b. Performance issues. c. Diversity of database types.
Q13. Difference between operational database system and data warehouse? ANS: OPERATIONAL DATABASE SYSTEM a. Online Transaction processing. b. Customer oriented. c. ER data model. d. Current day to day operations. e. Read as well as write. DATA WAREHOUSE Data analysis and decision making. Market oriented. Star. Snowflake and fact constellation. Large amount of historical data. Read alone.
Q14. Difference between OLTP and OLAP system? ANS: FEATURES Characteristic Orientiation DB design Function Access DB size ONLINE TRANSACTION PROCESSING SYSTEM Operational processing. Transaction oriented. ER data model. Day to day operation. Read as well as write. 100 MB to GB. ONLINE ANALYTIC PROCESSING SYSTEM Information processing. Subject/ Market oriented analysis. Star. Snowflake and fact constellation. Long term processing. Read alone. 100 GB to TB.
Q15. Give types of data warehouse model? ANS: Types are Star Model Snowflake Model Fact Constellation
Q16. Give types of OLAP operator? ANS: Types of OLAP operator are Roll up. Drill down. Slice and dice and Pivot.
Others: Drill Across. Drill through.
Q17. Give syntax for cube and dimension definition? ANS: Syntax for Cube Definition: define cube <cube_name> [<dimension_list>] : <measure_list> Syntax for Dimension Definition: define dimension <dimension_name> as (<attribute_or_dimension_list>) Q18. Give important measures in data warehouse? ANS: Important measures in data warehouse are a) Distributive. b) Holistic and c) Algebraic. Q19. Define Concept Hierarchy? ANS: Concept Hierarchy defines a sequence of mapping from set of low level concepts to higher level general concepts. Eg: Street << City << state << Country. Q20. Define Roll Up? ANS: Roll Up is also known as drill up and performs aggregation on data cube either by climb up concept hierarchy for dimension or by dimension reduction. Makes Concept Hierarchy upward. Eg: Location Street << City << state << Country.
Q21. Define Drill Down? ANS: Drill Down is reverse of roll up and navigates from less detailed data to more detailed data. It can be realized by stepping down concept hierarchy for dimension or introducing added dimensions. Eg: Time Day<month<quarter<year. Q22. Define Slice and Dice? ANS: Slice performs selection on one dimension of the given cube resulting in sub cube. Dice operation defines subcube by performing selection on two or more dimensions. Q23. Define Pivot? ANS: Pivot also known as rotate and is visualization operation that rotates the data axes in view in order to provide an alternative presentation of the data. Eg: 2D or 3D cubes. Q24. Define Starnet Query model? ANS: The querying of multidimensional database based on starnet model. It consist of radial lines emanating from central point, where each line represent concept hierarchy for dimension. Each abstraction level in hierarchy called foot print. Eg: location Country State City Street Day name Month Quarter Year time
Q25. Define datamart? ANS: A datamart contains a subset of corporate wide data that is of value to specific group of users. Implemented using UNIX/ LINUX or Windows. Categories are dependent and independent. Q26. Give types of OLAP server? ANS: Implementation of warehouse server for OLAP processing include: Relational OLAP (ROLAP) Server. MultiDimensional OLAP (MOLAP) Server. Hybrid OLAP.
Q27. Give formula to find out total number of cuboids present in n dimension datacube? ANS: Total no of cuboids = i=1 n (Li+1) Where Li = no of levels associated to dimension i. Q28. Give types of materialization? ANS: Types of materialization given a base cuboid: a) No materialization. b) Partial materialization. c) Full materialization. Q29. Define Partial and Full materialization? ANS: Full materialization precomputes all of cuboids. The resulting lattice of computed cuboid called full cube. Require huge amount of memory. Partial materialization: Selectively computes proper subset of whole set of possible cuboids. We compute only user specified criterion subset of cube. Q30. What are advantages of bit map indexing OLAP? ANS: Advantages: 1. Low Cardinality i.e. reduces processing time. 2. Reduction in space. 3. Searching is easier.
Q31. What are advantages of join indexing in OLAP? ANS: Advantages: 1. Register joinable rows of 2 or more forms relational database. 2. Cross table search possible as it maintains relationship between primary and foreign key. 3. Reduce overall cost of OLAP join operation. Q32. Define metadata? ANS: Metadata are the data that defines warehouse objects. It is the data about data. A metadata repository provides details regarding warehouse structure, data history and algorithm for summerization and mapping. Q33. What is the difference between data mining and data warehousing? ANS: Data mining is the process of extracting and analyzing data in a data warehouse or datamart. Data Warehousing is the aggregation of data from operational system to support data mining or business intelligence applications. materialization

Data Warehousing and Mining: Unit: Introduction and Datawarehousing

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Warehousing and Mining: Unit: Introduction and Datawarehousing

Загружено:

Авторское право:

Доступные форматы

DATA WAREHOUSING AND MINING

Others: Drill Across. Drill through.

Вам также может понравиться