Академический Документы
Профессиональный Документы
Культура Документы
16.
17. What are ROLAP, MOLAP and HOLAP? How do they differ from OLAP?
Ans. ROLAP, MOLAP and HOLAP are specialized OLAP (Online Analytical
Analysis) applications.
ROLAP stands for Relational OLAP. Users see their data organized in
cubes with dimensions, but the data is really stored in a Relational Database
(RDBMS) like Oracle. The RDBMS will store data at a fine grain level,
response times are usually slow.
MOLAP stands for Multidimensional OLAP. Users see their data
organized in cubes with dimensions, but the data is store in a Multidimensional database (MDBMS) like Oracle Express Server. In a MOLAP
system lot of queries have a finite answer and performance is usually critical
and fast. HOLAP stands for Hybrid OLAP, it is a combination of both worlds.
Seagate Software's Holos is an example HOLAP environment. In a HOLAP
system one will find queries on aggregated data as well as on detailed data.
18.What are the major Data Mining Processes?
Ans.
1. Decision Trees and Rules
2. Nonlinear Regression and Classification Methods
3. Example-based Methods
4. Probabilistic Graphical Dependency Models
5. Relational Learning Models.
19.Identify at least three of the main data mining methods.
Ans.
1. Regression analysis. Regression models are the mainstay of predictive
analytics. The linear regression model analyzes the relationship between the
response or dependent variable and a set of independent or predictor
variables. That relationship is expressed as an equation that predicts the
response variable as a linear function of the parameters.
2. Choice modeling. Choice modeling is an accurate and general-purpose
tool for making probabilistic predictions about decision-making behavior. It
behooves every organization to target its marketing efforts at customers who
have the highest probabilities of purchase.
Choice models are used to identify the most important factors driving
customer choices. Typically, the choice model enables a firm to compute an
individual's likelihood of purchase, or other behavioral response, based on
variables that the firm has in its database, such as geo-demographics, past
purchase behavior for similar products, attitudes, or psychographics.
3. Rule induction. Rule induction involves developing formal rules that are
extracted from a set of observations. The rules extracted may represent a
scientific model of the data or local patterns in the data.
One major rule-induction paradigm is the association rule. Association rules
are about discovering interesting relationships between variables in large
databases. It is a technique applied in data mining and uses rules to discover
regularities between products.
4. Network/Link Analysis. This is another technique for associating like
records. Link analysis is a subset of network analysis. It explores
relationships and associations among many objects of different types that
are not apparent from isolated pieces of information.
It is commonly used for fraud detection and by law enforcement. You may be
familiar with link analysis, since several Web-search ranking algorithms use
the technique.
20. What are some of the methods for cluster analysis?
Ans. There are a number of different methods that can be used to carry out a
cluster analysis; these methods can be classified as follows:
Hierarchical methods Agglomerative methods, in which subjects start in
their own separate cluster. The two closest (most similar) clusters are then
combined and this is done repeatedly until all subjects are in one cluster. At
the end, the optimum number of clusters is then chosen out of all cluster
solutions. Divisive methods, in which all subjects start in the same cluster
and the above strategy is applied in reverse until every subject is in a
separate cluster. Agglomerative methods are used more often than divisive
methods, so this handout will concentrate on the former rather than the
latter.
Non-hierarchical methods (often known as k-means clustering methods)