Вы находитесь на странице: 1из 3

data warehouse

Data warehouse is a Subject oriented, Integrated, Time variant, Non volatile collection of data in
support of management's decision making process.

The goal of a data warehouse is to take all the data that an organization has, put it all under one roof,
and organize it in such a way that people can query it in an ad hoc fashion, and glean insight.
To accomplish this, you have to use special techniques, and special tools. You also have to be very
good at grabbing data from all over the place, harmonizing/standardizing it, and shoving it in your data
warehouse.
Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more about
your company's sales data, you can build a warehouse that concentrates on sales. Using this
warehouse, you can answer questions like "Who was our best customer for this item last year?"
This ability to define a data warehouse by subject matter, sales in this case makes the data
warehouse subject oriented.
Integrated
Integration is closely related to subject orientation. Data warehouses must put data from
disparate sources into a consistent format. They must resolve such problems as naming
conflicts and inconsistencies among units of measure. When they achieve this, they are said to be
integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change. This
is logical because the purpose of a warehouse is to enable you to analyze what has occurred.
Time Variant
In order to discover trends in business, analysts need large amounts of data. This is very
much in contrast to online transaction processing (OLTP) systems, where
performance requirements demand that historical data be moved to an archive. A data
warehouse's focus on change over time is what is meant by the term time variant.

data mining
Data mining is sorting through data to identify patterns and establish relationships.
Data mining parameters include:

Association - looking for patterns where one event is connected to another event
Sequence or path analysis - looking for patterns where one event leads to another later event
Classification - looking for new patterns (May result in a change in the way the data is
organized but that's ok)
Clustering - finding and visually documenting groups of facts not previously known
Forecasting - discovering patterns in data that can lead to reasonable predictions about the
future (This area of data mining is known as predictive analytics.)
Data mining techniques are used in a many research areas, including mathematics, cybernetics, genetics
and marketing. Web mining, a type of data mining used in customer relationship management (CRM),
takes advantage of the huge amount of information gathered by a Web site to look for patterns in user
behavior.

datamart
A data mart is a repository of data that is designed to serve a particular community of knowledge
workers.

dimension
A dimension is a collection of reference information about a measurable event. In this context, events
are known as "facts." Dimensions categorize and describe data warehouse facts and measures in ways
that support meaningful answers to business questions. They form the very core of dimensional
modeling.
Dimensions are used in data warehouse star and snowflake schemas, OLAP cubes, and business
intelligence (BI) and business analytic (BA) applications.The following dimensions can be used to
meet specific data warehousing needs:
junk dimensions - a collection of miscellaneous attributes that are unrelated to any particular
dimension.
degenerate dimensions - data that is dimensional in nature but stored in a fact table.
role playing dimensions - a dimension that can play different roles in a fact table depending on
the context.
conformed dimensions - a dimension that has exactly the same meaning and content when being
referred to from different fact tables.

conformed dimension
A conformed dimension is a dimension that has the same meaning to every fact with which it relates.
Conformed dimensions allow facts and measures to be categorized and described in the same way
across multiple facts and/or data marts, ensuring consistent reporting across the enterprise.

A conformed dimension can exist as a single dimension table that relates to multiple fact tables within
the same data warehouse, or as identical dimension tables in separate data marts. Date is a common
conformed dimension because its attributes (day, week, month, quarter, year, etc.) have the same
meaning when joined to any fact table. A conformed product dimension with product name,
description, SKU, and other common attributes could exist in multiple data marts, each containing data
for one store in a chain.

star schema
A star schema is the simplest form of a dimensional model, in which data is organized into facts and
dimensions. A fact is an event that is counted or measured, such as a sale or login. A dimension
contains reference information about the fact, such as date, product, or customer. A star schema is
diagramed by surrounding each fact with its associated dimensions. The resulting diagram resembles a
star.
-Describe advantages of the CIF architecture versus the bus architecture with conformed dimensions.
Which would fit best in our environment given [some parameters they give you] and why
-Describe snowflaking
-Describe factless fact tables.
-Draw a star schema of our business
-Describe common optimization techniques applied at the data model level
-How do you handle data rejects in a warehouse architecture?
-Describe common techniques for loading from the staging area to the warehouse when you only have
a small window.
-How do you load type 1 dimensions
-How do you load type 2 dimensions, and how would you load it given our [insert business
particularity]
-How would you model unbalanced hierarchies
-How would you model cyclic relations
-What major elements would you include in an audit model?
-How would you implement traceability?

Вам также может понравиться