Вы находитесь на странице: 1из 12

ASSIGNMENT#2

DATA WAREHOUSE AND


DATA MINING

SUBMITTED TO:
MR.PAWAN KUMAR

SUBMITTED BY:
RAJESH RANA
REG.NO. 7010070005
BCA (Hons.)-MCA
Sec. D37D2

LOVELY PROFESSIONAL
UNIVERSITY
Part A

1. Differentiate Relational data model and multidimensional data


model?

Ans.

Sr Relational data Model Multi-Dimensional data Model

1 Relational models can be very Multidimensional modeling


complex with hundreds of tables are very simple. Each of the
having long chains of relationship dimension table has a direct
among them. relationship with the fact table

2 Normal data modeling is quite The Multidimensional modeling


flexible. has a rigid structure

3 One of the goals of relational Multidimensional modeling


modeling is to confirm to the rules are radicallydenormalized. The
of normalization. In a normalized dimension tables have a high
database each data value is stored number of repeated values in their
only once. fields.

4 Standard relational models are Multidimensional modeling


optimized for On Line Transaction are optimized for On Line
Processing. OLTP needs the Analytical Processing. OLAP needs
ability to efficiently update data. the ability to retrieve data
This is provided in a normalized efficiently. Efficient data retrieval
database that has each value stored requires a minimum number of
only once. joins. This is provided with the
simple structure of relationship in a
Multidimensional modeling, where
each dimension table is only a
single join away from the fact
table.

5 SQL queries required to MDX queries required to


manipulate relational data manipulate multi-dimensional data
6 Tables are units of relational data Cubes are units of multi-
storage. dimensional data storage.

7 Table fields of particular data type Dimensions and measures stores


store the actual data. actual data.

8 Table size is measured in number Cube size is measured in number


of record set. ofCellsets. In this sense, MS Excel
Pivot Table can accommodate
2,040,000cellsets (8000 * 255).
Considering the summary data to
be reported, this is reasonably
large. Pivot Table can report at the
most 8000 different items.

2. What is the significance of Star Schema?

Ans. In the star schema design, a single object (the fact table) sits in the
middle and is radially connected to other surrounding objects (dimension
lookup tables) like a star. Each dimension is represented as a single table.
The primary key in each dimension table is related to a forieng key in the
fact table

Store dimension Time dimension

Fact table

Customer Product dimension


dimension
Star schema

All measures in the fact table are related to all the dimensions that fact table is
related to. In other words, they all have the same level of granularity.

A star schema can be simple or complex. A simple star consists of one fact
table; a complex star can have more than one fact table.

Let's look at an example: Assume our data warehouse keeps store sales data,
and the different dimensions are time, store, product, and customer. In this case,
the figure on the left repesents our star schema. The lines between two tables
indicate that there is a primary key / foreign key relationship between the two
tables. Note that different dimensions are not related to one another.

3. What is the difference between metadata and data dictionary?


Ans. Metadata is structured data which describes the characteristics of a
resource. It shares many similar characteristics to the cataloguing that takes
place in libraries, museums and archives. The term "meta" derives from the
Greek word denoting a nature of a higher order or more fundamental kind. A
metadata record consists of a number of pre-defined elements representing
specific attributes of a resource, and each element can have one or more values.
Below is an example of a simple metadata record:

Element name Value

Title Web catalogue

Creator Dagnija McAuliffe

Publisher University of Queensland Library

Identifier http://www.library.uq.edu.au/iad/mainmenu.html

Format Text/html

Relation Library Web site

Each metadata schema will usually have the following characteristics:


 a limited number of elements
 the name of each element
 the meaning of each element

Typically, the semantics is descriptive of the contents, location, physical


attributes, type (e.g. text or image, map or model) and form (e.g. print copy,
electronic file). Key metadata elements supporting access to published
documents include the originator of a work, its title, when and where it was
published and the subject areas it covers. Where the information is issued in
analog form, such as print material, additional metadata is provided to assist
in the location of the information, e.g. call numbers used in libraries. The
resource community may also define some logical grouping of the elements
or leave it to the encoding scheme. For example, Dublin Core may provide
the core to which extensions may be added.

Some of the most popular metadata schemas include:


 Dublin Core
 AACR2 (Anglo-American Cataloging Rules)
 GILS (Government Information Locator Service)
 EAD (Encoded Archives Description)
 IMS (IMS Global Learning Consortium)
 AGLS (Australian Government Locator Service)

While the syntax is not strictly part of the metadata schema, the data will be
unusable, unless the encoding scheme understands the semantics of the
metadata schema. The encoding allows the metadata to be processed by a
computer program. Important schemes include:
 HTML (Hyper-Text Markup Language)
 SGML (Standard Generalised Markup Language)
 XML (eXtensible Markup Language)
 RDF (Resource Description Framework)
 MARC (MAchine Readable Cataloging)
 MIME (Multipurpose Internet Mail Extensions)

Metadata may be deployed in a number of ways:


Embedding the metadata in the Web page by the creator or their agent using
META tags in the HTML coding of the page
As a separate HTML document linked to the resource it describes
In a database linked to the resource. The records may either have been
directly created within the database or extracted from another source, such as
Web pages.

The simplest method is for Web page creators to add the metadata as part of
creating the page. Creating metadata directly in a database and linking it to
the resource, is growing in popularity as an independent activity to the
creation of the resources themselves. Increasingly, it is being created by an
agent or third party, particularly to develop subject-based gateways.

A data dictionary is a term for information that describes the data that will be
held in a database - the meta-data content.

The data dictionary is a component of a well documented database. It allows


database users, including administrators and others who interface to the system
to identify the expected data in each table and column of the database, even
without accessing the database itself.
Minimum Requirements of a Data Dictionary

A data dictionary should contain certain information in order to increase it's


usefulness, including:

 Table Name
 Field Name

Description of expected data in the field (i.e. Customer Address)

 Field Type (ex. int, varchar, datetime)


 Field Length
 Nullable or Not Nullable
 Default Value
 Constraints that apply to the field

Some fields may not be necessary for every entry in a data dictionary. For
example, the root description of a table doesn't require field names and types,
but could still provide a description of what the table holds
PART B

4. Differentiate between OLAP and OLTP?

Ans. OLTP (On-line Transaction Processing) is characterized by a large


number of short on-line transactions (INSERT, UPDATE, DELETE). The
main emphasis for OLTP systems is put on very fast query processing,
maintaining data integrity in multi-access environments and an effectiveness
measured by number of transactions per second. In OLTP database there is
detailed and current data, and schema used to store transactional databases is
the entity model (usually 3NF).

- OLAP (On-line Analytical Processing) is characterized by relatively low


volume of transactions. Queries are often very complex and involve
aggregations. For OLAP systems a response time is an effectiveness
measure. OLAP applications are widely used by Data Mining techniques. In
OLAP database there is aggregated, historical data, stored in multi-
dimensional schemas (usually star schema).
The following table summarizes the major differences between OLTP and
OLAP system design.

OLTP System OLAP System


Online Transaction Processing Online Analytical Processing
(Operational System) (Data Warehouse)
Operational data; OLTPs are the Consolidation data; OLAP data comes
Source of data
original source of the data. from the various OLTP Databases
To control and run fundamental To help with planning, problem
Purpose of data
business tasks solving, and decision support
Reveals a snapshot of ongoing Multi-dimensional views of various
What the data
business processes kinds of business activities
Inserts and Short and fast inserts and updates Periodic long-running batch jobs
Updates initiated by end users refresh the data
Relatively standardized and simple
Often complex queries involving
Queries queries Returning relatively few
aggregations
records
Depends on the amount of data
involved; batch data refreshes and
Processing
Typically very fast complex queries may take many
Speed
hours; query speed can be improved
by creating indexes
Larger due to the existence of
Space Can be relatively small if historical aggregation structures and history
Requirements data is archived data; requires more indexes than
OLTP
Typically de-normalized with fewer
DatabaseDesign Highly normalized with many tables tables; use of star and/or snowflake
schemas
Backup religiously; operational data Instead of regular backups, some
Backup and is critical to run the business, data environments may consider simply
Recovery loss is likely to entail significant reloading the OLTP data as a recovery
monetary loss and legal liability method
5. Taking a suitable example organization, identify various possible
drivers for favoring a data mart.

Ans. Each Data Mart can contain different combinations of tables, columns and
rows from the Enterprise Data Warehouse. For example, an business unit or
user group that doesn't require a lot of historical data might only need
transactions from the current calendar year in the database. The Personnel
Department might need to see all details about employees, whereas data such as
"salary" or "home address" might not be appropriate for a Data Mart that
focuses on Sales.

1. Possible drivers for favoring a data mart:- Data marts rarely grow up
to be an enterprise data warehouse, regardless of the volume point
reached. Many firms can benefit from data mart consolidation, though it
is not an unconditionally positive move for all. Enterprises are driven to
consider the advantages of consolidating diverse data marts into an
enterprise data warehouse for a variety of reasons. The list of drivers
includes business, organizational, operational and technological.

2. Business:- Mergers and acquisitions continue apace. The consolidation of


product and customer dimensions enables cross-selling and up- selling in
customer relationship management (CRM) as well as substituting
information for inventory in product and supply chain applications.
3. Organizational:- The requirement for intelligent information integration
at an enterprise level is exemplified by business intelligence demands
such as knowing the lifetime value of a customer. The lifetime value of a
customer cannot be known without aggregating a lifetime of transactions
across all the customer's touch points. That requires the organization to
move beyond departmental data mart silos to gain an enterprise
perspective.

4. Operational:- The need for operational efficiencies in the IT utility is


seemingly unending and urgent. Centralization provides for greater
efficiencies and reduced coordination costs in managing the decision-
support infrastructure. Data center consolidation invites the consolidation
of the data mart applications that are supported by the servers. The
benefits of data center consolidation are mirrored by data mart
consolidation.
5. Technological:- The benefits of resource sharing and management are
more widely available than ever before thanks to the build-out of storage
technology infrastructure in the form of storage area networks (SANs)
and network attached storage (NAS). As indicated, data marts rarely grow
up to become data warehouses, regardless of the volume of data involved.
However, after any data mart or warehouse reaches either a terabyte of
raw data or 1,000 data structures, organizations face a storage technology
problem to manage all the space allocations efficiently. Centralized and
policy-based storage resource management (SRM) is an important
dimension of any consolidation solution, and it is best undertaken using
data center resources and professional practices and staff.

6. Illustrate the significance of Bus Matrix by taking an example of your


choice.

Ans. The bus matrix purpose is one of high abstraction and visionary planning
on the Data Warehouse architectural level. By dictating coherency in the
development and implementation of an overall Data Warehouse the Bus
Architecture approach enables an overall vision of the broader enterprise
integration and consistency while at the same time dividing the problem into
more manageable parts – all in a technology and software independent manner .
The bus matrix and architecture builds upon the concept of conformed
dimensions - creating a structure of common dimensions that ideally can be
used across the enterprise by all business processes related to the DW and the
corresponding fact tables form which they derive their context. According to
Kimball and Marg Rosses article “Differences of Opinion” "The Enterprise
Data warehouse built on the bus architecture ”identifies and enforces the
relationship between business process metrics (facts) and descriptive attributes
(dimensions)”.

Significance:- In short words the bus matrix purpose is one of high abstraction
and visionary planning on the Data Warehouse architectural level. By dictating
coherency in the development and implementation of an overall Data
Warehouse the Bus Architecture approach enables an overall vision of the
broader enterprise integration and consistency while at the same time dividing
the problem into more manageable parts – all in a technology and software
independent manner.

The bus matrix and architecture builds upon the concept of conformed
dimensions - creating a structure of common dimensions that ideally can be
used across the enterprise by all business processes related to the DW and the
corresponding fact tables form which they derive their context

The concept of a bus is well known in the language of Information Technology,


and is what reflects the conformed dimension concept in the Data Warehouse,
creating the skeletal structure where all parts of a system connect, ensuring
interoperability and consistency of data, and at the same time considers future
expansion. This makes the conformed dimensions act as the integration „glue‟,
creating a robust backbone of the enterprise Data Warehouse.

The Data Warehouse Bus:


 Is dimensional;
 Contains transaction and summary data;
 Includes data marts, which have single subject or fact tables; and
 Can consist of multiple data marts in a single data base

Вам также может понравиться