Академический Документы
Профессиональный Документы
Культура Документы
Summary: This article describes the main operational aspects required to establish
the proper environment to support the data warehouse applications and provides a
comparation of DB2 LUW and Oracle Architecture.
Operational
Database
s
Enterprise RDBMS
Data
Warehouse
Warehous (DB2,Oracle)
e
Data Database
Marts OLAP Server (Cognos)
End Reports,
User
Application Analysis
Dashboards, Extracts
s Data Mining (Cognos, SPSS)
OLAP Analysis Dashboards
(Future)
The DB2 files can be splitted in DMS/SMS containers (data files in Oracle),
initialization files(DBM CFG file, DB CFG file-init.ora file in Oracle), transaction log
files and audit files - db2diag.log (alert log in Oracle). DBM CFG is used to
configure and tune the DB2 server at the instance level and DB CFG is spec ific to
each database.
DB2 does not maintain any dynamic performance views, but uses
commands to get the information from the system directory, such as LIST
DATABASE DIRECTORY, LIST TABLESPACES, LIST APPLICATIONS
The Oracle Architecture consits of three areas (Fig 3): Oracle Instance,
Files Area and Client Area.
The Oracle Instance represents the esential component in order to assure
the functionality of the Oracle SGBD. The Oracle Instance is composed of memory
area, background processes and user processes.
The memory area can be split into the SGA (System Global Area) and PGA
(Program Global Area).
The SGA System Global Area has the following structure:
- Shared Pool Area;
- Library cache area holds the SQL statement text, the parsed SQL
statement, and the execution plan;
- Data Dictionary Cache or Technical Metada Area contains the
definitions of the database, analyze structures and security
structures. The alghorithm used for this area is LRU (Last recently
Used).
- Database Buffer Cache area which contains the blocks get from the
database segments. It’s an area who is functioning based on LRU alghoritm
and wrinting list. The wrinting list contains the modified blocks which are not
saved yet to the database. The LRU list contains free blocks that could be
used, accesed blocks and modified blocks which are not moved yet to the
writing list. The alghoritm consists of reading the block directly from
memory (cache hit) or from disk (cache miss) - resulting in a suplimetary
procces;
- Redo Log Area it is a circular area which contains informations based on
changes made on data base objects. This information is stored in redo
entries. The LGWR background procces is wrinting these buffers to the
redo log files in real time. If the archive log mode is specified than the
archive processes (ARCn) copy them to the into the archive area;
- Large pool size is a very important area regarding data warehouse
because it keeps the necessary blocks for parallel processing. The area is
not using the LRU alghoritm .
The Program Global Area is an area reserved for each user process who is
connecting to the database and contains control information for a single server
process or a single background process. Unlike SGA who is a sharable area PGA is
used for a single process. This area is alocated when the process is created. In a
dedicated mode for every user process a server user process is alocated. PGA
consists of:
Session memory contains session variables (ex logging
information);
Private SQL area contains data such as bind information and
runtime memory structures. Each session that issues a SQL
statement has a private SQL area.
o The persistent area, which contains, for example, bind
information. It is freed only when the cursor is closed;
o The run-time area, for complex operations like sorting,
hash algorhitms, tables creatins is freed at the end of the
execution.
Database Optimization
The DW database must be configured to take the advantage of all the Data
Warehousing features offered by RDBMS.
The physical granularity of the main database will be as minimal as
possible, means using different tablespaces with different characteristics for
partitioned/non-partitioned tables: dimension tables, fact tables, aggregated tables
etc. Also there must be a consistent file structure with data files separate from
program files, separate from log files etc.
The Staging database could be similarly configured, the main difference is
that substantially less memory must be allocated to the Staging server as it will
have less concurrent connections, and the size of the database is substantially
smaller.
The methodology for calculating the number of CPU’s is based of the
expected maximum number of concurrent queries and ETL procedures. The
procedure is to calculate the maximum number of concurrent users times the
maximum number of concurrent reports per user that the end-user business
intelligence tool permits.
The size of the memory is based on the size of tables, type of sorts, number
and types of users, ETL transformations etc.
The calculation will be based on the assumptions done for two types of operations:
-Queries
-Batch (ETL jobs)
The compression feature must be used. In Oracle the database
compression can be achive at tabalespace level, in DB2 at table level. Data stored
in relational databases is keep growing as a result of businesses requirements for
more information. A big portion of the cost of keeping large amounts of data is in the
cost of disk systems, and the resources utilized in managing that data. Both
databases provide a way to deal with this cost by compressing data stored in
relational tables with virtually no negative impact on query time against that data,
thereby enabling substantial cost savings. by eliminating duplicate values in a
database block. Compressed data stored in a database block (a.k.a. disk page) is
self-contained. That is, all the information needed to recreate the uncompressed
data in a block is available within that block. The typical compression ratio for large
data-warehouse tables ranges from 2:1 to 4:1. The data warehouse must use
compression especially for fact partitioned tables and results partitioned tables.
RAID5 provides some protection against disk crashes, without being as
expensive as RAID1+0. It does this by calculating parity bits which can be used to
recreate 1 disk out of 4 if it fails, as opposed to saving everything twice, which will
allow a retrieval of data if one disk in two crashes.
However the calculation of the parity bits slows down the write performance
by half compared to RAID1+0.
Files that have critical performance and/or resilience to failure (eg Database
redo logs/circular logs for protection and performance, database temp files/system
temporary files-containers for performance) should be stored on RAID1+0 disks.
One RAID1+0 logical volumes should be used for Oracle/DB2 Logs and
Temporary files. The second RAID1+0 logical volumes will be used for Data Stage
temporary files.
In a shared disk database, database files are logically shared among the
nodes of a loosely coupled system with each instance having access to all data. The
shared disk access is accomplished either through direct hardware connectivity or by
using an operating system abstraction layer that provides a single view of all the
devices on all the nodes.
Security
IT Security Standards will be followed during the definition of the AAA
(Access, Authentication and Authorisation) mechanism, which shall be used to
guarantee data security. A full user access, report level, and data level security
approach must be detailed in the Security Plan. A security Plan describes:
Who is allowed access to the instance and/or database;
Where and how a user's password will be verified;
The authority level that a user is granted;
The commands that a user is allowed to run;
The data that a user is allowed to read and/or alter;
The database objects a user is allowed to create, alter, and/or drop.
Logical Data Level authentication is mainly managed by Front-End (end-user
reporting tool) tool.
The Cognos security model supports the distribution of security administration.
Because objects in Content Manager, such as folders and groups, can be secured
separately, security administration can be assigned to individuals in separate
functional areas of the organization.
The following authorization services are setup in Cognos Server - reporting
layer. Permissions are related to the users, groups, and roles defined in third-party
authentication providers. Permissions define access rights to objects, such as
directories, folders, and other content, for each user, group, or role. Permissions
also define the activities that can be performed with these objects.
Cognos authorization assigns permissions to:
- groups and roles created in the Cognos namespace in the Content Manager.;
- entire namespaces, users, groups, and roles created in third-party authentication
providers.
The users accessing the Data Warehouse Environment should be split based
on main activities and types of environment on the database:
- DBA users for administrative reasons
- Application users for running ETL processes
- Reporting users
Both DB2/Oracle engine can assign specific tabalespaces, privileges on
database objects to data warehouse users. In DB2 a user at database level must be
created first at operating system level.
RESOURCES