Data Warehouse Operational Architecture: Keywords: DB2, Oracle, Data Warehouse Data Warehouse Environment

DATA WAREHOUSE OPERATIONAL ARCHITECTURE
Certified IT Architect Codrin POPA
Summary: This article describes the main operational aspects required to establish
the proper environment to support the data warehouse applications and provides a
comparation of DB2 LUW and Oracle Architecture.
Keywords: DB2, Oracle, Data Warehouse
Data Warehouse Environment

The architecture description reflects the separation of concerns by identifying
two main aspects of architecture:
 The functional aspect’s focus is on describing the function of the IT system and
is primarily concerned with:
-the structure and modularity of the software components (both application
and technical);
-interactions between components;
-interfaces provided by components and their usage;
-dynamic behavior, expressed as collaborations between components.
 The operational aspect’s focus is on describing the operation of the IT system
and is primarily concerned with:
-representing network organization (hardware platforms, topology);
-describing what runs where – where software are placed on this network;
-satisfying service level requirements (performance, availability, security);
-defining the systems management and operation on the IT system.
Operational
Database
s
Extract, Transform ETL

ET IBM Information Server
Transform
& Load L metadata
metadat
Processes a
Enterprise RDBMS
Data
Warehouse
Warehous (DB2,Oracle)
e
Data Database
Marts OLAP Server (Cognos)
End Reports,
User
Application Analysis
Dashboards, Extracts
s Data Mining (Cognos, SPSS)
OLAP Analysis Dashboards
(Future)
Fig 1: Data Warehouse/ Business Intelligence Architecture
To provide a homogeneous solution, integrated into the global system

architecture, physical architecture consisting of hardware and software components
is associated with the logical solution architecture.
It is interesting to note that the best decision support solutions are built
almost identical to human learning cycles. During lifetime, the events occurred are
analyzed and then transformed into rules consciously or not. Every time a rule is
applied the reaction is expected to validate it or not, to tune and adapt the past
rules based on present events.
In a Business Intelligence System data is extracted from various
operational sources with different formats (databases, text files) and then
processed and transformed into a homogeneous computing environment based on
the information. This information is analyzed on the basis of specialized tools for ad
hoc reporting, multidimensional analysis and data mining, and thus various
indicators and metrics are identified. Based on these indicators, some decisions are
adopt, a plan is built and the organization opts for a particular implementation
decision.
Currently, data (information) is extracted from the operational source
systems using ETL tool component (Fig 1). Information is loaded into the
database/schema corresponding to the Staging database – temporary, intermediate
storage for performing validation and possible changes in the structure. The next
phase consists of the execution of transformations, before detailed data is loaded
into the reporting database.
The Staging Database is used only by the loading process. It is used as a
holding space and a work space to perform validations and calculations, and not to
provide user query services. The data from operational systems moves into staging
area with minimal changes. All the transformations are made between staging
(intermediate zone) and target data warehouse database.
The Data Integration Layer focuses on the methods that deal with the
processing and movement of data to prepare it for storage in the Data Repository
Layer in order to share it with the Analytical/Access applications and systems. This
layer will process data in scheduled batch intervals or in near real-time/”just-in-time”
intervals. ETL process can be segmented into two phases, each with a specific
purpose. These phases are:
The data flow from source systems to the intermediate zone, which contains
mainly extraction processes;
Data flow from the intermediate zone to the Data Warehouse, which contains
the transformation, validation and loading processes.
At first, ETL processes will consider basic validation which includes data type
conversions, "trimming" etc. Advanced transformations will be implemented in
phase two, from intermediate component to data warehouse component. These
changes involve the following activities:
Transformation and derivation of the fields;
Identifying changes in the source and history maintenance;
Generating surrogate keys and their maintenance;
Integrating data from multiple sources;
Establishing an automated process, repeatable and restartable.
The Data Repository Layer contains the databases and components that
provide most of the storage for the data which supports the Data Warehouse
environment. Reporting System (Business Intelligence System) is a decision
support system, data-oriented, with the central point, the Data Warehouse database
that meets the demands and needs of decision makers on different hierarchical
levels within the organization. Database is composed of complex data structures
from the information presented in detail, derived, to aggregate information.
Data Warehouse Database is NOT a replacement or replica of the
operational database but is a complementary database, where data obtained from
external operational sources are organized and reshaped into a specific structure
and format in order to support decision activities. It contains the conceptual, logical,
and physical data models and data model types.
The metadata developed into the database component of the reporting
system will contain both functional information to highlight the analytical perspective
regarding the meaning of data and relationships between them, as well as technical
information.
The data warehouse database holds all the detailed information in the
system. The Data Warehouse components, DDS – Detailed Data Store and SDS –
Summarized Data Store (aggregates and result tables) are maintained up to date
and the last one is used to provide data for a couple of applications that run over
the Data Warehouse. Currently there are two main architectural approaches for a
data warehouse. The Kimball Data Warehouse Bus (BUS) and The Corporate
Information Factory (CIF) are considered the two main types of data warehousing
architecture. From an architectural point of view, W.H Inmon theory is based on
top-down approach and support the idea of transferring data from operational
databases to a central database that is subject oriented, integrated, non-volatile and
time-variant. Contrasting with the data warehouse where the data are normalized,
each data mart is developed at the department level, data is non-normalized and
summarized for a specific department.
R. Kimball theory is based on bottom-up approach, data marts can be
contained in the data warehouse and in fact they represent the data warehouse and
linked by a bus structure that should contain all the common elements used for data
marts sites known as conformed dimensions and measures.
The Analytics Layer provides the analytical applications that are the heart of
Information on Data Warehouse environment. This layer is composed of various
technological components destined to meet specific needs, and are built from “best-
of-breed” software and tools. The Analytics Layer focuses on the decision support
usage of Information and provides tools and components that enable advanced
analytics and to deliver information in a useful fashion.
Users Submit queries to the database via HTTP server to the reporting
server providing a single, modern Web services architecture. For performing user
authentication, reporting server can be connected to a LDAP server (Lightweight
Directory Access Protocol). LDAP has emerged as a critical infrastructure
component for network security and as a vital platform for enabling integration
among applications and services on the network.
DB2/Oracle Database Systems Description

Fig 2: DB2 Database System
The DB2 memory architecture (Fig 2) consits of three areas: Database

Manager Shared Memory (Instance Shared Memory), Database Global
memory(Database Shared Memory) and Application Global Memory (Application
Shared Memory).
 Instance Shared Memory is allocated when the instance is started. All other
memory is attached or allocated from the Instance Shared Memory, which is
controlled by the INSTANCE_MEMORY database manager (DBM)
configuration parameter.
 Database Shared Memory is allocated when the database is first activated or

connected to for the first time. This memory is shared by all the applications that
might connect to the database and process that runs within each database.
Database Shared Memory is controlled by DATABASE_MEMORY database
(DB) configuration parameter.
 Application Shared Memory is allocated when an application connects to a

database. This happens only in partitioned database environments, or in a non-
partitioned database environment where intra-partition is enabled, or if the
connection concentrator is enabled.
The following background processes are defined for DB2:

Instance Level Processes: DB2 daemon spawner (db2gds), DB2 system
controller (db2sysc), DB2 watchdog (db2wdog), DB2 format log (db2fmtlg), DB2
system logger (db2syslog);
Database Level Processes: DB2 log reader (db2loggr), DB2 log writer
(db2loggw),DB2 page cleaner (db2pclnr), DB2 prefetcher (db2pfchr), DB2 deadlock
detector (db2dlock);
Application Level Processes: DB2 communication manager (db2ipccm), DB2
TCP manager(db2tcpcm),DB2 coordinating agent (db2agent)Active subagent
(db2agntp).
The DB2 files can be splitted in DMS/SMS containers (data files in Oracle),
initialization files(DBM CFG file, DB CFG file-init.ora file in Oracle), transaction log
files and audit files - db2diag.log (alert log in Oracle). DBM CFG is used to
configure and tune the DB2 server at the instance level and DB CFG is spec ific to
each database.
DB2 does not maintain any dynamic performance views, but uses
commands to get the information from the system directory, such as LIST
DATABASE DIRECTORY, LIST TABLESPACES, LIST APPLICATIONS
The Oracle Architecture consits of three areas (Fig 3): Oracle Instance,
Files Area and Client Area.
The Oracle Instance represents the esential component in order to assure
the functionality of the Oracle SGBD. The Oracle Instance is composed of memory
area, background processes and user processes.
Fig 3: Oracle Database System
The memory area can be split into the SGA (System Global Area) and PGA
(Program Global Area).
The SGA System Global Area has the following structure:
- Shared Pool Area;
- Library cache area holds the SQL statement text, the parsed SQL
statement, and the execution plan;
- Data Dictionary Cache or Technical Metada Area contains the
definitions of the database, analyze structures and security
structures. The alghorithm used for this area is LRU (Last recently
Used).
- Database Buffer Cache area which contains the blocks get from the
database segments. It’s an area who is functioning based on LRU alghoritm
and wrinting list. The wrinting list contains the modified blocks which are not
saved yet to the database. The LRU list contains free blocks that could be
used, accesed blocks and modified blocks which are not moved yet to the
writing list. The alghoritm consists of reading the block directly from
memory (cache hit) or from disk (cache miss) - resulting in a suplimetary
procces;
- Redo Log Area it is a circular area which contains informations based on
changes made on data base objects. This information is stored in redo
entries. The LGWR background procces is wrinting these buffers to the
redo log files in real time. If the archive log mode is specified than the
archive processes (ARCn) copy them to the into the archive area;
- Large pool size is a very important area regarding data warehouse
because it keeps the necessary blocks for parallel processing. The area is
not using the LRU alghoritm .
The Program Global Area is an area reserved for each user process who is
connecting to the database and contains control information for a single server
process or a single background process. Unlike SGA who is a sharable area PGA is
used for a single process. This area is alocated when the process is created. In a
dedicated mode for every user process a server user process is alocated. PGA
consists of:
 Session memory contains session variables (ex logging
information);
 Private SQL area contains data such as bind information and
runtime memory structures. Each session that issues a SQL
statement has a private SQL area.
o The persistent area, which contains, for example, bind
information. It is freed only when the cursor is closed;
o The run-time area, for complex operations like sorting,
hash algorhitms, tables creatins is freed at the end of the
execution.
Oracle Processes represent services who are assuring the functionality of

database server. Two type of Oracle processes:
 User process who is running the application code;
 Server processes include server and background processes. The
server processes are created for each user proceeses in case of
dedicated mode database;
The major background processes are:
- DBWR (DataBase Writer) – responsible for wrinting modified blocks to
disk;
- LGWR(Log Writer) – responsible for writing log buffers to disk when an
operations is commited or when log file is re-wrinting or at every three seconds;
- SMON (System Monitor) –keep the database consistency, initialization of
the restore process and eliminating of temporary segments;
- PMON (Process Monitor) - cleaning up the database buffer cache and
freeing resources that the user process was using when a user process is blocked;
- CKPT (Checkpoint Process) – controling DBWR processes and updates
of datafile and control files;
- ARCH (Archive Process) – This proccess copy redo log files to archive log
files.
File and directory structure consist of :

 Trace files for server and background processes;
 Data files that contain all the data and technical metadata (data dictionary).
The database structure map a logical view: tablespace, segment, extension
and physical view-datafiles specific for tablespaces and blocks;
 Redo log files that contains oll the changes made into the database (DDL
and DML);
 Init files which contain all the neccesary parameters for instance open;
 Control files who contain information about file status and localization;
 Undo files – keep the cuurent state of the object, controlling in this way
database consistency.
Database Optimization
The DW database must be configured to take the advantage of all the Data
Warehousing features offered by RDBMS.
The physical granularity of the main database will be as minimal as
possible, means using different tablespaces with different characteristics for
partitioned/non-partitioned tables: dimension tables, fact tables, aggregated tables
etc. Also there must be a consistent file structure with data files separate from
program files, separate from log files etc.
The Staging database could be similarly configured, the main difference is
that substantially less memory must be allocated to the Staging server as it will
have less concurrent connections, and the size of the database is substantially
smaller.
The methodology for calculating the number of CPU’s is based of the
expected maximum number of concurrent queries and ETL procedures. The
procedure is to calculate the maximum number of concurrent users times the
maximum number of concurrent reports per user that the end-user business
intelligence tool permits.
The size of the memory is based on the size of tables, type of sorts, number
and types of users, ETL transformations etc.
The calculation will be based on the assumptions done for two types of operations:
-Queries
-Batch (ETL jobs)
The compression feature must be used. In Oracle the database
compression can be achive at tabalespace level, in DB2 at table level. Data stored
in relational databases is keep growing as a result of businesses requirements for
more information. A big portion of the cost of keeping large amounts of data is in the
cost of disk systems, and the resources utilized in managing that data. Both
databases provide a way to deal with this cost by compressing data stored in
relational tables with virtually no negative impact on query time against that data,
thereby enabling substantial cost savings. by eliminating duplicate values in a
database block. Compressed data stored in a database block (a.k.a. disk page) is
self-contained. That is, all the information needed to recreate the uncompressed
data in a block is available within that block. The typical compression ratio for large
data-warehouse tables ranges from 2:1 to 4:1. The data warehouse must use
compression especially for fact partitioned tables and results partitioned tables.
RAID5 provides some protection against disk crashes, without being as
expensive as RAID1+0. It does this by calculating parity bits which can be used to
recreate 1 disk out of 4 if it fails, as opposed to saving everything twice, which will
allow a retrieval of data if one disk in two crashes.
However the calculation of the parity bits slows down the write performance
by half compared to RAID1+0.
Files that have critical performance and/or resilience to failure (eg Database
redo logs/circular logs for protection and performance, database temp files/system
temporary files-containers for performance) should be stored on RAID1+0 disks.
One RAID1+0 logical volumes should be used for Oracle/DB2 Logs and
Temporary files. The second RAID1+0 logical volumes will be used for Data Stage
temporary files.
Table partitioning in DB2 is similar cu table partitioning in oracle. This data

organization scheme is one in which table data is divided across multiple storage
objects called data partitions or ranges according to values in one or more table
columns. Each data partition is stored separately and can be in different table
spaces. In DB2 only hash partitioning is posible and only local indexes, not gloval
indexes. Partitioning tablespace will assure the database modularity that must be o
top priority in order to maintain an optimum back-up/restor and purge strategy and
also to increase database performance specifing database parallelism.
DB2 has another option Data Partitioning Feature (DPF) reffered as
database partitioning. In a partitioned environment, a database is distributed across
multiple partitions, capable of residing on different machines. Each partition, called
a database partition, is responsible for a portion of a database’s total data. Because
data is divided across database partitions, it can use the power of multiple
processors on multiple physical servers to satisfy requests for information. Data
retrieval and update requests are decomposed automatically into sub-requests, and
executed in parallel among the applicable database partitions.
Multidimensional clustering, also known as an MDC table, is a method of
data organization that clusters data together on disk according to dimension key
values. A dimension is a key or attribute, such as product, time period, or
geography, used to group factual data into a meaningful way for a particular
application. A cell is a unique combination of values across all dimensions. Data in
the MDC tables is clustered via cells.
Using the Materialized Views (MV-Oracle) or Materialized Query
Tables(MQT-DB2) can improve the performance of the system since that is not a
logical view but in fact are disk based and refresh periodically.
For Oracle the memory allocation to the different applications running on
the server could be changed depending upon whether the system is performing
data loads, or is not loading and is being used for queries. This is implemented
mainly through the dynamic Oracle system parameters. They can be changed at
the beginning and end of every batch load.
Caching options should be set up:
 Most dimension tables with indexes should be allocated to the Keep
buffer and specified as cached. The Keep Cache should be allocated
enough space to hold all these objects with a little room for growth;
 All other DW tables should be allocated to the Recycle Cache, which
should also be the tablespace defaults. The remainder of the memory
is allocated to the buffer cache is allocated to the Recycle pool.
This leaves the system tables alone in the default cache. All objects that are
not in the Keep cache should be specified as Parallel. In the first instance no
degree should be specified. All objects that are in the Keep pool should be left as
(or changed to) NOPARALLEL.
For data warehouse objects the DOP (degree of parallelism) recommended
settings are:
Fact tables: DOP = high
Dimension tables: DOP = low
Aggregation tables: DOP = high
DB2 Enterprise DPF uses two levels of parallelism:

 Intra-partition parallelism, which is the ability to have multiple processors
process different parts of an SQL query, index creation, or a database load
within a database partition. This level of parallelism can be specified in the
DBM configuration file by setting the INTRA_PARALLEL parameter to ON.
 Inter-partition parallelism, which provides the ability to break up a query
into multiple parts across multiple partitions of a partitioned database, o one
server or multiple database servers. This can be accomplished on both
SMP servers and massively parallel processing (MPP) clustered servers.
Back-up and Restore/Recovery

Database back-up, restoration, and recovery are essential processes
underlying any high availability system.
I propose a back-up strategy that shoul be based on a standard methodology
applied for data warehouse production environment consists of two (main)
databases associated:
 The Data Warehouse itself
 The Staging database
The back-ups will protect data from application error and acts as a safeguard
against unexpected data loss, by providing a way to restore original data.
If a failure takes place with the databases or an intruder destroys part or the
entire databases, then it is crucial to have correct backup and recovery procedures
in place.
As part of normal operation, the main database creates transaction logs that
record changes to the database. Databases created with default parameters create
circular logs where log data is overwritten after a cycle. The archived transaction
logs must be backed-up themselves periodically because they are used during the
restore process.
Depends on the size of data to be exported and mainly the log retain
mode/archive log mode there are two options regarding back-up strategy :
 no archivelog mode/circular logging
 archivelog mode/retain logging
If a database is not in Log Retain Mode, and it is hacked or damaged in any
way, then a point in time recovery is not possible and data will be lost. In order to
avoid common problems regarding failures on production environment the main
data warehouse database could be in log retain mode (archivelog mode). The
location of the software distribution, logs, data files etc should be splited for
resilience.
Reconstructing the contents of all or part of a database from a back-up typically
involves two phases: retrieving a copy of the datafile from a back-up, and
reapplying changes to the file since the backup from the archived and online redo
logs, to bring the database to a desired moment since the backup (usually, the
present).
To restore a datafile or control file from backup is to retrieve the file onto disk
from a backup location on tape, disk or other media, and make it available to the
database server.
To recover a datafile (also called performing recovery on a datafile), is to take a
restored copy of the datafile and apply to it changes recorded in the database's
redo logs. To recover a whole database is to perform recovery on each of its
datafiles.
In planning the database back-up and recovery strategy, a common problem is
to anticipate the errors that will arise, and put in place the backups needed to
recover from them. While there are several types of problem that can halt the
normal operation of an database or affect database I/O operations, only two
typically require DBA intervention and media recovery: media failure, and user
errors.
Instance failures, network failures and failure of a statement to execute due to,
for instance, exhaustion of some resource such as space in a datafile may require
DBA intervention, and might even crash a database instance, but will not generally
cause data loss or the need to recover from backup.
The appropriate recovery technique following a media failure depends on
the files affected and the types of backup available.
No Archive Log Mode/Circular Logging-Restore / Recover Strategy

The restore mechanism will be based on the ETL daily load and realized by
applying staging data to data warehouse database. In this case, a daily basis back-
up of staging database must be performed. Main Production database will be fully
backed-up (cold backup) to tape once a week. After the main data warehouse is
restored every instance of staging database will be applied to data warehouse
database.
Fig 4: Data Warehouse/ Business Intelligence Architecture

Daily copy to disk (daily image of the source) of staging database will take
place also in normal situations not only when crush-restore scenario occurs. The
restoring mechanism will be based on restoring the entire data warehouse database
and after applying necessary staging data from staging database (that is backed-up
daily) restored from disk. It is madatory that at least 2 copy of Staging database
have to be apply to the main data warehouse database, every day after data
warehouse database is restored.
Archive Log Mode/Log Retain-Restore/Recover Strategy
The second approach is based on log retain/archive database mode.
Main Production database will be fully backed-up – cold back-up to tape
once a week. Archive logs will be backed-up on a daily basis. Data warehouse
database must be recovered up to the moment before crash, by applying archives.
After the main data warehouse is restored every instance of staging

database will be applied to data warehouse database. Daily copy to disk of staging
database will take place only when crush-restore scenario occurs. While restore to
a point in time before crash, main data warehouse database is not available and
therefore it is necessary to capture the daily image of the source every day.
The restoring mechanism will be based on restoring the entire data
warehouse database and the necessary logs from tapes and after applying
necessary staging data from staging database restored from disk. It is madatory
that at least 2 copy of Staging database have be apply to the main data warehouse
database every day after the system is restored.
The back-up/recovery strategy must follows not only the
infrastructure and software technology context but also logical database
context.
Scalability and High Availability

Without support for scalable hardware environments the following problems can
occur:
 Processing is slower, because hardware resources are not maximized;
 Application design and hardware configuration cannot be decoupled, and
manual intervention and possibly redesign is required for every hardware
change;
 Scaling on demand is not possible.
Database-level scalability is possible vertically as well as horizontally. Based on

the hardware solution the server scales with additional system resources to the
possible expansion limit.
The horizontal scaling or the "scale out" approach at database level could be
achieved by using Real Application Cluster for Oracle or DB2 Data partitioning
feature (DPF) architecture.
In a shared-nothing architecture, the database is partitioned among the
nodes of a cluster system. IBM DB2 LUW is considered a shared-nothing
architecture and is based on data partitioning feature. In a partitioned environment, a
database is distributed across multiple partitions, capable of residing on different
machines. Each partition, called a database partition, is responsible for a portion of
a database’s total data.
Shared Nothing Architecture
Fig 4: Shared Nothing Architecture
In a shared disk database, database files are logically shared among the
nodes of a loosely coupled system with each instance having access to all data. The
shared disk access is accomplished either through direct hardware connectivity or by
using an operating system abstraction layer that provides a single view of all the
devices on all the nodes.
Share Disk Architecture
Fig 5: Shared Disk Architecture
Oracle Real Application Cluster (RAC) allows multiple Database servers to

access a single Oracle database in a clustered environment. RAC uses the Oracle
Clusterware for the infrastructure to bind multiple servers so that they operate as a
single system. Each instance has a buffer cache in its System Global Area (SGA).
Using Cache Fusion, RAC environments logically combine each instance’s buffer
cache to enable the instances to process data as if the data resided on a logically
combined, single cache.
For disaster recovery the DB2 solution provide HADR capabilities. DB2
high availability disaster recovery (HADR) is a data replication feature that provides
a high availability solution for both partial and complete site failures. HADR protects
against data loss by replicating data changes from a source database, called the
primary, to a target database, called the standby.
The architecture consists of two sites primary and secondary site -disaster
recovery (optionally). In case of a node or site failure a switch will be done in order
to switch to the other site.
High Availability Disaster Recovery is a data replication feature that
provides a high availability solution for both partial and complete site failures. HADR
protects against data loss by replicating data changes from a source database,
called the primary, to a target database, called the standby.
Synchronous Mode
In synchronous mode, DB2 ensures that the log records being written to
disk on the primary server are also written to disk on the standby server before an
application receives a successful return code to its commit statements. In this
mode, there is a guarantee that no committed transactions will ever be lost as both
servers stay completely in sync.
Near Synchronous Mode
In near synchronous mode, DB2 ensures that the log records being written
to disk on the primary server are in memory at the standby server (but perhaps not
on disk at the standby) prior to notifying an application that its commit statement
was successful. In this mode, there will never be any transactions lost unless both
the primary and standby fail simultaneously.
Asynchronous Mode
In asynchronous mode, DB2 will write the log buffer to disk on the primary
server and ensure the log buffer has been passed down to the TPC/IP socket to be
sent over to the standby. In this case, it would be possible to lose a committed
transaction if the primary failed and the packets containing the log buffer did not
make it to the standby server prior to a takeover.
Oracle Data Guard is the Oracle feature that allows for the creation of
standby databases that can be kept transactionally consistent with a primary
database. To achieve this, Oracle ships log buffers (or log files in some
configurations) from the primary server to the standby server where the log records
are replayed on the standby database. Oracle Data Guard standby types are:
Logical Standby and Physical Standby. In logical standby mode, log records are
converted to SQL statements and replayed on the standby database. This more
closely resembles DB2’s SQL Replication and Q Replication capabilities and as
such will not be discussed in this paper. In physical standby mode, log records are
applied using redo logic which applies the records much in the same fashion as
would occur when rolling forward a database through log files.
Security
IT Security Standards will be followed during the definition of the AAA
(Access, Authentication and Authorisation) mechanism, which shall be used to
guarantee data security. A full user access, report level, and data level security
approach must be detailed in the Security Plan. A security Plan describes:
 Who is allowed access to the instance and/or database;
 Where and how a user's password will be verified;
 The authority level that a user is granted;
 The commands that a user is allowed to run;
 The data that a user is allowed to read and/or alter;
 The database objects a user is allowed to create, alter, and/or drop.
Logical Data Level authentication is mainly managed by Front-End (end-user
reporting tool) tool.
The Cognos security model supports the distribution of security administration.
Because objects in Content Manager, such as folders and groups, can be secured
separately, security administration can be assigned to individuals in separate
functional areas of the organization.
The following authorization services are setup in Cognos Server - reporting
layer. Permissions are related to the users, groups, and roles defined in third-party
authentication providers. Permissions define access rights to objects, such as
directories, folders, and other content, for each user, group, or role. Permissions
also define the activities that can be performed with these objects.
Cognos authorization assigns permissions to:
- groups and roles created in the Cognos namespace in the Content Manager.;
- entire namespaces, users, groups, and roles created in third-party authentication
providers.
The users accessing the Data Warehouse Environment should be split based
on main activities and types of environment on the database:
- DBA users for administrative reasons
- Application users for running ETL processes
- Reporting users
Both DB2/Oracle engine can assign specific tabalespaces, privileges on
database objects to data warehouse users. In DB2 a user at database level must be
created first at operating system level.
RESOURCES
C. Ballard, D. M. Farrell, A. Gupta,C. Mazuela, S.Vohnik, Dimensional

Modeling in a Business Intelligence Environment, ibm.com/redbooks, 2006
W. Chen, A.N. Choi, M Greenstein, S.J. Martin, F. McArthur, C. Pinto, A.V.
Sammartino, N. Sokolof Oracle to DB2 Conversion Guide for Linux, UNIX, and
Windows, ibm.com/redbooks, 2007
IBM DB2 Database for Linux, UNIX, and Windows Information Center,
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp, 2011
R. Youngs, D. Redmond-Pyle, P. Spaas, E. Kahan, A standard for architecture
description , IBM SYSTEMS JOURNAL, VOL 38, NO 1, 1999
IBM Software Group, Technical Comparison of DB2 HADR and Oracle Data Guard,
IBM Corporation, 2005
IBM Software Group,Why should you care about the cost of your High Availability
Solution?, IBM Corporation, 2004
James Colquhoun – Data Warehouse Review, 2004
Oracle Database Performance Tuning Guide, 9i Release 2 (9.2) - A96533 02,
Oracle Documentation, 2002
Oracle Database Concepts 11g Release 2 (11.2) - E16508-05, Oracle
Documentation, 2010
Oracle Database Concepts 10g Release 2 (10.2) - B14220-02, Oracle
Documentation, 2005
Oracle Database Backup and Recovery Basics 10g Release 2 (10.2) - B14192-
02, Oracle Documentation, 2005
Oracle Database Performance Tuning Guide 10g Release 2 (10.2) - B14211-01,
Oracle Documentation, 2005
Technical Comparison of Oracle RealApplication Clusters 11g vs. IBM DB2 v9 for
Linux, Unix, and Windows, Oracle White Paper, 2009

Data Warehouse Operational Architecture: Keywords: DB2, Oracle, Data Warehouse Data Warehouse Environment

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Warehouse Operational Architecture: Keywords: DB2, Oracle, Data Warehouse Data Warehouse Environment

Загружено:

Авторское право:

Доступные форматы

DATA WAREHOUSE OPERATIONAL ARCHITECTURE

Certified IT Architect Codrin POPA

Keywords: DB2, Oracle, Data Warehouse

Data Warehouse Environment

Extract, Transform ETL

Fig 1: Data Warehouse/ Business Intelligence Architecture

To provide a homogeneous solution, integrated into the global system

DB2/Oracle Database Systems Description

The DB2 memory architecture (Fig 2) consits of three areas: Database

 Database Shared Memory is allocated when the database is first activated or

 Application Shared Memory is allocated when an application connects to a

The following background processes are defined for DB2:

Fig 3: Oracle Database System

Oracle Processes represent services who are assuring the functionality of

File and directory structure consist of :

Table partitioning in DB2 is similar cu table partitioning in oracle. This data

DB2 Enterprise DPF uses two levels of parallelism:

Back-up and Restore/Recovery

No Archive Log Mode/Circular Logging-Restore / Recover Strategy

Fig 4: Data Warehouse/ Business Intelligence Architecture

After the main data warehouse is restored every instance of staging

Scalability and High Availability

Database-level scalability is possible vertically as well as horizontally. Based on

Fig 4: Shared Nothing Architecture

Share Disk Architecture

Fig 5: Shared Disk Architecture

Oracle Real Application Cluster (RAC) allows multiple Database servers to

C. Ballard, D. M. Farrell, A. Gupta,C. Mazuela, S.Vohnik, Dimensional

Вам также может понравиться