IBM Star Schemas

Star schemas Star schemas
Pgina 1 de 6
A star schema consists of fact tables and dimension tables. Fact tables contain the quantitative or factual data about a business--the information being queried. This information is often numerical, additive measurements and can consist of many columns and millions or billions of rows. Dimension tables are usually smaller and hold descriptive data that reflects the dimensions, or attributes, of a business. SQL queries then use joins between fact and dimension tables and constraints on the data to return selected information. Fact and dimension tables differ from each other only in their use within a schema. Their physical structure and the SQL syntax used to create the tables are the same. In a complex schema, a given table can act as a fact table under some conditions and as a dimension table under others. The way in which a table is referred to in a query determines whether a table behaves as a fact table or a dimension table. Even though they are physically the same type of table, it is important to understand the difference between fact and dimension tables from a logical point of view. To demonstrate the difference between fact and dimension tables, consider how an analyst looks at business performance: A salesperson analyzes revenue by customer, product, market, and time period. A financial analyst tracks actuals and budgets by line item, product, and time period. A marketing person reviews shipments by product, market, and time period. The facts--what is being analyzed in each case--are revenue, actuals and budgets, and shipments. These items belong in fact tables. The business dimensions--the by items--are product, market, time period, and line item. These items belong in dimension tables. For example, a fact table in a sales database, implemented with a star schema, might contain the sales revenue for the products of the company from each customer in each geographic market over a period of time. The dimension tables in this database define the customers, products, markets, and time periods used in the fact table. A well-designed schema provides dimension tables that allow a user to browse a database to become familiar with the information in it and then to write queries with constraints so that only the information that satisfies those constraints is returned from the database. Performance of star schemas Performance is an important consideration of any schema, particularly with a decision-support system in which you routinely query large amounts of data. IBM Red Brick Warehouse supports all schema designs. However, star schemas tend to perform the best in decision-support applications. For more information on the performance implications of star schemas, see the Query Performance Guide. Terminology The terms fact table and dimension table represent the roles these objects play in the logical schema. In terms of the physical database, a fact table is a referencing table. That is, it has foreign key references to other tables. A dimension table is a referenced table. That is, it has a primary key that is a foreign key reference from one or more tables. Simple star schemas Any table that references or is referenced by another table must have a primary key, which is a column or group of columns whose contents uniquely identify each row. In a simple star schema, the primary key for the fact table consists of one or more foreign keys. A foreign key is a column or group of columns in one table whose values are defined by the primary key in another table. In IBM Red Brick Warehouse, you can use these foreign keys and the primary keys in the tables that they reference to build STAR indexes, which improve data retrieval performance. When a database is created, the SQL statements used to create the tables must designate the columns that are to form the primary and foreign keys. The following figure illustrates the relationship of the fact and dimension tables within a simple star schema with a single fact table and three dimension tables. The fact table has a primary key composed of three foreign keys, Key1, Key2, and Key3, each of which is the primary key in a dimension table. Nonkey columns in a fact table are referred to as data columns. In a dimension table, they are referred to as attributes. Figure 11. Simple Star Schema
http://publib.boulder.ibm.com/infocenter/rbhelp/v6r3/topic/com.ibm.redbrick.doc6.3/wag/...
13/10/2008
Star schemas
Pgina 2 de 6
In the figures used to illustrate schemas: The items listed within the box under each table name indicate columns in the table. Primary key columns are labeled in bold type. Foreign key columns are labeled in italic type. Columns that are part of the primary key and are also foreign keys are labeled in bold italic type. Foreign key relationships are indicated by lines connecting tables. Although the primary key value must be unique in each row of a dimension table, that value can occur multiple times in the foreign key in the fact table--a many-to-one relationship.
The following figure illustrates a sales database designed as a simple star schema. In the fact table Sales, the primary key is composed of three foreign keys, Product_id, Period_id, and Market_id, each of which references a primary key in a dimension table. Figure 12. Sales Database
Many-to-one relationships exist between the foreign keys in the fact table and the primary keys they reference in the dimension tables. For example, the Product table defines the products. Each row in the table represents a distinct product and has a unique product identifier. That product identifier can occur multiple times in the Sales table representing sales of that product during each period and in each market.
Multiple fact tables
A star schema can contain multiple fact tables. In some cases, multiple fact tables exist because they contain unrelated facts; for example, invoices and sales. In other cases, they exist because they improve performance. For example, multiple fact tables are often used to hold various levels of aggregated (summary) data, particularly when the amount of aggregation is large; for example, daily sales, monthly sales, and yearly sales. The following figure illustrates the Sales database with an additional fact table for sales from the previous year. Figure 13. Sales database with additional dimension
13/10/2008
Star schemas
Pgina 3 de 6
Another use of a referencing table is to define a many-to-many relationship between some dimensions of the business. This type of table is often known as a cross-reference or associative table. For example, in the Sales database, each product belongs to one or more groups, and each group contains multiple products, a many-to-many relationship that is modeled by establishing a referencing table that defines the possible combinations of products and groups. Figure 14. Sales database with cross-reference table
Multicolumn foreign key
Another way to define a many-to-many relationship is to have a dimension table with a multicolumn primary key that is a foreign key reference from a fact table. For example, in the Sales database, each product belongs to one or more groups, and each group contains multiple products, a many-to-many relationship. This relationship is modeled by defining a multicolumn foreign key in the Sales_Current table that references the Product table, as in the following example. Figure 15. Sales database with multicolumn foreign key
13/10/2008
Star schemas
Pgina 4 de 6
In the preceding figure, the Product_id and Group_id columns are the two-column primary key of the Product table and are a two-column foreign key reference from the Sales_Current table.
Outboard tables
Dimension tables can also contain one or more foreign keys that reference the primary key in another dimension table. The referenced dimension tables are sometimes referred to as outboard, outrigger, or secondary dimension tables. The following figure includes two outboard tables, District and Region, which define the ID codes used in the Market table. Figure 16. Sales database with outboard tables
In the preceding figure, the Market table, because it is both a referencing and referenced table, can behave as a fact (referencing) or dimension (referenced) table, depending on how it is used in a query. Multistar schemas In a simple star schema, the primary key in the fact table is formed by concatenating the foreign key columns. In some applications, however, the concatenated foreign keys might not provide a unique identifier for each row in the fact table. These applications require a multistar schema. In a multistar schema, the fact table has both a set of foreign keys, which reference dimension tables, and a primary key, which consists of one or more columns that provide a unique identifier for each row. The primary key and the foreign keys are not identical in a multistar schema. This fact distinguishes a multistar schema from a single-star schema. The following figure illustrates the relationship of the fact and dimension tables within a multistar schema. In the fact table, the foreign keys are Fkey1, Fkey2, and Fkey3, each of which is the primary key in a dimension table. Unlike the simple star schema, these columns do not form the primary key in the fact table. Instead, the two columns Key1 and Key2, which do not reference any dimension tables, and Fkey1, which does reference a dimension table, are concatenated to form the primary key. The primary key can consist of any combination of foreign key and other columns in a multistar schema. Figure 17. Relationship of fact and dimension tables in multistar schema
13/10/2008
Star schemas
Pgina 5 de 6
The following figure illustrates a retail sales database designed as a multistar schema with two outboard tables. The fact table Transact records daily sales in a rolling seven-day database. The primary key for the fact table consists of three columns: Date, Receipt, and Line_item. These keys together provide the unique identifier for each row. The foreign keys are the columns for Store_id and SKU_id, which reference the Store and SKU (storekeeping unit) dimension tables. Two outboard tables, Class and Subclass, are referenced by the SKU dimension table. Figure 18. Multistar schema with two outboard tables
In this database schema, analysts can query the transaction table to obtain information on sales of each item, sales by store or region, sales by date, or other interesting information. In a multistar schema, unlike a simple star schema, the same value for the concatenated foreign key in the fact table can occur in multiple rows, so the concatenated foreign key no longer uniquely identifies each row. For example, in this case the same store (Store_id) might have multiple sales of the same item (SKU_id) on the same day (Date). Instead, row identification is based on the primary key or keys. Each row is uniquely identified by Date, Receipt, and Line_item. Views In some databases, schema design can be simplified by the use of views, which effectively create a virtual table by selecting a combination of rows and columns from an existing table or combination of tables. For example, a view that selects employee names and telephone extensions from an employee database produces a company phone list but does not include confidential information such as addresses and salaries. A view that selects transactions that occur within a given time period avoids the need to constrain queries to that time period. Views are useful for a wide variety of purposes, including the following: Increasing security Simplifying complex tables to give users a view of only what they need Simplifying query constraints Simplifying administrative tasks, such as granting table authorizations Hiding administrative changes to users The database schema changes design, but the view to the user remains the same.
13/10/2008
Star schemas
Pgina 6 de 6
A view is created with a CREATE VIEW statement. Additionally, you can create precomputed views so that queries are automatically rewritten to access the appropriate aggregate table. For information on precomputed views and automatic query rewriting, see the IBM Red Brick Vista User's Guide.
Index Downloads | Library | Support | Support Policy | Terms of use | Feedback Copyright IBM Corporation, 2003, 2004 Last updated: March, 2004
13/10/2008

IBM Star Schemas

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

IBM Star Schemas

Загружено:

Авторское право:

Доступные форматы

Star schemas Star schemas

Multicolumn foreign key

Вам также может понравиться