DB Design

Physical DB Design Stages of DB Design: 1. Requirement Analysis 2. Logical Modeling a. Conceptual Modeling : user views in ER diagrams b.
View Integration: user views as Schema Tables 3. Activity Modeling: Volume, usage, frequency and integrity analysis (Constraints) 4. Physical Modeling: Convert into suitable hardware and software configurations PI project Initiation Initial Training and Research Project Analysis Logical Db Design Conceptual modeling and View Integration Activity Modeling: Volume, usage, frequency, Integrity Physical Modelling: S/W and Hardware Application Development and Testing Production Release Physical DB Design concepts: BIM: Business information Model: Show major entities and their relationships also called as BM (Business Model) LDM: Logical Data Model: Should be in 3NF, BIM + relational tables + minor entities (master tabs) + PK & FK + constraints + Attributes. ELDM: Extended: LDM + demographics + frequencies PDM: Physical Data Model: Index selection, De-Normalization LDM Review: Showing row-column table that represent real world business entriprise. Customer Service DB: ER Diagram
Customer Service Logical Model
PK Primary Key FK Foreign Key SA System Assigned UA User Assigned NN No NULLS ND No Duplicates NC No Changes
Rules for Identifying Entities

Must pass these tests: It is within the scope of the project It can be clearly and uniquely defined Each instance can be uniquely differentiated from all other instances Usually there is other information that describes it Entity name must be: Unique within the model A singular noun, possibly modified by adjectives
Normal Forms
Normalization is a set of rules and a methodology for making sure that the attributes in a design are carried in the correct entity to map accurately to reality, eliminate data redundancy and minimize update anomalies. Stated simply: One Fact, One Place! 1NF, 2NF and 3NF are progressively more refined and apply to non-key attributes regarding their dependency on PK attributes. 4NF and 5NF apply to dependencies between or among PK attributes. For most models, normalizing to 3NF meets the business requirements.
First Normal Form (1NF) Attributes must not repeat within a table. No repeating groups. The rule for 1NF is that attributes must not repeat within a table. In the violation example, there are six columns representing sales amount. Second Normal Form (2NF) An attribute must relate to the entire Primary Key, not just a portion. Tables with a single column Primary Key (entities) are always in Second Normal form. The rule for 2NF is that attributes must describe the entire Primary Key, not just a portion. In the violation example, the ORDER DATE column describes only the ORDER portion of the Primary Key.
ORDER PART ORDER NUMBER NUMBER DATE PK col1 and 2 composite FK FK 100 1234 2003-07-15 100 2537 2003-07-15 QUANTITY
200 100
Here order date must represent entire PK set not portion of composite PK. Third Normal Form (3NF) Attributes must relate to the Primary Key and not to each other. Cover up the PK and any ND columns, and what is left must not describe each other. The rule for 3NF is that attributes must describe only the Primary Key and not each other. In the violation example, the JOB DESCRIPTION column describes only the JOB CODE column and not the EMPLOYEE NUMBER (Primary Key) column.
Denormalizations
It is also recommended that, whenever possible, you keep the normalized tables from the Logical Model as an authoritative source and add additional denormalized tables to the database. Derived Data Repeating Groups Pre-Joins Summary Tables Temporary Tables.
Denormalization: Logical (Modifications to Logical Model), Physical (above given derived data,..) and Virtual (Join Indexes, Hash)
The Primary Key is the logical reference for the Logical Data Model. The Primary Index is the physical access mechanism for the Physical Data Model. They may be but will not always be the same.
Foreign Key e.g., FOREIGN KEY (col2, col2) REFERENCES parent (col4)
List Inputs required for DB Design Outputs and Objectives: Difference between Logical, Extended and Physical:
Hashing
The Hashing Algorithm acts like a mathematical blender. It takes up to 16 columns of mixed data as input and generates a single 32-bit binary value called a Row Hash. Different PI values rarely produce the same Row Hash. When this does occur, they are
known as Hash Synonyms or Hash Collisions. Note: Upper and lower case values hash to the same hash value. For example, Jones and JONES generate the same hash value. The Hashing Algorithm creates a fixed length value from any length input string. Input to the algorithm is the Primary Index (PI) value of a row. The output from the algorithm is the Row Hash. A 32-bit binary value. The logical storage location of the row. Used to identify the AMP of the row. Table ID + Row Hash is used to locate the Cylinder and Data Block. Used for distribution, placement, and retrieval of the row. Row Hash uniqueness depends directly on PI uniqueness. Good data distribution depends directly on Row Hash uniqueness. The algorithm produces random, but consistent, Row Hashes. The same PI value and data type combination always hash identically. Rows with the same Row Hash will always go to the same AMP. Different PI values rarely produce the same Row Hash (Collisions).
Hash Related Expressions

Database Design Requirement Gathering:
Base Requirements on Business Needs Write a Project Mission Statement Write Project Mission Objectives Hold a Project Kickoff Meeting
Explain to the participants that the approach you are using is probably new to them and that the systems that result from the analysis might differ significantly both in terms of capability and operation from the legacy system. Make it clear that the intent of the project is not to recreate the legacy system or to automate current manual procedures. Emphasize that one of the fundamental purposes of conducting the analysis is to eliminate outmoded policies and procedures. Emphasize that a great deal of their time will be required during this phase of the design process. Ensure that the subject community understands that when necessary interview subjects are not available, the course of development is delayed. Provide basic instruction on the methods you will use in the logical design process. Ensure that the participants have at least some familiarity with all of the following concepts. Entities
Attributes Associations Tables Primary keys Foreign keys Business rules and integrity constraints
General Procedure
Capture all data collection screens Capture all reports With captured information presentation media in hand, conduct interviews again Drill down by following up with detail-oriented questions aimed at capturing Begin to capture entity information by isolating and notating each job-related entity (noun) in the response set.
Identify New Data and Reports Required by Users

Teradata has created several different industry-specific logical data model (LDM) frameworks, including LDMs for the following industries: Teradata has created several different industry-specific logical data model (LDM) frameworks, including LDMs for the following industries: Communications Financial Services Healthcare Insurance Manufacturing Media Retail Transportation and Logistics Travel Industry
Definitions of Commonly Used Relational Terms

Alternate key Attribute Body: Each SQL relation must have a body ex: TYPE Candidate Key Composite Key: Domain: Filed: FK: Heading: Each attribute of a relation must have a heading. Instance: A tuple drawn from the complete set of tuples for a relation. Intelligent key: An overloaded simple key that encodes more than one fact KEY, An attribute set that uniquely identifies each tuple in a relation. NATURAL KEY, example, a common identifier of employees in a corporation is a unique
employee number. PK: Surrogate key: An artificial simple key used to identify individual entities when there is no natural key Identity Columns. Tuple: Tuple is the term used in set theory and logical design. Row is the equivalent term used in physical design.
Hashing
Hashing is the mechanism by which the Teradata DBS utilizes the Primary Index to distribute rows of data. The Hashing Algorithm acts like a mathematical blender. It takes up to 16 columns of mixed data as input and generates a single 32-bit binary value called a Row Hash.
CREATE SET TABLE PERSONNEL.employee ,FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( EmpNo SMALLINT FORMAT '9(5)' CHECK ( (EmpNo >= 10001 ) AND (EmpNo <= 32001 ) ) NOT NULL, Name VARCHAR(12) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL, DeptNo SMALLINT FORMAT '999' CHECK ( (DeptNo >= 100 ) AND (DeptNo <= 900 ) ), JobTitle VARCHAR(12) CHARACTER SET LATIN NOT CASESPECIFIC, Salary DECIMAL(8,2) FORMAT 'ZZZ,ZZ9.99' CHECK ( (Salary >= 1.00 ) AND (Salary <= 999000.00 ) ), YrsExp BYTEINT FORMAT 'Z9' CHECK ( (YrsExp >= -99 ) AND (YrsExp <= 99 ) ), DOB DATE FORMAT 'MMMbDDbYYYY' NOT NULL, Sex CHAR(1) CHARACTER SET LATIN UPPERCASE NOT CASESPECIFIC NOT NULL, Race CHAR(1) CHARACTER SET LATIN UPPERCASE NOT CASESPECIFIC, MStat CHAR(1) CHARACTER SET LATIN UPPERCASE NOT CASESPECIFIC, EdLev BYTEINT FORMAT 'Z9' CHECK ( (EdLev >= 0 ) AND (EdLev <= 22 ) ) NOT NULL, HCap BYTEINT FORMAT 'Z9' CHECK ( (HCap >= -99 ) AND (HCap <= 99 ) )) UNIQUE PRIMARY INDEX ( EmpNo ) INDEX ( Name ) INDEX ( DeptNo ) INDEX ( JobTitle );
The Hashing Algorithm creates a fixed length value from any length input string. Input to the algorithm is the Primary Index (PI) value of a row. The output from the algorithm is the Row Hash. A 32-bit binary value. The logical storage location of the row. Used to identify the AMP of the row.
Table ID + Row Hash is used to locate the Cylinder and Data Block. Used for distribution, placement, and retrieval of the row. Row Hash uniqueness depends directly on PI uniqueness. Good data distribution depends directly on Row Hash uniqueness. The algorithm produces random, but consistent, Row Hashes. The same PI value and data type combination always hash identically. Rows with the same Row Hash will always go to the same AMP. Different PI values rarely produce the same Row Hash (Collisions).
HashRow returns the row hash value for a given value(s) HashBucket the grouping for a specific hash value HashAMP the AMP that is associated with the hash bucket HashBakAMP the fallback AMP that is associated with the hash bucket
select HASHROW('TEST'); *** Query completed. One row found. One column returned. *** Total elapsed time was 1 second. HASHROW('TEST') --------------E3D27EFC select HASHBUCKET(HASHROW('TEST')); --------------------------933159
HASHAMP accepts only integer values between 0 and 65535 as its argument. In this example, HASHAMP is used to determine the number of primary rows on each AMP where C1 and C2 are to be the primary index of table TabX:
select HASHAMP(HASHBUCKET(HASHROW(EMPNO))),HASHAMP(HASHBUCKET(HASHROW(NAME))),HASHAMP (HASHBUCKET(HASHROW(DEPTNO))) from employee; *** Query completed. 21 rows found. 3 columns returned. *** Total elapsed time was 1 second. HASHAMP(HASHBUCKET(HASHROW(EmpNo))) HASHAMP(HASHBUCKET(HASHROW(Name))) HASHAMP(HASHBUCKET(HASHROW(DeptNo))) ----------------------------------- ---------------------------------- -----------------------------------0 0 1 1 0 1 0 1 1 1 0 1 0 0 0
1 1 0 0 1 1 1 0 0 0 0 1 SELECT HASHROW ('Teradata') AS "Hash Value" ,HASHBUCKET (HASHROW ('Teradata')) AS "Bucket Num" ,HASHAMP (HASHBUCKET (HASHROW ('Teradata'))) AS "AMP Num" ,HASHBAKAMP (HASHBUCKET (HASHROW ('Teradata'))) AS "AMP Fallback Num" ; *** Query completed. One row found. 4 columns returned. *** Total elapsed time was 1 second. Hash Value Bucket Num AMP Num AMP Fallback Num ---------- ----------- ----------- ---------------F66DE2DC 1009374 1 0 See How string Value works (Before space, After Space) SELECT HASHROW ('Teradata') AS "Hash Value 1" ,HASHROW ('Teradata ') AS "Hash Value 2" ,HASHROW (' Teradata') AS "Hash Value 3" ; *** Query completed. One row found. 3 columns returned. *** Total elapsed time was 1 second. Hash Value 1 Hash Value 2 Hash Value 3 ------------ ------------ -----------F66DE2DC F66DE2DC 53F30AB4 1 & 2 are same but 3 is different INTEGER and DATE HASH Values:
The hashing algorithm will hash the same numeric value in different data types to the same value.
SELECT c1_int, HASHROW (c1_int), c2_date,HASHROW (c2_date) from tableE; *** Query completed. 4 rows found. 4 columns returned. *** Total elapsed time was 1 second. SELECT c1_int, HASHROW (c1_int), c2_date,HASHROW (c2_date) from tableE; *** Query completed. 4 rows found. 4 columns returned. *** Total elapsed time was 1 second. c1_int HASHROW(c1_int) c2_date 20130213 C79DD536 13/02/13
HASHROW(c2_date) c1_int HASHROW(c1_int) c2_date HASHROW(c2_date) c1_int HASHROW(c1_int) c2_date HASHROW(c2_date) c1_int HASHROW(c1_int) c2_date HASHROW(c2_date)
3E940C65 ? 00000000 ? 00000000 1010601 1213C458 01/06/01 1213C458 100 8C49CDAB 13/02/13 3E940C65
The Hashing Algorithm hashes the same numeric value in different numeric data types to the same hash value.
CREATE SET TABLE PERSONNEL.tablea ,FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( c1_bint BYTEINT, c2_sint SMALLINT, c3_int INTEGER, c4_dec DECIMAL(8,0), c5_dec2 DECIMAL(8,2), c6_float FLOAT, c7_char CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC) UNIQUE PRIMARY INDEX ( c1_bint ,c2_sint ); INSERT INTO tableA (5, 5, 5, 5, 5, 5, '5');
SELECT HASHROW (c1_bint) AS "Hash Byteint" ,HASHROW (c2_sint) AS "Hash Smallint" ,HASHROW (c3_int) AS "Hash Integer" ,HASHROW (c4_dec) AS "Hash Dec80" ,HASHROW (c5_dec2) AS "Hash Dec82" ,HASHROW (c6_float) AS "Hash Float" ,HASHROW (c7_char) AS "Hash Char" FROM tableA; *** Query completed. One row found. 7 columns returned. *** Total elapsed time was 1 second. Hash Byteint 609D1715 Hash Smallint 609D1715 Hash Integer 609D1715 Hash Dec80 609D1715 Hash Dec82 BD810459 Hash Float E40FE360 Hash Char 334EC17C
Same hashes for int, decimal(0), EmpNo Hash Value Bucket Num AMP Num AMP Fallback Num ----- ---------- ----------- ----------- ---------------10003 3B4031A9 242691 0 1 10011 09092365 37010 1 0 10001 47CAF53A 294063 0 1 10009 1593E6F6 88382 1 0 10020 5088B358 329867 0 1 10007 2222AA87 139818 1 0 10018 5D1776E9 381303 0 1 10022 43F9EFC7 278430 1 0 10016 69A63A7A 432739 0 1 10012 82BFC19C 535548 1 0 10014 7630FE0B 484111 0 1 10010 8F4E852D 586984 1 0 10002 C1859371 792665 0 1 10008 9BDD48BE 638420 1 0 10021 CA43518F 828469 0 1 10006 A86C0C4F 689856 1 0 10019 D6D21520 879905 0 1 10004 B4F6CFE0 741228 1 0 10017 E35CD8B1 931277 0 1 10015 EFEB9C42 982713 0 1 10013 FC7A5FD3 1034149 0 1 --- how many buckets, how many amps for empno _ Hash
Uniq values Unique Hash -> Unique buckets Same AMP

how many amps for hash buckets *** Query completed. 2 rows found. 2 columns returned. *** Total elapsed time was 1 second. AMP Num cnt ----------- ----------0 12 1 9
Handling a multi-column index: The hashing algorithm uses multiplication and addition as commutative operators for handling a multi-column index. A numeric value of 0 hashes the same as a NULL. A character data type with a value of all spaces also hashes the same as a NULL. A character value of 0 hashes to a value different than the hash of a NULL. Upper and lower case characters hash the same.
The Hash Maps are stored in GDO (Globally Distributed Object) files on each SMP and are loaded into the PDE memory space when PDE software is started usually as part of the UNIX startup process. ROW RETRICAL VIA PI:
explain select * from employee where empno='10013'; *** Help information returned. 6 rows. *** Total elapsed time was 1 second. Explanation -------------------------------------------------------------------------------1) First, we do a single-AMP RETRIEVE step from PERSONNEL.employee by way of the unique primary index "PERSONNEL.employee.EmpNo = 10013" with no residual conditions. The estimated time for this step is 0.01 seconds. -> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.01 seconds. Now, how internals works?
The SELECT statement retrieves the row or rows where the PI is equal to a particular column value. explain select * from employee where empno='10013'; The parser PE always puts out a three-part message composed of the Table ID, Row Hash and Primary Index value. The 48 bit Table ID is looked up in the Data Dictionary, the 32 bit Row Hash value is generated by the Hashing Algorithm and the Primary Index value comes from the SQL request. i.e.10013
The Communications Layer Interface uses the DSW (first 16 bits of the Row Hash) to determine which AMP to interrupt and pass on the message. The AMP uses the Table ID and Row Hash to identify and locate the proper data block, then uses the Row Hash and PI value to locate the specific row(s). The PI value is required to distinguish between Hash Synonyms. DBC.Next is a Data Dictionary table that consists of a single row with 7 columns DBC.Next columns Values Data Type RowNum 1 CHAR(1) DatabaseID numeric 4 byte fixed TableID numeric 4 byte fixed ProcsRowLock numeric 4 byte fixed EventNum numeric 4 byte fixed LogonSequenceNo numeric 4 byte fixed TempTableID numeric 4 byte fixed
Names and Object IDs

48-bit number supplied by the parser. There are two major components of the Table ID: 32 bit main table + 16 bit sub-table; Table ID plus Row ID makes every row in the system unique. The Row Hash is not sufficient to identify a specific row in a table. Since it is based on a Primary Index value, multiple rows can have the same Row Hash. This is due either to Hash Synonyms or NUPI Duplicates.
The Row ID makes every row within a table uniquely identifiable. It consists of the Row Hash plus a Uniqueness Value. The Uniqueness Value is a 32-bit numeric value, designed to identify specific rows within a single Row Hash value.
The PE sends request to an AMP via the Message Passing Layer (PDE & BYNET). TID +ROW HASH + PI Val Master Index Cylinder Index Data Block
create table table_2 (col1 integer not null primary key, col2 integer) unique primary index (col2); UPI: col2 UI: col1 ----------If PK specified PK = UPI else 1st UNIQUE col specified col = UPI else 1st col specified col = NUPI create table table_1 (col1 integer not null primary key, col2 integer); UPI: col1 CHECK-2 create table table_3 (col1 integer not null UNIQUE, col2 integer); UPI: col1 CHECK-1 create table table_4 (col1 integer not null UNIQUE, col2 integer not null primary key); UPI : col2 UI:col1 CHECK-3 create table table_5 (col1 integer, col2 integer not null); NUPI: col1 ----------------create table table_6 (col1 integer not null, col2 integer not null unique) primay index(col1); NU PI: col1 UI: col2 TABLE SIZE: SQL to view Data distribution; SELECT Vproc ,CAST (TableName AS CHAR(20)) ,CurrentPerm ,PeakPerm
FROM DBC.TableSize WHERE DatabaseName = 'PERSONNEL' ORDER BY TableName, Vproc ;
Column Distribution Demographics for a PI Candidate

four ways: Distinct Values, Maximum Rows per Value, Maximum Rows NULL and Typical Rows per Value.
# of Distinct Values for a column:
SELECT COUNT(DISTINCT(column_name)) FROM tablename;
Max Rows per Value for all values in a column:

SELECT column_name, COUNT(*) FROM tablename GROUP BY 1 ORDER BY 2 DESC;
Max Rows per Value for 5 most frequent values:

SELECT t_colvalue, t_count FROM (SELECT column_name, COUNT(*) FROM tablename GROUP BY 1) t_table (t_colvalue, t_count) QUALIFY RANK (t_count) <= 5;
Average Rows per Value for a column (mean value):

SELECT COUNT(*) / COUNT(DISTINCT(col_name)) FROM tablename;
Typical Rows per Value for a column (median value):

SELECT t_count AS "Typical Rows per Value" FROM (SELECT col_name, COUNT(*) FROM tablename GROUP BY 1) t_table (t_colvalue, t_count), (SELECT COUNT(DISTINCT(col_name)) FROM tablename) t_table2 (t_distinct_count) QUALIFY RANK (t_colvalue) = (t_distinct_count / 2) ;
PPI What is a Partitioned Primary Index or PPI? A new indexing mechanism in Teradata V2R5.0 for use in physical database design. Data rows can be grouped into partitions at the AMP level. Partitioning adds a 3rd level to Teradatas data distribution and access capabilities. What advantages does a PPI provide? Increases the available options to improve the performance of certain types of queries. Only the rows of the qualified partitions in a query need to be accessed avoid full table scans. How is a PPI created and managed? A PPI is easy to create and manage. The CREATE TABLE statement contains new options to create partitions. The ALTER TABLE statement contains new options to alter partitioning. As always, data is distributed among AMPs and automatically placed within partitions.
Partitioned Primary Indexes (PPI)
A new term is associated with PPI tables. The Row Key uniquely identifies each row in a table defined with a PPI. The Row Key is a combination of the Partition #, Row Hash, and Uniqueness ID. BSERVE IN explain plan
If a table is defined with a PPI, rows are partitioned in up to 65,535 partitions (on each AMP). V2R5 with PPI PRIMARY INDEX ACCESSS
PPI considerations: Only main table can be partitioned. Exceludes: JOIN INDEXES, HASH INDEXES, GTT, Volatile Tables and secondary Indexes. But, It merely means that the PARTITION BY clause is not available on a CREATE JOIN INDEX or CREATE HASH INDEX statement. PPI can have partitioned Index
Why Define a PPI?

Increase query efficiency by avoiding full table scans without the overhead and maintenance costs of secondary indexes. Secondary Index, NUSI subtable permanent space and maintenance overhead. The PI access potential disadvantage occurs only when the partitioning column is not part of the PI. Advantages/Disadvantages of PPI
Advantages Queries which specify a restrictive condition on the partitioning column avoid full table scans. Range queries can be executed on tables without secondary indexes. The more partitions there are, the greater the potential benefit. Potential Disadvantages A PI access disadvantage occurs only when the partitioning column is not part of the PI. A query specifying a PI value, but no value for the partitioning column, must look in each partition for that value.
A direct join disadvantage occurs when joining a PPI table to a NPPI table
with the same PI (join disadvantages are discussed in another module). If one of the tables is partitioned, the rows won't be ordered the same, and the task, in effect, becomes a set of sub-joins, one for each partition of the PPI table. The disadvantage is proportional to the number of partitions, with fewer partitions being better than more partitions.
How to Define a PPI

CREATE TABLE PARTITION BY <partitioning-expression> Two functions, CASE_N and RANGE_N, are provided to simplify the creation of common partitioning schemes. Simple Partitioning Example 1
create table simp_partition_ex1 (Employee_Number INTEGER NOT NULL ,Location_Num ber INTEGER ,Dept_Number INTEGER ,Emp_Mgr_Number INTEGER ,Job_Code INTEGER ,Last_Name CHAR(20) ,First_Name VARCHAR(20) ,Salary_Amount DECIMAL(10,2)) PRIMARY INDEX (Employee_Number) PARTITION BY Location_Number; *** Table has been created. *** Total elapsed time was 1 second.
if a table has a Location_Number column defined as an integer with values between one and 65,535. Let is test limitation of PARTITION
insert into simp_partition_ex1(100,65535,10,120,12,'MATANGI','RAVI CHAND',12000); *** Insert completed. One row added. insert into simp_partition_ex1(100,65536,10,120,12,'MATANGI','RAVI CHAND',12000); *** Failure 5728 Partitioning violation for table personnel.simp_partition_ ex1. Also if Location_Number <= 0 then error will be returned.
Empty partitions use no disk space.

EXPLAIN select * from simp_partition_ex1 where location_number=65535; *** Help information returned. 16 rows. *** Total elapsed time was 1 second. Explanation ---------------------------------------------------------------------------
1) First, we lock a distinct PERSONNEL."pseudo table" for read on a RowHash to prevent global deadlock for PERSONNEL.simp_partition_ex1. 2) Next, we lock PERSONNEL.simp_partition_ex1 for read. 3) We do an all-AMPs RETRIEVE step from a single partition of PERSONNEL.simp_partition_ex1 with a condition of ( "PERSONNEL.simp_partition_ex1.Location_Number = 65535") with a residual condition of ( "PERSONNEL.simp_partition_ex1.Location_Number = 65535") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (128 bytes). The estimated time for this step is 0.03 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds. collect statistics on personnel.simp_partition_ex1 column location_number; *** Update completed. One row changed. 3) We do an all-AMPs RETRIEVE step from a single partition of PERSONNEL.simp_partition_ex1 with a condition of ( "PERSONNEL.simp_partition_ex1.Location_Number = 65535") with a residual condition of ( "PERSONNEL.simp_partition_ex1.Location_Number = 65535") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 2 rows (256 bytes). The estimated time for this step is 0.03 seconds. UPI is faster then PPI;
Partitioning with CASE_N and RANGE_N CASE_N: Evaluates a list of conditions RANGE_N: mapped into one of a list of specified ranges
NO CASE, NO RANGE, and UNKNOWN

CREATE TABLE Sales ( store_id INTEGER NOT NULL, item_id INTEGER NOT NULL, sales_date DATE FORMAT 'YYYY-MM-DD', total_revenue DECIMAL(9,2), total_sold INTEGER, note VARCHAR(256)) UNIQUE PRIMARY INDEX (store_id ,item_id ,sales_date) PARTITION BY RANGE_N ( sales_date BETWEEN DATE '2003-01-01' AND DATE '2003-12-31' EACH INTERVAL '1' DAY );
CREATE SET TABLE PERSONNEL.Sales_History_ByWeek ,FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( store_id INTEGER NOT NULL, item_id INTEGER NOT NULL, sales_date DATE FORMAT 'YYYY-MM-DD', total_revenue DECIMAL(9,2),
total_sold INTEGER, note VARCHAR(256) CHARACTER SET LATIN NOT CASESPECIFIC) UNIQUE PRIMARY INDEX ( store_id ,item_id ,sales_date ) PARTITION BY RANGE_N(sales_date BETWEEN DATE '2000-01-01' AND DATE '200012-31' EACH INTERVAL '7' DAY , DATE '2001-01-01' AND DATE '2001-12-31' EACH INTERVAL '7' DAY , DATE '2002-01-01' AND DATE '2002-12-31' EACH INTERVAL '7' DAY );
NO RANGE NO CASE
UNKNOWN ALTER TABLE MODIFY PRIMARY INDEX NOT PARTITIONED; alter table PERSONNEL.Sales_History_ByWeek modify primary index not partitioned; *** Table has been modified. CREATE SET TABLE PERSONNEL.Sales_History_ByWeek ,FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( store_id INTEGER NOT NULL, item_id INTEGER NOT NULL, sales_date DATE FORMAT 'YYYY-MM-DD', total_revenue DECIMAL(9,2), total_sold INTEGER, note VARCHAR(256) CHARACTER SET LATIN NOT CASESPECIFIC) UNIQUE PRIMARY INDEX ( store_id ,item_id ,sales_date );
Set the name of the primary index ALTER TABLE Table_A
MODIFY PRIMARY INDEX TabA_PI; Secondary Index:
Secondary Indexes are generally defined to provide faster set selection. The Teradata RDBMS allows up to 32 SIs per table.
Secondary Index Subtables

In effect, a USI retrieval is like two PI retrievals: always 2 amp operaton Master Index - Cylinder Index - Index Block Master Index - Cylinder Index - Data Block NUSIbehave differently NUSIs are implemented on an AMP-local basis.
Dual NUSI Access

Each column is a separate NUSI:
CREATE INDEX (department_number) ON Employee; CREATE INDEX (job_code) ON Employee;
AND with Equality Conditions:

SELECT last_name, first_name, FROM Employee
WHERE department_number = 500 AND job_code = 2147;
Optimizer options with AND: Use one of the two indexes if it is strongly selective. If the two indexes together are strongly selective, optionally do a bit-map intersection. If both indexes are weakly selective separately and together, the system does an FTS.
NUSI Bit Mapping is a process that determines common Row IDs between multiple NUSI values by a process of intersection:
Determines common Row IDs between multiple NUSI values. Faster than copying, sorting, and comparing the Row ID lists. Dramatically reduces the number of base table I/Os. All NUSI conditions must be linked by the AND operator. The Optimizer is much more likely to consider bit mapping if you COLLECT STATISTICS. Use EXPLAIN to see if bit mapping is being used. Requires at least 2 NUSI equality conditions.
Value-Ordered NUSIs
A Value-Ordered NUSI is limited to a single column numeric (4 byte) value. Some benefits of using value-ordered NUSIs: Index subtable rows are sorted (sequenced) by data value rather than hash value. Optimizer can search only a portion of the index subtables for a given range of
values. Can provide major advantages in performance of range queries.
Example of creating a Value-ordered NUSI by using the ORDER BY VALUES option.

CREATE INDEX (sales_date) ORDER BY VALUES (sales_date) ON Daily_Sales;
Limited to 16 columns (before V2R5) or 64 columns with V2R5.
Covering Indexes
Covering will be considered for any table in the query that references only columns defined in a given NUSI. These columns can be specified anywhere in the query including the: SELECT list WHERE clause Aggregate functions GROUP BY expressions
The presence of a WHERE condition on each indexed column is not a prerequisite for using the index to cover the query. A table may have up to 32 secondary indexes. The CREATE INDEX has a special option of ALL which is required if these columns will be potentially used for covering.
CREATE JOIN INDEX OrdCustIdx as SELECT (c_custkey, c_name), (o_status, o_date, o_comment) FROM Orders LEFT JOIN Customer ON o_custkey = c_custkey ORDER BY c_custkey PRIMARY INDEX (c_name); CREATE INDEX idx_name_stat ALL (c_name, o_status) on OrdCustIdx;
NUSI vs. Full Table Scan (FTS)

Rows per value is greater than data blocks per AMP. It does not have COLLECTed STATISTICS on the NUSI. The index is too weakly selective. The Optimizer determines this by using COLLECTed STATISTICS.
Table Level Attributes

CREATE MULTISET TABLE Table_1, FALLBACK, DATABLOCKSIZE = 16384 BYTES, FREESPACE = 10 PERCENT, CHECKSUM = NONE (column1 INTEGER, column2 CHAR(5) ); SET Dont allow duplicate rows MULTISET Allow duplicate rows (ANSI) DATABLOCKSIZE = Maximum multi-row block size for table in: BYTES Rounded to nearest sector (512) KILOBYTES (or KBYTES) Increments of 1024 MINIMUM DATABLOCKSIZE (7168) MAXIMUM DATABLOCKSIZE (130,560) IMMEDIATE May be used to immediately re-block the data FREESPACE Percent of freespace to keep on cylinder during load operations (0 - 75%). CHECKSUM = DEFAULT | NONE | LOW | MEDIUM | HIGH | ALL Disk I/O Integrity Check V2R5.1 feature
Column Level Constraints

PK Unique CHECK REFERANCE
Identity Column Overview

Identity Columns may be used to ...
Guarantee row uniqueness in a table Guarantee even row distribution for a table Optimize and simplify initial port from other databases that use generated keys
DBC generated unique primary index
Implicit uniqueness is guaranteed only for GENERATED ALWAYS + NO CYCLE Identity Columns. The valid range of values is 1 1000000
Unresolved Reference Constraints
Creating a table with a Foreign Key before creating the table with the Parent Key. Restoring a table with a Foreign Key and the Parent Key table does not exist or hasnt been restored.
ALTER TABLE <child_tablename> DROP INCONSISTENT REFERENCES; WITH NO CHECK (FK) CREATE SET TABLE PD.Employee, FALLBACK ( Employee_Number INTEGER NOT NULL, Dept_Number INTEGER, Emp_Mgr_Number INTEGER, Job_Code INTEGER, Last_Name CHAR(20), First_Name VARCHAR(20), Salary_Amount DECIMAL(10,2), FOREIGN KEY ( Dept_Number ) REFERENCES WITH NO CHECK OPTION PD.DEPARTMENT ( Dept_Number )) UNIQUE PRIMARY INDEX ( Employee_Number ); CREATE SET TABLE PD.Department, FALLBACK ( Dept_Number INTEGER NOT NULL, Dept_Name CHAR(20) NOT NULL, Dept_Mgr_Number INTEGER, Budget_Amount DECIMAL(10,2) FOREIGN KEY ( Dept_Mgr_Number ) REFERENCES WITH NO CHECK OPTION PD.Employee ( Employee_Number ) ) UNIQUE PRIMARY INDEX ( Dept_Number ) UNIQUE INDEX ( Dept_Name );
EXPLAIN PLAN
CREATE TABLE daily_sales (item_id INTEGER NOT NULL ,sales_date DATE FORMAT 'yyyy-mm-dd ,sales DECIMAL (9,2) ) PRIMARY INDEX (item_id); QUERY
EXPLAIN SELECT * FROM daily_sales WHERE item_id = 5010; single-AMP RETRIEVE (if PI is available on the selected column) High confidence: If statistics available EXPLAIN SELECT * FROM daily_sales ORDER BY 1; all-AMPs RETRIEVE - all-rows scan high confidence rows to be xxxxx rows total time estimated
EXPLAIN Terminology
Most EXPLAIN text is easy to understand. The following additional definitions may help:
... (Last Use)

A spool file is no longer needed and will be released when this step completes.
... with no residual conditions

All applicable conditions have been applied to the rows.
... END TRANSACTION

Transaction locks are released, and changes are committed.
... eliminating duplicate rows ...

Duplicate rows only exist in spool files, not set tables. Doing a DISTINCT operation.
... by way of the sort key in spool field1

Field1 is created to allow a tag sort.
... we do an ABORT test

Caused by an ABORT or ROLLBACK statement.
... by way of a traversal of index #n extracting row ids only

A spool file is built containing the Row IDs found in a secondary index (index #n)
... we do a SMS (set manipulation step)

Combining rows using a UNION, MINUS, or INTERSECT operator.
... we do a BMSMS (bit map set manipulation step)

Doing a NUSI Bit Map operation.
... which is redistributed by hash code to all AMPs.

Redistributing data in preparation for a join.
... which is duplicated on all AMPs.

Duplicating data from the smaller table (in terms of SPOOL) in preparation for a join.
... (group_AMPs)
V2R5 feature in which a subset of AMPs will be used instead of all AMPs.
... ("NOT (table_name.column_name IS NULL)")

V2R5 feature in which optimizer realizes that column being joined to is NOT NULL or has referential integrity.
The

DB Design

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

DB Design

Загружено:

Авторское право:

Доступные форматы

Physical DB Design Stages of DB Design: 1. Requirement Analysis 2. Logical Modeling a. Conceptual Modeling : user views in ER diagrams b.

Customer Service Logical Model

Rules for Identifying Entities

Hash Related Expressions

Identify New Data and Reports Required by Users

Definitions of Commonly Used Relational Terms

Uniq values Unique Hash -> Unique buckets Same AMP

Names and Object IDs

FROM DBC.TableSize WHERE DatabaseName = 'PERSONNEL' ORDER BY TableName, Vproc ;

Column Distribution Demographics for a PI Candidate

Max Rows per Value for all values in a column:

Max Rows per Value for 5 most frequent values:

Average Rows per Value for a column (mean value):

Typical Rows per Value for a column (median value):

Partitioned Primary Indexes (PPI)

Why Define a PPI?

How to Define a PPI

Empty partitions use no disk space.

NO CASE, NO RANGE, and UNKNOWN

Secondary Index Subtables

Dual NUSI Access

AND with Equality Conditions:

WHERE department_number = 500 AND job_code = 2147;

Example of creating a Value-ordered NUSI by using the ORDER BY VALUES option.

Limited to 16 columns (before V2R5) or 64 columns with V2R5.

NUSI vs. Full Table Scan (FTS)

Table Level Attributes

Column Level Constraints

Identity Column Overview

Unresolved Reference Constraints

... (Last Use)

... with no residual conditions

... END TRANSACTION

... eliminating duplicate rows ...

... by way of the sort key in spool field1

... we do an ABORT test

... by way of a traversal of index #n extracting row ids only

... we do a SMS (set manipulation step)

... we do a BMSMS (bit map set manipulation step)

... which is redistributed by hash code to all AMPs.

... which is duplicated on all AMPs.

... ("NOT (table_name.column_name IS NULL)")

Вам также может понравиться