Вы находитесь на странице: 1из 10

Normalization & Denormalization 1

Normalization is the process of efficiently organizing data in a database. There are two goals of
the normalization process: eliminate redundant data (for example, storing the same data in more
than one table) and ensure data dependencies make sense (only storing related data in a
table). Both of these are worthy goals as they reduce the amount of space a database consumes
and ensure that data is logically stored.

First normal form (1NF) sets the very basic rules for an organized database:
1. Eliminate duplicative columns from the same table.
2. Create separate tables for each group of related data and identify each row with a unique
column or set of columns (the primary key).

Second normal form (2NF) further addresses the concept of removing duplicative data:
1 Meet all the requirements of the first normal form.
2 Remove subsets of data that apply to multiple rows of a table and place them in separate
tables.
3 Create relationships between these new tables and their predecessors through the use of
foreign keys.

Third normal form (3NF) goes one large step further:


1 Meet all the requirements of the second normal form.
2 Remove columns that are not dependent upon the primary key.

Finally, fourth normal form (4NF) has one additional requirement:


1 Meet all the requirements of the third normal form.
2 A relation is in 4NF if it has no multi-valued Dependencies

Q A clustered table will store its rows physically on disk in order by a specified column or columns
The difference is that, Clustered index is unique for any given table and we can have only one
clustered index on a table. The leaf level of a clustered index is the actual data and the data is
resorted in case of clustered index.Whereas in case of non-clustered index the leaf level is
actually a pointer to the data in rows so we can have as many non-clustered indexes as we can
on the db. The clustering index forces table rows to be stored in ascending order by the indexed
columns. There can be only one clustering sequence per table (because physically the data can
be stored in only one sequence).

Use clustered indexes for the following situations:

• Join columns, to optimize SQL joins where multiple rows match for one or both
tables participating in the join
• Foreign key columns because they are frequently involved in joins and the DBMS
accesses foreign key values during declarative referential integrity checking
• Predicates in a WHERE clause
• Range columns
• Columns that do not change often (reduces physically reclustering)
• Columns that are frequently grouped or sorted in SQL statements

Q Triggers are basically used to implement business rules.Triggers is also similar to stored
procedures. The difference is that it can be activated when data is added or edited or deleted
from a table in a database.
Normalization & Denormalization 2

Q Delete command removes the rows from a table based on the condition that we provide with
a WHERE clause. Truncate will actually remove all the rows from a table and there will be
no data in the table after we run the truncate command.
Delete Command require Log file updation for each row of deleting process. But the Truncate
command not.So, the Truncate Command is so faster than Delete Command.

Q A transaction is a sequence of sql Operations(commands), work as single atomic unit of


work. To be qualify as "Transaction" , this sequence of operations must satisfy 4 properties ,
which is knwon as ACID test.
A(Atomicity):-The sequence of operations must be atomic, either all or no operations are
performed.
C(Consistency):- When completed, the sequence of operations must leave data in consistent
mode. All the defined relations/constraints must me Maintained.
I(Isolation): A Transaction must be isolated from all other transactions. A transaction sees the
data before the operations are performed , or after all the operations has performed, it can't see
the data in between.
D(Durability): All operations must be permanently placed on the system. Even in the event of
system failure , all the operations must be exhibit.
Q data integrity
A constraint is a property assigned to a column or the set of columns in a table that prevents
certain types of inconsistent data values from being placed in the column(s). Constraints are used
to enforce the data integrity. This ensures the accuracy and reliability of the data in
the database. The following categories of the data integrity exist:

Data integrity (DI) is an important feature in SQL Server. When used properly, it ensures that
data is accurate, correct, and valid. It also acts as a trap for otherwise undetectable bugs within
your applications. However, DI remains one of the most neglected SQL Server features.

1 Entity Integrity
2 Domain Integrity
3 Referential integrity
4 User-Defined Integrity

Entity Integrity ensures that there are no duplicate rows in a table.


Domain Integrity enforces valid entries for a given column by restricting the type, the format, or
the range of possible values.
Referential integrity ensures that rows cannot be deleted, which are used by other records (for
example, corresponding data values between tables will be vital).
User-Defined Integrity enforces some specific business rules that do not fall into entity, domain,
or referential integrity categories.

Each of these categories of the data integrity can be enforced by the appropriate
constraints. Microsoft SQL Server supports the following constraints:

PRIMARY KEY
UNIQUE
FOREIGN KEY
CHECK
NOT NULL

A PRIMARY KEY constraint is a unique identifier for a row within a database table. Every table
should have a primary key constraint to uniquely identify each row and only one primary key
constraint can be created for each table. The primary key constraints are used to enforce entity
integrity.
Normalization & Denormalization 3

A UNIQUE constraint enforces the uniqueness of the values in a set of columns, so no duplicate
values are entered. The unique key constraints are used to enforce entity integrity as the primary
key constraints.

A FOREIGN KEY constraint prevents any actions that would destroy link between tables with the
corresponding data values. A foreign key in one table points to a primary key in another table.
Foreign keys prevent actions that would leave rows with foreign key values when there are no
primary keys with that value. The foreign key constraints are used to enforce referential integrity.

A CHECK constraint is used to limit the values that can be placed in a column. The check
constraints are used to enforce domain integrity.

A NOT NULL constraint enforces that the column will not accept null values. The not null
constraints are used to enforce domain integrity, as the check constraints.

Denormalization
Denormalization is the art of introducing database design mechanisms that enhance
performance. Successful denormalization depends upon a solid database design normalized to at
least the third normal form. Only then can you begin a process called responsible
denormalization. The only reason to denormalize a database is to improve performance.
In this section, we will discuss techniques for denormalization and when they should be used.
The techniques are

• Creating redundant data


• Converting views to tables
• Using derived (summary) columns and tables
• Using contrived (identity) columns
• Partitioning data
o Subsets/vertical partitioning
o Horizontal partitioning
o Server partitioning (multiple servers)

Creating Redundant Data


Repeating a column from one table in another table can help avoid a join. For example, because
an invoice is for a customer, you could repeat the customer name in the invoice table so that
when it is time to print the invoice you do not have to join to the customer table to retrieve the
name. However, you also need a link to the customer table to get the billing address. Therefore, it
makes no sense to duplicate the customer name when you already need to join to the customer
table to get other data. A good example of redundant data is to carry the customer name in a
table that contains time-related customer activity; for example, annual or monthly summaries of
activity.

You can keep redundant data up-to-date with a trigger so that when the base table changes, the
tables carrying the redundant data also change.

Converting Views to Tables


This technique creates redundant data on a grand scale. The idea is that you create a table from
a view that joins multiple tables. You need complex triggers to keep this table synchronized with
the base tables — whenever any of the base tables change, the corresponding data in the new
table must also change.

An example of this technique is a vendor tracking system that contains several tables, as shown
in Figure 16.6, an entity-relationship diagram. A vendor can have multiple addresses, one of
which is marked “billing”; a vendor can have multiple contacts, one marked “primary”; and each
Normalization & Denormalization 4

contact can have many phone numbers, one marked “Main.”

The following Select statement creates a view that joins all the tables.

SELECT V.VendorID,
V.Name,
VA.Address1,
VA.Address2,
VA.City,
VA.State,
VA.PostalCode,
VA.Country,
VC.FirstName,
VC.LastName,
CP.Phone
FROM Vendor V,
VendorAddress VA,
VendorContact VC,
ContactPhone CP
WHERE V.VendorID = VA.VendorID
AND VA.AddressType = 'Billing'
AND V.VendorID = VC.VendorID
AND VC.ContactType = 'Primary'
AND VC.ContactID = CP.ContactID
AND CP.PhoneType = 'Main'
The above view is often too slow for quick lookup and retrieval. Converting that view to a table
with the same columns and datatypes and using a trigger to keep it up-to-date could give you
improved performance. More important, creating a table from a view gives developers a choice of
retrieving data from either the base tables or the redundant table.

The drawback to this technique is the performance penalty of keeping the table up-to-date. You
would use it only if the retrieval speed outweighed the cost of updating the table.

Using Derived (Summary) Columns and Tables


The two main types of derived columns are calculated and summary. Calculated columns are
usually extensions to the row containing the base field. For example, an invoice detail record has
QtyOrdered and ItemCost fields. To show the cost multiplied by the quantity, you could either
calculate it each time it is needed (for example, in screens, reports, or stored procedures) or you
could calculate the value each time you save the record and keep the value in the database.
However, in this particular example you would not use this form of denormalization because
calculated fields whose base data is in the same row don’t take much time to calculate, and
storing the calculation isn’t worth the extra storage cost.

Summary columns, on the other hand, are examples of denormalization that can have substantial
performance gains. Good examples include adding MonthToDate, QtrToDate, and YearToDate
columns in the Account table of a general ledger system. Each of these columns summarizes of
data in many different records. By adding these columns, you avoid the long calculation needed
for the summary and get the same data from one read instead. Of course, this option is not very
flexible, because if you need a total from a certain date range, you still have to go back to the
base data to summarize the detail records.

A poor example of using summary fields is carrying an invoice subtotal on the header record
when your orders average five lines — for example, in a high-priced, low-volume inventory
system, such as a major appliance store. It doesn’t take much time or effort for SQL Server to add
five rows. But what if you have a low-priced, high-turnover inventory, such as a grocery store, with
hundreds of lines per order? Then it would make more sense to carry an InvoiceSubtotal field in
Normalization & Denormalization 5

the header record.

Summarizing data by time period (day, month, year, etc.) or across product lines and saving the
summaries in a table or tables is a good strategy to shortcut the reporting process. This technique
is sometimes called a data warehouse. It can be expensive to keep up-to-date, because the data
collection process must also update the summary tables. One strategy to bypass the cost of the
data collection process is to replicate the transaction data to another database on another server
where the data warehouse is periodically updated.

Using Contrived (Identity) Columns


Sometimes called counter columns and identity columns, contrived columns are columns to which
the database assigns the next available number. Contrived columns are the primary keys in these
tables, but they are not necessarily the only unique indexes. They are usually a shortcut for wide,
multi-column keys. The contrived values are carried in other tables as foreign key values and link
to the original table.

Data Partitioning
You can partition data vertically or horizontally, depending on whether you want to split columns
or rows. You can also partition data across servers. We consider each type of partitioning below.

Subsets/Vertical Partitioning
Also called subsets, vertical partitions split columns that are not used often into new tables.
Subsets are zero- or one-to-one relationships, meaning that for each row in the main table, one
record (at most) exists in the subset table. The advantage to vertical partitioning is that the main
table has narrower records, which results in more rows per page. When you do need data from
the subset table, a simple join can retrieve the record.

Horizontal Partitioning
Horizontal partitioning, also known as row partitioning, is a more difficult type of partitioning to
program and manage. The table is split so that rows with certain values are kept in different
tables. For example, the data for each branch office could be kept in a separate table. A common
example in data warehousing — historical transactions — keeps each month of data in a
separate table.

The difficulty in horizontal partitioning comes in returning rows from each table as one result set.
You use a Union statement to create this result set, as demonstrated below.

SELECT * FROM JanuaryTable


UNION
SELECT * FROM MarchTable
SQL 6.5 has a major new feature that allows Unions in views. Most systems that use horizontal
partitioning build flexible SQL statements on the front-end, which do not take advantage of stored
procedures.

Server Partitioning (Using Multiple Servers)


An enhancement in version 6.5 gives you even more incentive to distribute your data and spread
the load among multiple servers. Now that you can return result sets from a remote procedure
and replicate to ODBC databases, programming and maintaining distributed databases is getting
easier. Mixing OLTP and data warehousing has rarely been successful. It is best to have your
data collection systems on one server and reporting database(s) on another.

The keys to designing an architecture involving multiple servers are a thorough knowledge of
replication and of returning results from remote procedures. For a complete discussion of
replication, see Chapter 11.
Normalization & Denormalization 6

Here is a simple example of returning a result set from a remote procedure and inserting it into a
table.

INSERT MyTable (Field1, Field2, Field3)


EXECUTE OtherServer.OtherDatabase.Owner.prProcedure @iParm1,
@sParm2
The remote procedure must return a result set with the same number of fields and the same data
types. In the example above, three fields must be returned. If they are not exactly the same data
types, they must be able to be implicitly converted by SQL Server. As you can see from the
example, you can pass parameters to the remote procedures. You can also insert the results into
an existing temporary table by placing the # symbol in front of the table name, as in #MyTable.

SUMMARY

Performance tuning can be a never-ending series of adjustments in which you alleviate one
bottleneck only to find another immediately. If your server is performing poorly, you need to
answer two basic questions: “Do I add more memory, disk space, or CPU?” and “Do I have
enough capacity to handle the anticipated growth of the company?” This book gives you enough
information to make informed decisions. Although more can be gained from tuning hardware than
ever before, the real gains come from tuning your queries and making good use of indexes, so
don’t stop reading here.
Introduction
Transaction Isolation Levels
Lock types
Locking optimizer hints
Deadlocks
View locks (sp_lock)
Literature

Introduction

In this article, I want to tell you about SQL Server 7.0/2000 Transaction Isolation Levels, what
kinds of Transaction Isolation Levels exist, and how you can set the appropriate Transaction
Isolation Level, about Lock types and Locking optimizer hints, about deadlocks, and about how
you can view locks by using the sp_lock stored procedure.

Transaction Isolation Levels

There are four isolation levels:

READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SERIALIZABLE

Microsoft SQL Server supports all of these Transaction Isolation Levels and can separate
REPEATABLE READ and SERIALIZABLE.

Let me to describe each isolation level.

READ UNCOMMITTED
Normalization & Denormalization 7

When it's used, SQL Server not issue shared locks while reading data. So, you can read an
uncommitted transaction that might get rolled back later. This isolation level is also called dirty
read. This is the lowest isolation level. It ensures only that a physically corrupt data will not be
read.

READ COMMITTED

This is the default isolation level in SQL Server. When it's used, SQL Server will use shared
locks while reading data. It ensures that a physically corrupt data will not be read and will never
read data that another application has changed and not yet committed, but it not ensures that the
data will not be changed before the end of the transaction.

REPEATABLE READ

When it's used, the dirty reads and nonrepeatable reads cannot occur. It means that locks will be
placed on all data that is used in a query, and another transactions cannot update the data.

This is the definition of nonrepeatable read from SQL Server Books Online:

nonrepeatable read
When a transaction reads the same row more than one time, and between the
two (or more) reads, a separate transaction modifies that row. Because the
row was modified between reads within the same transaction, each read
produces different values, which introduces inconsistency.

SERIALIZABLE

Most restrictive isolation level. When it's used, the phantom values cannot occur. It prevents other
users from updating or inserting rows into the data set until the transaction will be completed.

This is the definition of phantom from SQL Server Books Online:

phantom
Phantom behavior occurs when a transaction attempts to select a row that
does not exist and a second transaction inserts the row before the first
transaction finishes. If the row is inserted, the row appears as a phantom
to the first transaction, inconsistently appearing and disappearing.
You can set the appropriate isolation level for an entire SQL Server session by using the SET
TRANSACTION ISOLATION LEVEL statement.
This is the syntax from SQL Server Books Online:

SET TRANSACTION ISOLATION LEVEL


{
READ COMMITTED
| READ UNCOMMITTED
| REPEATABLE READ
| SERIALIZABLE
}
You can use the DBCC USEROPTIONS statement to determine the Transaction Isolation Level
currently set. This command returns the set options that are active for the current connection.
This is the example:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED


GO
Normalization & Denormalization 8

DBCC USEROPTIONS
GO

Lock types

There are three main types of locks that SQL Server 7.0/2000 uses:

Shared locks
Update locks
Exclusive locks

Shared locks are used for operations that do not change or update data, such as a SELECT
statement.

Update locks are used when SQL Server intends to modify a page, and later promotes the
update page lock to an exclusive page lock before actually making the changes.

Exclusive locks are used for the data modification operations, such as UPDATE, INSERT, or
DELETE.

Shared locks are compatible with other Shared locks or Update locks.

Update locks are compatible with Shared locks only.

Exclusive locks are not compatible with other lock types.

Let me to describe it on the real example. There are four processes, which attempt to lock the
same page of the same table. These processes start one after another, so Process1 is the first
process, Process2 is the second process and so on.

Process1 : SELECT
Process2 : SELECT
Process3 : UPDATE
Process4 : SELECT

Process1 sets the Shared lock on the page, because there are no another locks on this page.
Process2 sets the Shared lock on the page, because Shared locks are compatible with other
Shared locks.
Process3 wants to modify data and wants to set Exclusive lock, but it cannot make it before
Process1 and Process2 will be finished, because Exclusive lock is not compatible with other lock
types. So, Process3 sets Update lock.
Process4 cannot set Shared lock on the page before Process3 will be finished. So, there is no
Lock starvation. Lock starvation occurs when read transactions can monopolize a table or
page, forcing a write transaction to wait indefinitely. So, Process4 waits before Process3 will be
finished.
After Process1 and Process2 were finished, Process3 transfer Update lock into Exclusive lock to
modify data. After Process3 was finished, Process4 sets the Shared lock on the page to select
data.

Locking optimizer hints

SQL Server 7.0/2000 supports the following Locking optimizer hints:

NOLOCK
HOLDLOCK
Normalization & Denormalization 9

UPDLOCK
TABLOCK
PAGLOCK
TABLOCKX
READCOMMITTED
READUNCOMMITTED
REPEATABLEREAD
SERIALIZABLE
READPAST
ROWLOCK

NOLOCK is also known as "dirty reads". This option directs SQL Server not to issue shared locks
and not to honor exclusive locks. So, if this option is specified, it is possible to read an
uncommitted transaction. This results in higher concurrency and in lower consistency.

HOLDLOCK directs SQL Server to hold a shared lock until completion of the transaction in which
HOLDLOCK is used. You cannot use HOLDLOCK in a SELECT statement that includes the FOR
BROWSE option. HOLDLOCK is equivalent to SERIALIZABLE.

UPDLOCK instructs SQL Server to use update locks instead of shared locks while reading a
table and holds them until the end of the command or transaction.

TABLOCK takes a shared lock on the table that is held until the end of the command. If you also
specify HOLDLOCK, the lock is held until the end of the transaction.

PAGLOCK is used by default. Directs SQL Server to use shared page locks.

TABLOCKX takes an exclusive lock on the table that is held until the end of the command or
transaction.

READCOMMITTED
Perform a scan with the same locking semantics as a transaction running at the READ
COMMITTED isolation level. By default, SQL Server operates at this isolation level.

READUNCOMMITTED
Equivalent to NOLOCK.

REPEATABLEREAD
Perform a scan with the same locking semantics as a transaction running at the REPEATABLE
READ isolation level.

SERIALIZABLE
Perform a scan with the same locking semantics as a transaction running at the SERIALIZABLE
isolation level. Equivalent to HOLDLOCK.

READPAST
Skip locked rows. This option causes a transaction to skip over rows locked by other transactions
that would ordinarily appear in the result set, rather than block the transaction waiting for the
other transactions to release their locks on these rows. The READPAST lock hint applies only to
transactions operating at READ COMMITTED isolation and will read only past row-level locks.
Applies only to the SELECT statement.
You can only specify the READPAST lock in the READ COMMITTED or REPEATABLE READ
isolation levels.

ROWLOCK
Use row-level locks rather than use the coarser-grained page- and table-level locks.
Normalization & Denormalization 10

You can specify one of these locking options in a SELECT statement.


This is the example:

SELECT au_fname FROM pubs..authors (holdlock)

Deadlocks

Deadlock occurs when two users have locks on separate objects and each user wants a lock on
the other's object. For example, User1 has a lock on object "A" and wants a lock on object "B"
and User2 has a lock on object "B" and wants a lock on object "A". In this case, SQL Server ends
a deadlock by choosing the user, who will be a deadlock victim. After that, SQL Server rolls back
the breaking user's transaction, sends message number 1205 to notify the user's application
about breaking, and then allows the nonbreaking user's process to continue.

You can decide which connection will be the candidate for deadlock victim by using SET
DEADLOCK_PRIORITY. In other case, SQL Server selects the deadlock victim by choosing the
process that completes the circular chain of locks.

So, in a multiuser situation, your application should check the error 1205 to indicate that the
transaction was rolled back, and if it's so, restart the transaction.

Note. To reduce the chance of a deadlock, you should minimize the size of transactions and
transaction times.

View locks (sp_lock)

Sometimes you need a reference to information about locks. Microsoft recommends using the
sp_lock system stored procedure to report locks information. This very useful procedure returns
the information about SQL Server process ID, which lock the data, about locked database, about
locked table ID, about locked page and about type of locking (locktype column).

This is the example of using the sp_lock system stored procedure:

spid locktype table_id page dbname


------ ----------------------------------- ----------- ----------- ---------------
11 Sh_intent 688005482 0 master
11 Ex_extent 0 336 tempdb
The information, returned by sp_lock system stored procedure needs in some clarification,
because it's difficult to understand database name, object name and index name by their ID
numbers.

Check the link below if you need to get user name, host name, database name, index name
object name and object owner instead of their ID numbers:
Detailed locking view: sp_lock2