Академический Документы
Профессиональный Документы
Культура Документы
Summary
Prerequisites
Normalization
Types of Attributes
• Simple
– Each entity has a single atomic value for the attribute. For example, SSN or
Gender.
• Composite
– The attribute may be composed of several components. For example:
– Address(Apt#, House#, Street, City, State, ZipCode, Country), or
– Name(FirstName, MiddleName, LastName).
– Composition may form a hierarchy where some components are themselves
composite
• Multi-valued
– An entity may have multiple values for that attribute. For example, Color of a
CAR or PreviousDegrees of a STUDENT.
– Denoted as {Color} or {PreviousDegrees}.
Types of Attributes (Continued)
• In general, Composite and multi-valued attributes can be nested
to any number of levels, although this is rare.
– For example, Previous Degrees of a STUDENT is a composite multi-valued
attribute denoted by {PREVIOUS_DEGREES (College, Year, Degree, Field)}
– Multiple PreviousDegrees values can exist
– Each has four subcomponent attributes: Degree
Field
STUDENT_ID
Simple
Student
Key Attributes
• An attribute of an entity for which each entity must have a unique value is
called a key attribute of the entity.
– For example, ID of STUDENT. STUDENT_ID
• An entity may have more than one key attribute. One is a primary key Others are secondary/candidate keys
– The CAR entity may have two keys:
– Car_Id Number City
Car_Id
Brand
Year
CAR Color
Multi-valued {}
Key Attributes (Continued)
• Primary key
– A one or more attribute(s)/column(s) chosen to uniquely identify a row in a
relation/table.
• Foreign key
– A Primary key from one table that is placed into another table. (used to
establish relationships and link tables together).
COURSE Table Primary key
Name Course_code Credit_hours Department
SECTION Table
Section_id Course_code Semester Year Instructor
Foreign key
Two attributes used for uniquely identify a row in the SECTION Table.
File systems and motivations
• Consider the following table:
– Data inconsistency
– Duplicate entries
• An update anomaly is a data inconsistency that results from data redundancy and a partial update. For example, each employee in a
company has a department associated with them as well as the student group they participate in. If A. Bruchs’ department is an error it
must be updated at least 2 times or there will be inconsistent data in the database. If the user performing the update does not realize the
data is stored redundantly the update will not be done properly.
• A deletion anomaly is the unintended loss of data due to deletion of other data. For example, if the student group Beta Alpha Psi
disbanded and was deleted from the previous table, J. Longfellow and the Accounting department would cease to exist. This results in
database inconsistencies and is an example of how combining information that does not really belong together into one table can cause
problems.
• An insertion anomaly is the inability to add data to the database due to absence of other data. For example, assume Student_Group is
defined so that null values are not allowed. If a new employee is hired but not immediately assigned to a Student_Group then this
employee could not be entered into the database. This results in database inconsistencies due to omission.
• Update, deletion, and insertion anomalies are very undesirable in any database. Anomalies are avoided by the process of normalization.
Multi-valued & composite attributes are your worst enemy
Normalization (1NF)
First Normal Form (1NF)
• The only attribute values permitted by 1NF are single atomic (or indivisible) values.
• Remove multivalued attributes, composite attributes, and their combinations from the original
table/relation.
• Create a new table/relation that contains the removed attributes and then link that table with
the original one using the primary key of the original table as a foreign key
ABC 123 Cairo Black ABC 123 Giza FORD 2016 ABC 123 Cairo KIA 2017 Black, Silver
ABC 123 Cairo Silver XYZ 456 Alex BMW 2015 ABC 123 Giza FORD 2016 White, Black,
Grey
ABC 123 Giza White
XYZ 456 Alex BMW 2015 Gold
ABC 123 Giza Black
To bring this table to first normal form, we split the table into two
ABC 123 Giza Grey tables and now we have the resulting tables (Cars Table and
Cars_Colors Table)
XYZ 456 Alex Gold
First Normal Form (Example 1)
Disease Treatment
Composite ()
Simple key Medical History
P_no
DOB
Name
Patient Checks
Multi-valued {}
Normalization (1NF)
• 𝑅2 = (P_no, Checks)
First Normal Form (Example 2)
Disease Treatment
P_no
DOB
Name
Patient
𝑅0 = (P_no, Name, DOB, Medical History ({Disease}, {Treatment}))
Normalization (1NF)
P_no
DOB
Name
Composite multi-valued {()}
Name
Patient Checks Date
Result
𝑅0 = (P_no, Name, DOB, Medical History ({Disease}, {Treatment}), {Checks (Name, Date, Result)})
Normalization (1NF)
Normalization (2NF)
Second Normal Form (1NF++ ➔ 2NF)
• Functional Dependency
– A relationship between attributes in which one attribute (or group of attributes) determines
the value of another attribute in the same table.
In a table, if attribute B is functionally
dependent on A, but is not functionally
• Determinant dependent on a proper subset of A, then B is
considered fully functional dependent on A.
Hence, in a 2NF table, all non-key attributes
– an attribute that determines the values of other attributes cannot be dependent on a subset of the
primary key. Note that if the primary key is not a
– All primary keys are determinants composite key, all non-key attributes are
always fully functional dependent on the
– If A → B then A is the determinant and B is functionally dependent on A primary key. A table that is in 1st normal form
and contains only a single key as the primary
key is automatically in 2nd normal form.
• A relation is in the second normal form if and only if:
– It is in the first normal form (1NF)
– Every non-key attributes are fully functional dependent on the primary key
– Every non-key attribute must be defined by the entire key, not by only part of the key (No partial
dependencies)
Second Normal Form (Example 1)
Relation/Table Composite primary key contains two attributes Non-key attributes
• Course_Id → Course_name
– Course_Id is a part of the primary key {Student_Id, Course_Id}.
– Course_Name is partially dependent on Course_Id, So it can
determine/define the Course_Name which is a non-key attribute using
only part of the primary key (Course_Id).
Second Normal Form (Example 1 Continued)
• This relation has a composite (contains more than one attribute) primary key {Student_Id, Course_Id}.
• Since Course_Name depends on Course_Id, which is only part of the primary key Therefore, this table does
not satisfy second normal form conditions.
• To bring this relation to second normal form, we break the relation into two relations, and now we have the
following:
– 𝑅1 = STUDENT_GRADE (Student_Id, Course_Id, Grade)
• What we have done is removing the partial functional dependency that we initially had. Now, in the
relation COURSE, the non-key attribute Course_Name is fully dependent on the primary key of that table,
which is Course_Id. And in the relation STUDENT_GRADE, all the non-key attributes are fully functional
dependent on the primary key of that table too.
Cut your coat according to your cloth
Normalization (3NF)
Third Normal Form (2NF++ ➔ 3NF)
• transitive functional dependency:
– A relationship between attributes in which attribute B is functionally dependent
on attribute A (attribute A determines attribute B), and attribute C is functionally
dependent on attribute B (attribute B determines attribute C). In this case,
attribute C is transitively dependent on attribute A via attribute B. (A → B → C)
– Key attribute A determines non-key attribute B and that non-key attribute B
can be used to determine another non-key attribute C
• Evaluator_Id → Evaluator_Name
– In the STUDENT_GRADE relation above, {Student_Id, Course_Id} determines Evaluator_Id,
and Evaluator_Id determines Evaluator_Name.
– Evaluator_Id is a non-key attribute that determines another non-key attribute
Evaluator_Name. Therefore, we have a transitive functional dependency, and this
structure does not satisfy third normal form conditions.
Third Normal Form (Example 1 Continued)
• To bring this relation to third normal form, we break the relation into two relations, and now
we have the following:
– 𝑅1 = STUDENT_GRADE (Student_Id, Course_Id, Grade, Evaluator_Id)
• Now all non-key attributes are fully functional dependent only on the primary key.
– In STUDENT_GRADE, both Grade and Evaluator_Id are only dependent on {Student_Id, Course_Id}.
Normalization (3NF)
Normalization
The data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF]
NIN Contract No Hours Employee Name Company ID Company Location
616681B SC1025 72 P. White SC115 Belfast
• Given the PK: {NIN, ContractNo} each non-primary key attribute needs to be checked for
fully functional dependency on the PK:
• Employee name is fully functionally dependant on NIN (NIN → Employee Name). but as NIN
is part of the PK, Employee Name is not fully functional dependent on the PK therefore the
form is NOT on 2NF.
• And the previous tables are left the way they are because they are
3NF already
• The final tables normalised to 3NF are:
– Staff Details (NIN, EmployeeName)
– StaffContract (NIN, ContractNo, hours)
– Contracts (ContractNo, CompanyID)
– Company (CompanyID, CompanyLocation)
Transactions
Introduction
• Concurrency:
– Interleaved processing:
– Concurrent execution of processes is interleaved in a single CPU
– Parallel processing:
– Processes are concurrently executed in multiple CPUs
• A Transaction:
– Logical unit (set of operations) of database processing that includes one or
more access operations (read -retrieval, write - insert or update, delete).
• Transaction boundaries:
– Begin and End transaction.
– Why do we need boundaries?
– An application program may contain several transactions separated by the Begin
and End transaction boundaries.
Introduction (Continued)
• Desirable Properties of Transactions (ACID)
– Find the address of the disk block that contains item X. Buffer Block in main memory
– Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory
buffer).
– Copy item X from the buffer to the program variable named X.
Disk block
X X X Program stack
memory
• This occurs when two transactions that access the same database items have their
The Lost Update operations interleaved in a way that makes the value of some database item incorrect.
• This occurs when one transaction updates a database item and then the transaction fails for
The Temporary Update (or Dirty Read) some reason
• The updated item is accessed by another transaction before it is changed back to its original
value.
• If one transaction is calculating an aggregate summary function on a number of records
The Incorrect Summary Problem while other transactions are updating some of these records, the aggregate function may
calculate some values before they are updated and others after they are updated.
That’s why we need a concurrency control of some sort to solve/avoid these problems.
The Lost Update Problem
The temporary update problem
The incorrect summary problem
Why recovery is needed? (What causes a transaction to fail)
Failure Cause
A computer failure (system crash) • A hardware or software error occurs in the computer system during transaction execution. If
the hardware crashes, the contents of the computer’s internal memory may be lost.
• Some operation in the transaction may cause it to fail, such as:
• Integer overflow
A transaction or system error
• Division by zero.
• Erroneous parameter values
• Logical programming error.
• In addition, the user may interrupt the transaction during its execution.
• Certain conditions necessitate cancellation of the transaction. For example,
• data for the transaction may not be found.
Local errors or exception conditions detected by the transaction
• A condition, such as insufficient account balance in a banking
• database, may cause a transaction, such as a fund withdrawal from that
account, to be canceled.
• A programmed abort in the transaction causes it to fail.
Concurrency control enforcement • The concurrency control method may decide to abort the transaction, to be restarted later,
because it violates serializability or because several transactions are in a state of deadlock
• Some disk blocks may lose their data because of:
• A read or write malfunction
Disk failure
• A disk read/write head crash.
• This may happen during a read or a write operation of the transaction.
Physical problems and catastrophes • This refers to an endless list of problems that includes power or air-conditioning failure, fire,
theft, sabotage, overwriting disks or tapes by mistake, and mounting of a wrong tape by the
operator.
Transaction Concepts
• A transaction is an atomic unit of work that is either
completed in its entirety or not done at all.
– For recovery purposes, the system needs to keep track of
when the transaction starts, terminates, and commits or
aborts.
• Transaction states:
– Active state
– Partially committed state
– Committed state
– Failed state
– Terminated State
Transaction Concepts (Continued)
Transaction Concepts (Continued)
• The System Log or Journal: keeps track of all transaction operations
that affect the values of database items.
– This information may be needed to permit recovery from transaction failures.
Result equivalent • Two schedules are called result equivalent if they produce the same final state of the database.
Conflict equivalent • Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the
same in both schedules.
• Note: Being serializable (executed concurrently) is not the same as being serial (executed consecutively)
• Being serializable implies that the schedule is a correct schedule.
• It will leave the database in a consistent state.
• The interleaving is appropriate and will result in a state as if the transactions were serially executed,
yet will achieve efficiency due to concurrent execution.
Transaction schedule (Continued)
• Assume that the initial values of database items
are X = 90 and Y = 90 and that N = 3 and M = 2.
FAT CLIENT
Two-Tier Database
Server Architecture
Thinner clients
Three-tier
Architecture Client does little
Thin/ Thinnest clients processing (Thinnest)
Client/Server Architectures (Continued)
File Server Architecture Two-Tier Database Server Three-tier Architecture (Thinnest)
(FAT) Architecture (Thinner)
• All processing is done at • Client is responsible for • Includes another (Application server)
the PC that requested the • I/O processing logic layer in addition to the (Two-Tier)
data • Some business rules logic client and database server layers
• Entire files are transferred • DB server performs all data storage • Client is responsible for
from the server to the client and access processing • I/O processing (Browser)
Features for processing • DBMS is only on DB server • DB server performs all data storage
and access processing (DBMS)
• Processing logic could be at client, • Application server is responsible for
server, or both • Business rules (Web Server)
Remote Procedure Calls (RPC) Client makes calls to procedures running on remote computers (synchronous and asynchronous)
Message-Oriented Middleware (MOM) Asynchronous calls between the client via message queues
Object Request Broker (ORB) Object-oriented management of communications between clients and servers
Database Middleware
JDBC–Java Database Connectivity Special Java classes that allow Java applications/applets to connect to databases
Note: Application program interface in two tier architecture is provided by the ODBC–Open Database Connectivity
XML: Extensible
Markup Language
Introduction
• Unstructured data
– Limited indication of the of data document that contains
information embedded within it
– Example: could be the information stored in A File
• Structured data
– Represented in a strict format
– Example: information stored in databases
• Semi-structured data
– Has a certain structure
– Not all information collected will have identical structure
– Example: could be the information stored in an XML document
Introduction (Continued)
• XML: Markup Language that enables you to create your own custom tags.
– Contain Self-describing data
– Mainly used for structuring/storing data and exchange it across the web.
– Can be used to provide more information about the structure and meaning of the data in the
Web pages rather than just specifying how the Web pages are formatted for display on the
screen like HTML.
• Elements (tags) and attributes (Provide additional information that describe elements)
– Main structuring concepts used to construct an XML document
• Two query XML
language standards
• Complex elements: Constructed from other elements hierarchically • Xpath
• XQuery
• XML tag names: Describe the meaning of the data elements in the document
Main types of XML documents
• Data-centric XML documents:
– These documents have many small data items that follow a specific structure, and hence may be extracted from a
structured database. They are formatted as XML documents in order to exchange them or display them over the
Web.
• Well-formed XML
– Has XML declaration Indicates version of XML being used as well as any other relevant attributes
– Every element must matching pair of start and end tags. (Within start and end tags of parent element)
General Notes
Notes on Storage-Indexing
• Unordered Files (also called heap or pile files)
– New records are inserted at the end of the file.
– A linear search through the file records is necessary to search for a record.
– This requires reading and searching half the file blocks on the average, and is hence quite
expensive. Records insertion is quite efficient though.
• Spanned Records: Refers to records that exceed the size of one or more blocks and hence span a number of blocks.
– A dense index has an index entry for every search key value (and hence every record) in the data file.
– A sparse (or nondense) index, on the other hand, has index entries for only some of the search values
Notes on Storage-Indexing (Continued)
Notes on Database Security
• Threats to databases (CIA)
– Loss of confidentiality
– Loss of integrity: if unauthorized changes are made to the data by either intentional or accidental acts
– Loss of availability
• To protect databases against these types of threats some countermeasures can be implemented such as:
– Access control (The security mechanism of a DBMS must include provisions for restricting access to the database as a whole
which is handled by creating user accounts and passwords to control login process by the DBMS.)
– Inference control
– Encryption
• A statistical database is a database used for providing statistical information or summaries based on
various criteria.
– The countermeasures to statistical database security problem is called inference control measures.
Notes on Database Security (Continued)
• The database administrator (DBA) is the central authority for managing a database system.
• The DBA is responsible for the overall security of the database system
• System log keeps track of all operations on the database that are applied by a certain user
throughout each login session.
• A database log that is used mainly for security purposes is sometimes called an audit trail
Notes on Database Security (Continued)
• The account level:
– At this level, the DBA specifies the particular privileges that each account holds independently of the relations
in the database.
– This includes CREATE SCHEMA, CREATE TABLE privilege, CREATE VIEW privilege, ALTER privilege, DROP privilege,
MODIFY privilege OR SELECT privilege
• The owner (the account that was used when the relation was created in the first place) of a
relation is given all privileges on that relation.
• The owner account holder can pass privileges on any of the owned relation to other users by
granting privileges to their accounts.
• Suppose that Account_A1 wants to give to Account_A3 a limited capability to SELECT from the
EMPLOYEE relation and wants to allow Account_A3 to be able to propagate the privilege.
– The limitation is to retrieve only the NAME, BDATE, and ADDRESS attributes and only for the records with DNO > 5.
– After the view is created, Account_A1 can grant SELECT on the view A3EMPLOYEE to A3 as follows:
– ` GRANT SELECT ON A3EMPLOYEE TO Account_A3 WITH GRANT OPTION; `
Notes on SQL
• Assertions example:
– “The salary of an employee must not be greater than the salary of the
manager of the department that the employee works for’’
– ` CREATE ASSERTION SALARY_CONSTRAINT CHECK (NOT EXISTS (SELECT * FROM
EMPLOYEE E, EMPLOYEE M, DEPARTMENT D WHERE E.SALARY > M.SALARY AND
E.DNO = D.NUMBER AND D.MGRSSN=M.SSN)) `
• Triggers example:
– A trigger to compare an employee’s salary to his/her supervisor during insert
or update operations:
– ` CREATE TRIGGER INFORM_SUPERVISOR BEFORE INSERT OR UPDATE OF SALARY,
SUPERVISOR_SSN ON EMPLOYEE FOR EACH ROW WHEN (NEW.SALARY > (SELECT
SALARY FROM EMPLOYEE WHERE SSN=NEW.SUPERVISOR_SSN))
INFORM_SUPERVISOR (NEW.SUPERVISOR_SSN,NEW.SSN); `
Notes on SQL (Continued)
• Views example:
– Specify a different WORKS_ON table
– ` CREATE VIEW WORKS_ON_NEW AS SELECT FNAME, LNAME, PNAME, HOURS FROM EMPLOYEE,
PROJECT, WORKS_ON WHERE SSN = ESSN AND PNO = PNUMBER GROUP BY PNAME; `
• Views defined using groups and aggregate functions are not updateable
• Views defined on multiple tables using joins are generally not updateable