Вы находитесь на странице: 1из 69

Database II

Summary
Prerequisites

Normalization
Types of Attributes
• Simple
– Each entity has a single atomic value for the attribute. For example, SSN or
Gender.

• Composite
– The attribute may be composed of several components. For example:
– Address(Apt#, House#, Street, City, State, ZipCode, Country), or
– Name(FirstName, MiddleName, LastName).
– Composition may form a hierarchy where some components are themselves
composite

• Multi-valued
– An entity may have multiple values for that attribute. For example, Color of a
CAR or PreviousDegrees of a STUDENT.
– Denoted as {Color} or {PreviousDegrees}.
Types of Attributes (Continued)
• In general, Composite and multi-valued attributes can be nested
to any number of levels, although this is rare.
– For example, Previous Degrees of a STUDENT is a composite multi-valued
attribute denoted by {PREVIOUS_DEGREES (College, Year, Degree, Field)}
– Multiple PreviousDegrees values can exist
– Each has four subcomponent attributes: Degree
Field

– College, Year, Degree, Field


Year

Composite multi-valued {()}


PREVIOUS_DEGREES
College

STUDENT_ID

Simple
Student
Key Attributes
• An attribute of an entity for which each entity must have a unique value is
called a key attribute of the entity.
– For example, ID of STUDENT. STUDENT_ID

• A key attribute may be composite.

• An entity may have more than one key attribute. One is a primary key Others are secondary/candidate keys
– The CAR entity may have two keys:
– Car_Id Number City

– License_Plate_Number (Number, City)


Composite key ()
Simple key License_Plate_Number

Car_Id
Brand

Year
CAR Color

Multi-valued {}
Key Attributes (Continued)
• Primary key
– A one or more attribute(s)/column(s) chosen to uniquely identify a row in a
relation/table.

• Foreign key
– A Primary key from one table that is placed into another table. (used to
establish relationships and link tables together).
COURSE Table Primary key
Name Course_code Credit_hours Department

SECTION Table
Section_id Course_code Semester Year Instructor
Foreign key

Two attributes used for uniquely identify a row in the SECTION Table.
File systems and motivations
• Consider the following table:

Employee_ID Name Department Student_Group

123 J. Longfellow Accounting Beta Alpha Psi


234 B. Rech Marketing Marketing Club

234 B. Rech Marketing Management Club

456 A. Bruchs CIS Technology Org.

456 A. Bruchs CIS Beta Alpha Psi


File systems and motivations (Continued)
• Data stored in a file in an unstructured way.

• File/Unnormalized table Problems:


– Data redundancy

– Data inconsistency

– Duplicate entries

• An update anomaly is a data inconsistency that results from data redundancy and a partial update. For example, each employee in a
company has a department associated with them as well as the student group they participate in. If A. Bruchs’ department is an error it
must be updated at least 2 times or there will be inconsistent data in the database. If the user performing the update does not realize the
data is stored redundantly the update will not be done properly.

• A deletion anomaly is the unintended loss of data due to deletion of other data. For example, if the student group Beta Alpha Psi
disbanded and was deleted from the previous table, J. Longfellow and the Accounting department would cease to exist. This results in
database inconsistencies and is an example of how combining information that does not really belong together into one table can cause
problems.

• An insertion anomaly is the inability to add data to the database due to absence of other data. For example, assume Student_Group is
defined so that null values are not allowed. If a new employee is hired but not immediately assigned to a Student_Group then this
employee could not be entered into the database. This results in database inconsistencies due to omission.

• Update, deletion, and insertion anomalies are very undesirable in any database. Anomalies are avoided by the process of normalization.
Multi-valued & composite attributes are your worst enemy

Normalization (1NF)
First Normal Form (1NF)
• The only attribute values permitted by 1NF are single atomic (or indivisible) values.

• Remove multivalued attributes, composite attributes, and their combinations from the original
table/relation.

• Create a new table/relation that contains the removed attributes and then link that table with
the original one using the primary key of the original table as a foreign key

• Can you determine the primary key of the new table(s)?


This cars table is not in first normal form because the [Color]
Cars Table column can contain multiple values.
Number City Brand Year For example, the first row includes values “Black” and “Silver”
Cars_Colors Table
Number City Color ABC 123 Cairo KIA 2017 Number City Brand Year Color

ABC 123 Cairo Black ABC 123 Giza FORD 2016 ABC 123 Cairo KIA 2017 Black, Silver

ABC 123 Cairo Silver XYZ 456 Alex BMW 2015 ABC 123 Giza FORD 2016 White, Black,
Grey
ABC 123 Giza White
XYZ 456 Alex BMW 2015 Gold
ABC 123 Giza Black
To bring this table to first normal form, we split the table into two
ABC 123 Giza Grey tables and now we have the resulting tables (Cars Table and
Cars_Colors Table)
XYZ 456 Alex Gold
First Normal Form (Example 1)
Disease Treatment

Composite ()
Simple key Medical History

P_no
DOB

Name
Patient Checks
Multi-valued {}

𝑅0 = (P_no, Name, DOB, Medical History (Disease, Treatment), {Checks})

Normalization (1NF)

• 𝑅1 =(P_no, Name, DOB, Disease, Treatment)

• 𝑅2 = (P_no, Checks)
First Normal Form (Example 2)
Disease Treatment

multi-valued Composite ({})


Medical History

P_no
DOB

Name
Patient
𝑅0 = (P_no, Name, DOB, Medical History ({Disease}, {Treatment}))

Normalization (1NF)

• 𝑅1 =(P_no, Name, DOB)

• 𝑅2 = (P_no, Disease, Treatment)


First Normal Form (Example 3)
Disease Treatment

multi-valued Composite ({})


Medical History

P_no
DOB
Name
Composite multi-valued {()}
Name
Patient Checks Date

Result

𝑅0 = (P_no, Name, DOB, Medical History ({Disease}, {Treatment}), {Checks (Name, Date, Result)})

Normalization (1NF)

• 𝑅1 =(P_no, Name, DOB)


• 𝑅2 = (P_no, Disease, Treatment)
• 𝑅3 = (P_no, Name, Date, Result)
United we stand, divided we fall

Normalization (2NF)
Second Normal Form (1NF++ ➔ 2NF)
• Functional Dependency
– A relationship between attributes in which one attribute (or group of attributes) determines
the value of another attribute in the same table.
In a table, if attribute B is functionally
dependent on A, but is not functionally
• Determinant dependent on a proper subset of A, then B is
considered fully functional dependent on A.
Hence, in a 2NF table, all non-key attributes
– an attribute that determines the values of other attributes cannot be dependent on a subset of the
primary key. Note that if the primary key is not a
– All primary keys are determinants composite key, all non-key attributes are
always fully functional dependent on the
– If A → B then A is the determinant and B is functionally dependent on A primary key. A table that is in 1st normal form
and contains only a single key as the primary
key is automatically in 2nd normal form.
• A relation is in the second normal form if and only if:
– It is in the first normal form (1NF)
– Every non-key attributes are fully functional dependent on the primary key
– Every non-key attribute must be defined by the entire key, not by only part of the key (No partial
dependencies)
Second Normal Form (Example 1)
Relation/Table Composite primary key contains two attributes Non-key attributes

• STUDENT_GRADE (Student_Id, Course_Id, Course_Name, Grade)


• {Student_Id, Course_Id} → {Course_Name, Grade}
– {Course_Name, Grade} is functionally dependent on {Student_Id, Course_Id}

• Course_Id → Course_name
– Course_Id is a part of the primary key {Student_Id, Course_Id}.
– Course_Name is partially dependent on Course_Id, So it can
determine/define the Course_Name which is a non-key attribute using
only part of the primary key (Course_Id).
Second Normal Form (Example 1 Continued)
• This relation has a composite (contains more than one attribute) primary key {Student_Id, Course_Id}.

• The non-key attributes are {Course_Name, Grade}.

• Since Course_Name depends on Course_Id, which is only part of the primary key Therefore, this table does
not satisfy second normal form conditions.

• To bring this relation to second normal form, we break the relation into two relations, and now we have the
following:
– 𝑅1 = STUDENT_GRADE (Student_Id, Course_Id, Grade)

– 𝑅2 = COURSE (Course_Id, Course_Name)

• What we have done is removing the partial functional dependency that we initially had. Now, in the
relation COURSE, the non-key attribute Course_Name is fully dependent on the primary key of that table,
which is Course_Id. And in the relation STUDENT_GRADE, all the non-key attributes are fully functional
dependent on the primary key of that table too.
Cut your coat according to your cloth

Normalization (3NF)
Third Normal Form (2NF++ ➔ 3NF)
• transitive functional dependency:
– A relationship between attributes in which attribute B is functionally dependent
on attribute A (attribute A determines attribute B), and attribute C is functionally
dependent on attribute B (attribute B determines attribute C). In this case,
attribute C is transitively dependent on attribute A via attribute B. (A → B → C)
– Key attribute A determines non-key attribute B and that non-key attribute B
can be used to determine another non-key attribute C

• A relation/table is in third normal form if it satisfies the following


conditions:
– It is in second normal (obviously!)
– There is no transitive functional dependency
Third Normal Form (Example 1)
Non-key attributes
• STUDENT_GRADE (Student_Id, Course_Id, Grade, Evaluator_Id, Evaluator_Name)

• {Student_Id, Course_Id} → {Grade, Evaluator_Id, Evaluator_Name}


– {Grade, Evaluator_Id, Evaluator_Name} is functionally dependent on {Student_Id, Course_Id}

• Evaluator_Id → Evaluator_Name
– In the STUDENT_GRADE relation above, {Student_Id, Course_Id} determines Evaluator_Id,
and Evaluator_Id determines Evaluator_Name.
– Evaluator_Id is a non-key attribute that determines another non-key attribute
Evaluator_Name. Therefore, we have a transitive functional dependency, and this
structure does not satisfy third normal form conditions.
Third Normal Form (Example 1 Continued)

• To bring this relation to third normal form, we break the relation into two relations, and now
we have the following:
– 𝑅1 = STUDENT_GRADE (Student_Id, Course_Id, Grade, Evaluator_Id)

– 𝑅2 = COURSE (Evaluator_Id, Evaluator_Name)

• Now all non-key attributes are fully functional dependent only on the primary key.
– In STUDENT_GRADE, both Grade and Evaluator_Id are only dependent on {Student_Id, Course_Id}.

– In COURSE, Evaluator_Name is only dependent on Evaluator_Id.


Third Normal Form (Example 2)

• 𝑅0 = (E_Id, Course_Id, Date_Completed, E_Name, Dept_Id, Dept_Name, Salary) 1NF


Normalization (2NF)

• 𝑅1 = (E_Id, Course_Id, Date_Completed)

• 𝑅2 = (E_Id, E_Name, Dept_Id, Dept_Name, Salary)

Normalization (3NF)

• 𝑅1 = (E_Id, Course_Id, Date_Completed)

• 𝑅2 = (E_Id, E_Name, Dept_Id, Salary)

• 𝑅2.1 = (Dept_Id, Dept_Name)


Case Study

Normalization
The data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF]
NIN Contract No Hours Employee Name Company ID Company Location
616681B SC1025 72 P. White SC115 Belfast

Case Study 674315A


323113B
616681B
SC1025
SC1026
SC1026
48
24
24
R. Press
P. Smith
P. White
SC115
SC23
SC23
Belfast
Bangor
Bangor

• A engineering consultancy firm supplies temporary specialized


staff to bigger companies in the country to work on their project
for certain amount of time. The table below lists the time spent
by each of the company’s employees at other companies to
carry out projects. The National Insurance Number (NIN) is
unique for every member of staff.
– Explain in which normal form this table is
– Find the Primary Key for this relation and explain your choice.
– Normalise the table to 2NF
– Normalise the table)s( to 3NF
Case Study (Continued)
• The table is at least in 1NF (to check for 2NF the primary key needs to be defined.)
Candidate attributes are NIN, Contract No and Company ID), we can choose NIN and
ContractNo as the primary key because this two will be the minimum attributes required to
uniquely identify each row in the table and also will reflect better the objective of the
company which is to keep track record of staff supplied in a contract to another company.

• Given the PK: {NIN, ContractNo} each non-primary key attribute needs to be checked for
fully functional dependency on the PK:

• Employee name is fully functionally dependant on NIN (NIN → Employee Name). but as NIN
is part of the PK, Employee Name is not fully functional dependent on the PK therefore the
form is NOT on 2NF.

• Staff Contract (NIN, ContractNo, hours, EmployeeName, CompanyID, CompanyLocation)


Case Study (Continued)
• To normalise the table to 2NF, we need to remove the partial
dependencies from the table and locate them in a new table.
• StaffContract (NIN, ContractNo, hours)
• Staff Details (NIN, EmployeeName)
• ContractDetails (ContractNo, CompanyID, CompanyLocation)
Case Study (Continued)
• To normalise to 3NF this transitive dependency needs to be put in a
new table with a copy of the determinant:
– The ContractDetails table is divided in two more tables:
– Contracts (ContractNo, CompanyID)
– Company (CompanyID, CompanyLocation)

• And the previous tables are left the way they are because they are
3NF already
• The final tables normalised to 3NF are:
– Staff Details (NIN, EmployeeName)
– StaffContract (NIN, ContractNo, hours)
– Contracts (ContractNo, CompanyID)
– Company (CompanyID, CompanyLocation)
Transactions
Introduction
• Concurrency:
– Interleaved processing:
– Concurrent execution of processes is interleaved in a single CPU

– Parallel processing:
– Processes are concurrently executed in multiple CPUs

• A Transaction:
– Logical unit (set of operations) of database processing that includes one or
more access operations (read -retrieval, write - insert or update, delete).

• Transaction boundaries:
– Begin and End transaction.
– Why do we need boundaries?
– An application program may contain several transactions separated by the Begin
and End transaction boundaries.
Introduction (Continued)
• Desirable Properties of Transactions (ACID)

Property Name Description


• A transaction is an atomic unit of processing; it is either performed in its entirety
Atomicity or not performed at all.
Consistency preservation • A correct execution of the transaction must take the database from one
consistent state to another.
• A transaction should not make its updates visible to other transactions until it is
Isolation committed.
• solves the temporary update problem and makes cascading rollbacks of
transactions unnecessary
Durability or permanency • Once a transaction changes the database and the changes are committed,
these changes must never be lost because of subsequent failure.
Introduction (Continued)
• Basic operations are read and write
– read_item(X): Reads a database item named X
into a program variable. To simplify our notation,
we assume that the program variable is also
named X.
– write_item(X): Writes the value of program
variable X into the database item named X.
Introduction (Continued)
• Basic unit of data transfer from the disk to the computer main memory is one block. In general, a data item
(what is read or written) will be the field of some record in the database, although it may be a larger unit such
as a record or even a whole block. Disk block

• read_item(X) command includes the following steps: X X X Program stack


memory

– Find the address of the disk block that contains item X. Buffer Block in main memory

– Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory
buffer).
– Copy item X from the buffer to the program variable named X.

• write_item(X) command includes the following steps:


– Find the address of the disk block that contains item X.
– Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory
buffer).
– Copy item X from the program variable named X into its correct location in the buffer.
– Store the updated block from the buffer back to disk (either immediately or at some later point in time).

Disk block

X X X Program stack
memory

Buffer Block in main memory


Concurrency Problems
Problem Definition

• This occurs when two transactions that access the same database items have their
The Lost Update operations interleaved in a way that makes the value of some database item incorrect.

• This occurs when one transaction updates a database item and then the transaction fails for
The Temporary Update (or Dirty Read) some reason
• The updated item is accessed by another transaction before it is changed back to its original
value.
• If one transaction is calculating an aggregate summary function on a number of records
The Incorrect Summary Problem while other transactions are updating some of these records, the aggregate function may
calculate some values before they are updated and others after they are updated.

That’s why we need a concurrency control of some sort to solve/avoid these problems.
The Lost Update Problem
The temporary update problem
The incorrect summary problem
Why recovery is needed? (What causes a transaction to fail)
Failure Cause

A computer failure (system crash) • A hardware or software error occurs in the computer system during transaction execution. If
the hardware crashes, the contents of the computer’s internal memory may be lost.
• Some operation in the transaction may cause it to fail, such as:
• Integer overflow
A transaction or system error
• Division by zero.
• Erroneous parameter values
• Logical programming error.
• In addition, the user may interrupt the transaction during its execution.
• Certain conditions necessitate cancellation of the transaction. For example,
• data for the transaction may not be found.
Local errors or exception conditions detected by the transaction
• A condition, such as insufficient account balance in a banking
• database, may cause a transaction, such as a fund withdrawal from that
account, to be canceled.
• A programmed abort in the transaction causes it to fail.

Concurrency control enforcement • The concurrency control method may decide to abort the transaction, to be restarted later,
because it violates serializability or because several transactions are in a state of deadlock
• Some disk blocks may lose their data because of:
• A read or write malfunction
Disk failure
• A disk read/write head crash.
• This may happen during a read or a write operation of the transaction.

Physical problems and catastrophes • This refers to an endless list of problems that includes power or air-conditioning failure, fire,
theft, sabotage, overwriting disks or tapes by mistake, and mounting of a wrong tape by the
operator.
Transaction Concepts
• A transaction is an atomic unit of work that is either
completed in its entirety or not done at all.
– For recovery purposes, the system needs to keep track of
when the transaction starts, terminates, and commits or
aborts.

• Transaction states:
– Active state
– Partially committed state
– Committed state
– Failed state
– Terminated State
Transaction Concepts (Continued)
Transaction Concepts (Continued)
• The System Log or Journal: keeps track of all transaction operations
that affect the values of database items.
– This information may be needed to permit recovery from transaction failures.

• If the system crashes, we can recover to a consistent database state


by examining the log.
– Because the log contains a record of every write operation that changes the
value of some database item, it is possible to undo the effect of these write
operations of a transaction T by tracing backward through the log and
resetting all items changed by a write operation of T to their old_values.
– We can also redo the effect of the write operations of a transaction T by
tracing forward through the log and setting all items changed by a write
operation of T (that did not get done permanently) to their new_values.
Transaction Concepts (Continued)
• Commit Point:
– A transaction T reaches its commit point when all its operations that
access the database have been executed successfully and the
effect of all the transaction operations on the database has been
recorded in the log.
– Beyond the commit point, the transaction is said to be committed,
and its effect is assumed to be permanently recorded in the
database.
– The transaction then writes an entry [commit,T] into the log.

• Roll Back of transactions:


– Needed for transactions that have a [start_transaction,T] entry into the
log but no commit entry [commit,T] into the log.

• Force writing a log (before committing a transaction):


– Before a transaction reaches its commit point, any portion of the log
that has not been written to the disk yet must now be written to the
disk.
Transaction schedule
• Transaction schedule or history:
– When transactions are executing concurrently in an
interleaved fashion, the order of execution of operations
from the various transactions forms what is known as a
transaction schedule (or history).

• A schedule (or history) S of n transactions 𝑇1 , 𝑇2 , …, 𝑇𝑛 :


– It is an ordering of the operations of the transactions subject
to the constraint that, for each transaction 𝑇𝑖 that
participates in S, the operations of 𝑇1 in S must appear in the
same order in which they occur in 𝑇1 .
– Note, however, that operations from other transactions 𝑇𝑗
can be interleaved with the operations of 𝑇𝑖 in S.
Transaction schedule (Continued)
• Characterizing Schedules based on Recoverability:
Schedules classified on Definition
recoverability
• One where no transaction needs to be rolled back.
Recoverable • A schedule S is recoverable if no transaction T in S commits until all transactions T’ that
have written an item that T reads have committed.
Cascadeless • One where every transaction reads only the items that are written by committed
transactions.
Requiring cascaded rollback • A schedule in which uncommitted transactions that read an item from a failed
transaction must be rolled back.
Strict • A schedule in which a transaction can neither read or write an item X until the last
transaction that wrote X has committed.
Transaction schedule (Continued)
• Two operations in a schedule are said to conflict if they satisfy all
three of the following conditions:
– They belong to different transactions.
– They access the same item 𝒙.
– At least one of the operations is a write_item(𝒙).

• For example, in a schedule S, the operations 𝑟1 (𝑥) and 𝑤2 (𝑥) conflict,


𝑟2 (𝑥) and 𝑤1 (𝑥), and the operations 𝑤1 (𝑥) and 𝑤2 (𝑥).

• However, the operations 𝑟1 𝑥 and 𝑟2 (x) do not conflict, since they


are both read operations.

• The operations 𝑤2 (𝑥) and 𝑤1 (𝑦) do not conflict because they


operate on distinct data items 𝑥 and 𝑦.
Transaction schedule (Continued)
• Characterizing Schedules Based on Serializability:
Schedules classified on Definition
Serializability
• A schedule S is serial if, for every transaction T participating in the schedule, all the operations of T are
Serial executed consecutively in the schedule.
• Otherwise, the schedule is called nonserial schedule.
Serializable • A schedule S is serializable if it is equivalent to some serial schedule of the same n transactions.

Result equivalent • Two schedules are called result equivalent if they produce the same final state of the database.

Conflict equivalent • Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the
same in both schedules.

• Note: Being serializable (executed concurrently) is not the same as being serial (executed consecutively)
• Being serializable implies that the schedule is a correct schedule.
• It will leave the database in a consistent state.
• The interleaving is appropriate and will result in a state as if the transactions were serially executed,
yet will achieve efficiency due to concurrent execution.
Transaction schedule (Continued)
• Assume that the initial values of database items
are X = 90 and Y = 90 and that N = 3 and M = 2.

• After consecutively executing transactions 𝑇1


and 𝑇2 , we would expect the database values
to be X = 89 and Y = 93.

• Schedule C gives the results X = 92 and Y = 93,


in which the X value is erroneous. (Schedule C is
not a serializable Schedule)

• Schedule D gives the correct results however, it


is executing transactions 𝑇1 and 𝑇2 concurrently
Hence, schedule D is a serializable schedule.
Client/Server Architectures
Client/Server Architectures
Client/Server
Architectures
Client does extensive
processing (FAT)
File Server
Architecture

FAT CLIENT

Two-Tier Database
Server Architecture

Thinner clients

Three-tier
Architecture Client does little
Thin/ Thinnest clients processing (Thinnest)
Client/Server Architectures (Continued)
File Server Architecture Two-Tier Database Server Three-tier Architecture (Thinnest)
(FAT) Architecture (Thinner)
• All processing is done at • Client is responsible for • Includes another (Application server)
the PC that requested the • I/O processing logic layer in addition to the (Two-Tier)
data • Some business rules logic client and database server layers
• Entire files are transferred • DB server performs all data storage • Client is responsible for
from the server to the client and access processing • I/O processing (Browser)
Features for processing • DBMS is only on DB server • DB server performs all data storage
and access processing (DBMS)
• Processing logic could be at client, • Application server is responsible for
server, or both • Business rules (Web Server)

• Processing logic will be at


application server or Web server

• Huge amount of data


transfer on the network
Disadvantages • Each client must contain full N/A N/A
DBMS
• Heavy resource
demand on clients
• Clients do not have to be as • Scalability
powerful • Technological flexibility
Advantages N/A • Greatly reduces data traffic on the • Long-term cost reduction
network • Better match of systems to business
• Improved data integrity since it is all needs
processed centrally • Improved customer service
• Competitive advantage
• Reduced risk
Client/Server Architectures (Case Study 1)
• Consider course registration system where the students register a
set of courses at the beginning of each semester. The registration
process is done according to the business rules.
• We have three facilities:
– Faculty X (Public university) where is number of students exceeds 10000
– Faculty Y (Private university) where is number of students does not exceed
1000
– Faculty Z (New established university) where is no estimated number of
students

• Propose the suitable architecture for each university with


justification.
Client/Server Architectures (Case Study 1 Continued)
Faculty X (Public University) Three-Tier architecture (Thin Clients)
Faculty Y (Private university) Three-Tier architecture (Thin Clients)
Faculty Z (New established university) Three-Tier architecture (Thin Clients)

• As some clients of such a system might be a small devices like


mobile phones:
– We cannot risk and put so much processing burden on it.
– The size of the data does not matter in this case.
Client/Server Architectures (Case Study 2)
• If you were designing a Web-based system to make airline reservations
and to sell airline tickets, which DBMS Architecture would you choose?
Why? Why would the other architectures not be a good choice?
– Three-Tier Client/Server Architecture for Web Application is the best choice.
The Client consists of Web User Interface. The Web Server contains the
application logic which includes all the rules and regulations related to the
reservation process and the issue of tickets; the Database Server contains the
DBMS.
– Two-Tier Client/Server Architecture would work if the Business Logic can reside
on server other than the DBMS Server. In general, if the business logic was on
the DBMS Server, it will put an excessive burden on the server. If the business
logic were to reside on the web client, it will burden the communication
network as well a possibly thin client.
Middleware
• Software that allows an application to interoperate
with other software
• No need for programmer/user to understand
internal processing
• Accomplished via Application Program Interface
(API)

The “glue” that holds client/server applications together


Middleware (Continued)
Type Definition

Remote Procedure Calls (RPC) Client makes calls to procedures running on remote computers (synchronous and asynchronous)

Message-Oriented Middleware (MOM) Asynchronous calls between the client via message queues

Publish/Subscribe Push technology → server sends information to client when available

Object Request Broker (ORB) Object-oriented management of communications between clients and servers

SQL-oriented Data Access Middleware between applications and database servers

Database Middleware

ODBC–Open Database Connectivity Most DB vendors support this

OLE-DB Microsoft enhancement of ODBC

JDBC–Java Database Connectivity Special Java classes that allow Java applications/applets to connect to databases

Note: Application program interface in two tier architecture is provided by the ODBC–Open Database Connectivity
XML: Extensible
Markup Language
Introduction
• Unstructured data
– Limited indication of the of data document that contains
information embedded within it
– Example: could be the information stored in A File

• Structured data
– Represented in a strict format
– Example: information stored in databases

• Semi-structured data
– Has a certain structure
– Not all information collected will have identical structure
– Example: could be the information stored in an XML document
Introduction (Continued)
• XML: Markup Language that enables you to create your own custom tags.
– Contain Self-describing data

– Mainly used for structuring/storing data and exchange it across the web.

– Can be used to provide more information about the structure and meaning of the data in the
Web pages rather than just specifying how the Web pages are formatted for display on the
screen like HTML.

• Elements (tags) and attributes (Provide additional information that describe elements)
– Main structuring concepts used to construct an XML document
• Two query XML
language standards
• Complex elements: Constructed from other elements hierarchically • Xpath
• XQuery

• Simple elements: Contain data values (text, numbers and symbols)

• XML tag names: Describe the meaning of the data elements in the document
Main types of XML documents
• Data-centric XML documents:
– These documents have many small data items that follow a specific structure, and hence may be extracted from a
structured database. They are formatted as XML documents in order to exchange them or display them over the
Web.

• Document-centric XML documents:


– These are documents with large amounts of text, such as news articles or books. There is little or no structured data
elements in these documents

• Hybrid XML documents:


– These documents may have parts that contains structured data and other parts that are mostly textual or
unstructured.

• Schema-less XML documents


– Do not follow a predefined schema of element names and corresponding tree structure

• Well-formed XML
– Has XML declaration Indicates version of XML being used as well as any other relevant attributes
– Every element must matching pair of start and end tags. (Within start and end tags of parent element)
General Notes
Notes on Storage-Indexing
• Unordered Files (also called heap or pile files)
– New records are inserted at the end of the file.
– A linear search through the file records is necessary to search for a record.
– This requires reading and searching half the file blocks on the average, and is hence quite
expensive. Records insertion is quite efficient though.

• Ordered Files (also called sequential files)


– File records are kept sorted by the values of an ordering field.
– Insertion is expensive (records must be inserted in the correct order).
– A binary search can be used to search for a record on its ordering field value.
– This requires reading and searching 𝒍𝒐𝒈𝟐 of the file blocks on the average, an improvement
over linear search.
Notes on Storage-Indexing (Continued)
• Block: Unit of transfer from hard disk to main memory. (It’s a whole package you cannot just transfer only the records
you want from a black , you have to transfer the whole block)

• Blocking: Refers to storing a number of records in one block on the disk.

• Blocking factor (bfr): refers to the number of records per block.

• Spanned Records: Refers to records that exceed the size of one or more blocks and hence span a number of blocks.

• File records can be unspanned or spanned

– Unspanned: no record can span two blocks

– Spanned: a record can be stored in more than one block

• Indexes can be characterized as dense or sparse

– A dense index has an index entry for every search key value (and hence every record) in the data file.

– A sparse (or nondense) index, on the other hand, has index entries for only some of the search values
Notes on Storage-Indexing (Continued)
Notes on Database Security
• Threats to databases (CIA)
– Loss of confidentiality
– Loss of integrity: if unauthorized changes are made to the data by either intentional or accidental acts
– Loss of availability

• To protect databases against these types of threats some countermeasures can be implemented such as:
– Access control (The security mechanism of a DBMS must include provisions for restricting access to the database as a whole
which is handled by creating user accounts and passwords to control login process by the DBMS.)
– Inference control
– Encryption

• Two types of database security mechanisms:


– Discretionary security mechanisms
– Mandatory security mechanisms

• A statistical database is a database used for providing statistical information or summaries based on
various criteria.
– The countermeasures to statistical database security problem is called inference control measures.
Notes on Database Security (Continued)
• The database administrator (DBA) is the central authority for managing a database system.

• The DBA’s responsibilities include


– Granting/Revocation privileges to users who need to use the system
– Classifying users and data in accordance with the policy of the organization

• The DBA is responsible for the overall security of the database system

• System log keeps track of all operations on the database that are applied by a certain user
throughout each login session.

• If any tampering with the database is suspected, a database audit is performed


– A database audit consists of reviewing the log to examine all accesses and operations applied to the
database during a certain time period.

• A database log that is used mainly for security purposes is sometimes called an audit trail
Notes on Database Security (Continued)
• The account level:
– At this level, the DBA specifies the particular privileges that each account holds independently of the relations
in the database.
– This includes CREATE SCHEMA, CREATE TABLE privilege, CREATE VIEW privilege, ALTER privilege, DROP privilege,
MODIFY privilege OR SELECT privilege

• The relation level (or table level):


– At this level, the DBA can control the privilege to access each individual relation or view in the database.

• The owner (the account that was used when the relation was created in the first place) of a
relation is given all privileges on that relation.

• The owner account holder can pass privileges on any of the owned relation to other users by
granting privileges to their accounts.

• Whenever the owner A of a relation R grants a privilege on R to another account B, privilege


can be given to B with or without the GRANT OPTION.
– If the GRANT OPTION is given, this means that B can also grant that privilege on R to other accounts.
Notes on Database Security (Continued)
• Suppose that Account_A1 wants to allow Account_A4 to update only the SALARY attribute of
EMPLOYEE.
– Account_A1 can issue:
– ` GRANT UPDATE ON EMPLOYEE (SALARY) TO Account_A4; `
– The UPDATE or INSERT privilege can specify particular attributes that may be updated or inserted in a relation.
– Other privileges (SELECT, DELETE) are not attribute specific.

• Suppose that Account_A1 wants to give to Account_A3 a limited capability to SELECT from the
EMPLOYEE relation and wants to allow Account_A3 to be able to propagate the privilege.
– The limitation is to retrieve only the NAME, BDATE, and ADDRESS attributes and only for the records with DNO > 5.

• Account_A1 then create the view:


– ` CREATE VIEW A3EMPLOYEE AS SELECT NAME, BDATE, ADDRESS FROM EMPLOYEE WHERE DNO > 5; `

– After the view is created, Account_A1 can grant SELECT on the view A3EMPLOYEE to A3 as follows:
– ` GRANT SELECT ON A3EMPLOYEE TO Account_A3 WITH GRANT OPTION; `
Notes on SQL
• Assertions example:
– “The salary of an employee must not be greater than the salary of the
manager of the department that the employee works for’’
– ` CREATE ASSERTION SALARY_CONSTRAINT CHECK (NOT EXISTS (SELECT * FROM
EMPLOYEE E, EMPLOYEE M, DEPARTMENT D WHERE E.SALARY > M.SALARY AND
E.DNO = D.NUMBER AND D.MGRSSN=M.SSN)) `

• Triggers example:
– A trigger to compare an employee’s salary to his/her supervisor during insert
or update operations:
– ` CREATE TRIGGER INFORM_SUPERVISOR BEFORE INSERT OR UPDATE OF SALARY,
SUPERVISOR_SSN ON EMPLOYEE FOR EACH ROW WHEN (NEW.SALARY > (SELECT
SALARY FROM EMPLOYEE WHERE SSN=NEW.SUPERVISOR_SSN))
INFORM_SUPERVISOR (NEW.SUPERVISOR_SSN,NEW.SSN); `
Notes on SQL (Continued)
• Views example:
– Specify a different WORKS_ON table
– ` CREATE VIEW WORKS_ON_NEW AS SELECT FNAME, LNAME, PNAME, HOURS FROM EMPLOYEE,
PROJECT, WORKS_ON WHERE SSN = ESSN AND PNO = PNUMBER GROUP BY PNAME; `

• View materialization: Involves physically creating and keeping a temporary table


(Convert the view which is a virtual table into an actual one)

• Views defined using groups and aggregate functions are not updateable

• Views defined on multiple tables using joins are generally not updateable

• Update on a single view without aggregate operations:


– Update may map to an update on the underlying base table

• Views involving joins:


– An update may map to an update on the underlying base relations (Not always possible)

Вам также может понравиться