Вы находитесь на странице: 1из 10

Isolation (computer science)

In database systems, isolation is a property that the changes made by an operation are not
visible to other simultaneous operations on the system until its completion. This is one of
the ACID (Atomicity, Consistency, Isolation, Durability) properties.
Contents

[hide]
 1 Isolation Levels
o 1.1 SERIALIZABLE
o 1.2 REPEATABLE READ
o 1.3 READ COMMITTED
o 1.4 READ UNCOMMITED
 2 Example Queries
o 2.1 Phantom Reads
o 2.2 Non-repeatable Reads
o 2.3 READ UNCOMMITTED (Dirty Read)

 3 See also

The isolation property is the most often relaxed ACID property in a DBMS (Database
Management System). This is because to maintain the highest level of isolation a DBMS
must acquire locks on data, which may result in a loss of concurrency, or else implement
multiversion concurrency control, which may require additional application logic to
function correctly.
Most DBMSs offer a number of transaction isolation levels which control the degree of
locking which occurs when selecting data. For many database applications the majority
of database transactions can be constructed in such a way as to not require high isolation
levels, thus reducing the locking overhead for the system. The programmer must
carefully analyse database access code to ensure that any relaxation of isolation does not
cause difficult-to-find software bugs. Conversely, at higher isolation levels the possibility
of deadlock is increased, which also requires careful analysis and programming
techniques to avoid.
The isolation levels defined by the ANSI/ISO SQL standard are:

[edit] SERIALIZABLE

This isolation level specifies that all transactions occur in a completely isolated fashion;
i.e., as if all transactions in the system had executed serially, one after the other. The
DBMS may execute two or more transactions at the same time only if the illusion of
serial execution can be maintained. At this isolation level, phantom reads cannot occur.
With a lock-based concurrency control DBMS implementation, serializability requires
that range locks are acquired when a query uses a ranged WHERE clause. When using
non-lock concurrency control, no lock is acquired; however, if the system detects a
concurrent transaction in progress which would violate the serializability illusion, it must
force that transaction to rollback, and the application will have to restart the transaction.
[edit] REPEATABLE READ

All data records retrieved by a SELECT statement cannot be changed; however, if the
SELECT statement contains any ranged WHERE clauses, phantom reads may occur. In
this isolation level the transaction acquires read locks on all retrieved data, but does not
acquire range locks.

[edit] READ COMMITTED

Data records retrieved by a query are not prevented from modification by some other
transaction. Non-repeatable reads may occur, meaning data retrieved in a SELECT
statement may be modified by some other transaction when it commits. In this isolation
level, read locks are acquired on selected data but they are released immediately whereas
write locks are released at the end of the transaction.

[edit] READ UNCOMMITED

In this isolation level, dirty reads are allowed. One transaction may see uncommitted
changes made by some other transaction.
The default isolation level of different DBMSs varies quite widely. Most databases which
feature transactions allow the user to set any isolation level. Some DBMSs also require
additional syntax when performing a SELECT statement which is to acquire locks.

However, the definitions above have been criticised in the paper A Critique of ANSI
SQL Isolation Levels as being ambiguous, and as not accurately reflecting the isolation
provided by many databases:
This paper shows a number of weaknesses in the anomaly approach to defining
isolation levels. The three ANSI phenomena are ambiguous. Even their broadest
interpretations do not exclude anomalous behavior. This leads to some counter-
intuitive results. In particular, lock-based isolation levels have different
characteristics than their ANSI equivalents. This is disconcerting because
commercial database systems typically use locking. Additionally, the ANSI
phenomena do not distinguish among several isolation levels popular in
commercial systems.

[edit] Example Queries

In these examples two transactions take place. In the first transaction, Query 1 is
performed, then Query 2 is performed in the second transaction and the transaction
committed, followed by Query 1 is being performed again in the first transaction.
The queries use the following data table.
users
id name age
1 Joe 20
2 Jill 25
[edit] Phantom Reads

A phantom read occurs when, in the course of a transaction, two identical queries are
executed, and the collection of rows returned by the second query is different from the
first. This can occur when range locks are not acquired on performing a SELECT.
/* Transaction 1 */ | /* Transaction 2 */
|
/* Query 1 */ |
SELECT * FROM users |
WHERE age BETWEEN 10 AND 30; |
|
|
| /* Query 2 */
| INSERT INTO users VALUES ( 3, 'Bob', 27 );
| COMMIT;
|
/* Query 1 */ |
SELECT * FROM users |
WHERE age BETWEEN 10 AND 30; |
Note that transaction 1 executed the same query twice. If the highest level of isolation
were maintained, the same set of rows should be returned both times, and indeed that is
what is mandated to occur in a database operating at the SQL SERIALIZABLE isolation
level. However, at the lesser isolation levels, a different set of rows may be returned the
second time.
In the SERIALIZABLE isolation mode, Query 1 would result in all records with age in
the range 10 to 30 being locked, thus Query 2 would block until the first transaction was
committed. In REPEATABLE READ mode, the range would not be locked, allowing the
record to be inserted and the second execution of Query 1 to return the new row in its
results.

[edit] Non-repeatable Reads

In a lock-based concurrency control method, non-repeatable reads may occur when read
locks are not acquired when performing a SELECT. Under multiversion concurrency
control, non-repeatable reads may occur when the requirement that a transaction affected
by a commit conflict must rollback is relaxed.
/* Transaction 1 */
/* Query 1 */
SELECT * FROM users WHERE id = 1;
/* Transaction 2 */
/* Query 2 */
UPDATE users SET age = 21 WHERE id = 1;
COMMIT; /* in MVCC, or lock-based READ COMMITTED
*/
/* Query 1 */
SELECT * FROM users WHERE id = 1;
COMMIT; /* lock-based REPEATABLE READ */
In this example, Transaction 2 commits successfully, which means that its changes to the
row with id 1 should become visible. However, Transaction 1 has already seen a different
value for age in that row. At the SERIALIZABLE and REPEATABLE READ isolation
level, the DBMS must return the old value. At READ COMMITTED and READ
UNCOMMITTED, the DBMS may return the updated value; this is a non-repeatable
read.
There are two basic strategies used to prevent non-repeatable reads. The first is to delay
the execution of Transaction 2 until Transaction 1 has committed or rolled back. This
method is used when locking is used, and produces the serial schedule T1, T2. A serial
schedule does not exhibit non-repeatable reads.
In the other strategy, which is used in multiversion concurrency control, Transaction 2 is
permitted to commit first, which provides for better concurrency. However, Transaction
1, which commenced prior to Transaction 2, must continue to operate on a past version of
the database — a snapshot of the moment it was started. When Transaction 1 eventually
tries to commit, the DBMS looks to see if the result of committing Transaction 1 would
be equivalent to the schedule T1, T2. If it is, then Transaction 1 can succeed. If it cannot
be seen to be equivalent, however, Transaction 1 must rollback with a serialization
failure.
Using a lock-based concurrency control method, at the REPEATABLE READ isolation
mode, the row with ID = 1 would be locked, thus blocking Query 2 until the first
transaction was committed or rolled back. In READ COMMITTED mode the second
time Query 1 was executed the age would have changed.
Under MVCC, at the SERIALIZABLE isolation level, both SELECT queries see a
snapshot of the database taken at the start of Transaction 1. Therefore, they return the
same data. However, if Transaction 1 were then to attempt to UPDATE that row as well,
a serialization failure would occur and Transaction 1 would be forced to rollback.
At the READ COMMITTED isolation level, each query sees a snapshot of the database
taken at the start of each query. Therefore, they each see different data for the updated
row. No serialization failure is possible in this mode (because no promise of
serializability is made) and Transaction 1 will not have to be retried.

[edit] READ UNCOMMITTED (Dirty Read)

A dirty read occurs when a transaction reads data from a row that has been modified by
another transaction, but not yet committed.
Dirty reads work similarly to non-repeatable reads, however the second transaction would
not need to be committed for the first query to return a different result. The only thing
prevented in the READ UNCOMMITTED mode is that updates will not appear in the
results out of order; that is, earlier updates will always appear in a result set before later
updates.[verification needed]

Transactions are motivated by two of the properties of DBMS's discussed way back in
our first lecture:
 Multi-user database access
 Safe from system crashes

Multi-user database access

Most database systems run as servers where either:


1. multiple clients are simultaneously operating on the same database, or
2. one or more middle-tier application servers are maintaining multiple concurrent
connections to the database

Problems created by concurrency => need concurrency control

Example 1: Attribute-level inconsistency

Client 1 - UPDATE Student


SET address = CONCAT(address,:zip) WHERE ID = 123

Client 2 - UPDATE Student


SET address = CONCAT(address,:phone) WHERE ID = 123

For each client, DBMS reads address value, updates it, and writes it back. Possible
outcomes without concurrency control: one change or both.

Example 2: Relation-level inconsistency

Client 1 - UPDATE Apply SET decision = 'Y'


WHERE location = 'SB'
AND ID IN (SELECT ID FROM Student WHERE GPA > 3.2)

Client 2 - UPDATE Student


SET GPA = 1.2 * GPA
WHERE HSname = 'Paly'

Possible outcomes without concurrency control: some Paly students get into SB on scaled
GPA, others don't.

Example 3: Multiple-statement level inconsistency

Client 1 - INSERT INTO Archive (SELECT * FROM Apply WHERE decision = 'N')
DELETE FROM Apply WHERE decision = 'N'

Client 2 - SELECT COUNT(*) FROM Apply


SELECT COUNT(*) FROM Archive

Overall goal:
 Want to be able to execute a sequence of SQL statements so they at least appear
to be running in isolation.
 But also want to enable concurrency whenever possible.

Question: Why not just execute everything in sequence? What system, database, or
application features give us inherent concurrency that we want to exploit?

Safe from system crashes

 Bulk-loading the database, system crashes in the middle - what now?


 Example 3 Client 1:
 INSERT INTO Archive (SELECT * FROM Apply WHERE decision = 'N')
 DELETE FROM Apply WHERE decision = 'N'

System crashes in the middle - what now?

 Performed lots of update operations and DBMS is buffering the database in


memory for efficiency. System crashes - what now?

Need crash recovery


Solution: Transactions

A transaction is a sequence of one or more SQL operations treated as a unit.


 Transactions appear to run in isolation.
 If the system crashes, each transaction's changes are reflected in the persistent
database either entirely or not at all.

SQL standard (and Oracle):


 Transaction begins automatically when first SQL command is issued.
 Transaction ends (and new one begins) when "COMMIT" command is issued or
session ends.
 Alternative "AUTO COMMIT" mode turns each statement into a transaction.

Transaction properties

Transactions obey the ACID properties: Atomicity, Consistency, Isolation, Durability

(1) Isolation

Isolation obtained through serializability: operations within transactions may be


interleaved but execution must be equivalent to some sequential (serial) order.
Question: How is this guarantee achieved?

Solves Examples 1,2,3 above


Example 4 (variant on 3):

Client 1 - INSERT INTO Archive (SELECT * FROM Apply WHERE decision = 'N')
DELETE FROM Apply WHERE decision = 'N'

Client 2 - UPDATE Apply SET decision = 'U' WHERE Campus = 'Irvine'


Serialization order can make a big difference. This is the application's problem to solve,
not the DBMS.

(2) Durability

If system crashes after transaction commits, all effects of transaction remain in database.
Question: Seems obvious, but all DBMS's manipulate the data in memory, so how is this
guarantee achieved?

(3) Atomicity

Each transaction's operations are executed all-or-nothing, never left "half done."
 E.g., If system crashes before transaction commits, no effects of transaction
remain in database - transaction can start over when system comes back up.
 E.g., If error or exception occurs during a transaction, partial effects of the
transaction are undone.

Question: How is this guarantee achieved?

"Transaction rollback" = "transaction abort"


 Undoes partial effects of a transaction
 May be system-initiated or client-initiated

Robust application wraps every transaction with exception for system-initiated rollback.
Client-initiated rollback:

BEGIN TRANSACTION;
<get input from user>
SQL commands based on input
<confirm results with user>
IF input = confirm-OK THEN COMMIT; ELSE ROLLBACK;
Note: Rollback only undoes database changes, not other changes (e.g., program
variables) or side-effects (e.g., printing to screen, delivering cash).
Question: No self-respecting database programmer would write the above transaction.
Why?

(4) Consistency
Not really a property, more a good application of the other properties.
Idea: Assume all constraints are true at the start of every transaction. Clients are to
guarantee, under this assumption and isolation, that all constraints are still true at the end
of every transaction. (Similar to program invariants)

Read-only transactions

Can tell system a transaction will not perform writes, system will optimize accordingly.
"SET TRANSACTION READ ONLY"
Many, many transactions and applications fall into this category.
Question: If there are five read-only transactions and no other transactions, what does
the system need to do to guarantee serializability?

Weaker properties

There's a lot of overhead and concurrency reduction to guaranteeing the ACID properties.
Sometimes full isolation (i.e., full serializability) is not required.
Three weaker isolation levels:
 repeatable read
 read committed
 read uncommitted

Note: An isolation level is in the eye of the beholder. Specifically, the reads performed
by a transaction must adhere to its own isolation level.

Dirty reads

A data item is "dirty" if it has been written by an uncommitted transaction.

Example 5:

Client 1 - BEGIN TRANSACTION;


...
UPDATE Student SET GPA = .99 * GPA
...
COMMIT;

Client 2 - BEGIN TRANSACTION;


...
SELECT AVG(GPA) FROM Student
...
COMMIT;
Client 2 may only care about approximate average - dirty reads okay. Use:
"SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED"
Note: Isolation level of Client 1 transaction is irrelevant unless another transaction
updates GPA.
Committed reads

"SET TRANSACTION ISOLATION LEVEL READ COMMITTED"


 Cannot read uncommitted writes
 But still doesn't guarantee global serializability even if all other transactions have
isolation level SERIALIZABLE.

Modified Example 5:

Client 1 - BEGIN TRANSACTION; // serializable


...
UPDATE Student SET GPA = .99 * GPA
...
COMMIT;

Client 2 - SET TRANSACTION ISOLATION LEVEL READ COMMITTED;


BEGIN TRANSACTION;
...
SELECT AVG(GPA) FROM Student // executes before Client 1
...
SELECT MAX(GPA) FROM Student // executes after Client 1
...
COMMIT;

Repeatable read

"SET TRANSACTION ISOLATION LEVEL REPEATABLE READ"


Modified Example 5: get same GPAs both times
=> But still doesn't guarantee global serializability!

Example 6:

Client 1 - BEGIN TRANSACTION; // serializable


...
UPDATE Student SET GPA = .99 * GPA
UPDATE Student SET SAT = 1.01 * SAT
...
COMMIT;

Client 2 - SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;


BEGIN TRANSACTION;
...
SELECT AVG(GPA) FROM Student // executes before Client 1
...
SELECT AVG(SAT) FROM Student // executes after Client 1
...
COMMIT;
The following example is more realistic for repeatable read but not globally serializable,
and is based on the fact that repeatable read does not apply to inserted tuples.

Example 7:

Client 1 - BEGIN TRANSACTION; // serializable


...
INSERT INTO Student <100 new students>
...
COMMIT;

Client 2 - SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;


BEGIN TRANSACTION;
...
SELECT AVG(GPA) FROM Student // executes before Client 1
...
SELECT MAX(GPA) FROM Student // executes after Client 1
...
COMMIT;
New inserted tuples are called phantoms.

Question: What isolation level do you think Oracle supports as a default?

Summary

Standard default: transactions are serializable


Weaker isolation levels increase performance by eliminating overhead and increasing
concurrency. From weakest to strongest and the read behaviors they permit:

isolation level dirty reads nonrepeatable reads phantoms

read uncommitted Y Y Y
read committed N Y Y
repeatable read N N Y
serializable N N N
Remember that the isolation level is in the eye of the beholding transaction: For true
global serializability, every transaction must have isolation level SERIALIZABLE.

Вам также может понравиться