Академический Документы
Профессиональный Документы
Культура Документы
In database systems, isolation is a property that the changes made by an operation are not
visible to other simultaneous operations on the system until its completion. This is one of
the ACID (Atomicity, Consistency, Isolation, Durability) properties.
Contents
[hide]
1 Isolation Levels
o 1.1 SERIALIZABLE
o 1.2 REPEATABLE READ
o 1.3 READ COMMITTED
o 1.4 READ UNCOMMITED
2 Example Queries
o 2.1 Phantom Reads
o 2.2 Non-repeatable Reads
o 2.3 READ UNCOMMITTED (Dirty Read)
3 See also
The isolation property is the most often relaxed ACID property in a DBMS (Database
Management System). This is because to maintain the highest level of isolation a DBMS
must acquire locks on data, which may result in a loss of concurrency, or else implement
multiversion concurrency control, which may require additional application logic to
function correctly.
Most DBMSs offer a number of transaction isolation levels which control the degree of
locking which occurs when selecting data. For many database applications the majority
of database transactions can be constructed in such a way as to not require high isolation
levels, thus reducing the locking overhead for the system. The programmer must
carefully analyse database access code to ensure that any relaxation of isolation does not
cause difficult-to-find software bugs. Conversely, at higher isolation levels the possibility
of deadlock is increased, which also requires careful analysis and programming
techniques to avoid.
The isolation levels defined by the ANSI/ISO SQL standard are:
[edit] SERIALIZABLE
This isolation level specifies that all transactions occur in a completely isolated fashion;
i.e., as if all transactions in the system had executed serially, one after the other. The
DBMS may execute two or more transactions at the same time only if the illusion of
serial execution can be maintained. At this isolation level, phantom reads cannot occur.
With a lock-based concurrency control DBMS implementation, serializability requires
that range locks are acquired when a query uses a ranged WHERE clause. When using
non-lock concurrency control, no lock is acquired; however, if the system detects a
concurrent transaction in progress which would violate the serializability illusion, it must
force that transaction to rollback, and the application will have to restart the transaction.
[edit] REPEATABLE READ
All data records retrieved by a SELECT statement cannot be changed; however, if the
SELECT statement contains any ranged WHERE clauses, phantom reads may occur. In
this isolation level the transaction acquires read locks on all retrieved data, but does not
acquire range locks.
Data records retrieved by a query are not prevented from modification by some other
transaction. Non-repeatable reads may occur, meaning data retrieved in a SELECT
statement may be modified by some other transaction when it commits. In this isolation
level, read locks are acquired on selected data but they are released immediately whereas
write locks are released at the end of the transaction.
In this isolation level, dirty reads are allowed. One transaction may see uncommitted
changes made by some other transaction.
The default isolation level of different DBMSs varies quite widely. Most databases which
feature transactions allow the user to set any isolation level. Some DBMSs also require
additional syntax when performing a SELECT statement which is to acquire locks.
However, the definitions above have been criticised in the paper A Critique of ANSI
SQL Isolation Levels as being ambiguous, and as not accurately reflecting the isolation
provided by many databases:
This paper shows a number of weaknesses in the anomaly approach to defining
isolation levels. The three ANSI phenomena are ambiguous. Even their broadest
interpretations do not exclude anomalous behavior. This leads to some counter-
intuitive results. In particular, lock-based isolation levels have different
characteristics than their ANSI equivalents. This is disconcerting because
commercial database systems typically use locking. Additionally, the ANSI
phenomena do not distinguish among several isolation levels popular in
commercial systems.
In these examples two transactions take place. In the first transaction, Query 1 is
performed, then Query 2 is performed in the second transaction and the transaction
committed, followed by Query 1 is being performed again in the first transaction.
The queries use the following data table.
users
id name age
1 Joe 20
2 Jill 25
[edit] Phantom Reads
A phantom read occurs when, in the course of a transaction, two identical queries are
executed, and the collection of rows returned by the second query is different from the
first. This can occur when range locks are not acquired on performing a SELECT.
/* Transaction 1 */ | /* Transaction 2 */
|
/* Query 1 */ |
SELECT * FROM users |
WHERE age BETWEEN 10 AND 30; |
|
|
| /* Query 2 */
| INSERT INTO users VALUES ( 3, 'Bob', 27 );
| COMMIT;
|
/* Query 1 */ |
SELECT * FROM users |
WHERE age BETWEEN 10 AND 30; |
Note that transaction 1 executed the same query twice. If the highest level of isolation
were maintained, the same set of rows should be returned both times, and indeed that is
what is mandated to occur in a database operating at the SQL SERIALIZABLE isolation
level. However, at the lesser isolation levels, a different set of rows may be returned the
second time.
In the SERIALIZABLE isolation mode, Query 1 would result in all records with age in
the range 10 to 30 being locked, thus Query 2 would block until the first transaction was
committed. In REPEATABLE READ mode, the range would not be locked, allowing the
record to be inserted and the second execution of Query 1 to return the new row in its
results.
In a lock-based concurrency control method, non-repeatable reads may occur when read
locks are not acquired when performing a SELECT. Under multiversion concurrency
control, non-repeatable reads may occur when the requirement that a transaction affected
by a commit conflict must rollback is relaxed.
/* Transaction 1 */
/* Query 1 */
SELECT * FROM users WHERE id = 1;
/* Transaction 2 */
/* Query 2 */
UPDATE users SET age = 21 WHERE id = 1;
COMMIT; /* in MVCC, or lock-based READ COMMITTED
*/
/* Query 1 */
SELECT * FROM users WHERE id = 1;
COMMIT; /* lock-based REPEATABLE READ */
In this example, Transaction 2 commits successfully, which means that its changes to the
row with id 1 should become visible. However, Transaction 1 has already seen a different
value for age in that row. At the SERIALIZABLE and REPEATABLE READ isolation
level, the DBMS must return the old value. At READ COMMITTED and READ
UNCOMMITTED, the DBMS may return the updated value; this is a non-repeatable
read.
There are two basic strategies used to prevent non-repeatable reads. The first is to delay
the execution of Transaction 2 until Transaction 1 has committed or rolled back. This
method is used when locking is used, and produces the serial schedule T1, T2. A serial
schedule does not exhibit non-repeatable reads.
In the other strategy, which is used in multiversion concurrency control, Transaction 2 is
permitted to commit first, which provides for better concurrency. However, Transaction
1, which commenced prior to Transaction 2, must continue to operate on a past version of
the database — a snapshot of the moment it was started. When Transaction 1 eventually
tries to commit, the DBMS looks to see if the result of committing Transaction 1 would
be equivalent to the schedule T1, T2. If it is, then Transaction 1 can succeed. If it cannot
be seen to be equivalent, however, Transaction 1 must rollback with a serialization
failure.
Using a lock-based concurrency control method, at the REPEATABLE READ isolation
mode, the row with ID = 1 would be locked, thus blocking Query 2 until the first
transaction was committed or rolled back. In READ COMMITTED mode the second
time Query 1 was executed the age would have changed.
Under MVCC, at the SERIALIZABLE isolation level, both SELECT queries see a
snapshot of the database taken at the start of Transaction 1. Therefore, they return the
same data. However, if Transaction 1 were then to attempt to UPDATE that row as well,
a serialization failure would occur and Transaction 1 would be forced to rollback.
At the READ COMMITTED isolation level, each query sees a snapshot of the database
taken at the start of each query. Therefore, they each see different data for the updated
row. No serialization failure is possible in this mode (because no promise of
serializability is made) and Transaction 1 will not have to be retried.
A dirty read occurs when a transaction reads data from a row that has been modified by
another transaction, but not yet committed.
Dirty reads work similarly to non-repeatable reads, however the second transaction would
not need to be committed for the first query to return a different result. The only thing
prevented in the READ UNCOMMITTED mode is that updates will not appear in the
results out of order; that is, earlier updates will always appear in a result set before later
updates.[verification needed]
Transactions are motivated by two of the properties of DBMS's discussed way back in
our first lecture:
Multi-user database access
Safe from system crashes
For each client, DBMS reads address value, updates it, and writes it back. Possible
outcomes without concurrency control: one change or both.
Possible outcomes without concurrency control: some Paly students get into SB on scaled
GPA, others don't.
Client 1 - INSERT INTO Archive (SELECT * FROM Apply WHERE decision = 'N')
DELETE FROM Apply WHERE decision = 'N'
Overall goal:
Want to be able to execute a sequence of SQL statements so they at least appear
to be running in isolation.
But also want to enable concurrency whenever possible.
Question: Why not just execute everything in sequence? What system, database, or
application features give us inherent concurrency that we want to exploit?
Transaction properties
(1) Isolation
Client 1 - INSERT INTO Archive (SELECT * FROM Apply WHERE decision = 'N')
DELETE FROM Apply WHERE decision = 'N'
(2) Durability
If system crashes after transaction commits, all effects of transaction remain in database.
Question: Seems obvious, but all DBMS's manipulate the data in memory, so how is this
guarantee achieved?
(3) Atomicity
Each transaction's operations are executed all-or-nothing, never left "half done."
E.g., If system crashes before transaction commits, no effects of transaction
remain in database - transaction can start over when system comes back up.
E.g., If error or exception occurs during a transaction, partial effects of the
transaction are undone.
Robust application wraps every transaction with exception for system-initiated rollback.
Client-initiated rollback:
BEGIN TRANSACTION;
<get input from user>
SQL commands based on input
<confirm results with user>
IF input = confirm-OK THEN COMMIT; ELSE ROLLBACK;
Note: Rollback only undoes database changes, not other changes (e.g., program
variables) or side-effects (e.g., printing to screen, delivering cash).
Question: No self-respecting database programmer would write the above transaction.
Why?
(4) Consistency
Not really a property, more a good application of the other properties.
Idea: Assume all constraints are true at the start of every transaction. Clients are to
guarantee, under this assumption and isolation, that all constraints are still true at the end
of every transaction. (Similar to program invariants)
Read-only transactions
Can tell system a transaction will not perform writes, system will optimize accordingly.
"SET TRANSACTION READ ONLY"
Many, many transactions and applications fall into this category.
Question: If there are five read-only transactions and no other transactions, what does
the system need to do to guarantee serializability?
Weaker properties
There's a lot of overhead and concurrency reduction to guaranteeing the ACID properties.
Sometimes full isolation (i.e., full serializability) is not required.
Three weaker isolation levels:
repeatable read
read committed
read uncommitted
Note: An isolation level is in the eye of the beholder. Specifically, the reads performed
by a transaction must adhere to its own isolation level.
Dirty reads
Example 5:
Modified Example 5:
Repeatable read
Example 6:
Example 7:
Summary
read uncommitted Y Y Y
read committed N Y Y
repeatable read N N Y
serializable N N N
Remember that the isolation level is in the eye of the beholding transaction: For true
global serializability, every transaction must have isolation level SERIALIZABLE.