unit- IV notes continued
Database Recovery
the
concept of database recovery as a service that should be provided by the DBMS
to ensure that the database is reliable and remains in a consistent state in
the presence of failures.
The Need for
Recovery
The storage of data generally
includes four different types of media with an increasing
degree of reliability: main
memory, magnetic disk, magnetic tape, and optical disk. Main
memory is volatile storage
that usually does not survive system crashes. Magnetic disks
provide online non-volatile storage.
There are many different types of
failure that can affect database processing, each
of which has to be dealt with in
a different manner. Some failures affect main memory
only, while others involve
non-volatile (secondary) storage. Among the causes of failure
are:
·
system crashes due to hardware or software
errors, resulting in loss of main memory;
·
media failures, such as head crashes or
unreadable media, resulting in the loss of parts
of
secondary storage;
·
application software errors, such as
logical errors in the program that is accessing the
database,
which cause one or more transactions to fail;
·
natural physical disasters, such as fires,
floods, earthquakes, or power failures;
·
carelessness or unintentional destruction of data or
facilities by operators or users;
·
sabotage, or intentional corruption or
destruction of data, hardware, or software
facilities.
Recovery
Facilities
A DBMS should provide
the following facilities to assist with recovery:
1.
A backup mechanism
2.
Log file
3.
Checkpoint
4.
Recovery manager
1)
Backup
mechanism The DBMS should provide a mechanism to
allow backup copies of the database and the log file to be made at
regular intervals without necessarily having to stop the system first. The
backup copy of the database can be used in the event that the database has been
damaged or destroyed. A backup can be a complete copy of the entire database or
an incremental backup, consisting only of modifications made since the last
complete or incremental backup. Typically, the backup is stored on offline
storage, such as magnetic tape.
2)
Log file
To keep track of database
transactions, the DBMS maintains a special file called a log
(or journal) that contains
information about all updates to the database. The log may contain
the following data:
Transaction
records,-
– transaction identifier;
– type of log record (transaction
start, insert, update, delete, abort, commit);
– identifier of data item
affected by the database action (insert, delete, and update
operations);
– before-image of the data
item, that is, its value before change (update and delete
operations only);
– after-image of the data
item, that is, its value after change (insert and update operations
only);
– log management information,
such as a pointer to previous and next log records for
that transaction (all operations).
Owing to the importance of the
transaction log file in the recovery process, the log may
be duplexed or triplexed (that
is, two or three separate copies are maintained) so that if one
copy is damaged, another can be
used. In the past, log files were stored on magnetic tape
because tape was more reliable
and cheaper than magnetic disk. However, nowadays
DBMSs are expected to be able to
recover quickly from minor failures. This requires that
the log file be stored online on a fast
direct-access storage device.
3)
Checkpointing
The information in the log file
is used to recover from a database failure. One difficulty
with this scheme is that when a
failure occurs we may not know how far back in the log
to search and we may end up
redoing transactions that have been safely written to the
database. To limit the amount of
searching and subsequent processing that we need to
carry out on the log file, we can use a technique
called checkpointing.
Checkpoints are scheduled at
predetermined intervals and involve the following
operations:
·
writing
all log records in main memory to secondary storage;
·
writing
the modified blocks in the database buffers to secondary storage;
·
writing
a checkpoint record to the log file. This record contains the identifiers of
all
transactions that are active at the time of the
checkpoint.
4) recovery manager
allows the system to restore the database to a
consistent state following a failure.
Recovery Techniques
·
If
the database has been extensively damaged, for example a disk head crash has
occurred and destroyed the database, then it is necessary to restore the last
backup copy of the database using the log file. This assumes, that the log file
has not been damaged as well. the log file can be stored on a disk separate
from the main database files. This reduces the risk of both the database files
and the log file being damaged at the same time.
·
If
the database has not been physically damaged but has become inconsistent, for example
the system crashed while transactions were executing, then it is necessary to undo
the changes that caused the inconsistency. Here, we do not need to use the
backup copy of the database but can restore the database to a consistent state
using the before- and after-images held in the log file.
1)
Recovery
techniques using deferred update
Using the deferred update recovery
protocol, updates are not written to the database until
after a transaction has reached
its commit point. If a transaction fails before it reaches this
point, it will not have modified
the database and so no undoing of changes will be necessary.
we use the log file to protect
against system failures in the following way:
à When a transaction starts, write
a transaction start record to the log.
à When any write operation is
performed, write a log record containing all the log data
specified previously (excluding
the before-image of the update).
à When a transaction is about to
commit, write a transaction commit log record, write all
the log records for the
transaction to disk, and then commit the transaction. Use the log
records to perform the actual
updates to the database.
à If a transaction aborts, ignore
the log records for the transaction and do not perform the
writes.
2) Recovery techniques using
immediate update
Using the immediate update recovery
protocol, updates are applied to the database as they
occur without waiting to reach
the commit point. As well as having to redo the updates of
committed transactions following
a failure, it may now be necessary to undo the effects of
transactions that had not
committed at the time of failure. In this case, we use the log file
to protect against system
failures in the following way:
When a transaction starts, write
a transaction start record to the log.
à When a write operation is
performed, write a record containing the necessary data to the
log file.
à Once the log record is written,
write the update to the database buffers.
à The updates to the database
itself are written when the buffers are next flushed to
secondary storage.
à When the transaction commits,
write a transaction commit record to the log.
If the system fails, recovery
involves using the log to undo or redo transactions:
·
For
any transaction for which both a transaction start and transaction
commit record appear in the log, we redo using the log records to write the
after-image of updated fields
·
For
any transaction for which the log contains a transaction start record
but not a transaction commit record, we need to undo that transaction.
This time the log records are used to write the before-image of the affected
fields
3)
Shadow
paging
An alternative
to the log-based recovery schemes described above is shadow paging. This
scheme maintains two-page tables during the life of a transaction: a current
page table and a shadow page table. When the transaction starts, the
two-page tables are the same. The shadow page table is never changed
thereafter, and is used to restore the database in the event of a system
failure. During the transaction, the current page table is used to record all
updates to the database. When the transaction completes, the current page table
becomes the shadow page table. Shadow paging has several advantages over the
log-based schemes: the overhead of maintaining the log file is eliminated, and
recovery is significantly faster since there is no need for undo or redo
operations.
Advanced Transaction Models
Nested Transaction Model
was introduced by Moss (1981). In
this model, the complete transaction forms a tree, or hierarchy, of subtransactions.
There is a top-level transaction that can have a number of child transactions;
each child transaction can also have nested transactions. In Moss’s original proposal,
only the leaf-level subtransactions (the subtransactions at the lowest level of
nesting) are allowed to perform the database operations.
For example, we have a reservation transaction (T1) that consists
of booking flights (T2), hotel (T5), and hire car (T6). The flight reservation
booking itself is split into two subtransactions: one to book a flight from
London to Paris (T3) and a second to book a connecting flight from Paris to New
York (T4). Transactions have to commit from the bottom upwards. Thus, T3 and T4
must commit before parent transaction T2, and T2 must commit before parent T1.
The main advantages of the nested
transaction model are its support for:
·
Modularity A transaction
can be decomposed into a number of subtransactions for the purposes of
concurrency and recovery.
·
A finer level of
granularity
for concurrency control and recovery Occurs at the level of the
subtransaction rather than the transaction.
·
Intra-transaction
parallelism
Subtransactions
can execute concurrently.
·
Intra-transaction
recovery
Uncommitted
subtransactions can be aborted and rolled back without any side-effects to
other subtransactions.
Emulating nested transactions
using savepoints
One of the objectives of the
nested transaction model is to provide a unit of recovery at
a finer level of granularity than
the transaction. During the execution of a transaction,
the user can establish a savepoint,
for example using a SAVE WORK statement.† This
generates an identifier that the
user can subsequently use to roll the transaction back to,
for example using a ROLLBACK WORK
<savepoint_identifier> statement.
Serializability
and Recoverability
The
objective of a concurrency control protocol is to schedule transactions in such
a way as to avoid any interference between them. One obvious solution is to
allow only one transaction to execute at a time: one transaction is committed before the next
transaction is allowed to begin. However, the aim of a multi-user DBMS is also to maximize the
degree of concurrency or parallelism in the system, so that transactions that
can execute without interfering with one another can run in parallel.
for
each transaction Ti in schedule
S, the order of the operations in Ti must be the same in schedule S.
In a
serial schedule, the transactions are performed in serial order. For example,
if we
have
two transactions T1 and T2, serial order would be T1 followed by T2, or T2 followed
by T1.
Types of serializability
1) Conflict
serializability
In
serializability, the ordering of read and write operations is important:
--
If two transactions only read a data item, they do not conflict and order is
not important.
--If
two transactions either read or write completely separate data items, they do
not and order is not important.
--
If one transaction writes a data item and another either reads or writes the
same data
item,
the order of execution is important.
The figure containing operations
from two concurrently executing transactions T7 and T8. The write (balx) instruction of T7 conflicts with the
read(balx) instruction of T8. However, the write (balx) instruction of T2 does
not conflict with the read(baly) instruction of T7, because the two
instructions access different data items.
2)
Non- Conflict
serializability
If
we also now change the order of the following non-conflicting operations, we
produce the equivalent serial schedule S3 shown in Figure. Here T7 and T8 are non confilicting
transactions and also in serial schedule.
3)
View serializability
Two
schedules S1 and S2 consisting of the same operations from n transactions T1, T2, . . . , Tn are view equivalent if the following three
conditions hold:
·
For each data item x, if
transaction Ti reads the
initial value of x in
schedule S1, then transaction
Ti must also
read the initial value of x in schedule S2.
·
For each read operation on data item x by
transaction Ti in schedule
S1, if the
value read by x has
been written by transaction Tj, then transaction Ti must also read the value of x
produced by transaction Tj in schedule S2.
·
For each data item x, if
the last write operation on x was performed by transaction Ti in schedule S1, the same
transaction must perform the final write on data item x in schedule
S2.
Recoverability
Serializability identifies
schedules that maintain the consistency of the database, assuming
that none of the transactions in
the schedule fails. An alternative perspective examines the
recoverability of transactions within
a schedule. If a transaction fails, the atomicity property
requires that we undo the effects
of the transaction. In addition, the durability property states
that once a transaction commits,
its changes cannot be undone
Thank you so much for the information on recovery as a service. I was looking for these information since I was facing some trouble in my business. So hope they can solve my problem.
ReplyDelete