Traditional File-Based
Systems
·
Understanding the problems inherent in file-based
systems may prevent us from repeating these problems in database systems.
- · If you wish to convert a file-based system to a database system, understanding how the file system works will be extremely useful, if not essential.
File-based
system :
A collection of application programs that perform services for the end-users such as the production of
reports. Each program defines and manages its own data.
File-based
systems were an early attempt to computerize the manual filing system that we
are all familiar with.
·
For
example, in an organization a manual file is set up to hold all external and
internal correspondence relating to a product, task, client, or employee.
Typically, there are many such files, and for safety they are labeled and
stored in one or more cabinets.
·
For
security, the cabinets may have locks or may be located in secure areas of the
building. In our own home, we probably have some sort of filing system which
contains receipts, guarantees, invoices, bank statements, and such like.
·
When we need to look something up, we go to
the filing system and search through the system starting from the first entry
until we find what we want.
·
Alternatively,
we may have an indexing system that helps locate what we want more quickly. For
example, we may have divisions in the filing system or separate folders for
different types of item that are in some way logically related.
·
The
manual filing system works well while the number of items to be stored is
small.
·
It
even works quite adequately when there are large numbers of items and we have
only to store and retrieve them.
·
However,
the manual filing system breaks down when we have to cross-reference or process
the information in the files.
For example,
a typical real estate agent’s office might have a separate file for each
property for sale or rent, each potential buyer and renter,
and each member of staff. Consider the effort that would be required to
answer the following questions:
·
What
three-bedroom properties do you have for sale with a garden and garage?
·
What
flats do you have for rent within three miles of the city center?
·
What
is the average rent for a two-bedroom flat?
·
What
is the total annual salary bill for staff?
·
How
does last month’s turnover compare with the projected figure for this month?
·
What
is the expected monthly turnover for the next financial year?
Staff
details: Staff(staffNo, fName, lName,
position, sex, dateOfBirth, salary, branchNo)
StaffSalary(staffNo, fName, lName, sex, salary, branchNo)
Limitations (drawbacks) of the File-Based Approach
·
Separation and isolation of data
When data is isolated in separate files, it is more difficult
to access data that should be available. For example, if we want to produce a list
of all houses that match the requirements of clients, we first need to create a
temporary file of those clients who have ‘house’ as the preferred type.
·
Duplication of data
Uncontrolled duplication of data is undesirable for several
reasons, including:
a. Duplication
is wasteful. It costs time and money to enter the data more than once.
b. It takes up
additional storage space, again with associated costs. Often, the duplication of
data can be avoided by sharing data files.
c. Perhaps
more importantly, duplication can lead to loss of data integrity; in other
words, the data is no longer consistent.
·
Data dependence
Changes to an existing structure are difficult to make.
For example,
increasing the size of the address field from 40 to 41 characters sounds like a
simple change, but it requires the creation of a one-off program (that is, a
program that is run only once and can then be discarded) that converts the file
to the new format.
a. open the
original file for reading;
b. open a
temporary file with the new structure;
c. Read a
record from the original file, and write it to the temporary file. Repeat this
step for all records in the original file;
d. delete the
original file;
e. rename the
temporary file
·
Incompatible file formats
The
structures are dependent on the application programming language. For example,
the structure of a file generated by a COBOL program may be different from the
structure of a file generated by a ‘C’ program. The direct incompatibility of
such files makes them difficult to process jointly. This can be time-consuming and expensive.
·
Fixed queries/proliferation of application
programs
From the end-user’s point of view, file-based systems proved
to be a great improvement over manual systems.
a. There was
no provision for security or integrity;
b. Recovery,
in the event of a hardware or software failure, was limited or non-existent;
c. Access to
the files was restricted to one user at a time – there was no provision for
shared access by staff in the same department.
DATABASE APPROACH
Database:
A shared collection of logically related data, and a
description of this data, designed to meet the information needs of an
organization.
·
The database is a single, possibly large repository of
data that can be used simultaneously by many departments and users.
·
Instead of files with redundant data, all data items
are integrated with a minimum amount of duplication. The database is no longer
owned by one department but is a shared corporate resource.
·
The database holds not only the organization’s
operational data but also a description of this data. For this reason, a
database is also defined as a self-describing
collection of integrated records. The description of the data is known as the system catalog (or data dictionary or
metadata – the ‘data about data’). It is the self-describing nature of a
database that provides program–data independence.
One advantage of this approach, known as data abstraction, If new data structures are added or existing
structures are modified then the application programs are unaffected, provided
they do not directly depend upon what has been modified. For example, if we add
a new field to a record or create a new file, existing applications are
unaffected. However, if we remove a field from a file that an application
program uses, then that application program is affected by this change and must
be modified accordingly.
‘logically
related’- in the definition means When we analyze the information needs
of an organization, we attempt to identify entities,
attributes, and relationships.
An entity is a
distinct object (a person, place, thing, concept, or event) in the organization.
An attribute is a property
that describes some aspect of the object that we wish to record.
relationship- is an
association between entities.
The Database Management System (DBMS)
DBMS: A software system that enables users to
define, create, maintain, and control access to the database.
The DBMS is the software that
interacts with the users’ application programs and the database. Typically, a
DBMS provides the following facilities:
·
It
allows users to define the database, usually through a Data Definition Language (DDL). The DDL allows users to specify the
data types and structures and the constraints on the data to be stored in the
database.
·
It
allows users to insert, update, delete, and retrieve data from the database,
usually through a Data Manipulation
Language (DML).it provides an inquiry facility to this data called query language. The most common query
language is the Structured Query
Language(SQL).
·
It
provides controlled access to the database. For example, it may provide:
o
a
security system, which prevents unauthorized users accessing the database;
o
an
integrity system, which maintains the consistency of stored data;
o
a
concurrency control system, which allows shared access of the database
o
a
recovery control system, which restores the database to a previous consistent
state following a hardware or software failure;
o
a
user-accessible catalog, which contains descriptions of the data in the
database.
Components of the DBMS
Environment


Hardware
The DBMS and the applications require hardware to run. The
hardware can range from a single personal computer, to a single mainframe, to a
network of computers.
The particular hardware depends on the organization’s requirements and
the DBMS used. Some DBMSs run only on particular hardware or operating systems,
while others run on a wide variety of hardware and operating systems. A DBMS
requires a minimum amount of main memory and disk space to run.
This is called a
client–server architecture: the backend
is the server and the frontends are the clients.
Software
The software
component comprises the DBMS software itself and the application programs,
together with the operating system, including network software if the DBMS is
being used over a network.
Typically, application
programs are written in a third-generation programming language (3GL), such as
‘C’, C++, Java, Visual Basic, COBOL, Fortran, Ada, or Pascal, or using a
fourth-generation language (4GL), such as SQL.
The use of fourth-generation tools can improve productivity
significantly and produce programs that are easier to maintain.
Data
Perhaps the most important component of the DBMS environment,
certainly from the end-users’ point of view, is the data. We observe that the
data acts as a bridge between the machine components and the human components.
The database contains both the operational data and the
metadata, the ‘data about data’. The
structure of the database is called the schema.
Procedures
Procedures
refer to the instructions and rules that govern the design and use of the
database. The users of the system and the staff that manage the database
require documented procedures on how to use or run the system.
These may
consist of instructions on how to:
·
log
on to the DBMS;
·
use
a particular DBMS facility or application program;
·
start
and stop the DBMS;
·
make
backup copies of the database;
·
handle
hardware or software failures.
·
change
the structure of a table, reorganize the database across multiple disks,
improve performance, or archive data to secondary storage.
People
We can
identify four distinct types of people that participate in the DBMS
environment: data and database administrators, database designers, application
developers, and the end-users.
Roles in the Database Environment
We can identify four distinct types of people that
participate in the DBMS environment: data and database administrators, database
designers, application developers, and the end-users.
·
Data and Database Administrators
Data and database administration are the roles generally
associated with the management and control of a DBMS and its data. The Data
Administrator (DA) is responsible for the management of the data resource
including database planning, development and maintenance of standards, policies
and procedures, and conceptual/logical database design.
The Database
Administrator (DBA) is responsible for the physical realization of the
database, including physical database design and implementation, security and
integrity control, maintenance of the operational system, and ensuring
satisfactory performance of the applications for users.
·
Database Designers
In large database design projects, we can distinguish between
two types of designer: logical database designers and physical database
designers. The logical database designer is concerned with identifying the data
(that is, the entities and attributes), the relationships between the data, and
the constraints on the data that is to be stored in the database. Examples:-
Ø a member of
staff cannot manage more than 100 properties for rent or sale at the same time;
Ø a member of
staff cannot handle the sale or rent of his or her own property;
we split the work of the logical database designer into two
stages:
Ø conceptual
database design, which is independent of implementation details such as the
target DBMS, application programs, programming languages, or any other physical
considerations;
Ø logical
database design, which targets a specific data model, such as relational,
network, hierarchical, or object-oriented.
The physical database designer decides how the logical
database design is to be physically realized. This involves:
Ø mapping the
logical database design into a set of tables and integrity constraints;
Ø selecting
specific storage structures and access methods for the data to achieve good
performance;
Ø designing
any security measures required on the data.
·
Application Developers
The application programs that provide the required functionality
for the end-users must be implemented. This is the responsibility of the
application developers. Typically, the application developers work from a
specification produced by systems analysts. Each program contains statements
that request the DBMS to perform some operation on the database. This includes
retrieving data, inserting, updating, and deleting data. The programs may be
written in a third-generation programming language or a fourth-generation
language
·
End-Users
o Naïve users are typically unaware of the DBMS.
They access the database through specially written application programs that
attempt to make the operations as simple as possible. They invoke database
operations by entering simple commands or choosing options from a menu. This
means that they do not need to know anything about the database or the DBMS.
For example, the checkout assistant at the local supermarket uses a bar code
reader to find out the price of the item.
o Sophisticated users: At the other end of the
spectrum, the sophisticated end-user is familiar with the structure of the
database and the facilities offered by the DBMS. Some sophisticated end-users
may even write application programs for their own use.
Advantages and Disadvantages of DBMS
The database management system has promising potential
advantages. Unfortunately, there are also disadvantages.
Advantages
The advantages of database management systems are listed
below
1)
Control of
data redundancy
Traditional file-based systems waste space by storing the
same information in more than one file. In contrast, the database approach
attempts to eliminate the redundancy by integrating the files so that multiple
copies of the same data are not stored. However, the database approach does not
eliminate redundancy entirely, but controls the amount of redundancy inherent
in the database.
2)
Data
consistency
By
eliminating or controlling redundancy, we reduce the risk of inconsistencies
occurring. If a data item is stored only once in the database, any update to
its value has to be performed only once and the new value is available
immediately to all users. If a data item is stored more than once and the
system is aware of this, the system can ensure that all copies of the item are kept consistent.
3)
More
information from the same amount of data
With the integration of the operational data, it may be
possible for the organization to derive additional information from the same
data.
4)
Sharing of
data
Typically, files are owned by the people or departments that
use them. On the other hand, the database belongs to the entire organization
and can be shared by all authorized users. In this way, more users share more
of the data.
5)
Improved
data integrity
Database integrity refers to the validity and consistency of
stored data. Integrity is usually expressed in terms of constraints, which are
consistency rules that the database is not permitted to violate. Constraints
may apply to data items within a single record or they may apply to
relationships between records.
6)
Improved
security
Database security is the protection of the database from
unauthorized users. Without suitable security measures, integration makes the
data more vulnerable than file-based systems. However, integration allows the
DBA to define, and the DBMS to enforce, database security. This may take the
form of user names and passwords to identify people authorized to use the
database. The access that an authorized user is allowed on the data may be
restricted by the operation type (retrieval, insert, update, delete).
7)
Enforcement
of standards
Again, integration allows the DBA to define and enforce the
necessary standards. These may include departmental, organizational, national,
or international standards for such things as data formats to facilitate
exchange of data between systems, naming conventions, documentation standards,
update procedures, and access rules.
8)
Economy of
scale
Combining all the organization’s operational data into one
database, and creating a set of applications that work on this one source of
data, can result in cost savings. In this case, the budget that would normally
be allocated to each department for the development and maintenance of its
file-based system can be combined, possibly resulting in a lower total cost,
leading to an economy of scale.
9)
Balance of conflicting
requirements
Each user or department has needs that may be in conflict
with the needs of other users. Since the database is under the control of the
DBA, the DBA can make decisions about the design and operational use of the
database that provide the best use of resources for the organization as a
whole. These decisions will provide optimal performance.
10)
Improved
data accessibility and responsiveness
The data is directly accessible to the end-users. Many DBMSs
provide query languages or report writers that allow users to ask ad hoc
questions and to obtain the required information almost immediately at their
terminal, without requiring a programmer to write some software
11)
Increased
productivity
Many DBMSs also provide a fourth-generation environment
consisting of tools to simplify the development of database applications. This
results in increased programmer productivity and reduced development time.
12)
Improved
maintenance through data independence
DBMS separates the data descriptions from the applications,
thereby making applications immune to changes in the data descriptions. This is
known as data independence
13)
Increased
concurrency
In some file-based systems, if two or more users are
allowed to access the same file simultaneously, it is possible that the
accesses will interfere with each other, resulting in loss of information or
even loss of integrity. Many DBMSs manage concurrent database access and ensure
such problems cannot occur.
14) Improved backup and recovery services
Many file-based systems place the responsibility on the user
to provide measures to protect the data from failures to the computer system or
application program. This may involve taking a nightly backup of the data. In
the event of a failure during the next day, the backup is restored and the work
that has taken place since this backup is lost and has to be re-entered.
Disadvantages
1.
Complexity
Database systems interface with many different technologies
and have a significant impact on a company’s resources and culture. The changes
introduced by the adoption of a database system must be properly managed to
ensure that they help advance the company’s objectives. Given the fact that
database systems hold crucial company data that are accessed from multiple
sources, security issues must be assessed constantly.
2.
Size
The complexity and breadth of functionality makes the DBMS an
extremely large piece of software, occupying many megabytes of disk space and
requiring substantial amounts of memory to run efficiently.
3.
Cost of
DBMSs
The cost of DBMSs varies significantly, depending on the
environment and functionality provided. For example, a single-user DBMS for a
personal computer may only cost US$100. However, a large mainframe multi-user
DBMS servicing hundreds of users can be extremely expensive, perhaps US$100,000
or even US$1,000,000. There is also the recurrent annual maintenance cost,
which is typically a percentage of the list price.
4.
Additional
hardware costs
The disk storage requirements for the DBMS and the database
may necessitate the purchase of additional storage space. Furthermore, to
achieve the required performance, it may be necessary to purchase a larger
machine, perhaps even a machine dedicated to running the DBMS.
5.
Cost of
conversion
the cost of the DBMS and extra hardware may be insignificant compared
with the cost of converting existing applications to run on the new DBMS and hardware.
This cost also includes the cost of training staff to use these new systems,
and possibly the employment of specialist staff to help with the conversion and
running of the system.
6.
Performance
DBMS is written to be more general, to cater for many
applications rather than just one. The effect is that some applications may not
run as fast as they used to.
7.
Higher
impact of a failure
The centralization of resources increases the vulnerability
of the system. Since all users and applications rely on the availability of the
DBMS, the failure of certain components can bring operations to a halt.
Comments
Post a Comment