4. Database Management Technology Data: Data are values of qualitative or quantitative variables, belonging to a set of items. Data in computing (or data processing) are represented in a structure, often tabular (represented by rows and columns), a tree (a set of nodes with parent-children relationship) or a graph structure (a set of interconnected nodes). Data are typically the results of measurements and can be visualized using graphs or images. Data as an abstract concept can be viewed as the lowest level of abstraction from which information and then knowledge are derived. Raw data, i.e., unprocessed data, refers to a collection of numbers, characters and is a relative term; data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next. Field data refers to raw data collected in an uncontrolled in situ environment. Experimental data refers to data generated within the context of a scientific investigation by observation and recording.
Information: Information, in its most restricted technical sense, is a sequence of symbols that can be interpreted as a message. Information can be recorded as signs, or transmitted as signals. Information is any kind of event that affects the state of a dynamic system that can interpret the information. Conceptually, information is the message being conveyed. Therefore, in a general sense, information is "Knowledge communicated or received concerning a particular fact or circumstance", or rather; information is an answer to a question. Information cannot be predicted and resolves uncertainty. The uncertainty of an event is measured by its probability of occurrence and is inversely proportional to that. The more uncertain an event, the more information is required to resolve uncertainty of that event. The amount of information is measured in bits.
4. Database Management Technology Page 2
Data vs Information Data Information Meaning: Data is raw, unorganized facts that need to be processed. Data can be something simple and seemingly random and useless until it isorganized. When data is processed, organized, structured or presented in a given context so as to make it useful, it is called Information. Example: Each student's test score is onepiece of data The class' average score or the school's average score is theinformation that can be concluded from the given data. Definition: Latin 'datum' meaning "that which is given". Data was the plural form of datum singular (M150 adopts the general use of data as singular. Not everyone agrees.) Information is interpreted data.
Data Hierarchy Data Hierarchy refers to the systematic organization of data, often in a hierarchical form. Data organization involves fields, records, files and so on. A data field holds a single fact or attribute of an entity. Consider a date field, e.g. "September 19, 2004". This can be treated as a single date field (e.g. birthdate), or 3 fields, namely, month, day of month and year. A record is a collection of related fields. An Employee record may contain a name field(s), address fields, birthdate field and so on.
4. Database Management Technology Page 3
A file is a collection of related records. If there are 100 employees, then each employee would have a record (e.g. called Employee Personal Details record) and the collection of 100 such records would constitute a file (in this case, called Employee Personal Details file). Files are integrated into a database. This is done using a Database Management System. If there are other facets of employee data that we wish to capture, then other files such as Employee Training History file and Employee Work History file could be created as well. An illustration of the above description is shown in this diagram below.
The following terms are for better clarity. With reference to the example in the above diagram. Data field label = Employee Name or EMP_NAME Data field value = Jeffrey Tan The above description is a view of data as understood by a user eg a person working in Human Resource Department.
4. Database Management Technology Page 4
The above structure can be seen in the hierarchical model, which is one way to organize data in a database. [3]
In terms of data storage, data fields are made of bytes and these in turn are made up of bits.
Database A database is an organized collection of data. The data are typically organized to model relevant aspects of reality in a way that supports processes requiring this information. For example, modeling the availability of rooms in hotels in a way that supports finding a hotel with vacancies. Database management systems (DBMSs) are specially designed applications that interact with the user, other applications, and the database itself to capture and analyze data. A general- purpose database management system (DBMS) is a software system designed to allow the definition, creation, querying, update, and administration of databases. Well-known DBMSs include MySQL, PostgreSQL, SQLite,Microsoft SQL Server, Microsoft Access, Oracle. A database is not generally portable across different DBMS, but different DBMSs can inter-operate by using standards such as SQL and ODBC or JDBC to allow a single application to work with more than one database.
File Organization & Access Method File Access Method The way by which information/data can be retrieved. There are two method of file access: 1. Direct Access 2. Sequential Access Direct Access This access method the information/data stored on a device can be accessed randomly and immediately irrespective to the order it was stored. The data with this access method is quicker than sequential access. This is also known as random access method. For example Hard disk, Flash Memory
4. Database Management Technology Page 5
Sequential Access This access method the information/data stored on a device is accessed in the exact order in which it was stored. Sequential access methods are seen in older storage devices such as magnetic tape. File Organization Method The process that involves how data/information is stored so file access could be as easy and quickly as possible. Three main ways of file organization: 1. Sequential 2. Index-Sequential 3. Random Sequential file organization All records are stored in some sort of order (ascending, descending, alphabetical). The order is based on a field in the record. For example a file holding the records of employeeID, date of birth and address. The employee ID is used and records stored is group accordingly (ascending/descending). Can be used with both direct and sequential access.
Index-Sequential organization The records is stores in some order but there is a second file called the index-file that indicates where exactly certain key points. It cannot be used with sequential access method.
Random file organization The records are stored randomly but each record has its own specific position on the disk (address). With this method no time could be wasted searching for a file. Instead it jumps to the exact position and access the data/information. Can only be used with direct access access method.
4. Database Management Technology Page 6
Comparison of Traditional File-Based Approach and Database Approach
At the beginning, you should understand the rationale of replacing the traditional file-based system with the database system. File-based System File-based systems were an early attempt to computerize the manual filing system. File-based system is a collection of application programs that perform services for the end-users. Each program defines and manages its data. However, five types of problem are occurred in using the file-based approach:
Separation and isolation of data When data is isolated in separate files, it is more difficult for us to access data that should be available. The application programmer is required to synchronize the processing of two or more files to ensure the correct data is extracted.
Duplication of data When employing the decentralized file-based approach, the uncontrolled duplication of data is occurred. Uncontrolled duplication of data is undesirable because:
i. Duplication is wasteful ii. Duplication can lead to loss of data integrity
Data dependence Using file-based system, the physical structure and storage of the data files and records are defined in the application program code. This characteristic is known as program-data dependence. Making changes to an existing structure are rather difficult and will lead to a modification of program. Such maintenance activities are time-consuming and subject to error.
Incompatible file formats The structures of the file are dependent on the application programming language. However file structure provided in one programming language such as direct file, indexed-sequential file
4. Database Management Technology Page 7
which is available in COBOL programming, may be different from the structure generated by other programming language such as C. The direct incompatibility makes them difficult to process jointly.
Fixed queries / proliferation of application programs File-based systems are very dependent upon the application programmer. Any required queries or reports have to be written by the application programmer. Normally, a fixed format query or report can only be entertained and no facility for ad-hoc queries if offered. File-based systems also give tremendous pressure on data processing staff, with users' complaints on programs that are inadequate or inefficient in meeting their demands. Documentation may be limited and maintenance of the system is difficult. Provision for security, integrity and recovery capability is very limited.
Database Approach In order to overcome the limitations of the file-based approach, the concept of database and the Database Management System (DMS) was emerged in 60s.
Advantages A number of advantages of applying database approach in application system are obtained including:
Control of data redundancy The database approach attempts to eliminate the redundancy by integrating the file. Although the database approach does not eliminate redundancy entirely, it controls the amount of redundancy inherent in the database.
Data consistency By eliminating or controlling redundancy, the database approach reduces the risk of inconsistencies occurring. It ensures all copies of the idea are kept consistent.
4. Database Management Technology Page 8
More information from the same amount of data With the integration of the operated data in the database approach, it may be possible to derive additional information for the same data.
Sharing of data Database belongs to the entire organization and can be shared by all authorized users.
Improved data integrity Database integrity provides the validity and consistency of stored data. Integrity is usually expressed in terms of constraints, which are consistency rules that the database is not permitted to violate.
Improved security Database approach provides a protection of the data from the unauthorized users. It may take the term of user names and passwords to identify user type and their access right in the operation including retrieval, insertion, updating and deletion.
Enforcement of standards The integration of the database enforces the necessary standards including data formats, naming conventions, documentation standards, update procedures and access rules.
Economy of scale Cost savings can be obtained by combining all organization's operational data into one database with applications to work on one source of data.
Balance of conflicting requirements By having a structural design in the database, the conflicts between users or departments can be resolved. Decisions will be based on the base use of resources for the organization as a whole rather that for an individual entity.
4. Database Management Technology Page 9
Improved data accessibility and responsiveness By having an integration in the database approach, data accessing can be crossed departmental boundaries. This feature provides more functionality and better services to the users.
Increased productivity The database approach provides all the low-level file-handling routines. The provision of these functions allows the programmer to concentrate more on the specific functionality required by the users. The fourth-generation environment provided by the database can simplify the database application development.
Improved maintenance Database approach provides a data independence. As a change of data structure in the database will be affect the application program, it simplifies database application maintenance.
Increased concurrency Database can manage concurrent data access effectively. It ensures no interference between users that would not result any loss of information nor loss of integrity.
Improved backing and recovery services Modern database management system provides facilities to minimize the amount of processing that can be lost following a failure by using the transaction approach.
Disadvantages In split of a large number of advantages can be found in the database approach, it is not without any challenge. The following disadvantages can be found including:
Complexity Database management system is an extremely complex piece of software. All parties must be familiar with its functionality and take full advantage of it. Therefore, training for the administrators, designers and users is required.
4. Database Management Technology Page 10
Size The database management system consumes a substantial amount of main memory as well as a large number amount of disk space in order to make it run efficiently.
Cost of DBMS A multi-user database management system may be very expensive. Even after the installation, there is a high recurrent annual maintenance cost on the software.
Cost of conversion When moving from a file-base system to a database system, the company is required to have additional expenses on hardware acquisition and training cost.
Performance As the database approach is to cater for many applications rather than exclusively for a particular one, some applications may not run as fast as before.
Higher impact of a failure The database approach increases the vulnerability of the system due to the centralization. As all users and applications reply on the database availability, the failure of any component can bring operations to a halt and affect the services to the customer seriously.
Entity Relationship In software engineering, an entityrelationship model (ER model) is a data model for describing a database in an abstract way. An ER model is an abstract way of describing a database. In the case of a relational database, which stores data in tables, some of the data in these tables point to data in other tables - for instance, your entry in the database could point to several entries for each of the phone numbers that are yours. The ER model would say that you are an entity, and each phone number is an entity, and the relationship between you and the phone numbers is 'has a phone number'. Diagrams created to design these entities and relationships are called entityrelationship diagrams or ER diagrams.
4. Database Management Technology Page 11
Using the three schema approach to software engineering, there are three levels of ER models that may be developed. Conceptual data model This is the highest level ER model in that it contains the least granular detail but establishes the overall scope of what is to be included within the model set. The conceptual ER model normally defines master reference data entities that are commonly used by the organization. Developing an enterprise-wide conceptual ER model is useful to support documenting the data architecture for an organization. A conceptual ER model may be used as the foundation for one or more logical data models (see below). The purpose of the conceptual ER model is then to establish structural metadata commonality for the master data entities between the set of logical ER models. The conceptual data model may be used to form commonality relationships between ER models as a basis for data model integration. Logical data model A logical ER model does not require a conceptual ER model, especially if the scope of the logical ER model includes only the development of a distinct information system. The logical ER model contains more detail than the conceptual ER model. In addition to master data entities, operational and transactional data entities are now defined. The details of each data entity are developed and the entity relationships between these data entities are established. The logical ER model is however developed independent of technology into which it will be implemented. Physical model One or more physical ER models may be developed from each logical ER model. The physical ER model is normally developed to be instantiated as a database. Therefore, each physical ER model must contain enough detail to produce a database and each physical ER model is technology dependent since each database management system is somewhat different. The physical model is normally forward engineered to instantiate the structural metadata into a database management system as relational database objects such as database tables, database indexessuch as unique key indexes, and database constraints such as a foreign key constraint or a commonality constraint. The ER model is also normally
4. Database Management Technology Page 12
used to design modifications to the relational database objects and to maintain the structural metadata of the database. The first stage of information system design uses these models during the requirements analysis to describe information needs or the type of information that is to be stored in a database. The data modelingtechnique can be used to describe any ontology (i.e. an overview and classifications of used terms and their relationships) for a certain area of interest. In the case of the design of an information system that is based on a database, the conceptual data model is, at a later stage (usually called logical design), mapped to a logical data model, such as the relational model; this in turn is mapped to a physical model during physical design. Note that sometimes, both of these phases are referred to as "physical design". It is also used in database management system.
Entityrelationship modeling The building blocks: entities, relationships, and attributes An entity may be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identified. An entity is an abstraction from the complexities of a domain. When we speak of an entity, we normally speak of some aspect of the real world which can be distinguished from other aspects of the real world. [4]
An entity may be a physical object such as a house or a car, an event such as a house sale or a car service, or a concept such as a customer transaction or order. Although the term entity is the one most commonly used, following Chen we should really distinguish between an entity and an entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most people tend to use the term entity as a synonym for this term. Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical theorem. A relationship captures how entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Examples: an owns relationship between a company and a computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, a provedrelationship between a mathematician and a theorem.
4. Database Management Technology Page 13
The model's linguistic aspect described above is utilized in the declarative database query language ERROL, which mimics natural language constructs. ERROL'ssemantics and implementation are based on reshaped relational algebra (RRA), a relational algebra which is adapted to the entityrelationship model and captures its linguistic aspect. Entities and relationships can both have attributes. Examples: an employee entity might have a Social Security Number (SSN) attribute; the proved relationship may have a date attribute. Every entity (unless it is a weak entity) must have a minimal set of uniquely identifying attributes, which is called the entity's primary key. Entityrelationship diagrams don't show single entities or single instances of relations. Rather, they show entity sets and relationship sets. Example: a particular song is an entity. The collection of all songs in a database is an entity set. The eaten relationship between a child and her lunch is a single relationship. The set of all such child-lunch relationships in a database is a relationship set. In other words, a relationship set corresponds to a relation in mathematics, while a relationship corresponds to a member of the relation. Certain cardinality constraints on relationship sets may be indicated as well.