Вы находитесь на странице: 1из 24

Data Resource Management

Database Management
In all information systems, data resources must be organized and structured in logical manner so that they can be accessed easily, processed efficiently, retrieved quickly and managed effectively. A character is most basic element of data that can be observed and manipulated. A field or data item consists of grouping of related characters for example, Name. It represents an attribute (characteristic or quality) of some entity (object, person, place or event). For ex. An employees salary is an attribute to describe an employee. Fields are organized in logical order for ex. last_name, first_name and so on. A record represents a collection of attributes that describe an instance of an entity. An example is a persons payroll record. Variable-length records contain a variable number of fields and field lengths. Normally, first field in a record stores unique identifier for the record and is called primary key. Student ID can be primary key as long as none shares it. If there cannot be unique identifier, designer can assign a sequential number to a record.

File and database


A group of related records is a data file (or table or flat file). Strictly speaking, a flat file should only consist of data and delimiters. Broadly, it refers to any data that exists in a single file in form of rows and columns, with no relationships between records and fields except table structure. An employee file contains records of employees of a firm. Files are classified by application (ex. Payroll or inventory), type of data (ex. Document or graphical image), or permanence (ex. Payroll master or transaction). Obsolete transaction or master file is backed up as history file. A database is integrated collection of logically related data elements. Data stored in database are independent of application programs using them and type of storage devices where stored. Databases contain data elements describing entities and relationships among entities. Electric utility database consists of entities (customers, meters, bills, payments and meter readings) and relationships (bills sent to customers, customers make payments, and customers use meters). Billing and Payment processing applications depend on this database.

Database Structures
All pictures, videos, songs, messages, chats, icons, email addresses and others are stored on popular social networking websites as fields, records, files or objects in large databases. Data are stored so that there is easy access, can be shared by respective owners and can be protected from unauthorized access or use. Database Management System (DBMS) packages are designed to use logical data structure to provide end users with quick, easy access to information stored in databases. Early mainframe DBMS used hierarchical structure, in which relationships between records form a tree like structure. There is one root record and multiple subordinate levels in one-to-many relationship. Any data element can be accessed by moving progressively down from a root and along branches of tree until desired record is located. Network structure is more complex and still used by some mainframe DBMS. It allows many-to-many relationships. For example, department records can be related to more than one employee record and employee records can be related to more than one project record.

Hierarchical, Network structures


Department Department A Department B

Project A

Project B

Employee 1

Employee 2

Employee 3

Employee 1

Employee 2

Project A

Project B

Hierarchical Structure

Network Structure

Relational model
Relational model is most widely used. All data elements are stored in simple 2D tables or relations. Tables are flat files where each row and column represent a record and field respectively. A database can specify data attributes for many files simultaneously and can relate data elements in one file to those in one or more other files. For example, a manager may retrieve an employee name and salary from employee table, as well as department name from department table. 3 basic operations can be performed on relational database to create useful data sets. Select operation may be used on employee database to create subset of records that contain all employees who have spent 2 years and make more than Rs.3 lakhs per year. Join operation can combine 2 or more tables temporarily so that user can see relevant data from all. Project operation creates a subset of columns contained in temporary tables created by select and join operations. Large mainframe relational databases include Oracle 10g from Oracle and DB2 from IBM. Popular midrange database application is SQL Server from Microsoft. Common database for PC is Microsoft Access.

Relational structure
Deptno Dname Dloc Dmgr Department Table

Dept A
Dept B Dept C Empno Emp 1 Emp 2 Emp 3 Emp 4 Emp 5 Emp 6 Ename Etitle Esalary Employee Table Deptno Dept A Dept A Dept B Dept B Dept C Dept B

Multidimensional and Object oriented models


Multidimensional model is variation of relational model. It uses cubes of data and cubes within cubes. Each side of cube is dimension of data. For example, a single cell may contain total sales for a product in a region for a specific sales channel in a month. This structure is most popular for online analytical processing applications in which fast answers to complex business queries are expected. Object-oriented model is considered key technology for multi-media web applications. An object consists of data values describing attributes of an entity, plus operations that can be performed on data. This encapsulation capability allows object-oriented model to handle complex data types (graphics, pictures, voice and text) more easily than other structures. This model also supports inheritance; new objects can be automatically created by replicating some or all of characteristics of one or more parent objects. For example, savings and loan account objects can both inherit common attributes and operations of parent bank account object. Object-oriented database management systems (OODBMS) have become popular in computer-aided design (CAD) and other applications. For example, designers can create product design objects, store and replicate/modify them to create new product designs.

Data warehouse organization


Major relational DBMS vendors have added object-oriented modules to their relational software. Examples, multimedia object extensions to DB2 and object-based cartridges for Oracle. Database pioneer Michael Stonebraker was an architect of Ingres relational database. A row-based system like Ingres is great for executing transactions, but a column-oriented system is natural fit for data warehouses, he says. SQL Server, Sybase and Teradata have rows as their central design point. Most data warehouses can run up to 50 times faster in column database. Columns cut across transactions and store an element of information for each transaction. A row may hold 20-200 different elements. Relational database can load sales data for a month into system memory and calculate average quickly. Because columns contain similar information from each transaction, its possible to derive compression scheme for data type and apply throughout column. This makes for faster storage and retrieval, and reduces amount of disk space required.

Multidimensional database
East Actual Sales Camera TV VCR Audio Margin Camera TV VCR February Budget Actual March Budget

Audio

Multidimensional database
Sales Actual TV January February March Qtr 1 VCR January February March East Budget Actual West Budget

Qtr 1

Multidimensional database
January Sales TV East West South Total VCR East West South Actual Margin Sales Budget Margin

Total

Multidimensional database
January TV East Actual Budget Forecast Variance West Actual Budget Forecast Sales VCR TV Margin VCR

Variance

Object-oriented structure
Bank Account Object
Attributes (Customer, Balance, Interest) Operations (Deposit, Withdraw, Get owner)

Loan Account Object


Attributes (Credit line, Monthly statement) Operations (Calculate interest owed, Print monthly statement) Inheritance

Savings Account Object


Attributes (Number of withdrawals, Quarterly statement) Operations (Calculate interest paid, Print quarterly statement)

Database development
DBMS like Microsoft Access or Lotus Approach allow end users to develop databases easily. Large organizations place control of enterprise database development in hands of Database Administrators (DBA) and other database specialists. This improves security and integrity of organizational databases. Database developers use Data definition language (DDL) to develop and specify data contents, relationships and structure of each database, as well as modify them. Such information is cataloged and stored in a database of data definitions and specifications called a data dictionary or metadata repository. Data dictionary contains metadata (data about data). It contains name and description of all type of data records and their relationships; requirements for end users access and use of application programs; and database maintenance and security. An active data dictionary would not allow a data entry program to use non-standard definition of a customer record, nor to enter a name of customer that exceeds defined size of data element.

Data planning and database design


Database administrators and designers work with corporate and end-user management to develop an enterprise model that defines basic business process. Entity relationship diagrams (ERD) models relationships among many entities involved in business processes. End users and database designers could use database management or business modeling software to help them develop ERD models. They answer questions such as: Can a supplier provide more than one product type to us? Data models serve as logical design frameworks (called schema and subschema). A schema is overall logical view of data relationships among data elements in a database. Subschema is logical view of data relationships needed to support specific end-user application programs. Physical database design takes internal view of data that is how data are physically stored and accessed on storage devices of computer system.

Entity Relationship Diagram


Purchase order item

Ordered on

Supplies Product Supplier

Contains

Stocked as

Purchase order

Product Stock

Holds

Warehouse

Types of databases
Operational databases store detailed data to support business process and operations of a company. They are also called subject area databases, transaction databases and production databases. For example, a human resource database would include data identifying each employee and his/her time worked, compensation, benefits, performance appraisals, training and development status. Many organizations replicate copies or parts of databases to servers at different sites. Distributed databases can reside on network servers on World Wide Web, on corporate intranets or extranets. These may be copies of operational or analytical databases, hypermedia or discussion databases, or any other. Replication improves database performance at work sites. A company with many branch operations may distribute data so that each branch operation is location of its branch database. If all data reside in a physical location, any catastrophe such as fire or damage to data media result in data loss. Ensuring consistent and concurrent data is major challenge. Replication involves using special software application that looks at each distributed database and finds changes to it. Once changes are identified, replication process makes all distributed databases look same by applying proper changes to each. This takes time and computer resources based on number and size of distributed databases.

Data warehouses

Duplication identifies one database as master and duplicates it at prescribed time after hours. External databases is available for fee from commercial online services and with(out) charge from sources on World Wide Web. Hypermedia database stores hyperlinked pages of multimedia (text, graphic and photo images, video clips, audio segments). Web browser on client PC connects to Web network server. This server runs Web server software to access and transfer Web pages you request. Web page content may be described by HTML or XML language. Data warehouse stores data extracted from multiple databases. It is a central source of data that is cleaned, transformed and cataloged so that managers and business professionals can use for data mining, online analytical processing, and other forms of business analysis, market research and decision support. Data warehouses may be subdivided into data marts, that focus on specific aspects such as a department or business process. Metadata (data that define data in warehouse) are stored in metadata repository and cataloged by directory. Unlike data in databases, data in warehouse are static. This restriction allows queries to be made on data to look for complex patterns or historical trends.

Data mining
In data mining, data in warehouse are analyzed to reveal hidden correlations, patterns and trends in historical business activity. This software uses advanced pattern of recognition algorithms, various mathematical and statistical techniques, to sift through terabytes of data. Companies use data mining to (1) perform market-basket analysis to identify new product bundles, (2) find root causes of quality or manufacturing problems, (3) prevent customer attrition and acquire new customers.

Selection
Databases

Target data

Data transformation

Data Warehouse

Data mining Business Knowledge


Interpret/ Evaluate

Patterns

File processing

Earlier each business application was designed to use one or more data files containing specific data records. File processing systems had following problems. Independent data files included a lot of duplicated data. Same data (such as customers name and address) were recorded and stored in many files. This data redundancy needed file maintenance programs to ensure each file was properly updated. Having data in independent files made it difficult to provide end users with information for ad hoc requests that required accessing data stored in many files. Special programs were written to retrieve data from each independent file. Organization of files, their physical location on storage hardware and application software used to access those files depended on one another. Changes in format and structure of data and records in a file needed program maintenance efforts. Different users and applications could define data elements such as stock number and customer address differently. Lack of standard caused inconsistency problems in data access. Integrity (accuracy and completeness) of data was suspect because there was no control over their use and maintenance by authorized end users.

Database Management System (DBMS)


Database management approach consolidates data records, formerly held in separate files, into databases that can be accessed by different application programs. It serves as software interface between users and databases, which helps users easy access to data. Users can make direct, ad hoc queries without using application programs. DBMS controls how databases are created, interrogated and maintained to provide information to end users. Examples for PC are Microsoft Access, Lotus Approach or Corel Paradox. Popular mainframe and server versions are IBMs DB2 Universal Database, Oracle 10g by Oracle Corp and mySQL, open-source DBMS. Database development involves defining and organizing content, relationships and structure of data needed to build a database. Database application development involves using DBMS to develop prototypes of queries, forms, reports and Web pages for proposed business application. Database maintenance involves using transaction processing systems and other tools to add, delete, update and correct data in database. Database interrogation means asking for information from database using query language or report generator. Structured query language (SQL) is international standard query language for DBMS packages.

Structured Query Language (SQL)


Basic form of SQL query is: SELECTFROMWHERE After SELECT, you list data fields to retrieve. After FROM, you list files or tables from which data must be retrieved. After WHERE, you specify conditions that limit search to relevant data records. Which customers had no orders last month? SELECT [Customers].[Company Name], [Customers].[Contact Name] FROM [Customers] WHERE not Exists {SELECT [Ship Name] FROM [Orders] WHERE Month {[Order Date]}=1 and Year {[Order Date]}=2004 and [Customers].[Customer ID]=[Orders].[Customer ID]} Boolean logic consists of 3 logical operators: AND, OR and NOT. Suppose you want information about cats from Internet. Felines contain dogs which you do not need. You do not want Broadway musical titled Cats. Cats OR felines AND NOT dogs OR BROADWAY Data manipulation language (DML) statements perform data handling without need for conventional programming language. Database tuning tools monitor and improve database performance.

Case: Amazon, eBay and Google


Amazon: During 2002-05, about 65,000 people and companies signed up to use Amazons free Web services. Nearly 1/3rd tinkered with software tools that helped Amazons 800,000 or so active sellers. eBay: 1,000 applications have emerged from 15,000 registered developers. Most popular applications help sellers automate process of listing items on eBay or displaying on other sites. Google: The company parcels out some of its searchresults data and recently unlocked access to its desktop and paid-search products.

Вам также может понравиться