Академический Документы
Профессиональный Документы
Культура Документы
A DBMS is a set of software programs that controls the organization, storage, management, and retrieval of data in a
database. DBMSs are categorized according to their data structures or types. The DBMS accepts requests for data from
an application program and instructs the operating system to transfer the appropriate data. The queries and responses
must be submitted and received according to a format that conforms to one or more applicable protocols. When a
DBMS is used, information systems can be changed more easily as the organization's information requirements change.
New categories of data can be added to the database without disruption to the existing system.It also manages the data
and the information related to the data
Database servers are dedicated computers that hold the actual databases and run only the DBMS and related software.
Database servers are usually multiprocessor computers, with generous memory and RAID disk arrays used for stable
storage. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in
large volume transaction processing environments. DBMSs are found at the heart of most database applications.
DBMSs may be built around a custom multitasking kernel with built-in networking support, but modern DBMSs
typically rely on a standard operating system to provide these functions.
Optimizer Operations
A SQL statement can be executed in many different ways, such as full table scans, index
scans, nested loops, and hash joins. The query optimizer determines the most efficient way to
execute a SQL statement after considering many factors related to the objects referenced and
the conditions specified in the query. This determination is an important step in the processing
of any SQL statement and can greatly affect execution time.
Note:
The optimizer might not make the same decisions from one version of Oracle Database to the
next. In recent versions, the optimizer might make different decisions, because better
information is available.
The output from the optimizer is an execution plan that describes an optimum method of
execution. The plans shows the combination of the steps Oracle Database uses to execute a
SQL statement. Each step either retrieves rows of data physically from the database or
prepares them in some way for the user issuing the statement.
For any SQL statement processed by Oracle, the optimizer performs the operations listed in
Table 11-1.
Operation Description
2
Operation Description
Statement For complex statements involving, for example, correlated subqueries or
transformation views, the optimizer might transform the original statement into an
equivalent join statement.
Choice of optimizer The optimizer determines the goal of optimization. See "Choosing an
goals Optimizer Goal".
Choice of access For each table accessed by the statement, the optimizer chooses one or
paths more of the available access paths to obtain table data. See
"Understanding Access Paths for the Query Optimizer".
Choice of join For a join statement that joins more than two tables, the optimizer
orders chooses which pair of tables is joined first, and then which table is joined
to the result, and so on. See "How the Query Optimizer Chooses Execution
Plans for Joins".
Queary Plan
A query plan (or query execution plan) is an ordered set of steps used to access or modify information in a SQL
relational database management system. This is a specific case of the relational model concept of access plans.
Since SQL is declarative, there are typically a large number of alternative ways to execute a given query, with
widely varying performance. When a query is submitted to the database, the query optimizer evaluates some of the
different, correct possible plans for executing the query and returns what it considers the best alternative. Because
query optimizers are imperfect, database users and administrators sometimes need to manually examine and tune
the plans produced by the optimizer to get better performance.
Implementation
Implementation is the carrying out, execution, or practice of a plan, a method, or any design for doing something.
As such, implementation is the action that must follow any preliminary thinking in order for something to actually
happen. In an information technology context, implementation encompasses all the processes involved in getting
new software or hardware operating properly in its environment, including installation, configuration, running,
testing, and making necessary changes. The word deployment is sometimes used to mean the same thing.
Equivalent expressions
We often want to replace a complicated expression with a simpler one that means the same thing. For example, the
expression x + 4 + 2 obviously means the same thing as x + 6, since 4 + 2 = 6. More interestingly, the expression x +
x + 4 means the same thing as 2x + 4, because 2x is x + x when you think of multiplication as repeated addition. (Which
of these is simpler depends on your point of view, but usually 2x + 4 is more convenient in Algebra.)
Two algebraic expressions are equivalent if they always lead to the same result when you evaluate them, no matter
what values you substitute for the variables. For example, if you substitute x := 3 in x + x + 4, then you get 3 + 3 + 4,
which works out to 10; and if you substitute it in 2x + 4, then you get 2(3) + 4, which also works out to 10. There's
nothing special about 3 here; the same thing would happen no matter what value we used, so x + x + 4 is equivalent to
2x + 4. (That's really what I meant when I said that they mean the same thing.)
When I say that you get the same result, this includes the possibility that the result is undefined. For example, 1/x + 1/x
is equivalent to 2/x; even when you substitute x := 0, they both come out the same (in this case, undefined). In contrast,
x2/x is not equivalent to x; they usually come out the same, but they are different when x := 0. (Then x2/x is undefined,
but x is 0.) To deal with this situation, there is a sort of trick you can play, forcing the second expression to be
undefined in certain cases. Just add the words ‘for x ≠ 0’ at the end of the expression to make a new expression; then
the new expression is undefined unless x ≠ 0. (You can put any other condition you like in place of x ≠ 0, whatever is
appropriate in a given situation.) So x2/x is equivalent to x for x ≠ 0.
To symbolise equivalent expressions, people often simply use an equals sign. For example, they might say ‘x + x + 4 =
2x + 4’. The idea is that this is a statement that is always true, no matter what x is. However, it isn't really correct to
write ‘1/x + 1/x = 2/x’ to indicate an equivalence of expressions, because this statement is not correct when x := 0. So
instead, I will use the symbol ‘≡’, which you can read ‘is equivalent to’ (instead of ‘is equal to’ for ‘=’). So I'll say, for
example,
• x + x + 4 ≡ 2x + 4,
• 1/x + 1/x ≡ 2/x, and
• x2/x ≡ x for x ≠ 0.
The textbook, however, just uses ‘=’ for everything, so you can too, if you want.
Selection Operation
1. Consider the query to find the assets and branch-names of all banks who have depositors living in Port Chester.
In relational algebra, this is
2.
3. (CUSTOMER DEPOSIT BRANCH))
4.
o This expression constructs a huge relation,
o CUSTOMER DEPOSIT BRANCH
5
It is advantageous to apply projections early. Consider this form of our example query:
4. We can eliminate several attributes from this scheme. The only ones we need to retain are those that
o appear in the result of the query or
6
8.
9.
10. Note that there is no advantage in doing an early project on a relation before it is needed for some other
operation:
o We would access every block for the relation to remove attributes.
o Then we access every block of the reduced-size relation when it is actually needed.
o We do more work in total, rather than less!
o
7
o we see that we can compute DEPOSIT BRANCH first and then join with the first part.
o However, DEPOSIT BRANCH is likely to be a large relation as it contains one tuple for every account.
o The other part,
o
o So, if we compute
o
o It has one tuple for each account held by a resident of Port Chester.
o This temporary relation is much smaller than DEPOSIT BRANCH.
5. Natural join is commutative:
6.
o Thus we could rewrite our relational algebra expression as:
o
o
o But there are no common attributes between CUSTOMER and BRANCH, so this is a Cartesian product.
o Lots of tuples!
o If a user entered this expression, we would want to use the associativity and commutativity of natural
join to transform this into the more efficient expression we have derived earlier (join with DEPOSIT first,
then with BRANCH).
The advantage to this approach is that it is quick, easy, simple to understand. Putting in a few records, however, quickly
reveals why this kind of flat-file database is now reserved for non-programmers creating simple projects such as an
inventory of their music:
Even with just a few records, you can see that a great deal of data is duplicated in every record,
This duplication can make the database very hard to maintain and subject to corruption. To prevent this, databases are
‘normalized’ – a complex subject that boils down to eliminating the duplication by factoring out common elements into
separate tables and then reaching into those tables by using unique values (foreign keys). Thus, without bogging
ourselves down in db theory, we can quickly redesign this flat database into three tables:
10
In this normalized database, each entry has only the information that is unique to the particular BlogEntry:
Title
URL
Date Created
Date Modified
Short Description
The entry also has the ID of the Blogger who wrote the entry and the ID of the Blog that the entry belongs to.
A second table holds the information about each Blog, and a third table holds the information about each Blogger.
Thus, a given Bloggers first and last name, alias (email address at Microsoft.com) and phone number are entered only
ONCE for each blogger, and referenced in each entry by ID (known as a foreign key).
The diagram also shows that the two foreign-key relationships are named
FK_Blog_Entries_Blogs
11
FK_BlogEntries_Bloggers
These simple names indicate that there is a foreign key relationship from BlogEntries to Blogs and another from
BlogEntries to Bloggers. The data is now much cleaner:
Notice that the BlogEntries table now does not duplicate the first and last name and phone number for each Blogger,
but rather just refers to that Blogger’s ID. For example, entry #3
was written by Blogger 4 (see ID in the middle image) and was placed in Blog 3 (see ID in the top image)
Creating A Join Query (T-SQL)
You can easily write a query that recreates the complete set of information by “joining” the tables:
12
It then looks at the Blog table and joins the Blog value to the ID in THAT table:
Giving us the ability to create a virtual flat table without the duplicated values.
Novice Previous: Forms
Advanced Previous: Two Levels
Q.5 Describe the Structural Semantic Data Model (SSM) with relevant examples.
ANS.
Modelling Complex and Multimedia Data
13
Data modelling addresses a need in information system analysis and design to develop a model of the information
requirements as well as a set of viable database structure proposals. The data modelling process consists of:
A fundamental tool used in this process is the data model, which is used both for specification of the information
requirements at the user level and for specification of the data structure for the database. During implementation of a
database, the data model guides construction of the schema or data catalog which contains the metadata that describe
the DB structure and data semantics that are used to support database implementation and data retrieval.
Data modelling, using a specific data model type, and as a unique activity during information system design, is
commonly attributed to Charles Bachman (1969) who presented the Data Structure Diagram as one of the first, widely
used data models for network database design. Several alternative data model types were proposed shortly thereafter,
the best known of which are the:
The relational model was quickly criticized for being 'flat' in the sense that all information is represented as a set of
tables with atomic cell values. The definition of well-formed relational models requires that complex attribute types
(hierarchic, composite, multi-valued, and derived) be converted to atomic attributes and that relations be normalized.
Inter-entity (inter-relation) relationships are difficult to visualize in the resulting set of relations, making control of the
completeness and correctness of the model difficult. The relational model maps easily to the physical characteristics of
electronic storage media, and as such, is a good tool for design of the physical database.
The entity-relationship approach to modelling, proposed by Chen (1976), had two primary objectives: first to visualize
inter-entity relationships and second to separate the DB design process into two phases:
14
1. Record, in an ER model, the entities and inter-entity relationships required "by the enterprise", i.e. by the
owner/user of the information system or application. This phase and its resulting model should be independent
of the DBMS tool that is to be used for realizing the DB.
2. Translate the ER model to the data model supported by the DBMS to be used for implementation.
This two-phase design supports modification at the physical level without requiring changes to the enterprise or user
view of the DB content.
Also Chen's ER model quickly came under criticism, particularly for its lack of ability to model classification
structures. In 1977, (Smith & Smith) presented a method for modelling generalization and aggregation hierarchies that
underlie the many extended/enhanced entity-relationship, EER, model types proposed and in use today.
Q. 6 . Explain:
A) Data Dredging B) Data Mining Techniques
Data Dredging
definition -
Data dredging, sometimes referred to as "data fishing" is a data mining practice in which large volumes of data are
analyzed seeking any possible relationships between data. The traditional scientific method, in contrast, begins with a
hypothesis and follows with an examination of the data. Sometimes conducted for unethical purposes, data dredging
often circumvents traditional data mining techniques and may lead to premature conclusions. Data dredging is
sometimes described as "seeking more information from a data set than it actually contains."
15
Data dredging sometimes results in relationships between variables announced as significant when, in fact, the data
require more study before such an association can legitimately be determined. Many variables may be related through
chance alone; others may be related through some unknown factor. To make a valid assessment of the relationship
between any two variables, further study is required in which isolated variables are contrasted with a control group.
Data dredging is sometimes used to present an unexamined concurrence of variables as if they led to a valid conclusion,
prior to any such study.
Although data dredging is often used improperly, it can be a useful means of finding surprising relationships that might
not otherwise have been discovered. However, because the concurrence of variables does not constitute information
about their relationship (which could, after all, be merely coincidental), further analysis is required to yield any useful
conclusions.
Data mining is sorting through data to identify patterns and establish relationships.
• Association - looking for patterns where one event is connected to another event
• Sequence or path analysis - looking for patterns where one event leads to another later event
• Classification - looking for new patterns (May result in a change in the way the data is organized but that's ok)
• Clustering - finding and visually documenting groups of facts not previously known
• Forecasting - discovering patterns in data that can lead to reasonable predictions about the future (This area of
data mining is known as predictive analytics.)
Data mining techniques are used in a many research areas, including mathematics, cybernetics, genetics and marketing.
Web mining, a type of data mining used in customer relationship management (CRM), takes advantage of the huge
amount of information gathered by a Web site to look for patterns in user behavior.
16
17