Академический Документы
Профессиональный Документы
Культура Документы
Users
Course # 36838
Version 14.0.0
Student Guide
Trademarks
The following names are registered names or trademarks and are used throughout this manual.
The product or products described in this book are licensed products of Teradata Corporation or
its affiliates.
Other product and company names mentioned herein may be the trademarks of their respective
owners.
The materials included in this book are a licensed product of Teradata Corporation.
Module 9 - Subqueries
Subquery Introduction ................................................................................................................. 9-4
Basic Subquery Concepts ............................................................................................................ 9-6
Relating Concepts and Subqueries .............................................................................................. 9-8
Adding Conditions ..................................................................................................................... 9-10
Nesting Subqueries .................................................................................................................... 9-12
Multiple Column Matching ........................................................................................................ 9-14
NULL and NOT IN Subquery.................................................................................................... 9-16
Module 9: Summary................................................................................................................... 9-18
Module 9: Review Questions ..................................................................................................... 9-20
Module 9: Lab Exercise ............................................................................................................. 9-22
Module 13 - Aggregation
The Aggregate Functions........................................................................................................... 13-4
Aggregate Functionality ............................................................................................................ 13-6
COUNT(*) ................................................................................................................................. 13-8
Getting Department Sums........................................................................................................ 13-10
Aggregating Groups................................................................................................................. 13-12
Using GROUP BY ................................................................................................................... 13-14
The HAVING Clause .............................................................................................................. 13-16
WHERE Clause Explain .......................................................................................................... 13-18
HAVING on Non-Aggregates ................................................................................................. 13-20
Aggregation and Joins ............................................................................................................. 13-22
Correlated Subqueries and Aggregation .................................................................................. 13-24
A Complex Example................................................................................................................ 13-26
COUNT DISTINCT ................................................................................................................ 13-28
Module 13: Summary .............................................................................................................. 13-30
Module 13: Review Questions................................................................................................. 13-32
Module 13: Lab Exercise ......................................................................................................... 13-34
Module 14 - CASE
CASE ......................................................................................................................................... 14-4
Valued Form (Projection List) ................................................................................................... 14-6
Valued Form and Null ............................................................................................................... 14-8
Searched Form ......................................................................................................................... 14-10
Searched Form (Complex Example) ....................................................................................... 14-12
CASE and Aggregation ........................................................................................................... 14-14
NULLIF Function .................................................................................................................... 14-16
NULLIF for Division ............................................................................................................... 14-18
COALESCE Function .............................................................................................................. 14-20
COALESCE and Multiple Arguments ..................................................................................... 14-22
NULLIF and COALESCE Aggregation Quiz ......................................................................... 14-24
Module 14: Summary............................................................................................................... 14-26
Module 14: Review Questions ................................................................................................. 14-28
Module 14: Lab Exercise ......................................................................................................... 14-30
Module 20 - RANK
Ranking Values .......................................................................................................................... 20-4
QUALIFY With no Tied Values ................................................................................................ 20-6
QUALIFY With Tied Ending Values ........................................................................................ 20-8
Qualifying Without Rank Projection ....................................................................................... 20-10
Bottom Values by ASC Rank .................................................................................................. 20-12
Bottom Values by DESC Rank ................................................................................................ 20-14
RANK and PARTITION.......................................................................................................... 20-16
ROW_NUMBER ..................................................................................................................... 20-18
ROW_NUMBER vs. RANK ................................................................................................... 20-20
ROW_NUMBER and PARTITION ........................................................................................ 20-22
ROW_NUMBER and RESET WHEN .................................................................................... 20-24
Finding Median Values ............................................................................................................ 20-26
Module 20: Summary............................................................................................................... 20-28
Module 20: Review Questions ................................................................................................. 20-30
Module 20: Lab Exercise ......................................................................................................... 20-32
Module 23 - Views
What is a View? ......................................................................................................................... 23-4
Creating and Using Views ......................................................................................................... 23-6
Replacing a View via SQL Assistant......................................................................................... 23-8
Using Views to Rename Columns ........................................................................................... 23-10
Join View ................................................................................................................................. 23-12
Joining Views .......................................................................................................................... 23-14
Using View to Format for SQL Assistant................................................................................ 23-16
Views with Aggregates ............................................................................................................ 23-18
Aggregates and HAVING........................................................................................................ 23-20
Views and TOP N .................................................................................................................... 23-22
Restrictions on Views .............................................................................................................. 23-24
Advantages and Suggestions ................................................................................................... 23-26
Module 23: Review Questions................................................................................................. 23-28
Module 23: Lab Exercise ......................................................................................................... 23-30
Module 24 - Derived Tables and Volatile Tables
Temporary Table Choices .......................................................................................................... 24-4
Another Derived Table Syntax Form ......................................................................................... 24-6
Volatile Table Syntax................................................................................................................. 24-8
Volatile Table Restrictions....................................................................................................... 24-10
HELP and SHOW (Volatile) TABLE ...................................................................................... 24-12
ON COMMIT DELETE ROWS (Implicit Transactions) ........................................................ 24-14
ON COMMIT PRESERVE ROWS (Implicit Transactions) ................................................... 24-16
ON COMMIT DELETE ROWS (Explicit Transactions) ........................................................ 24-18
ON COMMIT PRESERVE ROWS (Explicit Transactions) ................................................... 24-20
Limitations ............................................................................................................................... 24-22
Using INSERT-SELECT ......................................................................................................... 24-24
Inserting a Single Row ............................................................................................................. 24-26
UPDATE .................................................................................................................................. 24-28
Updating with Joins ................................................................................................................. 24-30
DELETE................................................................................................................................... 24-32
Deleting with Joins................................................................................................................... 24-34
Module 24: Summary............................................................................................................... 24-36
Module 24: Review Questions ................................................................................................. 24-38
Module 24: Lab Exercise ......................................................................................................... 24-40
Database Concepts
A physical model, on the other hand, is a way of defining how these logical representations
might be represented to a “Relational Database Management System” (RDBMS), or, more
simply, a “Database”. In a physical model:
• Entities may be represented as tables. (e.g. an entity name may become a table name)
• Attributes may be represented as columns. (e.g. the attribute “employee number” may
become “Employee_Number”)
• Domains are represented as data types. (e.g. employee number may be stored as an
INTEGER)
• An actual employee can be represented as a row in the table.
• Relational Databases are founded on Set Theory and based on the Logical Model.
• A Relational Database consists of a collection of logically related tables.
• A table is a two dimensional representation of data consisting of columns and
rows.
Table EMPLOYEE Column
MANAGER
EMPLOYEE EMPLOYEE DEPARTMENT JOB LAST FIRST HIRE BIRTH SALARY
NUMBER NUMBER NUMBER CODE NAME NAME DATE DATE AMOUNT
EMPLOYEE
SQL joins are based
MANAGER
upon relationships EMPLOYEE EMPLOYEE DEPARTMENT JOB LAST FIRST HIRE BIRTH SALARY
among tables. NUMBER NUMBER NUMBER CODE NAME NAME DATE DATE AMOUNT
PK FK FK FK
1006 1019 301 312101 Stein John 861015 631015 3945000
These physical 1008 1019 301 312102 Kanieski Carol 870201 680517 3925000
1005 0801 403 431100 Ryan Loretta 861015 650910 4120000
relationships may or 1004 1003 401 412101 Johnson Darlene 861015 560423 4630000
may not be based upon 1007 432101 Villegas Arnando 870102 470131 5970000
1003 0801 401 411100 Trader James 860731 570619 4785000
PK and FK definitions.
These objects must be defined to exists somewhere on the database, so they are created “inside,”
or said to be “owned” by a database. Since these many of these objects require space, the
“repository” called a “database” is defined to have an amount of space at create time. Space is a
concept deferred for a different time and place.
EMPLOYEE
MANAGER
EMPLOYEE EMPLOYEE DEPARTMENT JOB LAST FIRST HIRE BIRTH SALARY
NUMBER NUMBER NUMBER CODE NAME NAME DATE DATE AMOUNT
PK FK FK FK
1006 1019 301 312101 Stein John 861015 631015 3945000
1008 1019 301 312102 Kanieski Carol 870201 680517 3925000
1005 0801 403 431100 Ryan Loretta 861015 650910 4120000
1004 1003 401 412101 Johnson Darlene 861015 560423 4630000
1007 432101 Villegas Arnando 870102 470131 5970000
1003 0801 401 411100 Trader James 860731 570619 4785000
EMPLOYEE (View)
Without a database MANAGER
qualifier, Teradata relies EMPLOYEE EMPLOYEE DEPARTMENT JOB LAST FIRST HIRE BIRTH SALARY
NUMBER NUMBER NUMBER CODE NAME NAME DATE DATE AMOUNT
on a default setting.
1006 1019 301 312101 Stein John 861015 631015 3945000
• Physical databases contain accessible objects that are based upon the
logical model.
True or False:
EMPLOYEE
MGR
EMP EMP DEPT JOB LAST FIRST HIRE BIRTH SAL
NUM NUM NUM CODE NAME NAME DATE DATE AMT
PK FK FK FK
1006 1019 301 312101 Stein John 761015 531015 2945000
1008 1019 301 312102 Kanieski Carol 770201 580517 2925000
1005 0801 403 431100 Ryan Loretta 761015 550910 3120000
1004 1003 401 412101 Johnson Darlene 761015 460423 3630000
1007 1005 403 432101 Villegas Arnando 770102 370131 4970000
1003 0801 401 411100 Trader James 760731 470619 3785000
The Teradata Database offers 2 different transaction protocols for you to submit SQL.
Understanding transaction protocols can be a very important issue and is discussed more fully in
the class for “Application Design and Development”. This is not appropriate material for an
introductory course on SQL.
Note: In SQL Assistant, the double-quotes do not appear in the Explorer Tree.
These are the rules used for naming objects in the Teradata database.
• Account_Table
• Financials_2001_DB
• Sales_$_Column
• #_of_Years_Column
Teradata SQL is an ANSI compliant product. Teradata has its own extensions to the language,
as do most vendors. Teradata SQL is fully certified at the SQL92 Entry level, with some
intermediate, some full and some SQL-99 Core features also implemented.
• Is ODBC-compliant.
(Can be used to access any ODBC-compliant Database.)
• Saves information about previous query result sets.
• Permits retrieval of previously used queries.
• May be used to import or export data.
(Not covered by this course.)
• Allows many options for tuning how it works.
Prior to logging on to the database, a “Data Source” must have been defined. One method for
setting up a data source is the following, although many other ways exist.
The example on the facing page assumes that a data source has already been defined. In this
case, all that is required is to double-click the data source, or click it and choose “Ok”, t hen type
in a password.
1. Explorer Tree
A GUI into Teradata’s Data Dictionary. This window does not appear until after a
successful logon to Teradata, and only appears if it was open when the previous session
left it open during logoff.
This window can be opened through either a toolbar button, or from the “View” pull-
down menu.
2. Query Window:
The window from which one submits SQL requests to Teradata. By right-clicking this
window, one can perform may options such as changing the type and size of fonts or
provide a SQL template for certain SQL operations.
3. History Window:
Provides a history of previous SQL requests sent to Teradata. By clicking on a request
text stored in history, one can retrieve that request back into the Query Window for re-
submission.
The number of these that can be kept is determined under the “TOOLS OPTIONS
History” menu.
One may also find it handy to double-click the “Notes” box and enter a note for personal
reference.
4. Results Window
Where results are returned. There are a great many ways one may tailor the viewing of
results. Many can be accomplished though SQL, and many more through the SQL
Assistant itself.
For detailed information on SQL Assistant, refer to the user’s guide for
“Teradata SQL Assistant for Microsoft Windows” (B035-2430-067A)
SQL Assistant is a
Window’s based GUI to
Teradata’s data dictionary.
- Right-clicking on areas
- Double-clicking on fields
- Clicking on drop-downs Explorer
- Click and drag from Tree
explorer tree to query Window Results Window
window.
On the following page we begin by showing how one can use either the tool’s Explorer Tree or
SQL to retrieve names (table names, view names, etc) for the objects residing in a database or
user. Recall that both Databases and Users are repositories for tables and other objects, only that
a user is a logon. To better understand this difference, compare the minimum syntax needed to
create each:
To create a database:
CREATE DATABASE abc AS PERM = 1000000;
To create a user:
CREATE USER abc AS PERM = 1000000, PASSWORD = lucky;
Note that a password is required for a user and not allowed for a database. Also note that each
may be assigned an amount of “permanent” space for the objects they may own.
While the Explorer Tree provides names of objects, the names are by category as opposed to
alphabetical order regardless of category as seen by using the HELP DATABASE command.
Note that, in the HELP command, DATABASE and USER arte interchangeable. Both return the
same information. Since it is SQL, the result appears in the Results Window.
Explorer Tree
Information on
Employee_Sales.
The Explorer Tree requires only pointing and clicking – as is expected from a GUI tool. Also,
more information can be viewed by either moving the cursor arrow onto a column, or by
dragging the edge of the Explorer Tree window to the right.
The “HELP TABLE” command returns information about a database or user object.
This query gives Teradata the “databasename.tablename” to identify the object.
Explorer Tree
information on table SQL Request
Employee.
Result of HELP
TABLE command on
table Employee in
the query window.
In some examples the database name is specified into the SQL request – in other cases it is not.
The reason for this has to with something called your “default” database. Your default is shown
in SQL Assistant along a banner near the top portion of the tool. If the object you are
referencing exists in this database, there is no need to reference (qualify) it in your SQL. If it
does not exist in this database, then it must be referenced in the SQL request.
Default databases are discussed on the next slide after this one.
This allows them the option of not having to continually specify the same
(long) database name in their SQL queries.
Users have the ability to change this default database value via SQL.
To avoid always having to qualify objects you can set a single default database that will allow
you to refrain from having to qualify that database name for objects referenced or owned by that
database. You may have only one default set at any time, and this default may be set using SQL
at any time during your session. Once invoked, this new default remains until it is changed
again, or until you log off. Each time you log on to the Teradata your default database will be
the same. This default need not be your own user name. It can be set to something other than
your user when your user was created.
The default database for this user will be their own user. (“username” is the default) It could
have been set to another database if this syntax were used to create “abc”. This would be your
default database if your user was created from our standardized script. Or it may have been
changed by your DBA.
Default database
is DLM.
Changes the After issuing the database default request shown, which
default database to database will be checked for the following?
Employee_Sales. HELP TABLE Employee;
Clicking on “SQL
Statement” in History
will recall it to the Query
Window later.
When typing in a default database from the tool, it can either be manually typed with each logon,
or be more permanently (but not completely permanently) implemented by typing it in to the
TOOLS DEFINE DATA SOURCE path to add it so that you don’t need to keep typing it in
for each logon.
TOOLS
DEFINE ODBC DATA SOURCE
SYSTEM DSN
CONFIGURE
Note how the result of the SHOW command appears in the result window where-as the result of
the “right-click” “show definition” method would display the result into the Query Window
(not shown). Except for the location of the result, they display the same information. The nice
thing about the Explorer Tree method is that one can easily make changes and create another,
different, table from an existing table without having to copy from the Results Window to the
Query Window.
The definition shown includes any syntax acquired during the initial create by default and not by
explicitly typing the syntax itself. For example, what were likely not included during the initial
creation of the table were references to the following.
• Set (do not store duplicate rows in this table, Vs multiset – allow duplicate rows)
• (Permanent) Journaling references
• Fallback
• Character Set Latin
• Not Case Specific
• Checksum
There are default options for these settings and were likely obtained by these defaults.
Right-Click
Show Definition
For a complete list of objects referenced by the SHOW command you can issue the
following:
One method for retrieving information about your current session is to reference the
session variables below. (These are often referred to as “Built-in Functions”)
For a complete list of values, feel free to issue this command during a lab. Much of the
information displayed from using this statement is described in other courses such as Advanced
SQL, Physical Implementation, and Application Design and Development.
Another method for listing session information is to issue the request shown below.
Note that the scroll bar indicates more information is available for viewing.
Parameterized Queries
Queries may contain Named Parameters, which makes it easy to reuse a query because all that
changes are the data values (for example, in a Where clause). Named Parameters function like
variables. Enter the value for a named parameter once. If it is used in multiple places within the
query that same value will be used in all places.
Note: The values entered for named parameters will be saved to the Notes column of History
for future reference. Named Parameters are indicated by a “?” immediately followed by a name.
The name can consist of alphanumeric characters plus the “_” symbol. When an parameterized
query is executed, a prompt appears for each parameter before the query is submitted.
For example, if the following query is submitted, a prompt appears to enter a value for
NameStart:
TOOLS
OPTIONS
QUERY
Note: If you want to retain special formatting when pasting information from other applications,
use Shift+Insert instead of Ctrl+V or the Paste tool button. If you use Ctrl+V to paste text from
applications such as Microsoft Office, the special formatting from those applications will be
changed as follows:
• Formatted text from Microsoft Excel will be pasted as unformatted data
• Where text is preceded by bullets from programs such as Microsoft Word, each
paragraph will be pasted prefixed with a period in place of the bullet
• Numbered paragraphs will be pasted prefixed with the number
Note: Some keywords will cause a line break and possibly cause the new line to be indented. If
a keyword is found to already be the first word on a line and it is already prefixed by a tab
character, then its indentation level will not change.
Indentation
When you press the Enter key, the new line will automatically indent to the same level as the line
above. If you highlight one or more lines in the query and press the Tab key, those lines are
indented one level. If you press Shift-Tab, the highlighted lines are unindented by one level.
This indentation of lines will only apply if the selected text includes a line feed character. For
example, you must either select at least part of two lines, or if selecting only one line, then the
cursor must be at the beginning of the next line. (Note that this is always the case when you use
the margin to select a line.) If no line end is included in the selected text, or no text is selected,
then a tab character will simply be inserted.
TOOLS
OPTIONS
CODE EDITOR
TOOLS
OPTIONS
ANSWERSET
To show the history filter options: right-click on a query in the history window and choose filter
Note: The operator box accepts only applicable operators for the filter function.
To filter the History table
1. Select the History window.
2. Right-click in the History window and select Filter.
3. Set the history filter as needed.
Data Source Filter by data source name. Enter a data source name, optionally containing
wildcard characters. Check Use current Data Source to filter by the current data source only.
Note: The Use current Data Source filter option is used only when the Allow connection to
multiple data sources option is not checked.
User Name Shows only those rows for a specific User Name.
Statement Type Shows only those rows in which the query contains the specified statement
type. For example, Select or Create Table.
Statement Count Show only those rows in which the query contains this many statements (Use
operator <, > or =).
Row Count Shows only those rows in which the query effected this many rows (Use operator <,
> or =).
TOOLS
OPTIONS
HISTORY
(continued)
Elapse Time Shows only those rows in which the elapsed time matches the time entered (Use
operator <, > or =, and specify the time as hh:mm:ss).
Show successful queries only Check this box to filter for successful queries only. Queries with
errors are ignored.
01 Click-Drag
02 Click-Ctrl-Drag
03 Ctrl-Ctrl-Ctrl-Drag
04 Click-Shift-Click-Drag
The Employee_Sales database contains all tables referenced by the lab exercises.
Database containing
tables used in the labs.
The Emp_Views database contains all views referenced by the lab exercises
Database containing
views used in the labs.
True or False:
1. Logon to SQL Assistant and review the option in the “TOOLS” pull down menu.
Next, see if you can successfully perform the “List Columns” option. Choose a
database and a table name based on what you have learned about the lab
environment.
Note the options for “Explorer Tree” and “History”. Click on each of these and
note the indentations to show whether they are turned on or not.
3. Try each of the following in the order shown, and note if it fails by looking at the
bottom-left portion of the utility screen. For those that fail, “double-click” the
“Notes” field for the failed request in the “History Window”.
HELP DATABASE;
HELP DATABASE yourusername;
DATABASE yourusername;
SHOW TABLE Employee;
DATABASE Employee_Sales;
SHOW TABLE Employee;
SHOW TABLE Emp_VIEWS.Employee;
SHOW VIEW Emp_VIEWS.Employee;
4. Practice dragging and dropping columns from the “Explorer Tree” window onto
the “Query Window”.
The database is a server that parses and processes requests from a client or host. In the scheme
of things, requests are made up of one or more statements. A statement is a single SQL construct
such as a SELECT. So multiple DML statements can be bundled into a single request. A
“request” is also often referred to as a “query” or as a “statement”, however, referring to a
request as a statement is correct if the request contains only a single statement.
In the previous module we were introduced to various SQL commands. Those were HELP;
SHOW; and SELECT (for session variables). In this module we shall look at the more common
uses for SELECT.
DCL is SQL that deals with controlling privileges and is taught in the Teradata Administration
class.
DDL refers to SQL that affects objects. Such SQL causes “write” locks to the Teradata
dictionary and, as such, have a potential for blocking queries from parsing. DDL should
“normally” be performed during off hours.
It is important to note that, throughout this text, ALL RESULT SETS ARE GENERATED BY
USING BTEQ AND NOT SQL ASSISTANT!
The facing page discusses how the alignment of the columns can tell you whether the data for
that column is character (left justified) or numeric (right justified).
With the top query it is not clear in which database the table resides. Only the author knows the
answer to this question. Although the bottom query is clear about the databases involved, it is
understandable to all of us which of these takes more of an effort to type.
The order of the result seems arbitrary. This can, and will, be rectified on a later page.
If copied and pasted correctly, the number of title dashes shown should match with the definition
of the column in the table. (i.e. a 30 character width)
To obtain a list of all valid department names, two possible methods are:
SELECT Employee_Sales.Department.Department_Name
FROM Employee_Sales.Department;
Read how results are formatted for this class on the left-hand page!
Numeric data is right justified and character data is left justified and, unless the column name is
longer than the data type (as is the case for some of these columns), you can count the number of
title dashes to determine the column width. For instance, an integer is a whole number that is
plus-or-minus 2 billion and change, so, including the sign, it takes 11 characters to display its
values. (10 digits plus the sign equals 11 characters)
Nulls are show, but are not discussed until the next module. So hold on to that thought!
This is a “shortcut” for displaying all column values for all rows in the department table.
Numeric column values are right justified in the field space, along with their column
headings, to signify numeric data.
Character column values are left justified in the field space, along with their column
headings, to signify character data.
The two values showing a “?” both represent NULLs (“?” is an invalid numeric value).
Note that, as a new name, it becomes the new heading by default. Aliases are names we can
reference elsewhere in the same query. By stating that they are optional we mean that the
following would work as well.
Another example of using double-quotes is shown. Notice how it appears as a heading for the
result set.
You can rename, or “alias”, a projected column name using the optional “AS”
keyword. An alias is the assignment of a new name. It is a renaming the column
for the current query only.
Show all column values for all rows of the department table renaming the columns
names to something shorter.
SELECT department_number AS "Dept Nbr"
,department_name AS DeptName As new names, aliases now
become the names for the
,budget_amount AS Budget
column headings.
,manager_employee_number AS Mgr#
FROM Employee_Sales.department;
Dept Nbr DeptName Budget Mgr#
-------- ------------------------------ ------------ -----------
403 education 932000.00 1005
Note the 600 None ? 1099
Heading. 402 software support 308000.00 1011
201 technical operations 293800.00 1025
100 president 400000.00 801
302 product planning 226000.00 1016
301 research and development 465600.00 1019
501 marketing sales 308000.00 1017
401 customer support 982300.00 1003
As far as commas are concerned, there need not be a space between a comma and a column
name. For instance, the following, with no spaces around the commas, would be perfectly
acceptable.
SELECT department_number,department_name,budget_amount,
manager_employee_number
FROM Employee_Sales.department;
Based on our discussion from the previous page, can you determine what is
happening with this query and its result?
SELECT department_number,
department_name
budget_amount,
manager_employee_number
FROM Employee_Sales.department;
Note how it was acceptable for use to reference the alias name in the order clause.
Show all column values for all rows in department and ordered by department name.
• ORDER BY 3, 1;
Ordering by column position. Ordering by column 1 (department number, implied
ascending) within column 3 (manager number, implied ascending)
• ORDER BY 3 DESC, 1;
A combination of ordering that will order according by column position. Order by
column 1 (manager, implied ascending) within column 3 (department, explicit
descending).
• ORDER BY department_name;
Although not projected, the order will order by department name (implied ascending).
Ordering by a column (or columns) not projected may be useful under certain
circumstances.
There are many different techniques that may be used for ordering result rows.
Discuss what each option shown is attempting to do and if it is valid or not.
Given that there are nine columns in Department, what about these two?
Valid SELECT * FROM department ORDER BY 2;
Invalid SELECT * FROM department ORDER BY 10;
Notice that characters literals must be enclosed in single quotes so as not to be confused (by the
database) as an object reference. Numeric values are not enclosed with single quotes because –
well – because they are numeric.
With SQL you can project literal values as well as column values.
In the examples on the facing page, the top one shows how to apply a condition to a numeric
column (using no single quotes), while the bottom example shows how to apply a condition on a
character column (single quotes required). If qualifying on a literal character column without
single quotes, then the database considers the literal to be an object name.
It is a one column table that stores many of the ASCII character values. It is a “MULTISET”
table to show that, as a single column table, the upper and lower case letters for the same value
are actually the same value and could not be stored as a SET table due to causing a duplicate row
violation. Tables created in Teradata mode are, by default, SET tables and not MULTISET as
shown here. Note how column “c1” is defined as NOT CASESPECIFIC to show no case
sensitivity.
It is a one column table that stores many of the ASCII character values. It is a “SET” table to
show that, as a single column table, the upper and lower case letters for the same value are
actually different rows and not duplicates. Tables created in ANSI mode are, by default,
MULTISET tables, and not SET tables as shown here. Note how column “c1” is defined as
CASESPECIFIC to show case sensitivity.
Teradata
ANSI Standard
Extension
Equal = EQ
Less Than
<= LE
Equal To
The facing page discusses aggregate processing to obtain a distinct list. This is a stratgey that
will be duscussed later.
The query discussed will fail because there should be a space between the WHERE clause and
the table named “department_number”.
SELECT last_name
,first_name
,hire_date
,salary_amount
FROM employee
WHERE department_number = 401
ORDER BY last_name;
select last_name,first_name,hire_date
salary_amount from employee
wheredepartment_number = 401 order by last_name;
True or False:
1. Select all columns for all departments from the department table.
2. Request a report of employee last and first names and salary for all of
manager 1019's employees. Order the report in last name ascending
sequence.
3. Project a distinct list of job codes which have been assigned to people
and are greater than 510000 and sort the result descending.
4. What are the first names of people with a last name of “Brown”?
5. How many people have been assigned job codes greater than or equal
to 510001?
(since aggregation has not been taught yet you will have to manually count
them? Or can SQL Assistant tell you?)
Logical Operators
The concept of NULL is one of the most important in all of SQL, and, although it can get tricky
(and difficult) to deal with, a good SQL would be remiss if it didn’t go into depth on NULL and
continue with it throughout the class. It will definitely add a level of complexity to your SQL
education.
The comparison operators that we will be studying in this module are shown below.
The symbols are ANSI standard.
The abbreviations are Teradata extensions.
Comparison Operators
(alternative abbreviations included where applicable):
= (EQ) Equal
< > (NE) Not equal
> (GT) Greater than
< (LT) Less than
> = (GE) Greater than or equal to
< = (LE) Less than or equal to
BETWEEN. . . AND Inclusive range
[NOT] IN Test against predefined set
IS [NOT] NULL Test for nulls
As you will soon find out, when linking constraints together with AND, all conditions must
evaluate true or no column values for the row involved will be projected. In other words, for our
example at the bottom of the facing page, we will only project rows if the last name for the
employee equals ‘Brown’ and their first name equals ‘Alan’ (no case sensitivity).
Employee Table
Contrast the AND with the OR, below: Last First Dept#
Name Name
WHERE
Last_Name = ‘brown’ AND Dept# = 401
OR
First_Name = ‘Mary’
The result shows that “Mary Smith” doesn’t satisfy either of the AND’ed conditions, but it does
satisfy the (only) condition that the first name is “Mary”. While “Alan Brown satisfies the (only)
condition that his last name is “Brown” and he works in department 401.
We will continue this thread and add more examples to show you how complexity can be added
to a query making it more and more complex. But first we take small steps.
Retrieve employees with last name “Brown” and Work in department 401, OR first
name “Mary”. (“AND” is evaluated first)
SELECT *
last_name first_name dept#
FROM Employee
---------- ---------- -----------
WHERE Last_Name = 'brown' Smith Mary 900
AND Dept# = 401 Brown Alan 401
OR First_Name = 'mary';
If we re-arrange the parentheses to that shown for the query at the bottom of the page we see that
the query is now that for an entirely different business question. We still have only 2 conditions,
but now they are linked with AND:
WHERE
Last_Name = “brown”
AND
Dept# = 401 OR First_Name = “mary”
Notice that Mary Smith no longer qualifies since she isn’t in department 401.
Alan Brown still qualifies because his last name is “Brown” (first condition) AND he works in
department 401 (one of the OR’ed condition satisfies the second half of this AND’ed condition).
The use of the parentheses in this query illustrate the order of evaluation of “AND”
and “OR” for the query on the previous page.
Employee Table
Retrieve employees with SELECT * Last First Dept#
last name “Brown” and FROM Employee Name Name
work in department 401, WHERE (Last_Name = 'brown'
or first name is “Mary”. AND Dept# = 401) Brown Alan 401
OR (First_Name = 'mary');
Brown Allen 801
last_name first_name dept# Brown Cary 567
---------- ---------- -----------
Smith Mary 900 Smith Mary 900
Brown Alan 401
Jones Jimmy 401
SELECT *
FROM Employee last_name first_name dept#
WHERE (Last_Name = 'brown') ---------- ---------- -----------
AND (Dept# = 401 Brown Alan 401
OR First_Name = 'mary');
In our example we see how the database rewrites the query to an equivalent one where each
condition is equality based and linked with OR. To get the optimizer plan you simple prefix the
query with the keyword “EXPLAIN”.
EXPLAIN
SELECT *
FROM Employee
WHERE First_Name IN (‘alan’, ‘allen’);
Another way to do this through SQL assistant is to press the F6 key. Just makes sure to highlight
the query if it is among many so as not to get an EXPLAIN for the whole of them.
One of the nice things about doing an EXPLAIN is that it not only shows you the plan of how it
is going to perform your query to get the result, but it also may show how it re-wrote it in doing
so.
Retrieve employee whose first names are in the following Employee Table
list (as a set). Last First Dept#
Name Name
SELECT *
FROM Employee Brown Alan 401
Submitting the request by prefixing it with the EXPLAIN keyword will display the
optimizer rewrite of this query (see left hand page).
. . . . Rewrite Equivalent:
3) We do an all-AMPs RETRIEVE step from DLM.Employee by way of an
WHERE
all-rows scan with a condition of ("(DLM.Employee.first_name =
First_Name = 'allen'
'allen') OR (DLM.Employee.first_name = 'alan')") into Spool 1
OR
(group_amps), which is built locally on the AMPs. The size of
First_Name = 'alan';
Spool 1 is estimated with no confidence to be 7 rows (595 bytes).
The estimated time for this step is 0.02 seconds.
. . . .
Let’s compare IN
Written as -
First_Name = ‘alan’
OR First_Name = ‘allen’
In this example the only values satisfying at least one of these conditions are for the two shown.
All others evaluate false or unknown and are not returned.
to NOT IN
Written as -
In this example the only values satisfying both of these conditions are for all values but the two
shown. For these two, either ‘alan’ or ‘allen’ make one of these conditions false, and as such,
causes the entire condition to be false because when one condition of many AND’ed together is
false (regardless of unknown conditions involving null), the entire AND’ed condition is false.
Retrieve employee whose first names are NOT IN the Employee Table
following list (as a set).
Last First Dept#
Name Name
SELECT *
FROM Employee Brown Alan 401
WHERE First_Name NOT IN ('alan', 'allen');
Brown Allen 801
Submitting the request by prefixing it with the EXPLAIN keyword will yield the
optimizer rewrite of this query.
. . . . Rewrite Equivalent:
3) We do an all-AMPs RETRIEVE step from DLM.Employee by way
of an all-rows scan with a condition of “(DLM.Employee.first_name WHERE
<> 'alan') AND (DLM.Employee.first_name <> 'allen')") into Spool 1 First_Name <> 'allen'
(group_amps), which is built locally on the AMPs. The size of AND
Spool 1 is estimated with no confidence to be 20 rows (1,700 First_Name <> 'alan';
bytes). The estimated time for this step is 0.02 seconds.
. . . .
As can be seen in the EXPLAIN of the query, BETWEEN is inclusive. By making it inclusive,
it becomes easier to control the ranges that are required. And since the optimizer rewrites it by
replacing BETWEEN with its “<= / >=” equivalent, it makes no difference how you may write
it, they will perform the same.
SELECT *
Brown Alan 401
FROM Employee
WHERE Dept# BETWEEN 401 and 567; Brown Allen 801
Brown ? 567
last_name first_name dept#
---------- ---------- ----------- Smith Mary ?
Jones Jimmy 401
Jones Jimmy 401
Brown Alan 401
Brown ? 567
Explanation
---------------------------------------------------------------------------------------------------------------------------------------
. . . .
3) We do an all-AMPs RETRIEVE step from customer_service.Employee by way of an all-rows
scan with a condition of ("(customer_service.Employee.dept# <= 567) AND
(customer_service.Employee.dept# >= 401)") into Spool 1 (group_amps), which is built
locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 4 rows (340
bytes). The estimated time for this step is 0.02 seconds.
. . . .
In our example, no department number can be greater than or equal to 567, and at the same time
less than or equal to 401. This does not fail! It is considered a success with no rows satisfying
the conditional expression.
Contrast this query with the one following it: Employee Table
Last First Dept#
Name Name
SELECT *
FROM Employee
Brown Alan 401
WHERE Dept# BETWEEN 567 AND 401;
Brown Allen 801
For a row to be projected, the net result of all WHERE Brown ? 567
conditions must be TRUE.
Smith Mary ?
This condition is deemed un-satisfiable by the
optimizer. Jones Jimmy 401
Next we shall see what happens when we integrate other conditions into this conditional
expression.
The EXPLAIN of the previous query illustrates how the Employee Table
optimizer is aware of the unsatisfiable condition. Last First Dept#
Name Name
Only a quick PI access is attempted with an
unsatisfiable condition. Brown Alan 401
Explanation
----------------------------------------------------------------------------------------------------------------------------------------------
1) First, we do a single-AMP RETRIEVE step from DLM.Employee by way of the primary index
"DLM.Employee.last_name = _LATIN '010000001800000001E0'XC" with unsatisfiable conditions
into Spool 1 (one-amp), which is built locally on that AMP. The size of Spool 1 is estimated with
high confidence to be 1 row (52 bytes). The estimated time for this step is 0.00 seconds.
-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total
estimated time is 0.00 seconds..
On the facing page, the query on the left side can be rewritten by applying parentheses as shown
by the query on the right side in a way that doesn’t change the query but, instead, hopefully
makes it easier to understand. You could use parentheses to change the order. For instance, you
could do the following to force the last OR’ed condition to be included into the AND’ed
condition. By doing this, however, you create an entirely different query with a different result
set.
Evaluation Precedence:
1. Parenthesis evaluated first
2. NOT operators
3. AND operators
4. OR operators
5. Operators of equal precedence evaluated from left to right
• Operators =, <>, <=, >=, <, >, IN, NOT IN and BETWEEN can be used as
qualifiers.
• All conditions found of the WHERE clause must be satisfied for results
to be projected.
• IN, and NOT IN lists can be used as a short cut to replace long
“AND/OR” linked conditions.
True or False:
• Describe what NULLs are and how they affect query results.
• Incorporate NULL syntax into queries correctly.
• Identify issues introduced by adding NULL into WHERE conditions.
Null is a tough concept to describe. Some say it is a missing value. In truth, null is truly a
concept. It is an idea of how to deal with “unknown-ness”. Something that is unknown is
inconvenient, to say the least. By saying the value is missing we almost imply that we forgot to
assign the value, which could be true for null as well, however null represents an unknown.
All are good questions that must be dealt with, but first they must be understood.
By default, in SQL Assistant and in BTEQ, null is displayed as a question mark (?). You can
change this in the respective tool to whatever you like, but remember, what you choose could be
mistaken for that value instead of an unknown. (After all, how do you display an unknown with
a known quantity or entity?)
Note: The phrase “NULL value” could be considered incorrect by some since a null is
unknown. However, we shall refer to a “NULL value” from time to time for purposes of
discussion.
You may not use it with an inequality since it can never evaluate “true or false”.
Lastly, there is a subtle, but significant, difference between asking if a column value IS NULL
Vs comparing a column to a null via an operator. For instance, if we could write WHERE Col1
= NULL, it would evaluate unknown, whereas writing WHERE Col1 IS NULL will evaluate true
if the value for Col1 IS NULL.
We cannot write a condition like WHERE C1 = NULL since it can never evaluate TRUE.
But we can write WHERE C1 IS NULL
This evaluates TRUE only where the value for C1 is unknown – this can be determined.
In relational databases, to say that a row wasn’t projected because the condition evaluated false is
incorrect. We say that it didn’t project because it didn’t evaluate true. This is a significant
statement! It will cause many a quandary with respect to writing bug-free SQL.
In SQL, the database only projects column values for those rows that contain data
that satisfy all of the WHERE CONDITIONS.
If columns for a row are not projected, it is not because the conditions evaluated
FALSE, but instead because they did not evaluate TRUE.
Observe:
SELECT *
FROM Employee
WHERE Dept# < 500 SELECT *
OR First_Name IS NULL; FROM Employee
WHERE Dept# = NULL;
last_name first_name dept#
---------- ---------- ----------- *** Failure 3731 The user must use IS
Jones Jimmy 401 NULL or IS NOT NULL to test for NULL
Brown Alan 401 values.
Brown ? 567
SELECT *
FROM Employee
WHERE First_Name <> NULL;
*** Failure 3731 The user must use IS NULL or IS NOT NULL to test for NULL values.
For the conditional expression on the facing page, it is important to note that the EXPLAIN of
the request shows how a null comparison, via “=”, can be interjected into a request, even though
we unable to do so with SQL directly. Here, the values for the column First_Name include a
null, so we have effectively introduced something we cannot type in via the syntax, namely
WHERE First_Name = NULL (NULL being introduced from the table itself). For instance:
First_Name = ‘alan’
Becomes –
NULL = ‘alan’ (or ‘alan’ = NULL if this looks more recognizable to you)
No values for rows having a null last name can ever be projected since null, in a conditional
expression, can never evaluate true.
Retrieve employees whose first names are in the following list (as a set).
. . . .
3) We do an all-AMPs RETRIEVE step from DLM.Employee by way of an
EXPLAIN
all-rows scan with a condition of ("(DLM.Employee.first_name =
SELECT *
'allen') OR (DLM.Employee.first_name = 'alan')") into Spool 1
FROM Employee
(group_amps), which is built locally on the AMPs. The size of
WHERE First_Name
Spool 1 is estimated with no confidence to be 7 rows (595 bytes).
IN ('alan', 'allen');
The estimated time for this step is 0.02 seconds.
. . . .
The EXPLAIN of this query shows an alternate method for writing this.
. . . .
3) We do an all-AMPs RETRIEVE step from DLM.Employee by way
of an all-rows scan with a condition of “(DLM.Employee.first_name
<> 'alan') AND (DLM.Employee.first_name <> 'allen')") into Spool 1
(group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with no confidence to be 20 rows (1,700
bytes). The estimated time for this step is 0.02 seconds.
. . . .
Since the value for either of these could be null, this syntax could actually end up comparing a
null to a value or to another null. The database does not know if this could happen until it does.
That’s a whole different issue that this:
What happens, then, is that we will never return the row for Mary Smith since the condition “null
= null” can never be true.
Explanation
-------------------------------------------------------------------------------------------------
The EXPLAIN shows how
. . . .
the database correctly
3) We do an all-AMPs RETRIEVE step from DLM.Employee by way
treats the NULL literal as a
of an all-rows scan with a condition of (
comparison.
"(DLM.Employee.dept# = 403) OR
((DLM.Employee.dept# = 401) OR
(DLM.Employee.dept# = NULL ))")
into Spool 1 (group_amps), which is built locally on the AMPs.
The size of Spool 1 is estimated with no confidence to be 4
Note the difference between this EXPLAIN and that for the
query on the previous page. Employee Table
Last First Dept#
SELECT * Name Name
FROM Employee
WHERE Dept# IS NULL Brown Alan 401
OR Dept# IN (401, 403);
Brown Allen 801
Explanation
---------------------------------------------------------------------------------------------------------------------
. . . .
3) We do an all-AMPs RETRIEVE step from DLM.Employee by way of an
all-rows scan with a condition of (
"(DLM.Employee.department_number = 403) OR
((DLM.Employee.department_number = 401) OR
(DLM.Employee.department_number IS NULL ))") Note: “IS NULL”
into Spool 1 (group_amps), which is built locally on the AMPs.
The size of Spool 1 is estimated with no confidence to be 9 rows (1,323 bytes).
Let’s examine the conditional expression to see why Alan Brown, in department 801, was not
returned by reviewing the logic in the EXPLAIN.
WHERE Dept# <> NULL -- 801 <> NULL evaluates unknown and
AND Dept# <> 401 -- not true, the next 2 conditions are
AND Dept# <> 403 -- rendered as being irrelevant.
For Alan Brown 801 <> NULL evaluates as unknown and not true. Since all conditions are
linked with and, all of them must evaluate as true for Alana Brown to be returned. The same
logic exists for each and every row. The result No rows found.
To resolve this issue, simply remove NULL from the list. There is no reason for it to be there
anyway, is there? So why discuss it at all? Wait until we get to the module on Subqueries later
in the course!
Partial Explanation
---------------------------------------------------------------------------------------------------------------------------------------------
3) We do an all-AMPs RETRIEVE step from DLM.Employee by way of an all-rows scan with a
condition of ("(DLM.Employee.dept# <> NULL) AND ((DLM.Employee.dept# <> 401) AND
DLM.Employee.dept# <> 403 ))") into Spool 1 (group_amps), which is built locally on the AMPs.
SELECT * SELECT *
FROM Employee FROM Employee
WHERE First_Name NOT IN ('alan', 'allen'); WHERE First_Name <> 'alan'
AND First_Name <> 'allen';
SELECT *
FROM Employee
WHERE NOT (First_Name = 'alan' All three return the explain
OR First_Name = 'allen'); shown below.
Explanation
---------------------------------------------------------------------------
. . .
3) We do an all-AMPs RETRIEVE step from DLM.Employee by way of an
all-rows scan with a condition of ("(DLM.Employee.first_name <>
'allen ') AND (DLM.Employee.first_name <> 'alan ')") into Spool 1
(group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with no confidence to be 10 rows (520 bytes).
The estimated time for this step is 0.02 seconds.
. . .
True or False:
• Describe what NULLs are and how they affect query results.
• Incorporate NULL syntax into queries correctly.
• Identify issues introduced by adding NULL into WHERE conditions.
2. Using an IN list, display employees with any of the following job codes:
412101, 412109, NULL.
4. List employee with un assigned job codes that have salaries between
30K and 40K.
Character data is automatically translated between the client and the database. Its form-of-use
is determined by the client character set or session character set. The form of character data internal to
Teradata Database is determined by the server character set attribute of the column.
Fixed character columns will always occupy the number of characters defined by the data type
(usually padded with spaces), whereas variable character columns will contain only the number
of characters for the stored value. Space is a valid character and will count as one of the
characters whether trailing, leading, or anywhere in the value.
1. 'abc' = 'ABC' T
2. '190' = '190 ' T
3. 'abc ' = ' abc ' F
4. ' 190' = '190 ' F
5. ' ' = ' ' T
Two single quotes 6. '' = ' ' T
(i.e. zero length string) 7. '' = null F
Although there is no real need to memorize the ranges of these different types, knowing what
they are will help with understanding certain types of data type conversions beyond this module.
We will discuss some conversions later in this module.
BIGINT
ANSI Standard
8 bytes of storage
Range: ± 9,233,372,036,854,775,807
Characters needed to display: 20
The amount of storage required by each depends, again, on the size (number of digits) of the
decimal. The page shows how one can relate the number of digits for the decimal to the number
of digits for an integer to help determine storage requirements. If you disregard the decimal, and
consider only the value for ‘m’, the number of bytes of storage required will be the same as for
the corresponding integer that can completely store any decimal value for that decimal type after
removing the decimal point.
FLOAT
Number in the range of 2 X 10-307 to 2 X 10+308
Can be used to represent a very large number, but with only 15 digits of
precision.
Example: 4.35400000000000E-001
= 4.35400000000000 X 10-1
= 4.354 X 0.1
= 0.4354
BLOBs (Binary Large Objects) and CLOBs (Character Large Objects) are each a different form
of LOB (Large Object)
These are:
• Never translated by the Teradata Database
• Handled as if they were n-byte, unsigned binary integers
• Suitable for digitized image information (BLOB)
BYTE(n)
VARBYTE(n) CREATE TABLE Data_Types
Where “n” = Number of bytes between (Last_Name CHAR(20),
1 and 64,000. First_Name VARCHAR(20),
(These two are Teradata extensions to Thesis CLOB(2M),
the ANSI syntax) Years_Employed BYTEINT,
Employee_Age SMALLINT,
BLOB(n[K|M|G]) Employee_Number INT,
(Binary Large OBject) Population BIGINT,
Fixed length up to 2GB in size. Salary_Amount DEC(18,2),
Examples: Bigger_Than_BIGINT FLOAT,
BLOB(3200) Img001 BYTE(32000),
BLOB(32K) Img002 VARBYTE(64000),
BLOB(32M) Img100 BLOB(300M));
BLOB(2G)
For column C2 (below), the data type is DATE, which is also stored as an integer. The database
will perform DATE arithmetic on this column. It happens that the integer number 991231
(earlier paragraph) also represents a valid date (1999-12-31), however the integer representation
for January 1st of 2000 (The day after 1999-12-31 being 2000-01-01) is 1000101 and not 991232.
In other words, the year goes from 99 to 100.
One way to determine if an integer represents a valid date is to add nine-teen million to it using
normal (integer) arithmetic. If the resulting number looks like a date in the format of
YYYYMMDD then it represents that date. For instance:
When specifying a literal date, one should get into the habit of prefixing the character date
(format of ‘YYYY-MM-DD’) with the DATE keyword like this DATE ‘20010-01-01’.
DATE
• Is stored internally as data type INTEGER (4 bytes of storage).
• Supports full date intelligent arithmetic.
It also supports an Oracle-compatible form of ARRAY type called VARRAY. However, unlike
the Oracle VARRAY data type, the Teradata VARRAY type can be defined in multiple
dimensions
The array data type is a user-defined type (UDT) with a fixed number of defined elements. It has
the following characteristics:
• An array data type is defined by the CREATE TYPE statement, like other UDTs
• It can be used as:
◦ A column of a table
◦ Parameter to a UDF/UDM/XSP/SP
◦ Local variable inside a SP
• All elements within the array have the same data type
• All elements default to an un-initialized state unless the DEFAULT NULL clause is
specified at creation time
• An array can be formed of a single dimension (1-D) or multiple dimensions(n-D)
• Overall size of an array is limited to ~64 KB of storage
The Teradata ARRAY data type is a user-defined type (UDT). This differs from the ANSI
standard which does not consider an ARRAY data type to be a UDT.
The Teradata ARRAY data type is a user-defined type (UDT). This differs
from the ANSI standard which does not consider an ARRAY data type to
be a UDT.
• One-Dimensional Array
• Multi-Dimensional Array
Use the NUMBER data type when migrating from Oracle or another database that uses the
NUMBER data type. Teradata’s NUMBER data type is similar to Oracle’s NUMBER data type,
with minor exceptions. DB2 also has support for Oracle’s NUMBER data type, for compatibility
reasons, but is stored in a fixed length as opposed to variable length in Oracle and Teradata.
5 * 4 = 20
2 ** 3 = 8
8/4=2
9 / 4 = 2 (throw away the reminder and keep the quotient)
9 MOD 4 = 1 (throw away the quotient and keep the remainder)
8.00 / 3 = 2.67 (decimal division)
In a projection.
SELECT 5 + 2; 7
SELECT 8.40 / 4.20; 2.00
SELECT 10 - 7 * 8; -46
SELECT 2**3 / 2 + 4; 8.00000000000000E 000
(exponentiation causes float)
SELECT Salary_Amount * 1.25; Give a 25% raise?
SELECT Budget_Amount - 1e2; Subtract 100 from the budget amount
SELECT 1e6 - 100000; 9.00000000000000E 005
SELECT 240378 MOD 100; 78
In a conditional Expression.
SELECT * FROM Employee WHERE Salary_Amount * 1.25 > 200000;
SELECT Job_Code FROM Job WHERE Job_Code MOD 1000 = 125;
Note how the CAST syntax forms a single construct while the extended form is actually to
constructs. These can be made into a single construct as shown which may, sometimes, be
required:
Also, the Teradata extended form may produce a result differing from the CAST. For instance,
casting 12 as a 3 character field for each method is shown below.
SELECT 12 (CHAR(3)); 12
---
1
Note that the heading is left justified and so is the result. The result contains 3 characters, the
first two of which are spaces. To explain what happened one must first understand that the
value 12 is a BYTEINT. A BYTEINT is +123, which takes 4 characters, right justified. Starting
from the left we get the first 3 characters (from the conversion), which are then left justified into
the character space. Thus the ‘ 1’
In contrast, the ANSI CAST actually removes (or trims) the spaces in front of the number before
acquiring the characters.
** The note at the bottom of the facing page is answered by saying that any conversion that
results in a truncation while in ANSI mode, will fail. The exception is the Teradata extended
form of the CAST. This will truncate in either mode.
SEL CAST(1.77 AS CHAR(2)) this will truncate the decimal to “1.” left justified as
a two character field
** Read the left-hand page to learn how truncation using CAST works in ANSI mode.
Rounding works the same whether using the CAST or the Teradata extended method explained
on the earlier left-hand page. So the following are equivalent to the right-hand page.
One important thing to know and understand is how rounding takes place inside
your system.
There are 2 methods for rounding, each method is determined by a setting set by
your Teradata administrator.
Method 1:
SEL CAST(1.34999 AS DEC(2,1)); < 5, round down 1.3
SEL CAST(1.35000 AS DEC(2,1)); >= 5, round up 1.4
SEL CAST(1.35001 AS DEC(2,1)); >= 5, round up 1.4
Method 2:
SEL CAST(1.34999 AS DEC(2,1)); < 5, round down 1.3
SEL CAST(1.35000 AS DEC(2,1)); equal to 5, round to the even number 1.4
SEL CAST(1.45000 AS DEC(2,1)); equal to 5, round to the even number 1.4
SEL CAST(1.35001 AS DEC(2,1)); > 5, round up 1.4
The actual method of conversion is the Teradata extended form discussed on the previous left-
hand page. In other words, numbers are right justified into the appropriate character space. So
an integer number would, by default, become an 11 character field with the value right justified
into the field. Examples of these will follow on the next page.
Two consecutive pipe characters (or vertical bars) are interpreted by the database
as a request to perform a concatenation of fields.
Examples:
SELECT Last_Name || First_Name FROM Employee;
SELECT Last_Name || ', ' || First_Name FROM Employee;
SELECT First_Name||' '||Last_Name||' is '||(DATE - Birthdate) / 365 FROM Employee;
SELECT 123||'ABC'||456;
For the previous examples, the resulting single field results will all have a data type
of VARCHAR.
SELECT Last_Name || ', ' || First_Name FROM Employee WHERE Last_Name = 'Brown';
SELECT First_Name || ', ' || Last_Name FROM Employee WHERE Last_Name = 'Brown';
((first_name||', ')||last_name)
----------------------------------------------------
Allen, Brown
Alan, Brown
CAST automatically “trims” the “leading” and “trailing” spaces after it casts the number
to VARCHAR.
SELECT CAST(123 AS VARCHAR(4))||'ABC'||CAST(4567 AS VARCHAR(4)) AS Concat;
Concat
-----------
123ABC4567
Although the format specification may not be longer than 30 characters, it may be used to format
a wide field. For instance, the following format specification may not include another ‘x’ since
there are 30 of them already specified.
FORMAT ‘xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx’
You can reformat the appearance of how a value is displayed by using FORMAT.
(FORMAT is a Teradata extension to the ANSI Standard.)
Although there are many different formatting options, other than those used for
dates (later) the ones we will show are:
“-” Dash
“%” Percent sign
“$” Dollar sign
“Z” Leading zero suppress (“Z” or “z”)
“9” Show leading zeroes
“X” Character display (“X” or “x”)
“.” Decimal or period
“,” Comma
“B” Blank or space (“B” or “b”)
“/” Slash or division sign
“:” Colon
The facing page shows how we format the field and move it into the spool as a character value.
What SQL Assistant “sees” is a character field, not knowing or caring that it holds a reformatted
value.
Before providing examples with results, let us look at the different styles used for
formatting result fields.
These are the formatting styles available for use when using SQL Assistant.
The various methods for formatting any field are to numerous too
mention.
'abcdefg' 'abcdefg'
--------- -----------------------------------------------------------------
abcdefg abcdefg (255 character space)
1234567.89 1234567.89
Left justified in a
--------------- ---------------
$ 1,234,567.89 $1,234,567.89 12 character field.
The following are available Y4 Four digit year (use also 'YYYY')
for date formatting. YY Two digit year
M4 Full name of month (use also 'MMMM')
Lower case is allowed. M3 Three character abbreviation (use also 'MMM')
MM Two digit month
E4 Full day of week name (use also 'EEEE')
E3 Abbreviated day of week name (use also 'EEE')
D3 three digit day of year (use also 'DDD')
DD two digit day of month
• Case sensitivity has an effect on how SQL works with character data.
• You can change the data type for any expression by using the CAST function.
• Many date formatting options exist to tailor how dates can be displayed.
True or False:
1. The FLOAT data type has more precision than does a decimal data type.
False – float only has 15 digits of precision.
2. Character data types can not be converted to a numeric data types.
False
3. FORMAT 'd2' is a valid formatting option.
False – for 2-digit day formatting “dd” must be used.
4. The expression 'a ' = 'A ' evaluates true.
True 3 spaces 10 spaces
5. You can use the CAST function to change a data type or to format results.
True
6. The comma “,” is a valid formatting character.
True
7. The formatting character “9” may be used to display leading or trailing
zeroes.
False – it can only display leading zeroes.
1. Find and list employees first and last names for employees where their
last name begins with either “R”, “S” or “T”. (Do this without regard to
case sensitivity.)
2. Write a request that will show the salary amount for the people
identified in #1 if they were given a 10% increase in salary that gave
them a salary > 50K.
3. Project new employee job codes (from the Employee table) for all those
job codes ending in 101, increasing them by 100. Include last names,
job codes, department numbers to make help verify results.
first_name last_name
------------ --------------------
Peter Rabbit
Larry Ratzlaff
Frank Rogers
Nora Rogers
Irene Runyon
Loretta Ryan
Michael Short
John Stein
After completing this module, you should be able to discuss the features
and usage of the following functions, including: ANSI vs. Teradata
Extension; Character Vs. Numeric; Supporting Syntax.
Functions are SQL constructs that perform certain and varied operations on an
argument list or expression containing:
• Single columns or groups of columns
• Literals
• Expressions involving computations
• Results of other functions (nested functions)
Some functions may be fully compliant and yet provide extended capabilities
supporting functionality beyond those defined by the ANSI Standadrd.
The LOWER function result is in the same character set as the input argument. The only
exception to this is when the input is the Kanji1 character set. With Kanji1, the LOWER
function returns the result in the Latin character set, as previously. The Kanji1 character set is
deprecated and should no longer be used anywhere.
UPPER ( expression )
Returns “expression” as uppercase.
LOWER ( expression )
Returns “expression” as lowercase.
last_name
--------------------
Trader
Transaction processing in Teradata mode is not case specific (not case sensitive).
first_name
SELECT First_Name FROM Employee
------------------------------
WHERE First_Name = 'ALAN'; Alan
You can then make the query not case sensitive (NOT CASESPECIFIC) by
performing a “case-blind” test by using either UPPER or LOWER.
last_name
SELECT Last_Name FROM Employee
--------------------
WHERE UPPER(Last_Name) = 'TRADER'; Trader
CHARACTER_LENGTH ( expression )
Characters(last_name)
Last Name is defined as CHARACTER(20)
---------------------
20 This means that every value contains 20 characters.
first_name Characters(first_name)
------------------------------ ----------------------
I.B. 4
first_name Characters(first_name)
CHARACTER_LENGTH ( expression ) -------------- ----------------------
Irene 5
SELECT first_name, Robert 6
Nora 4
CHARACTER_LENGTH(first_name) Alan 4
FROM employee; John 4
Charles 7
SELECT LAST_name, Paulene 7
Larry 5
CHARACTER_LENGTH(last_name) Carol 5
FROM employee John 4
Where department_number IN (401, 501); James 5
Edward 6
Frank 5
last_name Characters(last_name) James 5
------------- --------------------- Allen 5
Runyon 20 Michael 7
Rogers 20 Ron 3
Peter 5
Brown 20
I.B. 4
Phillips 20 Jim 3
Johnson 20 William 7
Ratzlaff 20 Darlene 7
Rabbit 20 Domingus 8
Trader 20 Arnando 7
Albert 6
Wilson 20
Loretta 7
Hoover 20
Machado 20
These two requests SELECT TRIM(TRAILING ' ' FROM 'abc ') || 'XYZ';
return the same result.
(Trim(TRAILING ' ' FROM 'abc ')||'XYZ')
-----------------------------------------------
abcXYZ
For instance:
• BIGINT left-justified into a variable character 20 space (recall that BIGINT is plus-
or-minus 9,233,372,036,854,775,807, including the sign requires 20 characters).
When using TRIM on numeric fields, the database performs an implicit CAST to
character prior to doing the trim.
ConcatData
Six character field ----------------
-777-888-999 Both fields trimmed.
right justified.
The issue with using CAST is knowing how big to make the character length if it
were a column instead of a literal.
Returns an INTEGER value representing the numeric position of the first argument in
the second argument. POSITION is ANSI standard.
“When a numeric value gets trimmed it gets left-justified into a variable character space that
fits the data type. Prior to the conversion, the numeric value was right justified into the
character space, but trimming it for leading and trailing spaces has the effect of shifting it into
the space left-justified.”
Read the left-hand page for a more thourough explanation of the second example.
For instance:
SELECT TYPE(SUBSTRING(‘contact’ FROM 8 FOR 6)) AS testcol;
SELECT TYPE(SUBSTRING(‘contact’ FROM 8 FOR 0)) AS testcol;
testcol
---------------------------------------
VARCHAR(0) CHARACTER SET UNICODE
C o n t a c t
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
-5 -4 -3 -2 -1 0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +10
Following are some examples of the SUBSTRING function used on the word “Contact”.
34567 is an integer cast, right-justified, into an 11-character field (including the sign).
Beginning with the 3rd character, it then takes the first 5 characters.
_ _ _ _ _ _ 3 4 5 6 7
Unlike POSITION, LIKE does not return a numeric starting position. It simply searches to find
an occurrence of the value in inside the string column or expression.
Of the two wild cards, “%” designates that any number of characters may precede (or follow) it.
The “_” is positional and means that the character for that particular position may be anything.
LIKE may only be used on character strings, however, you may nest a conversion of the data
type to character from which it may search.
LIKE is a logical operator (recall BETWEEN, IN, NOT IN etc.) and not a function.
It is included in this module due to its complexity and its similarity to POSITION.
Searches for a character string pattern within another character string or character
string expression. “LIKE” is ANSI compliant.
LIKE has reference to two (2) wild cards: “%” (percent) and “_” (underscore).
last_name
-------------------- Any number of characters preceding.
Villegas
Wilson Any number of characters following.
Phillips
In addition to the examples shown, observe the following queries and their results, where the
positions of the “o” and the “s” are switched:
last_name
--------------------
Short
Wilson
Johnson
last_name
--------------------
Rogers
Rogers
Hopkins
Johnson
Morrissey
last_name
--------------------
Brown
Brown
last_name
The value “Brown” is followed by 15 spaces.
--------------------
Brown
The following would work.
Brown WHERE Last_Name LIKE '%wn ';
In our example, we are looking for last names where the character “r” occurs as the second
character in it. If we wanted to find where the character “n” were the second from the last letter
in the last name, the following syntax would be needed. This would likely be true even if the
last name were variable character since a space could occur as the last character for it as well. In
that case, where the TRIM is left off and a variable character column has a value like 'Brown '
(where there is a space after ‘Brown’), the following last name would be included in the result
set.
SEL Last_name
FROM employee
WHERE last_name LIKE ‘%n_’;
last_name
----------
Brown
Find employees having last names with “r” as the second character.
SELECT Last_Name FROM Employee WHERE Last_Name LIKE '_r%';
last_name
--------------------
Brown
Brown
Crane
Trader
Trainer
Find employees having last names with “w” as the second from the
last character.
last_name
SELECT Last_Name
--------------------
FROM Employee Brown
WHERE TRIM(Last_Name) LIKE '%w_'; Brown
• Until an instance of the ESCAPE character occurs, characters in the pattern are
interpreted at face value.
• When an ESCAPE character immediately follows another ESCAPE character, the two
character sequence is treated as though it were a single instance of the ESCAPE
character, considered as a normal character.
• When an underscore metacharacter immediately follows an ESCAPE character, the
sequence is treated as a single underscore character (not a wildcard character).
• When a percent metacharacter immediately follows an ESCAPE character, the sequence
is treated as a single percent character (not a wildcard character).
• When an ESCAPE character is not immediately followed by an underscore
metacharacter, a percent metacharacter, or another instance of itself, the scan stops and an
error is reported.
To search for a character in a string that is a LIKE wild card you use the ESCAPE
clause.
Find table names having an underscore as the second character in its name.
SELECT TableName
FROM DBC.Tables
WHERE TableName LIKE '_x_%' ESCAPE 'x';
TableName
------------------------------
T_CST_WT_COMMANDS
T_CST_CTRL_INPUT
T_CST_WT_OVERALL
When using this function, values like “Brown” and “brown” are no longer equal since “B” and
“b” have a different case.
{ expression } ( { CASESPECIFIC | CS } )
last_name
SELECT Last_Name FROM Employee --------------------
WHERE POSITION(('Ra' (cs)) IN Last_Name) > 0; Ratzlaff
Rabbit
The result of the EXTRACT function is an integer value that represents that portion you are
extracting.
Yr Mth Dy
----------- ----------- -----------
2011 11 19
Hr Mn Scd
----------- ----------- -----------
10 20 30
Since this can be used to add months to a date, it is also capable of adding years as well. The
second example for using the function illustrates this approach.
ADD_MONTHS(date-expression, integer-expression)
There are three Teradata system-defined business calendars that you can set for your session:
• Teradata
• ISO
• COMPATIBLE
All three calendars are based on the de facto international standard, the Gregorian calendar.
The Gregorian calendar has 365 days in most years and 366 days in a leap year. The calendars
support January 1, 1900, to December 31, 2100.
The default session calendar is Teradata. Each calendar defaults to all business days. You can
change that pattern using a Macro.
SET SESSION calendar = Teradata; SET SESSION calendar = ISO; SET SESSION calendar = Compatible;
S M T W T F S S M T W T F S S M T W T F S
1 1 1
2 3 4 5 6 7 8 2 3 4 5 6 7 8 2 3 4 5 6 7 8
9 10 11 12 13 14 15 9 10 11 12 13 14 15 9 10 11 12 13 14 15
16 17 18 19 20 21 22 16 17 18 19 20 21 22 16 17 18 19 20 21 22
23 24 25 26 27 28 29 23 24 25 26 27 28 29 23 24 25 26 27 28 29
1
30 31 30 31 30 31
• TERADATA Calendar:
◦ The first full week of the year starts on Sunday
◦ The days of the year before Sunday belong to Week 0. For example, if the year starts
on January 1, 2004 (a Thursday), then Week 0 is from January 1 to January 3. Week
1 begins on Sunday, January 4.
• ISO Calendar:
◦ This calendar follows the ISO and European standard.
◦ The week begins on Monday
◦ The first week of the year is the first week that has at least 4 days. If a week has
fewer than 4 days, it belongs to the last week of the previous year. (week 52)
◦ There are no partial weeks. For example, if the year starts on January 1, 2008 (a
Tuesday) and the week start is Monday, week 1 of 2008 is from December 31, 2007,
to January 6, 2008.
• COMPATIBLE Calendar:
◦ This calendar is Oracle-compatible.
◦ It specifies that the first full week of a year begins on January 1, regardless of what
day of the week that is
◦ There can be partial weeks with 1 day (for most years) or 2 days (for leap years) at
the end of the year.
◦ The day the week begins can change from year to year. For example, if January 1,
2011, is a Saturday, the first week of the year is from Saturday, January 1, 2011,
through Friday, January 7, 2011.
Find the week number for a given data using the Teradata, Compatible,
and ISO Calendars via the new calendar UDFs : S M T W T F S
2 3 4 5 6 7 8
SELECT weeknumber_of_year(2011-01-01); 9 10 11 12 13 14 15
16 17 18 19 20 21 22
weeknumber_of_year(2011-01-01,’TERADATA’) 23 24 25 26 27 28 29
----------------------------------------- 30 31
0
S M T W T F S
SET SESSION calendar = COMPATIBLE; Week 1 of Current year 1
2 3 4 5 6 7 8
SELECT weeknumber_of_year(2011-01-01); 9 10 11 12 13 14 15
16 17 18 19 20 21 22
weeknumber_of_year(2011-01-01,’COMPATIBLE’)
23 24 25 26 27 28 29
-----------------------------------------
30 31
1
SET SESSION calendar = ISO; S M T W T F S
SELECT weeknumber_of_year(2011-01-01); 2 3 4 5 6 7 8
9 10 11 12 13 14 15
weeknumber_of_year(2011-01-01,’ISO’) 16 17 18 19 20 21 22
----------------------------------------- 23 24 25 26 27 28 29
52 30 31
SYNTAX:
Function Name ( expression )
TD_SYSFNLIB , calendar_name
NULL
• Expressions:
an expression that results in a DATE, TIMESTAMP, or TIMESTAMP WITH TIME
ZONE value.
• calendar_name:
an optional business calendar name. The only possible values are Teradata, ISO, and
COMPATIBLE. This argument must be a character literal and cannot be a table column
or expression. If a calendar is not named, Teradata uses the calendar for the session.
NULL:
an optional argument, the calendar that is set for the session.
Syntax
Function Name ( expression )
TD_SYSFNLIB , calendar_name
NULL
WeekNumber_of_calendar Returns the number of weeks from the beginning of the month to the specified Integer
date.
WeekNumber_of_month Returns the number of weeks from the beginning of the month to the specified Integer
date.
DayOccurrance_of_month Returns the nth occurrence of the weekday in the month for the specified date. Integer
SYNTAX:
Function Name ( expression )
TD_SYSFNLIB , calendar_name
NULL
• Expressions:
an expression that results in a DATE, TIMESTAMP, or TIMESTAMP WITH TIME
ZONE value.
• calendar_name:
an optional business calendar name. The only possible values are Teradata, ISO, and
COMPATIBLE. This argument must be a character literal and cannot be a table column
or expression. If a calendar is not named, Teradata uses the calendar for the session.
NULL:
an optional argument, the calendar that is set for the session.
Syntax
Function Name ( expression )
TD_SYSFNLIB , calendar_name
NULL
MonthNumber_of_calendar Returns the number of months from the beginning of the calendar to the Integer
specified date.
QuarterNumber_of_year Returns the number of quarters from the beginning of the year to the Integer
specified date.
QuarterNumber_of_calendar Returns the number of quarters from the beginning of the year to the Integer
specified date.
WeekNumber_of_quarter Returns the number of weeks from the beginning of the quarter to the Integer
specified date.
True or False:
1. From the Employee table, display the last name first name for
employees 1013, 1018, and 1024. Concatenate the columns so that you
see them as “last, first”.
2. Repeat #1. Replace your WHERE Clause, using LIKE to only list
employees who have an "LL" combination in their last name.
FullName
------------------------
Ratzlaff, L.
Villegas, A.
Phillips, C.
Set Operators
All employee
Result 1 Result 2
Last Name last names, Last Name
First Name first names, First Name
Emp Number emp#, Emp Number
Dept Number dept# Dept Number
There are many operations that set operations support. You can use set operators within the
following operations:
• Simple queries
• Derived tables (not covered in this class)
• Subqueries
• Insert/Select clauses (later module)
• View definitions (later module)
SELECT statements connected by set operators can include all of the normal clause options for
SELECT except the WITH clause.
The SQL set operators manipulate the result sets of two or more queries
by combining the results of each individual query into a single result set.
Set operators deal with actual sets of data (i.e. sets of rows)
Whereas join and subquery results are based upon IN, NOT IN, or Equality of
one or more columns – set operators act upon rows of data as a set.
If the data type for a column in the first projection is SMALLINT, and that for the
corresponding column in a subsequent projection is INTEGER, then the corresponding (longer
column) will result in a “numeric overflow”.
The following query is how one could write the SQL request on the facing page without using a
UNION.
SELECT Last_Name,
Department_Number AS Dept#,
Salary_Amount
FROM Employee
WHERE Department_Number = 401
OR Salary_Amount BETWEEN 35000 AND 38000
ORDER BY 1;
c1 c2 c3 c1 c2 c3
a b c a b d
b c null c d e
c d e a b c
d e f b c null
SELECT Last_Name,
Department_Number AS Dept#,
Salary_Amount
FROM Employee
WHERE Department_Number = 401
AND Salary_Amount BETWEEN 35000 AND 38000
ORDER BY 2, 1;
Returns result rows that appear in all answer sets generated by the individual
intersected SELECT statements.
last_name Dept# salary_amount
----------- ----- -------------
SELECT Last_Name, Machado 401 ?
Department_Number AS Dept#, Rogers 401 46000.00
Salary_Amount Brown 401 43100.00
FROM Employee Phillips 401 24500.00
WHERE Department_Number = 401 Johnson 401 36300.00
Trader 401 37850.00
INTERSECT
Hoover 401 25525.00
SELECT Last_Name,
Department_Number,
Salary_Amount last_name dept# salary_amount
----------- ----- -------------
FROM Employee
Hopkins 403 37900.00
WHERE Salary_Amount BETWEEN 35000 AND 38000 Trader 401 37850.00
ORDER BY 2, 1; Johnson 401 36300.00
Result
last_name Dept# salary_amount
-------------------- ----------- -------------
Johnson 401 36300.00 Common
Trader 401 37850.00 Rows
c1 c2 c3 c1 c2 c3
a b c a b d
b c null c d e
c d e a b c
d e f b c null
The facing page has lines drawn through the rows that are being omitted due to either the left
table rows matching rows from the right table, or (for “Short”) a row that wasn’t involved at all.
This
Area
Result Is the Result
1 Except 2
result
True or False:
1. The INTERSECT operator returns the same result as does the MINUS
operator, however MINUS is a Teradata extension.
False – MINUS and EXCEPT are equivalent (though MINUS is an extension)
2. The following is valid for a set operator ORDER BY Last_Name
False – you must order by a positional number
3. The ALL option may potentially return more rows than if not using it.
True
4. Set operators may cause truncation among corresponding columns of a
result sets.
True
5. An INTERSECT is just another way of returning an inner result.
False
6. If all three different set operators are referenced in a query, the UNION is
performed first.
False – the INTERSECT is first
7. “SELECT *” is a valid projection in a set operation.
True – as long as the numbers of columns projected among projections
remains constant
2. Use UNION to combine employees who earn more than $10,000.00 with
those who work in departments 301 or 401. Alias last name to LNM
and first name to FNM.
4. Change #1 to find those who satisfy both the department and salary
conditions.
Subqueries
Recall how a list of values can be passed into a query as a set of WHERE values.
You can replace the “IN LIST” with a subquery that will generate the set.
• Employees having department numbers that are not in the department table. (outer)
• Employees having department numbers that are in the department table. (inner)
• Departments that have people assigned to them (inner)
• Departments in which no one works (outer)
Note that items 2 and 3 (above) actually refer to the same set. But it is important to note that
their business focus is quite different.
Notice how the concepts of “inner” and “outer” relate to the Venn-Diagram. The answers to the
questions posed are:
Both of these areas are “outer” areas and these result sets are referred to as outer-results. Area 2
represents (conceptually) the inner-results, or IN results. This is the area where department
numbers from both tables match.
In the following, note the use of the terms “Inner” and “Outer”.
Only rows from the Outer Table may be projected.
The Inner Query is used to generate a distinct list (In List) of values.
1 2 3
Into which area Into which area
of the diagram of the diagram
would this result Dept Emp would this result
set fall? set fall?
SELECT * SELECT *
FROM Employee FROM Department
WHERE Department_Number NOT IN WHERE Department_Number NOT IN
(SELECT Department_Number (SELECT Department_Number
FROM Department); FROM Employee);
On the facing page we illustrate how one would interpret a business question from a subquery.
To do so, begin from the lowest level and move outward to the outer-most query. The bottom
example illustrates how two separate subqueries, AND’ed together, might be interpreted as each
separate condition. If they were OR’ed together then the busuness question would be “People
who are managers or work in support departments.” This is not an example of nested
subqueries, however, which will be discussed next.
You can include conditions to the outer and inner queries like this.
SELECT *
FROM Employee Employees with job code 412101
WHERE Job_Code = 412101 who work in support departments
AND Department_Number IN
(SELECT Department_Number Support Departments
FROM Department
WHERE Department_Name LIKE '%Support%');
The previous example was not nested because each subquery was each a separate condition, and
not part of another subquery.
To interpret the business question of the query start from the lowest-level
query and work upward.
Employee information on
Sales department managers
SELECT * who are not assigned to a
FROM Employee customer.
WHERE Department_Number IN
(SELECT Department_Number Sales department mangers
FROM Department who have no assigned
WHERE Department_Name LIKE '%Sales%' customers.
AND Manager_Employee_Number NOT IN
(SELECT Sales_Employee_Number
FROM Customer) ); Sales people having
customers
It should go without saying that one cannot match 2 values with 3 values, but let’s say it anyway.
That is to say, equal numbers of values may only match equal numbers of values. Observe.
SELECT *
FROM Employee
WHERE (Employee_Number,
Department_Number,
Manager_Employee_Number)
IN (SELECT Department_Number,
Manager_Employee_Number
FROM Department);
You can also use match multiple outer columns to multiple inner columns.
To find employees who work in departments that they manage, begin by finding
all department managers from the department table.
SELECT Department_Number, Departments and their managers
Manager_Employee_Number
FROM Department;
Explanation
---------------------------------------------------------
. . . .
3) We do an all-AMPs RETRIEVE step from DLM.Employee by
way of an all-rows scan with a condition of
("(DLM.Employee.dept# <> NULL) AND
((DLM.Employee.dept# <> 401) AND DLM.Employee.dept# <>
403 ))") into Spool 1 (group_amps), which is built
locally on the AMPs.
. . . .
Recall that all AND conditions must evaluate true, and that, for this WHERE condition, the
condition EMPLOYEE.dept# <> NULL can never evaluate true, so no rows are returned.
In the middle example of the facing page we replaced the NOT IN list with a subquery. Since
the subquery may, in fact, return a null department number, the query could potentially result no
rows being returned as well.
The example at the bottom of the page suggests adding a condition to the subquery that avoids
introducing a null into the resulting NOT IN list, this avoiding the no rows returned result.
Recall from an earlier module that the following query will return zero rows.
SELECT *
FROM Department
WHERE Department_Number NOT IN (401, 403, null);
Consider the following query where the inner query returns a null department.
How many rows will be returned?
What would be a good condition to add to the inner query?
SELECT *
FROM Department
WHERE Department_Number NOT IN
(SELECT Department_Number FROM Employee);
Answer:
SELECT *
FROM Department
WHERE Department_Number NOT IN
(SELECT Department_Number FROM Employee
WHERE Department_Number IS NOT NULL);
True or False:
Extra tough
3. Write a nested subquery that finds employees whose managers are
department managers that are not managers in the employee table.
Inner Join
In our example, we might want to look up the department name for any number of employees.
Since employee information does not include department information other than for the
department number. The reason for storing the department number in the employee table is
because it is what we need to get information for the department. The actual syntax for inner
join is (as you will soon see) fairly straight forward.
• Inner joins are similar to subqueries in that an inner join returns an inner result.
Subqueries, however, can return outer result sets as well
• Where subqueries are limited to projecting only from the outer table, inner joins can
project columns from any joined table.
Note the differences between the syntax used for a subquery and that for the join.
The join condition must evaluate “True” in order to project column values.
The SELECT *, in the case of the join, will project all columns from both tables for
comparisons that evaluate “True.”
1. You can only project columns from the 1. You can project columns from any table.
outer table.
2. A distinct list guarantees a one-to-many 2. Does not guarantee a one-to-many
relationship between the inner and outer relationship between the tables.
table.
3. Can return an inner result (using IN) or 3. Can only return an inner result.
an outer result (using NOT IN)
When required:
• When the name is ambiguous. (It occurs in more than one table.)
Aliasing is a way to provide another, more user-friendly, name to an table much like it is used in
aliasing columns. Aliasing table is optional (as it is for columns), but it is used in nearly all
queries involving joins. It is most always recommended that one use the alias as a qualifier
whenever referencing a column, even though optional to do so. The reason for this is to easily
identify from which table the column is being projected. An example of typical usage follows.
SELECT emp.Last_Name,
emp.First_Name,
emp.Department_Number,
dept.Manager_Employee_Number
FROM Employee emp , Department AS dept
WHERE emp.Department_Number = dept.Department_Number;
Just as you can alias column names, you may also alias table names.
Without double-quotes, aliases:
• May not contain non-standard characters.
• May not contain key-words.
SELECT employee.Last_Name,
Qualification not required.
First_Name,
Employee.Department_Number,
d.Manager_Employee_Number
FROM Employee, Department AS d
WHERE Department_Number = d.Department_Number;
SELECT e.Last_Name,
First_Name,
e.Department_Number,
Qualification required.
d.Manager_Employee_Number
FROM Employee e, Department d
WHERE e.Department_Number = d.Department_Number;
The style at the top of the page is often referred to as the “implicit” form (Inner Join is not stated
so it is implied) while the style at the bottom is referred to as the “explicit” form (Inner Join is
stated). Another term some may use for the top form is the “coma” form. When using the
explicit form, the INNER keyword is optional.
Also notice that the “ON” clause references the join condition. The WHERE clause is used to
reference conditions that are “residual” to the join. The “ON” clause is mandatory. Join
conditions when using the “implicit” form are not mandatory. We shall discuss this later in the
module.
SELECT e.Last_Name,
e.First_Name,
e.Department_Number,
ANSI 88 (Implicit Form)
d.Manager_Employee_Number
FROM Employee e, Department d
WHERE e.Department_Number = d.Department_Number
AND e.Last_Name = 'Brown';
Equivalent
Results
SELECT e.Last_Name,
e.First_Name,
e.Department_Number,
ANSI 92 (Explicit Form)
d.Manager_Employee_Number
FROM Employee AS e INNER JOIN Department AS d
ON e.Department_Number = d.Department_Number
WHERE e.Last_Name = 'Brown';
Query result:
Jones Sales Manager
Explicit form:
Although not entirely obvious, the second and third forms must be precisely written so that each
join condition references the immediately preceding tables or a syntax error will result. Note
these examples, which place the “ON” (join) conditions improperly. Both of these yield the
failure shown below them.
The business concern on the facing page could be stated something like this.
“Provide the job description and department name for all accountants working in departments
having budgets over $50,000.00”
Same
Join
Note in the example below that the key word INNER is optional. Also note that the
number of join conditions is the number of tables minus 1 and that best practice,
whether aliasing or not, is to always qualify, whether required or not, to match columns
to tables.
Same
Join
Note that the relationship between an employee’s name and the name of their manager is
between separate rows within the very same table. In order to display such information it is
necessary to join the table to itself as shown. Such a join is referred to as a self-join. Self joins
can be somewhat of a challenge for even experienced SQL coders. It is mainly the join condition
that poses such a challenge. At least one version of the table must be aliased or a error occurs as
shown below. The many things that are wrong with the following query should become obvious.
SELECT Last_Name,
First_Name,
Last_Name,
First_Name
FROM Employee JOIN Employee
ON Manager_Employee_Number = Manager_Employee_Number;
*** Failure 3868 A table or view without alias appears more than once in FROM clause.
Display the last name and first names of employees along with the last name and
first names of their managers for those working in departments 201 and 301.
The facing page illustrates what happens when the join relationship is many-to-many. In such a
case, unintended result rows appear in the final result set. In the example, let’s assume that
employee 100 works only in department 30 (not department 55), since the join is on manager
number this result set will show that person working in both department 30 and department 55.
Certainly not in the real circumstance. Other employees (employees 400, 500, and 600) share
the same fate. It would be difficult, if not impossible, to view the result set and know who truly
works in which department.
Employee Department
Result
Emp# Dept# Mgr# Dept# Mgr#
20 100 Emp# Dept#
100 30 300 100 30
200 10 400 30 300 100 55
55 300 400 30
400 55 300
400 55
500 30 300 90 500
500 30
600 95 500 95 500 500 55
Subquery form:
SELECT Employee_Number,
First_name,
FROM Employee
WHERE Department_Number IN
(SELECT Department_Number FROM Department);
Note that you may only rewrite a join as a subquery if you are only projecting
columns from one table!
SELECT Last_Name,
First_Name
FROM Employee
WHERE Manager_Employee_Number NOT IN
(SELECT Manager_Employee_Number FROM Department);
The NOT IN subquery would have no issue with obtaining the result intended here.
Find employees whose managers are not department managers.
SELECT Employee_Number,
First_name,
FROM Employee e JOIN Department d
ON e.Manager_Employee_Number <> d.Manager_Employee_Number;
17 rows total
As noted on the facing page, since no join condition exists, the database invents one for us,
whether we are pleased with it or not. The condition of “WHERE 1=1” always evaluates true.
Thus, you can read the row for employee “Smith” as “Project the employee number and last
name of this employee for each row in the department table where 1=1 is true”. The result is
to project these column values (from the “Smith” row) for each department row. The same thing
happens all over again for each employee row. As a different example, the following query
would return the result shown.
SELECT e.Employee_Number,
e.Last_Name,
d.Department_Number,
d.Manager_Employee_Number
FROM Employee e CROSS JOIN
Department d;
Result
Employee Department
100 Smith
Emp# Last_Name Dept# Mgr# 1 100 Smith
1 100 Smith 20 100 100 Smith
Project the
200 Jones 30 300 column values 200 Jones
2
where 1=1 is true. 2 200 Jones
3 400 Adams 55 300
200 Jones
400 Adams
3 400 Adams
400 Adams
SELECT Last_Name
WHERE Employee.Department_Number =
Department.Department_Number;
On the facing page, the join condition references a table called “Department” which is not
referenced in a FROM clause, just like the previous example on this page. Also, a table called
“Dept” (in a “FROM” clause alias) has no join condition. The optimizer will interpret this as a
cross join between (likely) Dept and Department, the result of which will then be joined to
Employee (in the immediate example). Ouch!
• Be careful! Do not alias a table and then use the name instead of the alias.
• In the examples below, the first one will fail due to a syntax error (ANSI 92).
• The second will cause a 4-table join, one of which is a self join between Dept
(the aliased Department table) and Department!
SELECT Employee.Last_Name
FROM Employee AS Emp;
Or
SELECT Employee.Last_Name
FROM Department;
In each case, there are two (2) tables involved. As stated on the previous page, cross joins will
be performed by the database to make this happen.
Both forms of joins cause bad self joins when referring to the table
name in the select list instead of the alias!
• Subqueries and inner joins can both return inner result sets.
• Incorrect table and column references can cause incorrect result sets.
• Inner joins can not return outer (NOT IN) result sets as can subqueries.
True or False:
1. For inner joins, each FROM clause requires an ON clause for join
conditions.
False – Only the explicit form requires an ON clause
2. Referencing a WHERE clause is invalid for the explicit form of inner join.
False – A WHERE clause may be need for adding residual conditions
3. Many-to-many relationships are allowed with inner joins.
True
4. When performing a self join, table aliasing is required.
True – You may not reference the same table name with creating ambiguities
5. Inner join syntax requires at least one qualifying join column.
False – A WHERE clause may be need for adding residual conditions
6. The explicit form of inner join can reject some uses of incorrect
qualifications.
True – But only in the ON clause and not in the project list
7. The implicit form of inner join is not ANSI standard.
False – Both forms are ANSI standard
1. List all employees by name, the name of their department, their original
salary, and salary again with a ten percent increase, for those working
in departments with budgets > $40,000.00. Make the last and first name
10 characters each and use the implicit form of inner join.
2. Find the department names and employee names for employees that
have both an “i” and an “e” in their last name. Make the last and first
name 10 characters each and use the explicit form of inner join.
Optional
4. Write a cross join that lists all possible combinations of first names
and last names from employee.
Outer Join
Having said all of this, outer join can be the most difficult to write. Much has yet to be learned.
1 2 3
Emp Dept
Outer joins can retrieve both the matching (inner) and non-matching (outer) rows.
Employees and the Departments with people
departments in which they assigned to them (2) plus
OR
work (2), plus those with departments with no
unmatched departments (1). assigned people (3).
“If an employee has a department number that does not (or will not) match one in the
department table, what should be projected for their department name (or any other
department column)?”
This question gets to the very heart of outer joins. The simple answer to this question is that the
data base will project a null for each and every column from the inner table that doesn’t match
the join criteria.
• Inner joins retrieve only the INNER (matching join condition) result sets.
• Outer joins retrieve both the INNER and OUTER (non-matching) result sets.
• For OUTER JOIN, you can write one that returns the following:
Employees with valid departments
Employees with invalid departments
(i.e. all employees)
What would an outer join project for department name of employees in department
numbers 300 and 400, which don’t exists in the department table?
NOT IN
Employee Department
Last Department Department
Department
Name Number Number
Name
Jones 100 (Unique)
IN
On the facing page we see the result of an inner join union’ed with that for the NOT IN. The
inner join can project columns for both tables, but since we cannot return the non-matching
(outer) results, we union these with the NOT IN to return the outer results as well. Note how it
projects a null for the column we are unable to project from the inner table found in the
subquery. You may now begin to understand how important it is to know and understand the
terms referenced from earlier modules.
Also, note how NULL (in the subquery) has been defined as character to match that for the
department name. The default data type for null is integer. This can be obtained by using the
TYPE function, like this SELECT TYPE(null);
Since the data type for null is different from that for its corresponding column from the first
SELECT, without the explicit conversion, an implicit conversion would have been performed for
the null causing a mismatch in data types. This would have then failed the query.
You can now contrast this syntax to that of the previous query. Note that, in the second query,
the subquery actually determines which table is inner and which is outer, while in the first query
it is the LEFT keyword that determines the outer table and, hence, by process of elimination,
which is the inner table.
Vs.
SELECT e.Last_Name,
e.Department_Number,
d.Department_Name
FROM Employee e, Department d
WHERE e.Department_Number = d.Department_Number
UNION ALL
SELECT Last_Name,
Department_Number,
NULL (CHAR(30))
FROM Employee
WHERE Department_Number NOT IN
(SELECT Department_Number FROM Department);
These both return the same result set, but have different perspectives.
One more thing to note is that the key word “OUTER” is optional. That is to say, there is no
such thing as a RIGHT or LEFT or FULL inner join.
Notice that this result is exactly the same as that show earlier for using the UNION ALL
technique. No surprises here. Between the two this one will perform better and is generally
easier to write. Keep in mind that this example is a simple one and that outer joins can become
much more complex! Try to become familiar with this syntax and use it instead of the UNION
ALL. The UNION ALL was shown simply to help us understand terms and concepts, and not
meant as an alternative for practical usage.
In the result set a null is returned for an inner table column for an inner result and does not
indicate an outer result! If the department name column were defined as NOT NULL, then a null
value for department name could not exist, and so as null for this column in the result set can
now be interpreted as an outer result row.
SELECT e.Last_Name,
d.Department_Number,
e.Department_Number
FROM Employee e LEFT OUTER JOIN Department d
ON e.Department_Number = d.Department_Number
AND Department_Name LIKE ‘%support%’
WHERE e.Last_Name = ' smith '
OR e.Last_name = ' brown ';
In general, qualifications on the columns of the inner table don’t make sense. Such
qualifications might make sense if the query were an inner join since the result contains results
only for matching conditions, where all column values, from each table, are available for
referencing, In the outer join the inner table columns values may not be available for
qualification in that they may be null, and, hence, not available. Such usage would be equivalent
to inner joins.
• Areas 1 and 2
• Areas 2 and 3
• Areas 2 and 3
• Areas 1 and 2
1 2 3
SELECT e.Last_Name, d.Department_Name
FROM Department d LEFT JOIN Employee e Emp Dept
ON e.Department_Number = d.Department_Number;
Left Table
(Outer Table in this example)
SELECT e.Last_Name,
e.First_Name, Right Table
D.Department_Name (Inner Table in this example)
FROM Employee e Inner Join Condition
LEFT [OUTER] JOIN
Department d Inner Join Search Condition
ON e.Dept# = d.Dept# (A residual condition
AND d.Dept_Name LIKE '%support%' to the inner result.)
WHERE Salary < 50000
Outer Search Condition
(A residual condition
to the outer result.)
Also recall the rules for placing parenthesis. With or without the parenthesis the query returns
the same result. How they can be placed was discussed in the module on inner joins.
SELECT e.Last_Name,
d.Department_Number,
j.Job_Code
FROM ((Employee e JOIN
Department d
ON e.Department_Number = d.Department_Number)
JOIN
Job j
ON e.Job_Code = j.Job_Code);
• The result of the first outer join (between Employee and Department) is obtained first.
Employee is the outer table in this join.
• The result of the first outer join then gets joined to the Job table. The outer table from the
previous join (Employee) is maintained as the outer table in the second join due to its
placement as the LEFT table.
These two queries return equivalent results because Employee remains the outer-
most table.
There are many other ways in which one may write this and get the same result.
SELECT e.Last_Name,
d.Department_Name,
j.Description
FROM ( ( Employee e LEFT JOIN
Department d
ON e.Department_Number = d.Department_Number )
LEFT JOIN
Job j
ON e.Job_Code = j.Job_Code )
SELECT e.Last_Name,
d.Department_Name,
j.Description
FROM ( ( Department d RIGHT JOIN
Employee e
ON e.Department_Number = d.Department_Number )
LEFT JOIN
Job j
ON e.Job_Code = j.Job_Code )
These two outer join queries also return the same result.
SELECT e.Last_Name,
d.Department_Name,
j.Description
FROM (Department d RIGHT JOIN
(Job j RIGHT JOIN
Employee e
ON e.Job_Code = d.Job_Code )
ON e.Department_Number = d.Department_Number );
SELECT e.Last_Name,
d.Department_Name,
j.Description
FROM (Job j RIGHT JOIN
(Department d RIGHT JOIN
Employee e
ON e.Department_Number = d.Department_Number )
ON e.Job_Code = j.Job_Code );
• Outer joins share a similar syntax to the explicit form of inner join.
• Outer join results can be obtained by using UNION with inner join and
NOT IN.
• There are many ways one can write outer joins to achieve the same
result.
• Teradata uses only the ANSI standard syntax for outer join.
True or False:
1. All outer joins require use of either LEFT, RIGHT or FULL keywords.
True
2. Outer joins can return more rows that can inner joins.
True
3. Nulls returned from the inner table mean the result row is an outer
result.
False – Only for the join column, or if the column is defined as NOT NULL
4. The use of a WHERE clause is not allowed in an outer join.
False – WHERE can be used for writing residual conditions
5. The use of an ON clause is required when writing an outer join.
True
6. The keyword OUTER is required when writing outer joins.
False
7. The FULL outer join returns LEFT and RIGHT outer results.
True – It also returns inner results
1. From the employee and department tables, list employee last names,
first names, the department names and the employees department
numbers only, for all employees. Compare this to the number of rows
returned by the inner join.
2. For #1, include the department number from the department table to
the projection to see which rows are actually outer results.
3. From the employee, department, and job tables, list employee last
names, first names, and the join columns from all three tables for all
employees having salaries between $34,000.00 and $58,000.00. Do any
of these employees have both an invalid department and invalid job?
Correlated Subqueries
To understand the business question, start from the outer table and work in.
Subquery processing:
• Resolve bottom-most level query first
• Use results as input to next level up
• Continue until top level reached
SELECT last_name
FROM employee Outer Table
WHERE department_number IN
(SELECT department_number
FROM department Inner Table
WHERE department_name
LIKE ('%research%'));
Correlated subqueries:
• are subqueries.
• can only project from outer table.
• have the inner table “inter-connected” to outer table via a join condition
(correlated).
• are considered a row-at-a-time process as opposed to a set process.
• most typically references EXISTS and NOT EXISTS.
As it turns out, the database can return a result set according to this logic quite efficiently! The
database is not restricted to performing in the manner defined, it is only necessary to return the
result by following this defined process. The result differs from that of the subquery for mainly
the NOT IN (vs. NOT EXISTS) rather than for IN (vs. EXISTS). This will be discussed later in
the module.
• Correlated subqueries provide an implicit loop function within any standard SQL
DML statement.
• The logic defining the relational processing is different that the internal database
processing.
Becomes
SELECT Dept#
FROM Dept D
WHERE NOT EXISTS
(SELECT * FROM Emp E
WHERE E. Dept # = D. Dept #);
SELECT Manager_Employee_Number
,Department_Number
,Job_Code
FROM Employee ee
WHERE
NOT EXISTS
(SELECT *
FROM Department d
WHERE ee. Department_Number = d. Department_Number)
AND
NOT EXISTS
(SELECT *
FROM Job j
WHERE ee.Job_code = j.Job_Code);
Correlated subqueries:
True or False:
Aggregation
They produce sums, averages and counts as well as minimum and maximum
values.
[ SUM ]
[ AVERAGE |
AVG ]
[ MINIMUM |
( expression )
MIN ]
[ MAXIMUM |
MAX ]
[ COUNT ]
SumSal
SELECT SUM(Salary_Amount) AS SumSal FROM Employee; ------------
1102050.00
The query is intended to illustrate how COUNT(*) can be used to derive an average, and how
this average would differ from that for a typical aggregation, which is also replicated via a
derivation as well.
3
Sum(c1) Count(c1) Count(*) AvgFunc ByCount ByStar
----------- ----------- ----------- ----------- -----------
1 -----------
240 4 6 60 60 40
To find the total salary for all employees in each department you could do this.
However:
• A separate SELECT is required for each department.
• Impractical for large numbers of departments.
An example of the last bullet follows. The result differs from the facing page due to the change
in the grouping.
SELECT SUM(Salary_Amount)
FROM Employee
GROUP BY Department_Number, Manager_Employee_Number;
Sum(salary_amount)
------------------
37850.00
201800.00
38750.00
91200.00
57700.00
58700.00
66000.00
24500.00
134125.00
52500.00
31200.00
100000.00
207725.00
The HAVING clause can reference a column either by its alias or by its actual aggregation as it
appears in the projection. HAVING that attempts to reference by numeric position interprets the
number used as a literal and not as a column position. So the following clause would always be
false, and no rows would be returned. An EXPLAIN will reveal this fact.
In this example, the value “2” is treated as a literal. Since the condition of “1 > 100000” can
never be true, no rows are ever returned. Observe the EXPLAIN plan.
SELECT Department_Number,
SUM(Salary_Amount) AS SumSal
FROM Employee
GROUP BY 1
HAVING 2 > 100000;
We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan with a
condition of ("2 > 100000") into Spool 1 (group_amps), which is built locally on the AMPs. The
size of Spool 1 is estimated with no confidence to be 15 rows (660 bytes). The estimated time
for this step is 0.02 seconds.
The EXPLAIN plan should read like the one for this query, referencing a “field”.
SELECT Department_Number,
SUM(Salary_Amount) AS SumSal
FROM Employee
GROUP BY 1
HAVING SumSal > 100000;
We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan with a
condition of ("Field_3 > 100000.00") into Spool 1 (group_amps), which is built locally on the
AMPs. The size of Spool 1 is estimated with no confidence to be 15 rows (660 bytes). The
estimated time for this step is 0.02 seconds.
department_number Sum(salary_amount)
----------------- ------------------
401 245575.00
HAVING, like WHERE and GROUP BY, may reference column names or alias names.
SELECT department_number AS d#, SUM(Salary_Amount) AS SumSal
FROM Employee
GROUP BY d#
WHERE d# IN (401, 402)
HAVING SumSal > 100000;
The steps outlining when certain clauses get executed is important. It implies that the HAVING
clause can also reference a non-aggregate value as well. We will show an EXPLAIN plan of this
on the next page.
At the right is a partial list showing the order in which certain clauses take
place during a query’s execution.
The answer to the questioned posed at the bottom of the facing page is that it would fail because
the aggregation has not been performed until after the WHERE conditions have been satisfied.
. . . .
3) We do an all-AMPs SUM step to aggregate from DLM.Employee by way of an all-rows scan
with no residual conditions, grouping by field1 ( DLM.Employee.department_number).
Aggregate Intermediate Results are computed globally, then placed in Spool 3. The size of
Spool 3 is estimated with no confidence to be 15 rows (555 bytes). The estimated time for this
step is 0.03 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan with a
condition of ("(department_number = 401) OR (department_number = 402)") into Spool 1
(group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no
confidence to be 15 rows (615 bytes). The estimated time for this step is 0.02 seconds.
. . . .
The value for “SumSal” represents the two employees in department 402 that has a null
department name in the department table. Observe the result if this query were an outer join
instead
department_name SumSal
------------------------------ ------------
education 233000.00
research and development 116400.00
customer support 245575.00
? 306950.00
marketing sales 200125.00
The two employees that have a null department number (new hires?).
department_name SumSal
------------------------------ ------------
education 233000.00
research and development 116400.00
customer support 245575.00
? 77000.00
marketing sales 200125.00
The row with the null department name represents the sum of the salaries for those
employees working in a department that does not have a name for it in the
department table.
“No. As a subquery you may only project columns from the outer table.”
List employee information for those whose salary is greater than their
department average.
The following example shows how to find those department managers with salaries greater than
their department average. Except for the “>” symbol and “AVG” (instead of the “=” and
“MAX”), it is identical to the one on the facing page.
True or False:
1. Display the salary sums, by job code within department, for all
employees who work for manager 1003, 1004, and 1017. Order by job
code within department.
2. Find the average budget amount for each department, and another
average that includes a 50% budget increase.
CASE
“CASE is typically used to improve performance of a query when you can reduce multiple
passes of a table to just a single pass.”
As far as the terms themselves go, they are normally seen while reading reference material, or
when taking a class like this one.
CASE:
Specifies alternate values for a conditional expression or expressions
based on various equality and TRUTH conditions.
3731: The user must use IS NULL or IS NOT NULL to test for NULL values.
SELECT Last_Name,First_Name,
CASE Last_Name
WHEN 'Brown' THEN CASE First_Name
WHEN 'Alan' THEN 'Allan'
WHEN 'Allen' THEN 'Alen'
END
WHEN 'Trainer' THEN 'Ethel'
END
FROM employee
WHERE Last_Name IN ('brown', 'trainer');
It sometimes helps if you break up a query to better understand it. For instance.
SELECT
CAST (
SUM (
CASE department_number
1 WHEN 401 THEN salary_amount
4 2
ELSE 0
END)
/ SUM(salary_amount) 3
AS DECIMAL(2,2)
)
AS Sal_Ratio
FROM employee;
NULLIF returns NULL if its arguments are equal, otherwise, it returns its first argument.
Without NULLIF:
SELECT job_code AS Job Job Rate
----------- --------
,hourly_billing_rate AS Rate
104202 .00
FROM job 104201 .00
WHERE job_code < 200000; 111100 .00
With NULLIF:
Find the ratio of hourly billing rate to hourly cost rate for all "analyst" jobs.
Without NULLIF:
SELECT description
,hourly_billing_rate / hourly_cost_rate AS "Billing to Cost Ratio"
FROM job
WHERE description like '%analyst%';
With NULLIF:
SELECT description
,hourly_billing_rate / (NULLIF(hourly_cost_rate, 0))
AS "Billing to Cost Ratio"
FROM job
WHERE description LIKE '%analyst%';
COALESCE returns NULL if all its arguments evaluate to null, otherwise, it returns the
value of the first non-null argument in the list.
Without COALESCE:
SELECT budget_amount budget_amount
FROM department -------------
WHERE department_number = 600; ?
With COALESCE:
SELECT COALESCE(budget_amount,0) <CASE expression>
FROM department ------------------
WHERE department_number = 600; .00
One other item worth mentioning is that CASE (which include COALESCE and NULLIF), can
potentially project multiple data types into a single projected column. Each different value for its
data type will occupy the proper character space. Character values being left justified and
numbers being right justified within their respective space..
Using COALESCE:
SELECT last_name
,COALESCE( office_phone,
cell_phone,
pager_number,
home_phone,
fax_number,
'No Number Found') AS Phone Number
FROM phone_table;
Using CASE:
SELECT last_name,
CASE WHEN office_phone IS NOT NULL THEN office_phone
WHEN cell_phone IS NOT NULL THEN cell_phone
WHEN pager_number IS NOT NULL THEN pager_number
WHEN home_phone IS NOT NULL THEN home_phone
WHEN fax_number IS NOT NULL THEN fax_number
ELSE 'No Number Found'
END AS Phone Number
FROM phone_table;
Determine the result for each select below given the data at the right.
• There are two basic form for using CASE: Valued and Searched.
• The valued form is based on equality and can only reference a single
column or expression.
• The searched form can reference more than one column or expression
based on other than equality.
True or False:
Where their salary amount is null, make it equal to their job code BEFORE
DOING THE CHANGE.
Verify that, where different averages occur, the zero-to-null salary averages
should be larger.
While the dictionary is write locked on an object (e.g. a table, a view etc), attempts by the parser
to “resolve” the object for concurrent accesses block on the “write” lock, preventing those
queries from being parsed until the DDL is finished. This is normally an issue with explicit
transaction processing and not with implicit transaction processing.
Changes to dictionary tables require a write lock, and can block the
database's attempts to access this locked information during parsing.
CREATE < SET/MULTISET > TABLE tablename, < Table Level Attributes >
( column name < Column Level Data Types and Attributes >
. . . )
< Primary and Secondary Index Level Attributes >;
The defaults are SET (in Teradata mode) and MULTISET (in ANSI mode).
The concept of a duplicate row is a very important one in database theory, and case sensitivity
plays an important role in this discussion. Since Teradata is, by default, not case sensitive,
uppercase vs. lowercase characters, for the same character, evaluate as being the same character
value. While as with case specific-ness, uppercase vs. lowercase characters evaluate as being
different for the same character value. The biggest concern between the two, however, is beyond
the scope of this class and has to deal with the following question:
“Why allow duplicate rows at all?”
As SQL goes, we simply discuss SET and MULTISET concepts and syntax, not strategies.
The default for Teradata Mode tables is “SET”. (No duplicate rows
allowed.)
The default for ANSI Mode tables is “MULTISET”. (Duplicate rows allowed.)
A duplicate row is where each and every column value for one row is equal
to it’s corresponding column value in another row.
For values that are defined as “case sensitive”, uppercase values differ
from corresponding lower case values for the same character value.
For values defined as “not case sensitive” equal character values are the
same whether upper case or lowercase.
Last First
Dept#
Name Name If both character columns are not case
100 'Smith' 'Mary' sensitive, which rows would be duplicates?
100 'Smith' 'Mary ' If both character columns are case
sensitive, which rows would be duplicates?
100 'smith' 'Mary'
100 'Smith' 'mary'
REFERENCES_OPTION IS
REFERENCES [WITH [NO] CHECK OPTION] tname [(cname [..., cname])]
[ { partitioning_expression } ]
[ { } ]
[ PARTITION BY { (partitioning_expression, } ]
[ { partitioning_expression } ]
[ { [..., partitioning_expression]) } ]
For indexes:
• Only one primary index allowed per table.
For a more complete list see
• Up to 32 secondary indexes are allowed per table.
the left-hand page.
• Up to 64 columns are allowed per index.
• Indexes may be unique or non-unique.
The drop command, at the bottom of the facing page, removes not only the rows, but the table
definition as well. It performs slightly slower than the preceding delete commands because of
the amount of additional time that it takes to remove the dictionary entries.
To remove all data associated with a table, as well as the table structure
definition from the Data Dictionary, use the DROP TABLE statement.
Indexes may be provided with a name. This may be useful for dropping it later, by name. It may
be easier, perhaps for some, to just simply replace the “CREATE” keyword (used when creating
the index) with the “DROP” keyword. In this case, if the index is unique, do not provide the
“UNIQUE” keyword. Examples are shown on the facing page.
Unique? Y
HELP INDEX emp_data; Primary//or//Secondary? P
Column Names employee_number
HELP INDEX shows Index Id 1
Approximate Count 0
information on all indexes Index Name Emp_Key
defined for a table.
Unique? N
Primary//or//Secondary? S
The values in the Index Id Column Names department_number
column correlate to the Index Id 4
Approximate Count 0
index numbers referenced Index Name ?
in EXPLAIN text.
Unique? Y
Primary//or//Secondary? S
Column Names last_name,first_name
Index Id 8
Approximate Count 0
Index Name Full_Name
Unique? N
Primary//or//Secondary? S
Column Names job_code
Index Id 12
Approximate Count 0
Index Name ?
One of the nice things about this strategy is that one can actually display the average salary,
which can’t be accomplished via a subquery. The bad thing about this strategy is that you need
to create the table and then drop it after using it. This requires more steps, and it requires DDL,
which is not terrific.
• The strategy below can be used to return the result for the business concern
shown.
• Correlated Subqueries can return the result, but without projecting the average
salary!
• Both queries are cross joins because no equality condition exists for the join
condition.
• This table will eventually have to be dropped.
Find employees whose salaries are greater than their department average.
Derived tables -
• are “database created” tables that are only available to a single query.
• are discarded by the database when they are no longer required.
• are materialized into spool.
• are referenced and treated as any “real” table.
• must be defined by the author of the query, and require -
a table name
columns and their names
a SELECT that is used to populate the table
SELECT Last_Name,
Salary_Amount,
AvgSal
FROM Employee e,
(SELECT AVG(Salary_Amount)
FROM Employee) AS AvgT (AvgSal)
WHERE e.Salary_Amount > AvgT.AvgSal;
SELECT Last_Name,
Salary_Amount,
AvgSal
FROM Employee e JOIN
(SELECT AVG(Salary_Amount) AS AvgSal
FROM Employee) AvgT
ON e.Salary_Amount > AvgT.AvgSal;
Of course, there are all of the necessary one-to-many join needed for correctly obtaining the
result.
• Department number is unique in department (one-to-many to employee)
• Department number is unique in AvgT (one-to-many to employee)
Show the department name for those having a salary larger than their department
avg.
SELECT d.Department_Name, e.Last_Name, e.Salary_Amount, AvgT.AvgSal
FROM Employee e,
Department d,
(SELECT Department_Number, AVG(Salary_Amount) AS AvgSal
FROM Employee
GROUP BY 1) AS AvgT
WHERE e.Department_Number = d.Department_Number
AND e.Department_Number = AvgT.Department_Number
AND e.Salary_Amount > AvgT.AvgSal
ORDER BY 1;
True or False:
Notice that the facing page carefully states that the samples are considered to be random. It
should be understood that nothing is “truly” random. With this in mind, there are two different
methods the database can use for retrieving samples from tables: AMP proportional and a more
randomized allocation, either of which can be specified by the user as will be shown in this
module.
One may also use the sample feature to retrieve multiple samples within the same projection.
Each individual sample may be associated with a unique sample identification number that is
generated by the database for purpose of relating rows with their associated sample.
Another consideration that may be specified is whether-or-not the rows, once chosen for the
sample, will be replaced into the source and be made available for re-sampling (therefore
occurring again within the sample) or whether they will not be made available and, hence, will
occur no more than once within the sample.
By default (i.e. this can be altered via SQL), the sample is generated AMP
proportional, so that each AMP is responsible for a proportional fraction of the
rows in the sample.
• The sample is “AMP Proportionally” – that is, the database seeks to have each AMP
contribute a proportional share of the sample.
• The sample is performed “without replacement” – that is, no column values from any
specific row will appear more than once within the sample.
The fact that the employee numbers are unique reinforces the notion of “no replacement.” No
employee number should appear more than once within the sample.
employee_number employee_number
--------------- ---------------
1001 0.25 * 26 = 6.5 1003
1002 1004
1003 Fractional results greater 1006
1004 than .4999 generate an 1007
1006 added row. 1011
1011 1016
1014 1019
1016
1019
1024
The default of “no replacement” is still in effect in that no row will appear within the same
sample more than once – moreover – no row will appear in more than one sample! Think of “no
replacement” as meaning that once a row is select for appearing in a sample, it does not get
“replaced” back into the set, so it is not available for being selected again by the same sample.
“No replacement” also means that rows cannot appear across samples within the same
projection. The employee numbers being unique reinforce this notion, but so, also, does the fact
that we run out of values for sampling! There are only 26 employees in our table. Since the total
number of rows for all 3 samples gets exhausted during the final 10 row sample (only 6 rows are
left), the database returns the remaining 6 rows along with a warning describing the event.
It should also be noticed that one cannot mix percentage samples with whole number samples in
the same projection.
When sampling with replacement, a sampled row, once sampled, is returned to the
sampling pool. As a result, a row might be sampled multiple times.
employee_number SampleId
--------------- -----------
Note the replacement both within and 1001 1
across samples. 1004 1
1012 1
SELECT Employee_Number, SAMPLEID 1013 1
FROM Employee 1014 1
SAMPLE WITH REPLACEMENT 10, 10 1016 1
ORDER BY 2, 1; 1018 1
1019 1
1019 1
1025 1
1001 2
1002 2
1002 2
1004 2
1005 2
1012 2
1014 2
1016 2
1019 2
1019 2
So far, the chart illustrates that one may write a single SQL request, for a single projection, that
includes:
• Many joins and other WHERE conditions
• Aggregations with or without a HAVING condition
• Samples
• A specified ordering
• Formatting
For all such requests, these features will be performed in the order described in the chart. That
is:
• The WHERE condition will restrict the number of qualifying rows which will participate
in the activities following it in the list.
• Aggregation will be performed on the qualified WHERE result.
• HAVING is performed on the aggregate result.
• SAMPLES are obtained on what results from the earlier steps.
• ORDER BY orders the (final?) result.
• Formatting is performed on the final spool as it gets returned to the user. (i.e. the final
spool is not formatted.)
The following rules apply to SAMPLE. Where SAMPLE falls into the
order of operations.
1. No more than 16 samples can be
requested per fraction description or 1. WHERE {join conditions}
count description. 2. AGGREGATION
2. A sampled result set cannot be 3. HAVING
guaranteed to be repeated. 4. SAMPLE
5. ORDER BY
3. Sampling can be used in a derived table, 6. FORMAT
view (discussed in a later module), or
INSERT-SELECT to reduce the number of
rows to be considered for further
computation.
4. You cannot use a SAMPLE clause in a
subquery.
5. You cannot specify the SAMPLE clause
in a SELECT statement that uses the set
operators UNION, INTERSECT, or MINUS.
The facing page shows how understanding the order of operations for a request can help one
determine why, exactly, a request returns the result that it does. In the example on the facing
page, the result is the same result as for the aggregation! Since the aggregation returns a single
row result, the SAMPLE, which follows in the order of operations, only has one row from which
to sample. As we have learned earlier in this module, requesting a 10 row sample from a one
row set (the aggregation result) results in a warning.
*** Warning: 7473 Requested sample is larger than table rows. All rows returned
Here we see a derived tables used to force an aggregation to be performed prior to requesting a
sample.
• The derived table result must be obtained first, to create the desire set of data.
• The sample is then performed on the result “derived” by the “derived table.”
Each time one submits the request to the database a (potentially) different result occurs.
Example:
A retail application might divide a customer population into subgroups
composed of customers who pay for their purchases with cash, those who
pay by check, and those who buy on credit.
SELECT Last_Name,
Department_Number AS Dept#
SAMPLEID
FROM employee
SAMPLE
WHEN department_number < 401 THEN 2, 2, 2
WHEN department_number < 501 THEN 3, 1
ELSE 2, 2
END last_name Dept# SampleId
ORDER BY 3, 1, 2; ----------- -------- -----------
Short 201 1
Stein 301 1
Kanieski 301 2
The reference to “ELSE” is optional. Trainer 100 2
Kubic 301 3
Note that replacement can not occur: Morrissey 201 3
• Within any level across samples. Crane 402 4
e.g. across 1, 2 and 3 Daly 402 4
• Across levels. Phillips 401 4
e.g. No crossing level 1 (samples1, Hopkins 403 5
2 and 3) with level 2 (samples 4 and Rabbit 501 6
5) or level 3 (samples 6 and 7) Runyon 501 6
Ratzlaff 501 7
Wilson 501 7
SELECT Last_Name,
Department_Number AS Dept#,
SAMPLEID
FROM Employee
SAMPLE WITH REPLACEMENT
WHEN Dept# < 402 THEN .25, .25 last_name Dept# SampleId
WHEN Dept# < 501 THEN .50, .50 ---------- ------- -----------
ELSE .25 Hoover 401 1
END Johnson 401 1
ORDER BY 3, 1, 2; Phillips 401 1
Stein 301 1
Hoover 401 2
Johnson 401 2
Note that replacement can occur:
Phillips 401 2
• Within any level across samples.
Stein 301 2
e.g. across 1 and 2
Brown 403 3
Note that replacement can not occur: Crane 402 3
• Across levels. Hopkins 403 3
e.g. No crossing level 1 (samples1 Lombardo 403 3
and 2) with level 2 (3 and 4) or level 3 Brown 403 4
(sample 5) Crane 402 4
Hopkins 403 4
Lombardo 403 4
Ratzlaff 501 5
The default row allocation method is proportional. This means that the requested rows are
allocated across the AMPs as a function of the number of rows on each AMP. This method is
much faster than randomized allocation, especially for large sample sizes. Because proportional
allocation does not include all possible sample sets, the resulting sample is not a simple random
sample, but it has sufficient randomness to suit the needs of most applications.
Note that simple random sampling, meaning that each element in the population has an equal and
independent probability of being sampled, is employed for each AMP in the system regardless of
the specified allocation method.
One way to decide on the appropriate allocation method for your application is to determine
whether it is acceptable to “stratify” (not to be confused with “stratified” sampling of data rows)
the sampling input across the AMPs to achieve the corresponding performance gain, or whether
you need to consider the table as a whole.
Randomized allocation means that the requested rows are allocated across the
AMPs by simulating random sampling (without regard for AMP proportionality).
The result is not discernibly different than that for the default.
employee_number SampleId
--------------- -----------
1002 1
1010 1
1012 1
1014 1
1001 2
1005 2
1012 2
1012 2
• RANDOM can only be called in one of the following SELECT query clauses:
o WHERE
(Using RANDOM as a WHERE condition is much like that for obtaining a
sampled percentage. Due to the nature of the RANDOM function, however, it
cannot guarantee the requested percentage, e.g. using the following to obtain a
66% sample.)
o GROUP BY
e.g. like this only:
SEL SUM(salary_amount)
FROM employee
GROUP BY RANDOM(1, 6);
o ORDER BY
e.g. like this only:
SEL last_name
FROM employee
ORDER BY RANDOM(1,6);
o HAVING
e.g. like this only:
SEL SUM(salary_amount)
FROM employee
HAVING RANDOM(1, 6) > 2;
INSERT t1 (RANDOM(1,10),...)
In this example, RANDOM causes an error to be reported in this case if the first column
in the table is a primary index or partitioning column.
Both limits must be specified and both must be of data type integer.
• RANDOM may be used in a SELECT list or a WHERE clause, but not both.
• The SAMPLE feature can be used to “randomly” return rows for various
business purposes.
True or False:
1. Return two 15-row samples from the employee table. Anything less
than 15 rows per sample is unacceptable. Project last and first names
plus hire dates and birth dates.
3. Using the same sampling as #2, SUM the salaries for each sample.
TOP N
Syntax:
TOP { [ INTEGER | DECIMAL ] } [ PERCENT ] [ WITH TIES ]
The following options cannot appear in a SELECT statement that specifies the TOP N operator:
• DISTINCT option
• QUALIFY clause
• SAMPLE clause
• WITH clause (WITH/BY – see Appendix A)
• ORDER BY clause where the sort expression is an ordered analytical function
• Sub-selects of set operations
You cannot specify the TOP N operator in any of the following SQL statements or statement
components:
• Correlated subquery
• Subquery in a search condition
• CREATE JOIN INDEX statement
• CREATE HASH INDEX statement
• Seed statement or recursive statement in a CREATE RECURSIVE VIEW statement
or WITH RECURSIVE clause
The following options cannot appear in a SELECT statement that specifies the TOP
option:
• DISTINCT option
• QUALIFY clause
• SAMPLE clause
• WITH clause (See appendix)
• ORDER BY clause where the sort expression is an ordered analytical
function
• Sub-selects of set operations
The QUALIFY clause with the RANK or ROW_NUMBER ordered analytical functions (from
the Advanced SQL class) returns the same results as the TOP N operator.
The following is an excerpt from the SQL reference manual. It is placed here for reference only,
and only to show another SQL construct capable of providing similar results.
SELECT TOP 10 *
FROM sales ORDER BY county;
SELECT *
FROM sales QUALIFY ROW_NUMBER() OVER (ORDER BY COUNTY) <= 10;
SELECT *
FROM sales QUALIFY RANK() OVER (ORDER BY county) <= 10;
For best performance, use the TOP option instead of the QUALIFY clause with RANK or
ROW_NUMBER. In best-case scenarios, the TOP option provides better performance; in worst-
case scenarios, the TOP option provides equivalent performance.
department_number budget_amount
------------------ ----------------------
401 982300.00
403 932000.00
301 465600.00
100 400000.00
501 308000.00
The WITH TIES option only applies to those values that are tied at the bottom of the list. This is
true whether the order is ascending or descending. In our example it can be deduced that there
are no more values of $308,000.00 because the WITH TIES option will return all values at the
bottom that are the same. Without WITH TIES one can never be certain no more values exists.
department_number budget_amount
SELECT TOP 5 WITH TIES ----------------- -------------
401 982300.00
department_number
403 932000.00
,budget_amount 301 465600.00
FROM department 100 400000.00
ORDER BY 2 DESC; 501 308000.00
402 308000.00
• This is the same output as the example using TOP 5 WITH TIES.
• Each is counted as a separate row for the top six.
• By default, rows with the same amount are counted as separate rows toward
the total.
• If the WITH TIES option had been specified in this example, how would the
result set differ?
department_number budget_amount
SELECT TOP 6 WITH TIES ----------------- -------------
401 982300.00
department_number
403 932000.00
,budget_amount 301 465600.00
FROM department 100 400000.00
ORDER BY 2 DESC; 501 308000.00
402 308000.00
• ORDER BY ASC reverses the ranking sequence and shows the bottom rankings.
• Two rows with the same salary are treated as two rows of output.
The TOP N function can be used effectively to quickly select rows from a table.
Get the top 45 percent of department budgets and allow for ties.
department_number budget_amount
SELECT TOP 45 PERCENT WITH TIES ----------------- -------------
401 982300.00
department_number
403 932000.00
, budget_amount 301 465600.00
FROM department 100 400000.00
ORDER BY 2 DESC; 402 308000.00
501 308000.00
Recall that a percent may be any integer or decimal number between 1 and 100.
• The WITH TIES option can be used to return all rows at the bottom of an
ordered list that share the same value.
True or False:
1. List the top 5 salaries amount VALUES in the employee table along with
the last names and first name of the employee.
Performing ordered analytical computations at the SQL level rather than through a higher level
OLAP calculation engine provides four distinct advantages.
• Reduced programming effort.
• Elimination of the need for external sort routines.
• Elimination of the need to export large data sets to external tools because ordered
analytical functions enable you to target the specific data for analysis within the
warehouse itself by specifying conditions in the query.
• Marked enhancement of analysis performance
The use of Teradata-specific functions is strongly discouraged. These functions are retained only
for backward compatibility with existing applications.
This module will focus only on a specific type of Window – the Group Window
You need not directly code SQL queries to take advantage of ordered analytical functions. Both
Teradata Database and many third-party query management and analytical tools have full access
to the Teradata SQL ordered analytical functions. Teradata Warehouse Miner, for example, a
tool that performs data mining preprocessing inside the database engine, relies on these features
to perform functions in the database itself rather than requiring data extraction.
Teradata Warehouse Miner includes approximately 40 predefined data mining functions in SQL,
based on the Teradata SQL-specific functions. For example, the Teradata Warehouse Miner
FREQ function uses the Teradata SQL-specific functions CSUM, RANK, and QUALIFY
to determine frequencies.
Is the default for this expression, meaning an empty pair of parentheses accomplishes the very
same thing.
,COUNT(salary)
OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
It should be of no surprise to anyone to see that the window aggregates follow the very same
rules as do the standard aggregate functions. This is to say, all nulls are ignored.
Of the four available window types, only the GROUP window is taught in this module. The
others windows will be taught in the following module. It is in these groups that the ORDER
BY becomes important. The ORDER BY can be an important consideration from a performance
related point of view. This will be discussed later in this module.
1. WHERE
2. AGGREGATION
3. HAVING
4. OLAP { [ PARTITION BY ]
[ ORDER BY ] [ rows ] }
5. QUALIFY [ ORDER BY ]
6. RANDOM
7. SAMPLE | TOP N
8. ORDER BY
9. FORMAT
QUALIFY, on window aggregates, works much the same way as does HAVING on
standard aggregates.
The standard ORDER BY (shown) occurs in step 7 of the order operations (also
shown). Notice its affect on the result.
1. WHERE
2. AGGREGATION
SELECT last_name AS Name
3. HAVING
,salary_amount AS Salary
4. OLAP { [ PARTITION BY ]
,department_number AS Dept
[ ORDER BY ] [ rows ] }
,SUM(Salary) OVER (PARTITION BY Dept)
5. QUALIFY [ ORDER BY ]
FROM Employee
6. RANDOM
WHERE Department_Number IN (301, 501)
7. SAMPLE | TOP N
ORDER BY 1;
8. ORDER BY
9. FORMAT
The standard ORDER BY (shown) occurs in step 8 of the order operations (also
shown). Notice its affect on the result.
Note that this example contains a single ordering instead of two as found on the previous page.
The net result, however, is the same.
1. WHERE
2. AGGREGATION
According to our order of operations, 3. HAVING
this window aggregate result is not 4. OLAP { [ PARTITION BY ]
available for WHERE conditioning. [ ORDER BY ] [ rows ] }
5. QUALIFY [ ORDER BY ]
6. RANDOM
SELECT last_name AS Name 7. SAMPLE | TOP N
,salary_amount AS Salary 8. ORDER BY
9. FORMAT
,department_number AS Dept
,SUM(Salary) OVER ( ) AS TotSum SELECT last_name AS Name
FROM Employee ,salary_amount AS Salary
WHERE Dept = 401 ,department_number AS Dept
AND totsum / salary > 3 ,SUM(Salary) OVER ( ) AS TotSum
ORDER BY 3, 2; FROM Employee
*** Failure 5479 Ordered Analytical WHERE Dept = 401
Functions not allowed in WHERE Clause. QUALIFY totsum / salary > 3
ORDER BY 3, 2;
DeptNbr SumSal
----------- ------------
401 213275.00
501 200125.00
403 193500.00
Next, the window aggregate is performed on this result next (shown without the QUALIFY):
• Like aggregation, they use SUM, COUNT, MIN, MAX and AVG.
• Unlike aggregation, they retain the detail data of each row.
• Can be partitioned into groups.
• Can be ordered.
• Can be used with QUALIFY
• Occur after aggregation in the order of operations.
True or False:
1. Of the four Windows, this module only discussed the GROUP Window.
True
2. With the Group Window, ORDER BY, in the OVER, will not change result
values.
True
3. In the OVER clause, ORDER BY must be after PARTITION, if both are used.
True
4. PARTITION and GROUP BY may both be present within the same
projection.
True – they may be alone or together or not present at all
5. QUALIFY need not reference a projected value.
True
6. HAVING must reference a projected value.
False
7. PARTITION may return a null value.
True
1. From the “SalesTbl”, list the store id, product id, sales, for each row,
include with each projected row the sum of the sales for each product
across all stores and order by store id within product id.
• PARTITION BY
• ORDER BY
• ROWS [BETWEEN]
• QUALIFY
• Cumulative Windows
• Moving Windows
• Remaining Windows
• RESET WHEN
• Moving Differences
• The preceding module taught how the use of "UNBOUNDED" determined the group window.
• The expression for a Group Window is:
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
• A GROUP includes either all rows in a PARTITION, or all rows in the result.
• ORDER BY, inside the window expression, did not affect the aggregate values within the
group.
There are 4 different windows.
1. Group Window
2. Cumulative Window
3. Moving Window Covered in this module.
4. Remaining Window
SELECT
ItemID
,SalesDate
,Sales
,SUM(Sales) OVER (ORDER BY SalesDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM SalesHist
WHERE itemid IN (4, 6);
Also note that, unlike the GROUP window, ORDER BY plays an extremely important role in a
cumulative window. A different ordering provides an entirely different result. By different
results, we mean different values for the very same table rows. Consider the following, where
the ordering is changed to item id and sales date.
SELECT
ItemID
,SalesDate
,Sales
,SUM(Sales) OVER (ORDER BY itemid, SalesDate ROWS
UNBOUNDED PRECEDING)
FROM SalesHist
WHERE itemid IN (4, 6);
Without an ORDER BY clause, the default ordering would be each of the un-aggregated values
within the previous one in ascending order.
SELECT ItemID
,SalesDate
,Sales
,SUM(Sales) OVER (PARTITION BY itemid
ORDER BY SalesDate
ROWS UNBOUNDED PRECEDING)
FROM SalesHist
WHERE itemid IN (4, 6);
SELECT itemid
,salesdate
,sales
,SUM(sales) OVER (PARTITION BY itemid
ORDER BY salesdate
ROWS 2 PRECEDING)
FROM saleshist
WHERE salesdate BETWEEN DATE'2008-05-24' AND DATE'2008-05-31';
Provide a 3-day moving sum for the sales of item 1 for the last week.
Project a 2-day moving average of the prior 2 days onto each row for comparison.
We could have added the following QUALIFY to limit the result to just the rows for just the
week of concern. The result of including this QUALIFY will be shown on the next page.
Find the daily differences of sales from one week to the next for item 1.
SELECT salesdate, ((((salesdate) - (DATE'1901-01-06')) MOD 7 ) + 1 ) AS DayOfWeek, sales,
sales - MIN(sales) OVER (ORDER BY salesdate
ROWS BETWEEN 7 PRECEDING AND 7 PRECEDING) AS Diff
FROM saleshist
WHERE itemid = 1
AND salesdate BETWEEN DATE'2008-05-25' AND DATE'2008-06-07'
The WHERE condition helps performance by having only the rows for the required 2 weeks
participate in the aggregate operation since it is applied in step-1. The QUALIFY limits the
result, from the predicate, to only the week of concern.
Find the daily differences of sales from one week to the next for item 1.
The very same result could have been obtained using a cumulative window. As a matter of fact,
in a normal query it is likely that the default heading would be replaced with an alias name.
Since it is the default heading that indicates this result to be a remaining window result, without
it no one would know the difference between this result and the result of a cumulative window
ascending. An ORDER BY 1 DESC would add a bit of sense to this query.
Sum each sales amount with all of the ones following it for 2008-05-25 to 2008-05-28, by item.
This feature is an enhancement to the Teradata windows aggregate functions. “RESET WHEN”,
has been added to the window function that adds a dynamic condition to the query. This feature
provides applications with an easy-to-use method of creating “conditional partitions” as part of
window aggregate processing based on a user specified RESET WHEN condition
At run time, during the evaluation of the row, the RESET WHEN condition is evaluated. If the
condition is TRUE a new partition is created for statistical function evaluation. This feature adds
more dynamic handling than does the window PARTITION BY clause, which is more limited in
terms of the kind of analysis that could be performed on the data. RESET WHEN is added to the
ORDER BY clause as used in analytic functions.
Limitations
This feature imposes following limitations for its functionality:
• There must be an ORDER BY specification in the WINDOW function
• The RESET WHEN condition cannot have a SELECT clause.
• Nested RESET WHEN condition is not permitted, i.e. RESET WHEN clause in a
window function that is sub-expression in a RESET WHEN condition, is not supported.
Considerations:
• RESET is not a reserved word
• The RESET WHEN condition is equivalent in its scope to the condition allowed in
QUALIFY clause
• This is a Teradata extension to the ANSI SQL standard for windows aggregate functions.
True or False:
1. Write a window aggregate that provides a rank of salary amounts for all
employees using a cumulative window.
RANK
It should be noted that RANK can operate on character data values as well as numeric, though it
is like a rare instance where this would be needed.
In the previous module, we saw how a Cumulative Count Window could act as
method for ranking values.
What we didn't discuss is what happens for tied values.
Here is how the COUNT strategy works when ties occur.
SELECT itemid
,sales
,COUNT(itemid) OVER (ORDER BY sales DESC ROWS UNBOUNDED PRECEDING)
FROM saleshist
WHERE salesdate = date '2008-05-24'
itemid sales Cumulative Count(itemid)
----------- ------------ ------------------------
5 690.00 1
4 562.00 2
8 489.00 3
6 465.00 4
7 449.00 5
2 449.00 6
10 383.00 7
1 375.00 8
3 309.00 9
9 271.00 10
As we stated on the previous page, a tied value gets the same rank value, and here we see the
correct result for a RANK. Below, notice that this function works quite differently in that the
window aggregates that we have seen earlier. Here we do not put a value into the rank function.
To be more precise, we are not allowed to place anything into the function (i.e. RANK(sales) is
not allowed with the window aggregate version of rank). Instead, what determines the ranking is
the ORDER BY used in the OVER portion of the syntax. The default for RANK is ASC.
SELECT itemid
,sales
,RANK() OVER (ORDER BY sales DESC)
FROM saleshist
WHERE salesdate = date '2008-05-24';
SELECT itemid
,sales
,RANK( ) OVER (ORDER BY sales DESC) AS "Rank"
FROM saleshist
WHERE salesdate = date '2008-05-24'
QUALIFY "Rank" < 4;
Note that the reference to the alias, in the qualify, must include the double-quotes.
SELECT itemid
,sales
,RANK() OVER (ORDER BY sales DESC) AS "Rank"
FROM saleshist
WHERE salesdate = date '2008-01-01'
QUALIFY "Rank" < 7;
SELECT itemid
,sales
,RANK( ) OVER (ORDER BY sales DESC) AS "Rank"
FROM saleshist
WHERE salesdate = date '2008-01-01'
QUALIFY "Rank" < 6;
Because there is a tie for the last (5th itemid sales Rank
ranked) value, there are 2 rows in ----------- ------------ -----------
the result for that value. 4 690.00 1
5 690.00 1
8 489.00 3
Note that there are 6 ranked values 6 465.00 4
altogether. 7 449.00 5
2 449.00 5
The extra rows are only projected
when there is a tie for the very last What will the next ranking value be?
value.
SELECT itemid
,sales
FROM saleshist
WHERE salesdate = date '2008-05-24'
QUALIFY RANK( ) OVER (ORDER BY sales DESC) < 4;
Show the bottom 3 selling items across all dates and all items.
SELECT itemid
,sales
,RANK( ) OVER (ORDER BY sales ASC) AS "Rank"
FROM saleshist
QUALIFY "Rank" < 4;
Perhaps keeping in mind the order of operations makes this easier to understand.
1. WHERE
2. AGGREGATION
3. HAVING
4. OLAP { [ PARTITION BY ] [ ORDER BY ] [ rows ] }
5. QUALIFY [ ORDER BY ]
6. SAMPLE | TOP N
7. ORDER BY
8. FORMAT
Watch what happens if you attempt to shortcut this as shown in the following.
SELECT itemid
,sales
,RANK() OVER (ORDER BY sales DESC) AS "Rank"
FROM saleshist
QUALIFY “Rank” ASC < 4;
Show the bottom 3 selling items across all dates and all items
showing descending rank value.
Show the top 3 selling items for each day for a 3-day period, without
showing their rank value.
SELECT itemid
,salesdate
,sales
FROM saleshist
WHERE salesdate BETWEEN DATE'2008-05-24' AND DATE'2008-05-27'
QUALIFY RANK( ) OVER (PARTITION BY salesdate
ORDER BY sales DESC) < 4;
itemid salesdate sales
----------- --------- ------------
5 08/05/24 690.00
This is based upon what we should 4 08/05/24 690.00
already know about using Partition 8 08/05/24 489.00
only now with Rank. 1 08/05/25 549.00
10 08/05/25 522.00
9 08/05/25 474.00
5 08/05/26 729.00
8 08/05/26 629.00
4 08/05/26 548.00
8 08/05/27 729.00
7 08/05/27 674.00
2 08/05/27 586.00
SELECT Last_Name,
ROW_NUMBER( ) OVER (ORDER BY Last_Name),
RANK( ) OVER (ORDER BY Last_Name)
FROM Employee
WHERE Department_Number IN (401, 302);
SELECT BirthDate,
MIN(Salary_Amount) OVER (ORDER BY Birthdate DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS
PrevSal,
Salary_Amount,
SUM(Salary_Amount) OVER (ORDER BY Birthdate DESC
RESET WHEN Salary_Amount IS NULL OR Salary_Amount <
PrevSal
ROWS UNBOUNDED PRECEDING) AS Growth
FROM Employee
WHERE birthdate BETWEEN DATE'1976-01-01' AND DATE'1983-01-01';
Vs.
SELECT SalesAmt,
ROW_NUMBER( ) OVER (ORDER BY SalesAmt)
AS rownum,
COUNT(*) OVER ( ) AS rowcount
FROM saleshist2
QUALIFY rownum = (rowcount + 1)/2 Lower median value if even or middle median if odd.
OR rownum = (rowcount/2) + 1; Higher median value if even or middle median if odd.
True or False:
2. Rank all sales from the “salestbl” showing only the bottom three sales
amounts along with their bottom ranking values.
QUANTILE
Although the use of QUANTILE is discouraged in favor of deriving the result in favor of using
know ANSI functionality, it is easier to use the function since it can intuitively perform some of
the more complex variations of this capability, and with less syntax. It is a Teradata extension to
the ANSI SQL-2003 standard and is retained only for backward compatibility with existing
applications.
Since “n” is 100, this is an example of using the Quantile function to derive
percentiles.
Quantile is an OLAP function and can be used with QUALIFY to return qualified
result sets.
Show the salaries for those employee in department 401 whose salary is in the top
20th percentile for that department.
As with qualify, you need not project the quantile value when using qualify,
Show the salaries for those employee in department 401 whose salary is in the top
20th percentile for that department.
employee_number salary_amount
--------------- -------------
1010 46000.00
Sum(sals)
The quantile result is placed into a derived table.
------------
The sum is then performed on the quantile result.
387825.00
Note:
SELECT department_number,
SUM(salary_amount) OVER () AS Sumsal,
QUANTILE(100, sumsal)
FROM employee;
*** Failure 5480 Ordered Analytical Functions cannot be
nested.
Trying to get the previous result in the same manner as a Window Aggregate will
fail.
QUANTILE is a Teradata extension that is mutually exclusive to both Window
Aggregates as well as standard aggregation.
*** Failure 5478 Aggregates are allowed only with Window Functions.
Unlike Window Aggregates, Quantile is a non-ANSI function that uses the facilities
of GROUP BY for Partitioning.
Get the employees who represent the top 25% of their respective department by salary.
SELECT Department_Number,
Salary_Amount, Note that the GROUP BY
QUANTILE (100, salary_amount) AS Quant need not reference all
FROM employee non-aggregates as with
WHERE Department_Number IN (401, 403, 501) aggregation.
QUALIFY QUANTILE(100, salary_amount) >=75
GROUP BY Department_Number; GROUP BY operates as
does PARTITION with
Window Aggregates.
department_number salary_amount Quant
----------------- ------------- -----------
401 46000.00 85 This makes Quantile
403 49700.00 83 mutually exclusive to
501 66000.00 75 aggregation.
True or False:
“Extended Grouping” functions share the same 5 syntax constructs and also summarize data but,
like standard aggregation, they lose detail. They are, therefore, more closely related to standard
aggregation than are window aggregates. For instance, for these functions all selected non-
aggregates must be part of the associated grouping construct. We say “grouping construct”
(instead of GROUP BY) because, being an “extended grouping” feature, we use, extend the
capabilities of the “GROUP BY” by adding more to the syntax so that it looks like the following:
• GROUP BY ROLLUP
• GROUP BY CUBE
• GROUP BY GROUPING SETS
Suppose that we would like to also obtain a grand total for the 5 departments shown on the
facing page? The extended grouping functionality would allow us to obtain this in addition to
the data shown.
In another example, suppose that we would like to further summarize the following, which
retrieves a summary of salary amount by manager and department, so that it also includes a sum
by manager, or a sum by department, or both – and with grand totals? Extended grouping
functions can be used to do this as well.
Produce a total of salaries by department for department numbers less than 402.
The question mark represents a total line (in this case for all departments). If it weren’t for the
fact that we are doing a rollup of salaries, this can be confused with a result where the “?” might
actually be the sum of salaries for a null department. Later we will look at ways of changing this
value to something more descriptive.
The ROLLUP function is used when aggregation is desired across all levels of a
hierarchy within a single dimension (with rollup, this single dimension has to do
with the aspect of “direction”).
Note, below, that we have an extended grouping capability – “GROUP BY ROLLUP”.
As in aggregation,
“All selected non-aggregates must be part of the associated group.”
SELECT department_number AS DeptNum Note that the '?' does not represent a
,SUM(salary_amount) AS SumSal null department number, it represents
FROM employee the 'total' of all department salaries.
WHERE department_number < 402
GROUP BY ROLLUP (department_number) The GROUPING function (discussed
ORDER BY 1; later) will allow us to differentiate the
two.
DeptNum SumSal
----------- ------------
? 591925.00 Total sum of all groups is added.
100 100000.00
201 73450.00
301 116400.00 Result of normal aggregation (i.e., GROUP BY)
302 56500.00
401 245575.00
The query below is that of one performing standard aggregation. In both queries, all selected
non-aggregates must be part of the associated group! And yes, the parentheses are valid, though
rarely used.
Sometimes a situation arises where there are both null groups and total lines.
Distinguishing between the two could become necessary.
SELECT department_number,
SUM(salary_amount)
FROM employee
GROUP BY ROLLUP (department_number)
ORDER BY 1;
department_number Sum(salary_amount)
----------------- ------------------
? 129950.00 In this example you can determine which
? 948050.00 is which by process of elimination.
301 58700.00
401 213275.00
402 52500.00
403 193500.00
501 200125.00
999 100000.00
Note:
• That the heading of “Deptno” is left justified, signifying a character field.
• That, within this character field, there are both numeric (right-justified) and character
(left-justified) values.
• That numbers sort before characters.
As one gets to work with this functionality one will begin to see the attention to detail that is
required to carefully choose values that sort in a logical manner, so that, for instance, grand total
lines sort last, or that (seen later) subtotal lines sort IMMEDIATELY after the group they are
subtotaling.
The GROUPING function distinguishes rows with nulls from rows with aggregates.
GROUPING returns:
• A “0” (zero) if the actual data for the column is null.
• A “1” (one) if it represents a total or subtotal value for the column.
The result set for this query is provided on the following page.
By changing ROLLUP to CUBE, we add the extra rollup level of Dept Totals”, this providing all
rollups for the group, including the normal aggregate result, and including the grand total.
Here is what happens to a CUBE of one level. The result should look familiar to to as one
shown earlier for ROLLUP of one level.
Mgr SumSal
----------- ------------
? 271975.00
801 37850.00
1003 175425.00
1019 58700.00
For the query on the facing page, we are requesting sums for only department and manager. We
will not return department-manager sums nor will we return a grand total.
Remember: “All selected non-aggregates must be accounted for as being in the GROUP BY.”
True or False:
Views
• Create a view.
• Use a view to provide secure access to data.
• Drop or modify a view.
• List several reasons for using views.
They can be used to filter the data of the table so that the referencing user can only see the
columns projected in the view. The view could also perform a reformatting the table data for
providing a different look and feel.
It is virtual in that no physical data really exists for the view. Hence, no indexes may be created
on them.
It can also be considered logical because it can alter the table’s appearance from many different
viewpoints without affecting the actual data found in the table that it references.
Views can be extremely helpful in eliminating the need for a referencing user to be involved with
writing complex SQL since this complexity can be embedded into the view.
Finally, a view may be also considered to be a “derived table” in that the spool generated from
the view can be used to replace the spool generated by a functionally equivalent “derived table”
discussed earlier in this course.
Result:
Copy this DDL to some editor (like notepad) and make the changes needed, including changing
“CREATE” to “REPLACE”.
3. Submit.
SELECT last_name
,first_name Who is employee 1002 and in
,department_name which department do they
FROM emp_dept work?
WHERE employee_number = 1002;
Views may be joined together whether simple views like these or more complex
views that contain joins themselves.
This view can be used to provide formatting for use with SQL Assistant.
This view will work fine with BTEQ as well, however, the conversions strategies are
not necessary for it.
SELECT *
FROM SQLA_View;
The facing page illustrates how this can be accomplished through the use of views that perform
the initial aggregate result.
last_name salary_amount
-------------------- -------------
Trainer 100000.00
Runyon 66000.00
Kubic 57700.00
Rogers 56500.00
Ratzlaff 54000.00
Wilson 53625.00
Daly 52500.00
Villegas 49700.00
Rogers 46000.00
Brown 43700.00
True or False:
Global Temporary table usage is such that “Temp” space can survive a restart, and can,
therefore, be used in a scripted application that is run unattended on a scheduled basis. Global
Temporary table are significantly more complex that are Volatile temp tables and are typically
considered more a concern of application development than a SQL one.
Derived tables also use spool, and, as such, do not survive a database restart, not do they survive
a request failure. Volatile and Global temp tables can be created to survive request failures but,
by default, do not.
Views.
• Local to a query
• Uses Spool
• May be replaced with derived tables.
Derived Tables
• Local to the query
• Incorporated into SQL query syntax
• Discarded when query finishes
• No Data Dictionary involvement
• May be replaced with views.
Volatile Tables
• Local to a session (are available to all queries during the session)
• Uses CREATE VOLATILE TABLE syntax
• Discarded automatically at session end
• No Data Dictionary involvement
Notice that the WITH form is structured completely “upside-down” from its counterpart.
“WITH” form:
• Definition appears at the top of the query, prior to the SELECT portion.
“FROM” form:
• Definition appears in the FROM portion, after the SELECT portion.
If you frequently reuse particular volatile table definitions, consider writing a macro that contains
the CREATE TABLE text for those volatile tables. Because volatile tables are private to the
session that creates them, the system does not check their creation, access, modification, and
drop privileges. Any user that has spool can create them. Understand that by holding on to the
spool for the life of the session, the user has less spool available to them for other queries.
Volatile table always use spool directly from the creating user’s spool definition. It is for this
reason that, if you specify another database name it will fail. If, however, your default database
is set to some database other than your user name, and you don’t qualify the database name in
the create (taking your default), the creation of the volatile table is successful because the default
is ignored.
The default would also include that is a SET table with LOG on. A defaulted primary index
would also result in the first column of the table being a NUPI (non-unique primary index).
CREATE VOLATILE TABLE vt_deptsal1 HELP DATABASE command does not show
( deptno SMALLINT, VT’s and they do not appear in the Explorer
avgsal DECIMAL(9,2), Tree window in SQLAssistant.
maxsal DECIMAL(9,2),
minsal DECIMAL(9,2),
sumsal DECIMAL(9,2),
empcnt SMALLINT );
Since the default is ON COMMIT DELETE ROWS, and the request is an implied commit, the
moment the rows are inserted they are deleted due to the commit.
Since the commit is withheld until an ET statement is encountered, the rows will be deleted on
our terms, and then the table will become empty.
BT;
INSERT INTO vt_deptsal (1, 2, 3, 4, 5, 6);
SELECT * FROM vt_deptsal;
deptno avgsal maxsal minsal sumsal empcnt
------ ----------- ----------- ----------- ----------- ------
1 2.00 3.00 4.00 5.00 6
ET;
SELECT * FROM vt_deptsal; The default of ON COMMIT DELETE
ROWS deleted the rows immediately
*** Query completed. No rows found.
after the ET;
This would work the same for ANSI mode explicit transactions.
BT;
INSERT INTO vt_deptsal (1, 2, 3, 4, 5, 6);
SELECT * FROM vt_deptsal;
deptno avgsal maxsal minsal sumsal empcnt
------ ----------- ----------- ----------- ----------- ------
1 2.00 3.00 4.00 5.00 6
ET;
SELECT * FROM vt_deptsal;
deptno avgsal maxsal minsal sumsal empcnt
------ ----------- ----------- ----------- ----------- ------
1 2.00 3.00 4.00 5.00 6
The SELECT may include derived data values or involve any number
of features and functions as expressions.
There is no typing shortcut facility for explicitly inserting multiple rows with this form.
Updating may also be done in bulk. In this case “bulk” means more than one row. An example
of a bulk operation would be, for instance, if the WHERE condition were to be removed from
this update so that all rows would be updated. Another example of a bulk operation is provided
in the next discussion on the following page.
The FROM clause is optional. Optional keywords are often referred to as being “noise”
(something that can be disregarded or ignored – easy enough to say I suppose).
UPDATE
EMPLOYEE SET Last_Name = DEFAULT
WHERE Salary_Amount = DEFAULT;
*** Failure 3811 Column 'last_name' is NOT NULL. Give the
column a value.
UPDATE
EMPLOYEE SET Department_Number = DEFAULT
WHERE Salary_Amount = DEFAULT;
EMPLOYEE
MGR
EMP EMP DEPT JOB LAST FIRST HIRE BIRTH SAL
UPDATE modifies NUM NUM NUM CODE NAME NAME DATE DATE AMT
one or more rows PK FK FK FK
1006 1019 301 312101 Stein John 761015 531015 2945000
in a single table. 1008 1019 301 312102 Kanieski Carol 770201 580517 2925000
1005 0801 403 431100 Ryan Loretta 761015 550910 3120000
1004 1003 401 412101 Johnson Darlene 761015 460423 3630000
1007 1005 403 432101 Villegas Arnando 770102 370131 4970000
1003 0801 401 411100 Trader James 760731 470619 3785000
EMPLOYEE
MGR
EMP EMP DEPT JOB LAST FIRST HIRE BIRTH SAL
NUM NUM NUM CODE NAME NAME DATE DATE AMT
PK FK FK FK
1006 1019 301 312101 Stein John 761015 531015 2945000
1008 1019 301 312102 Kanieski Carol 770201 580517 2925000
1005 0801 403 431100 Ryan Loretta 761015 550910 3120000
1004 1005 403 432101 Johnson Darlene 761015 460423 3630000
1007 1005 403 432101 Villegas Arnando 770102 370131 4970000
1003 0801 401 411100 Trader James 760731 470619 3785000
Using a subquery:
UPDATE employee
SET salary_amount = salary_amount * 1.10 Using a correlated subquery:
WHERE department_number IN
(SELECT department_number UPDATE employee e
FROM department SET salary_amount = salary_amount * 1.10
WHERE department_name LIKE '%Support%'); WHERE department_number =
(SELECT department_number
FROM department d
WHERE e.department_number =
Using an inner join: d.department_number
AND department_name LIKE '%Support%');
UPDATE employee [ FROM Department ]
SET salary_amount = salary_amount * 1.10
WHERE employee.department_number =
department.department_number
AND department_name LIKE '%Support%';
You may also use the DEFAULT keyword in a delete like this.
EMPLOYEE
MGR
DELETE removes one or EMP EMP DEPT JOB LAST FIRST HIRE BIRTH SAL
more rows from a table. NUM NUM NUM CODE NAME NAME DATE DATE AMT
PK FK FK FK
1006 1019 301 312101 Stein John 761015 531015 2945000
1008 1019 301 312102 Kanieski Carol 770201 580517 2925000
1005 0801 403 431100 Ryan Loretta 761015 550910 3120000
1004 1003 401 412101 Johnson Darlene 761015 460423 3630000
1007 1005 403 432101 Villegas Arnando 770102 370131 4970000
1003 0801 401 411100 Trader James 760731 470619 3785000
Remove employees in
DELETE FROM employee
department 301 from
WHERE department_number = 301;
the employee table.
• There are two forms for writing derived tables: WITH and FROM.
• Views, Derived, Global and Volatile tables are all examples of temporary
instance objects.
• Volatile tables last until the end of the user’s session logoff , when the
database drops them automatically.
True or False:
2. Create another volatile table that averages the salary amounts for each
department. Now issue a HELP command to verify the existence of the
two volatile tables that you created. Do SHOW TABLE on one of them
and then drop the first one and repeat the earlier HELP command.
True or False:
True or False:
True or False:
True or False:
True or False:
True or False:
1. The FLOAT data type has more precision than does a decimal data type.
False – float only has 15 digits of precision.
2. Character data types can not be converted to a numeric data types.
False
3. FORMAT 'd2' is a valid formatting option.
False – for 2-digit day formatting “dd” must be used.
4. The expression 'a ' = 'A ' evaluates true.
True 3 spaces 10 spaces
5. You can use the CAST function to change a data type or to format results.
True
6. The comma “,” is a valid formatting character.
True
7. The formatting character “9” may be used to display leading or trailing
zeroes.
False – it can only display leading zeroes.
True or False:
True or False:
1. The INTERSECT operator returns the same result as does the MINUS
operator, however MINUS is a Teradata extension.
False – MINUS and EXCEPT are equivalent (though MINUS is an extension)
2. The following is valid for a set operator ORDER BY Last_Name
False – you must order by a positional number
3. The ALL option may potentially return more rows than if not using it.
True
4. Set operators may cause truncation among corresponding columns of a
result sets.
True
5. An INTERSECT is just another way of returning an inner result.
False
6. If all three different set operators are referenced in a query, the UNION is
performed first.
False – the INTERSECT is first
7. “SELECT *” is a valid projection in a set operation.
True – as long as the numbers of columns projected among projections
remains constant
True or False:
True or False:
1. For inner joins, each FROM clause requires an ON clause for join
conditions.
False – Only the explicit form requires an ON clause
2. Referencing a WHERE clause is invalid for the explicit form of inner join.
False – A WHERE clause may be need for adding residual conditions
3. Many-to-many relationships are allowed with inner joins.
True
4. When performing a self join, table aliasing is required.
True – You may not reference the same table name with creating ambiguities
5. Inner join syntax requires at least one qualifying join column.
False – A WHERE clause may be need for adding residual conditions
6. The explicit form of inner join can reject some uses of incorrect
qualifications.
True – But only in the ON clause and not in the project list
7. The implicit form of inner join is not ANSI standard.
False – Both forms are ANSI standard
True or False:
1. All outer joins require use of either LEFT, RIGHT or FULL keywords.
True
2. Outer joins can return more rows that can inner joins.
True
3. Nulls returned from the inner table mean the result row is an outer
result.
False – Only for the join column, or if the column is defined as NOT NULL
4. The use of a WHERE clause is not allowed in an outer join.
False – WHERE can be used for writing residual conditions
5. The use of an ON clause is required when writing an outer join.
True
6. The keyword OUTER is required when writing outer joins.
False
7. The FULL outer join returns LEFT and RIGHT outer results.
True – It also returns inner results
True or False:
True or False:
True or False:
True or False:
True or False:
True or False:
True or False:
1. Of the four Windows, this module only discussed the GROUP Window.
True
2. With the Group Window, ORDER BY, in the OVER, will not change result
values.
True
3. In the OVER clause, ORDER BY must be after PARTITION, if both are used.
True
4. PARTITION and GROUP BY may both be present within the same
projection.
True – they may be alone or together or not present at all
5. QUALIFY need not reference a projected value.
True
6. HAVING must reference a projected value.
False
7. PARTITION may return a null value.
True
True or False:
True or False:
True or False:
True or False:
True or False:
True or False:
3. Try each of the following in the order shown, and note if it fails by looking at the
bottom-left portion of the utility screen. For those that fail, “double-click” the
“Notes” field for the failed request in the “History Window”.
HELP DATABASE;
3707: Syntax error, expected something like a name or a Unicode delimited identifier between
the 'DATABASE' keyword and ';'.
DATABASE yourusername;
The default database should be set to your logon and be reflected at the top of the screen in the
center banner.
DATABASE Employee_sales;
The default database should be set to Employee_sales and be reflected at the top of the screen
in the center banner.
Exercise 4 is hands-on.
Try the various drag-and-drop methods outlined in the module.
2. Request a report of employee last and first names and salary for all of
manager 1019's employees. Order the report in last name ascending
sequence.
3. Project a distinct list of job codes which have been assigned to people and are greater
than 510000 and sort the result descending.
job_code
-----------
512101
511100
last_name first_name
-------------------- ------------------------------
Brown Allen
Brown Alan
5. How many people have been assigned job codes greater than or equal to 510001?
(since aggregation has not been taught yet you will have to manually count
them? Or can SQL Assistant tell you?)
SELECT *
FROM Employee
WHERE Job_Code >= 510001;
The history window shows that 4 rows have been returned for this query.
last_name department_number
-------------------- -----------------
Kanieski 301
Kubic 301
Ratzlaff 501
Hoover 401
Rogers 401
Wilson 501
Phillips 401
Machado 401
Rabbit 501
Johnson 401
Stein 301
Trader 401
Brown 401
Runyon 501
2. Project the last names of employees whose salary is greater than or equal to
$28,078.
SELECT Last_Name
FROM Employee
WHERE Salary_Amount >= 28078
ORDER BY Last_Name;
last_name
--------------------
Brown
Brown
Daly
Hopkins
Johnson
Kanieski
Lombardo
Morrissey
Ratzlaff
Rogers
Rogers
Runyon
Ryan
Short
Stein
Trader
Trainer
Villegas
Wilson
last_name department_number
-------------------- -----------------
Brown 401
Hoover 401
Johnson 401
Kanieski 301
Kubic 301
Machado 401
Phillips 401
Rabbit 501
Ratzlaff 501
Rogers 401
Runyon 501
Stein 301
Trader 401
Wilson 501
4. Modify #4 to show only those whose salary amounts are between $50,000 and
$60,000.
last_name department_number
-------------------- -----------------
Ratzlaff 501
Wilson 501
2. Using an IN list, display employees with any of the following job codes: 412101,
412109, NULL.
4. List employee with unassigned job codes that have salaries between 30K and 40K.
1. Find and list employees first and last names for employees where their last name
begins with either “R”, “S” or “T”. (Do this without regard to case sensitivity.)
Another solution:
first_name last_name
------------------------------ --------------------
Peter Rabbit
Larry Ratzlaff
Frank Rogers
Nora Rogers
Irene Runyon
Loretta Ryan
Michael Short
John Stein
2. Write a request that will show the salary amount for the people identified in #1 if they
were given a 10% increase in salary that gave them a salary > 50K.
3. Project new employee job codes (from the Employee table) for all those job codes
ending in 101, increasing them by 100. Include last names, job codes, department
numbers to make help verify results.
FullName
----------------------------------------------------
Brown, Allen
Phillips, Charles
Ratzlaff, Larry
2. Repeat #1. Replace your WHERE Clause, using LIKE to only list employees who have
an "LL" combination in their last name.
FullName
----------------------------------------------------
Phillips, Charles
Villegas, Arnando
3. Using POSITION, change #2 to also include last names having an “FF” combination in
their last name.
FullName
------------------------
Phillips, P.
Ratzlaff, R.
Villegas, V.
1. For those employees who work in departments 301 or 401, remove those whose
salary is less than $35,000.00 and order this result by last name and then first
name.
2. Use UNION to combine the results of #1 with those who earn more than $10,000.00.
Alias last name to LNM and first name to FNM.
4. Using a SET operator, change #1 to find those who satisfy both the department and
salary conditions.
1. Write a subquery that finds employees who are not employee managers. (i.e. Not
managers in the employee table.)
19 rows
2. Edit #1 to find employee who are neither employee managers, nor department
managers.
Extra tough
4. Write a nested subquery that finds employees whose managers are department
managers that are not managers in the employee table.
No rows found
No rows found
1. List all employees by name, the name of their department, their original salary, and
salary again with a ten percent increase, for those working in departments with
budgets > $40,000.00. Make the last and first name 10 characters each and use the
implicit form of inner join.
2. Find the department names and employee names for employees that have both an
“i” and an “e” in their last name. Make the last and first name 10 characters each
and use the explicit form of inner join.
3. Use POSITION to list department names that have people working in them whose
job description has the word “sales” in it. List the employee names as well.
Optional
4. Write a cross join that lists all possible combinations of first names and last
names from employee.
1. From the employee and department tables, list employee last names, first names, the
department names and the employees department numbers only, for all employees.
Compare this to the number of rows returned by the inner join.
SELECT d.department_number,
CAST(e.last_name AS CHAR(10)),
CAST(e.first_name AS CHAR(10)),
d.department_name
FROM employee e LEFT JOIN department d
ON e.department_number = d.department_number
ORDER BY 1, 2;
last_name first_name
-------------------- ------------------------------
Brown Allen
Charles John
Hopkins Paulene
Lombardo Domingus
Short Michael
Villegas Arnando
1. Display the salary sums, by job code within department, for all employees who
work for manager 1003, 1004, and 1017.
SELECT department_number,
AVG(budget_amount) AS avgbudget,
CAST(Avgbudget * 1.5 AS DEC(15,2))
FROM department
GROUP BY 1
ORDER BY 1;
4. Count the number of distinct manager numbers and distinct departments from the
employee table.
CDM# CDD#
----------- -----------
7 6
1. Display employee information you deem necessary to compare salary changes for
the people in their respective departments as shown in the chart below.
Where their salary amount is null, make it equal to their job code BEFORE DOING THE
CHANGE.
Verify that, where different averages occur, the zero-to-null salary averages should
be larger.
Note that “zero to null” averages should be equal to or larger than “null to zero” values
since ignoring nulls means division by smaller counts that ignore nulls.
Since there are no salary amounts of zero (0), the last average should match the second
average.
Since there are null salary amounts, the first average should be equal to or less than the
other two averages since we are increasing the number of null nulls to divide by.
1. Use a derived table to list those departments whose budgets are greater than the
average budget for all departments.
3. Modify #1 to add the differences between the department’s budget and the
average.
1. Return two 15-row samples from the employee table. Anything less than 15 rows
per sample is unacceptable. Project last and first names plus hire dates and birth
dates.
This will return 2 samples but, due to the nature of sampling, it cannot be duplicated.
2. Return two 50% row samples of employees for each of departments 401 and 501.
3. Using the same sampling as #2, SUM the salaries for each sample.
1. List the top 5 salaries amount VALUES in the employee table along with the last
names and first name of the employee.
2. To see if it can be used on character data values, list the top 3 department names
from the department table by VALUE (in descending order).
department_name
------------------------------
technical operations
research and development
product planning
3. Retrieve half of the job descriptions that have “manager” in the description using the
TOP N feature. Verify the result by doing a COUNT(*) from job.
Due to the nature of the feature you may not be able to duplicate this result.
description
----------------------------------------
?
Hardware Engineer
Dispatcher
Manager - Marketing Sales
Manager - Research and Development
Sales Rep
Manager - Education
Corporate President
Manager - Customer Support
Mechanical Assembler
Due to the nature of the feature you may not be able to duplicate this result.
SELECT TOP 3 *
FROM
(SELECT department_number, SUM(salary_amount) AS sumsal
FROM employee
GROUP BY 1) e
ORDER BY 2 DESC;
department_number sumsal
----------------- ------------
401 213275.00
501 200125.00
403 193500.00
1. From the “SalesTbl”, list the store id, product id, sales, for each row, include with
each projected row the sum of the sales for each product across all stores and
order by store id within product id.
Note that ordering by a non-unique store ID only, has a random effect on the order of
prod ID (as well as the other columns).
1. Write a window aggregate that provides a rank of salary amounts for all employees
using a cumulative window.
1. Retrieve a ranking of salary amounts from the employee table. Next change the
rank to partition by department number.
2. Rank all sales from the “salestbl” showing only the bottom three sales amounts
along with their bottom ranking values.
sales Rank(sales)
----------- -----------
20000.00 11
25000.00 10
30000.00 9
1. Display tertile (values from 0 to 2) of employee salary amounts. Then return decile
(0 to 9) followed by percentiles (0 to 100). Do each separately and note how the
rows-per-quantile thin out as the number increases. Note how the number of rows
per quantile value changes from lower to higher.
salary_amount Quantile(3,salary_amount)
------------- -------------------------
? 0
? 0
? 0
? 0
24500.00 0
25525.00 0
26500.00 0
29250.00 0
29450.00 0
31000.00 1
31200.00 1
34700.00 1
36300.00 1
37850.00 1
37900.00 1
38750.00 1
43100.00 1
43700.00 1
46000.00 2
49700.00 2
52500.00 2
53625.00 2
54000.00 2
56500.00 2
66000.00 2
100000.00 2
salary_amount Quantile(10,salary_amount)
------------- --------------------------
? 0
? 0
? 0
? 0
24500.00 1
25525.00 1
26500.00 2
29250.00 2
29450.00 3
31000.00 3
31200.00 3
34700.00 4
36300.00 4
37850.00 5
37900.00 5
38750.00 5
43100.00 6
43700.00 6
46000.00 6
49700.00 7
52500.00 7
53625.00 8
54000.00 8
56500.00 8
66000.00 9
100000.00 9
salary_amount Quantile(100,salary_amount)
------------- ---------------------------
? 0
? 0
? 0
? 0
24500.00 15
25525.00 19
26500.00 23
29250.00 26
29450.00 30
31000.00 34
31200.00 38
34700.00 42
36300.00 46
37850.00 50
37900.00 53
38750.00 57
43100.00 61
43700.00 65
46000.00 69
49700.00 73
52500.00 76
53625.00 80
54000.00 84
56500.00 88
66000.00 92
100000.00 96
SELECT salary_amount,
QUANTILE(3, salary_amount) AS Tert,
QUANTILE(10, salary_amount) AS Decl,
QUANTILE(100, salary_amount) AS Pctl
FROM EMPLOYEE;
1. Display manager numbers, department numbers, and salary sums for employees by
manager and by department – only – (do not project a grand total).
1. Create a volatile table based on the definition of the Department table and then
populate it with data from the department table. Use the “preserve” option. Select
all rows from the table you created.
The volatile table should be populated and the SELECT should return rows.