Вы находитесь на странице: 1из 38


to Excel for

This booklet was prepared as part of supplementary reading
material for a Statistics Concept Course for the staff of
Central Bureau of Statistics, facilitated by Biometry Unit
Consultancy Services (BUCS) and the Statistical Services
Center (SSC) and funded by DFID.

You are welcome to use and share this material, as long as
due credit is given to BUCS & SSC.

Introduction to Excel for statistics

Part 1: Getting started

1. Introduction
This introductory guide covers the basics of Excel that are needed for data analysis. It
is in two parts.
We assume that you are not new to computers. In the first part of this guide we review
the basics of Windows and Excel that we will assume. This is provided as pre-course
reading for the Phase 1 training given to CBS staff. It may also be used during
preparatory computer training.
The second part of this guide looks at the handling of data in Excel. We explain the
importance of keeping data in a list, and introduce data auditing, filtering, sorting,
and calculation among other topics. The data used for illustration are taken from the
1997 welfare monitoring survey, conducted by CBS. A CD is available from this
survey that provides the questionnaires, raw data, data dictionaries and reports
This guide is used as supporting material for the first session of the Phase 1 training to
CBS, Kenya. Later sessions look at the production of tables and graphs in Excel, and
at the use of a statistical add-in, for which there are further guides
The two parts of the guide follow the approach in the first chapters of the book titled
Data Analysis with Microsoft Excel., Berk, K.N and Carey, P. (2000). Reference
copies of this book are available for those who need more detailed information. Some
materials have used the training notes from the first sessions of the short course, called
Excel for Statistics: What you can and can not do given by SSC in the UK and BUCS
in Kenya.

2. Using Windows
There are many different versions of Windows. These notes apply for all versions from
Windows 95.
In Fig. 2.1 we show a typical Windows desktop. This is the base from which you open
application programs, such as Excel.

This is an excellent resource for the training and we are grateful to CBS (Central Bureau of Statistics,
Kenya) for providing the information.
The guides are called Good Tables for Excel Users, Guidelines for Good Statistical Graphics in
Excel and Tutorial Introduction to SSC-Stat.
Fig. 2.1 The Windows desktop

My computer
Office software
Statistical software
Recycle bin
Start button and menu
Taskbar including programs currently open
You must be comfortable with the use of your computer mouse to use Windows
effectively. In the table below we describe the four basic mouse operations.
Mouse operations
Operation Description
Clicking Move the mouse so the tip of the pointer touches the element
you want to use. Then press and release the left mouse button.
Right-clicking Same as the above, but you press and release the right button.
Double-clicking Press and release the left button twice in rapid succession
Dragging Press and hold down the left button. Then, with the button still
pressed, move the mouse across the screen. Release the button
when you are at your destination.
If you need practice, then here is a short exercise.
1. Double-click
on the My Computer icon on the Windows desktop. This will
open the My Computer window, as shown in Fig. 2.2.
Fig. 2.2 Practicing using the mouse

Title bar
Corner or side for dragging
My Computer icon on the taskbar
My Computer icon on desktop
Mimimise, maximise and close buttons

2. Click the Minimise button. This will reduce the My Computer window to a
button on the taskbar.
3. Click on the My Computer icon on the taskbar, see Fig. 2.2. This will restore
the My Computer window to the desktop.
4. Click the Maximise button. The My Computer window now fills the whole
5. Click the Midsize button. The My Computer Window is now restored to its
original size.
6. Move the mouse pointer, so it is on the title bar of the My Computer window,
see Fig. 2.2. Drag the window to the bottom right-hand corner of the screen,

With some versions of Windows you only have to single-click to open the window.
and then release the mouse button. Then drag the Window back to roughly its
original position.
7. Point to the lower left-hand corner of the My Computer Window. The pointer
will turn into a double-headed arrow. Drag the window corner down, then
release the button. This has enlarged the Window. Return it to roughly its
previous size
8. Repeat this operation, but start with the pointer at an edge of the Window, rather
than a corner. This changes the shape of the window.
9. Click on the Close Button. This closes the My Computer window.
If you are a relative beginner, then take time to practice these operations, both in the
type of exercise above, and when you start using an application, like Excel.
Fig. 1.3 Windows Help

Help command Help window

If you need more information about Windows, then take time also to access the On-line
Help system. This provides tutorials and other information to help you to use Windows
effectively. We show an example of the Windows Help in Fig. 2.3.

When you are working with more than one window, you will often wish to arrange them conveniently
on the desktop.
Beginners to computing often look for books they can read to gain experience. If you
are in this situation we urge you mainly to practice with on-line help and information
instead. The best way to gain experience in computing is simply to use a computer!
One of the key features of using applications in Windows is that the way you use all
applications is basically the same. So look for the common points, like the way to use
Help, the way to open and save files, and so on. Then with each new application, all
you have to do is look for the new features, and they are probably why you chose that
Once you gain in experience you will also find that you do not need to wait for a formal
training course before starting to use a new application. Ask the presenters of the next
course that you attend what they do. You will find that they often taught themselves,
using books and the on-line information, for many of the applications that they are now
trainers for.
3. Spreadsheets

Excel is a piece of software. It is an application to evaluate and present information in
a spreadsheet format. Spreadsheets were originally developed for simple business uses,
such as a financial report or an inventory management.
Excel is now so flexible that it is used for many other applications, including data
analysis. You can use Excel to enter data and then for simple analyses, such as the
construction of the appropriate tables and graphs. The results can then be transferred to
other applications, such as Word for including in a report, or Powerpoint for including
in a presentation.
To start Excel you probably have an icon on the desktop that you can click, just as you
did to open the My Computer window
. If you do this the screen will look roughly as
shown in Fig. 3.1.

Otherwise press on the Start button, and then on Programs and then look for Excel.

Fig. 3.1 A typical Excel window

Active cell
More sheets, making up a workbook
Column headings
Row headings
Formula bar
Scroll bars
Menu bar

The Excel window, shown in Fig. 3.1 is where you will analyse your data. If you are
not experienced in Excel, take time to note the elements shown in Fig. 3.1.
You instruct Excel by using the menus or the toolbars. For example, to open an Excel
file, click on the File menu, and then on Open.
Excel also offers toolbars, and they provide a one-click access to many of the same
commands as are in the Excel menus. For example, in Fig. 3.1, clicking on the icon
that looks like an open file, is the same as using the menu.
Excel documents are called workbooks. Each workbook is made up of individual
worksheets. Each spreadsheet can have up to 255 columns, labelled A, B, C, and so on.
It can have up to 65,000 rows
. A workbook can have up to 255 worksheets.

These are the current maxima, when this guide was written, February 2003. If these dimensions are
limiting, you should probably be using other software anyway. Most statistics packages do not have such
4. A sample workbook

For illustration we use an Excel workbook that contains data from the 1997 household
survey conducted in Kenya by the Central Bureau of Statistics, CBS. We have just
used the data from a single district.
In this section we review how you can open a workbook and navigate round it.
We assume that you are in Excel
. If so then:
1. Click the Open button on the Excel toolbar. The Open dialogue box appears,
looking something like Fig. 4.1.
Fig 4.1 The Open Files dialogue

Drop-down list with the folder tree
Selected file
List of files in the current folder
Click to open the selected file
2. Click on the drop-down list to locate the folder with the Excel file to open.
3. Either double-click on the workbook to open, or click and then click on the
Open button. The selected workbook, see Fig. 4.2, will open into Excel.
4. There are three ways to move from sheet to sheet in the workbook. Practice
each of these ways.
a. Click on the tab at the bottom of the sheet, to move to the one you like.

limits. For large surveys the limit of 256 columns is sometimes a restriction, because they have more
questions than that.
If Excel is not already open, then an alternative is to use Windows Explorer. Find the file and double-
click on it, to open Excel with the file ready for use.
b. Click on the worksheet navigation buttons.
c. Right-click on any of the worksheet navigation buttons, and select from
the drop-down list that will appear.
Fig. 4.2 Open workbook in Excel

Active sheet
Other sheets in the workbook
Worksheet navigation buttons
5. Within any worksheet practice also with the vertical and horizontal scrollbars.
Check that you can answer the following questions:
a. From the worksheet with core information for the head of the household,
which we have called corehead, you will see that the data are from a
single district. Go to the sheet called codes to find which district this
b. Use the sheet with the codes to check how many districts were included
in this survey.
c. How many rows of data are there in the three sheets called CoreHead,
Expenditure and Agriculture. If there are not the same number, can
you think why this should be?
d. Excel names the columns of data as A, B, C, etc. What is the name of
the last column with data in the CoreHead sheet? So how many
columns of data are there in this sheet?
e. How many rows of data are there in the sheet called Coreperson? Could
you use this information to give you an idea of the average number of
people per household in the district?
5. Basic workbook operations
We continue to use the workbook that was introduced in the previous section. In this
section we review how you can add and delete sheets. We also describe how data can
be copied and moved etc.
In this section and beyond it is important that you understand the logic of the task that
ask you to do. Do not simply try to remember the route to do the task. In Windows
there are often different ways of doing the same task and these will soon become
automatic for the common tasks.
The first task is to add a new sheet to the workbook. This is so that you can then copy
some data to this new sheet for your working.
1. Use Insert > Worksheet to insert a blank worksheet.
2. In the new worksheet, go to the tab at the bottom of the screen and right-click.
You should get a popup menu as shown in Fig. 5.1.
Fig. 5.1 Popup menu gives basic operations on the current sheet

3. Click on the option to Rename and call the sheet Temp to signify you will use
it for your temporary work.
4. We will now practice copying and pasting. Return to the sheet called
CoreHead. Make the top corner cell, <A1> the active cell and then drag the
mouse down to the cell <D14>
5. If that was easy, continue with this part. Otherwise go to 6. Press the CTRL
key. With CTRL still pressed, put the cursor on the cell <F1> and drag the

If things go wrong, then use the <Undo> button and try again.
mouse down to the cell <F14>, see Fig. 5.2. Then use <Edit> <Copy>
to copy
the selected parts of the sheet to the clipboard.
Fig. 5.2 Selecting parts of a worksheet

The part where
you used CTRL
and then dragged
The cells that were dragged

6. Return to the new sheet that you called Temp and use Edit > Paste
7. Now we show a different way of pasting. Start by using the Undo button to
remove the contents you have just pasted. Then use Edit > Paste Special,
rather than just the simple Paste. The menu is shown in Fig. 5.3. Here use
Paste Link as shown.

Or use the Copy on the toolbar, or press <Cntl> <C>.
Or us the Paste on the toolbar, or use <Cntl> <V>.

Fig. 5.3 The Paste Special dialogue

Fig. 5.4 Linking information across worksheets

This cell is linked and
not just copied as the
number 23
8. Explain why the Paste Link might be a useful feature in your future use of
. Do you think it might have a down-side
We now show one way of moving a range of cells.
1. In the sheet called Temp, select the data in the first four columns.
2. Move the mouse to a border of the area that is selected. The pointer will change
from a + to an arrow.
3. Drag the selected area down 3 rows and across one column, as shown in Fig.
4. Click on any cell to deselect the cell range.
5. When using Windows there are often different ways of doing the same thing.
Can you think of another way of moving a range of cells
? Try to do this by
pressing the Undo button, and then repeating the move.
Fig. 5.5 Moving a range of cells

Border of new destination
Pointer is an arrow
Tooltip to show destination

You should regularly save your work, when you make changes to a workbook. Excel,
and other Windows applications, offer you two options for saving. You can save using

It means the data are not duplicated. Any correction in the original data will automatically change the
working copy.
These links sometimes can slow the processing of the data, particularly with large files.
For example you could try cutting the selected range, and then pasting them to their new destination.
the Save command, or shortcut, which keeps the name of the file. Or you can use the
Save As command, which allows you to give your new work a different name.
1. Click on File > Save As to open the corresponding dialogue box.
2. Check which folder you are saving your file in, see Fig. 5.6. If necessary
change the folder to the one you wish to use.
3. Give the file a new name, and click on the Save button.

Fig. 5.6 The Save As dialogue

Folder in which the file will be saved
New name for the file

Before finishing this section we suggest that you delete the temporary worksheet that
you created for your working.
1. Make sure that you are on the sheet called Temp.
2. Then use the edit menu and then delete sheet. Excel will ask you if you really
want to continue. Say that you do.
3. Note that the Undo button on the toolbar is now inactive. This is an operation
that can not be undone.
4. Can you think of another way you could have done this task?

You can right-click on the tab at the bottom of the sheet. This gives a popup menu and one option is to
delete the sheet.
6. Excel add-ins
Excels capabilities can be extended through the use of special programs called add-
ins. These are then used in the same way as Excel itself. Some add-ins are supplied
with Excel, but are not automatically installed. Others are made available by different
We check first that Excels own Data Analysis Toolpack has been installed.
1. Click Tools > Data Analysis, if you can find the command called Data
Analysis. Otherwise see the steps below. If the command is there, the menu
shown in Fig. 6.1 will open. Click on Cancel, because we do not need these
features now.
Fig. 6.1 The data analysis toolpack.

2. As an exercise
, click on Tools > Add-Ins. The dialogue shown in Fig. 6.2 is
Fig. 6.2 The add-in dialogue (the entries may be
different on each machine)

You will have to do this if you could not find the Data Analysis command on the Tools menu.
3. If you did not find the Data Analysis Toolpack earlier, then the entry in Fig. 6.2
will not be ticked. In that case, tick the entry and then press OK. Otherwise
press Cancel.
The second task is to install the add-in called SSC-Stat. This may already have been
done earlier, in which case you will have an extra menu, called SSCStat every time
you load Excel.
If you do not have this extra menu then the following tasks will install the SSC-Stat
1. The SSC-Stat add-in must first be installed on your computer. It is available on
CD, or can be downloaded from the web site www.reading.ac.uk/ssc. Once
downloaded click on the file and it will install. You do this outside Excel.
2. Then go into Excel and use Tools > Add-ins as before. Now use the Browse
button on the dialogue and move to the directory where the file called SSC-
was copied.
3. Double-click on the file called something like SSC-Stat.xla. Then click OK on
the Add-ins dialogue and the extra menu should appear.
Add-ins are a very powerful feature of Excel. They enable third-parties to add to the
facilities provided by Excel, or to tailor existing facilities for particular users.
The SSC-Stat add-in is designed to encourage good statistics with Excel. It has
facilities to support the use of Excel for data manipulation, graphics and simple
Fig. 6.3 SSC-Stat, showing the graphics menu.

It is used just like any other Excel menu as indicated in Fig. 6.3

If you do not remember where the add-in was copied, then you can search for the file. Search for all
files beginning with ssc-stat, e.g. ssc-stat*.xla, because the name may include the version number, for
example ssc-stat v2.0.xla.

7. In conclusion
If you are a relative beginner the Excel, then we have introduced some of the key
aspects that apply whatever field of application you use for the software.
If you have used Excel before, then most of the material in this chapter should have
been familiar. However, Excel is a large package and existing users may have found
some points that were new, and will help them to use the software more effectively.
One feature of Windows is that there is usually more than one way of accomplishing
the general tasks that have been introduced in this chapter. It is much more important
to understand the logic of the task, than to remember the steps associated with one way
of doing the work. Always try to use logic and not memory.
A key feature of Windows is that once you have mastered any application, then others
have some similar features. So, once you are confident in your use of Excel, this
should also help with your use of other software.

SSC-Stat has its own tutorial. It is loaded automatically, when SSC-Stat is installed,. It and other help
for the add-in is available from within SSC-Stat. Use SSC-Stat > General for access.
Introduction to Excel for Statistics

Part 2 Handling Data
8. Introduction
In this part of the guide we look at the features of Excel that are particularly needed
when using Excel for handling and analysing data. Excel is an all-purpose tool and
to use it effectively for statistical work, you have to work with the same discipline that
would be automatic if you used a statistics package. We explain here what we mean
by this. For illustration we continue with the data set introduced in part 1 of this guide.
One element of good practice is to avoid making any worksheet too complicated. It
is better to have workbooks with sheets, that are simple and with names that show
clearly what they contain. Then it is easier to find a particular set of data, or a table of
In the next section we describe the importance of keeping your data in what Excel calls
a List. Then we show briefly how data can be entered and checked in Excel.
The term meta-data is used for the information about the data themselves, for
example the units of measurements, or the explanation of each category of a column
that is coded 1 and 2. We consider how names are used in Excel, and also the value of
the feature of adding comments to a cell.
We then describe two powerful features that work only when the data are in the form of
a list
. The first is to Filter, or to examine a subset of the data, and the second is the
facility to sort columns.
In the remainder of this part of the guide we look briefly at Excels system for
calculating functions and formulae, at protecting data and at how to cope with missing
9. Structure of your data
For effective statistical work you need an efficient structure for holding, managing and
analysing the data. This is usually obtained through organising the data in rectangular
In an array, usually
Columns Contain different measurements or information about the individuals,
or units being studied: the Variables
Rows Contain all the information collected about a single individual: the

Even more important for survey work is that Excels powerful facility for tabulation requires data to be
in the form of a list.
The first row of the list contains labels for the columns. As an example we show part
of the data from the 1997 household survey. There are four columns shown, and just
10 of the four hundred households in the district used.
Fig. 9.1 Data from the CBS survey is in list form.

These rectangular structures are called LISTS in Excel, they have the advantages that
Allow the use of a range of database facilities.
Make it easy to exchange data with other software such as databases and
statistical packages.
Give structure to the way data are handled. In particular the use of lists
encourages the user to store information of the same kind within columns
In the following table we show Excels own Guidelines for creating a list on a
worksheet as they appear in the Help.

So a column with numbers should not contain text, and a text column does not include numbers that
will need summarising.

In this part of the guide we will see many facilities of Excel that rely on the data being
organised in a list. This starts with Excels facilities for data entry, validation and
auditing, that we consider in the next section.
Microsoft Excel has a number of features that make it easy to manage and analyse
data in a list. To take advantage of these features, enter data in a list according to
the following guidelines.
List size and location
Avoid having more than one list on a worksheet. Some list management features,
such as filtering, can be used on only one list at a time.
Leave at least one blank column and one blank row between the list and other data
on the worksheet. Microsoft Excel can then more easily detect and select the list
when you sort, filter, or insert automatic subtotals.
Avoid putting blank rows and columns in the list so that Microsoft Excel can more
easily detect and select the list.
Avoid placing critical data to the left or right of the list; the data might be hidden
when you filter the list.
Column labels
Create column labels in the first row of the list. Microsoft Excel uses the labels to
create reports and to find and organise data.
Use a font, alignment, format, pattern, border, or capitalisation style for column
labels that is different from the format you assign to the data in the list.
When you want to separate labels from data, use cell borders and not blank rows or
dashed lines to insert lines below the labels.
Row and Column Contents
Design the list so that all rows have similar items in the same column.
Don't insert extra spaces at the beginning of a cell; extra spaces affect sorting and
Don't use a blank row to separate column labels from the first row of data.
10. Entering data
For large surveys a system that is specially designed for data entry and verification is
usually used, rather than Excel. Hence we consider data entry only briefly here
To see the sort of facilities for data entry in Excel you could try the following exercise:
1. Make a new sheet (as described in Part 1 of this guide).

Excel is commonly used for entering the data from small studies. We have a guide called Disciplined
Use of Spreadsheets for Data Entry, that describes the facilities available in Excel. This is produced in a
variety of forms and can be downloaded from www.reading.ac.uk/ssc if copies are not available locally.
2. Copy the first 21 rows and 8 columns from the sheet called Expenditure to this
new sheet.
3. Use Data > Form to produce the type of data entry form shown in Fig. 10.1.
Fig. 10.1 Data entry form on a subset from the expenditure survey

4. Use this form to explore the 20 records that you copied across.
5. Press New and enter the next record, which is 2, 1, Y, 23, 63,3, 4, 3.65.
Excel also offers facilities for validation and auditing. To show how these are used we
assume that the number of people in a household is usually between 2 and 5. To set up
a validation rule:
6. Mark the whole of the column G, called MEMBERS. Click on the letter G to
select the whole column.
7. Use Data > Validation and complete the dialogue as shown in Fig. 10.2.

Fig. 10.2 Setting a validation rule

We could use checks like this while we are entering data. They are also useful when
auditing data that have already been entered, as we illustrate below.
8. Use Tools >Auditing > Show Auditing Toolbar and click on the option to
Circle Invalid data, as shown in Fig. 10.3.
9. Once you have seen the data that are outside the range you can click on the
option to hide the invalid data again.

Fig. 10.3 Auditing data that were already entered.

Finally in this section we look at how a column of data can be added, when it is a
regular sequence.
1. On the new sheet where you copied the 20 records, go to the top of the next free
column. This is probably column I. Give cell I1 the name Index.
2. Type the number 1 in the next cell down, cell I2, and 2 in cell I3.
3. Select the range I2:I3. Notice the small black box at the lower right hand corner
of the selected cells. This is called the fill handle.
4. Move the mouse over the fill handle, so the pointer changes from a fatplus, to a
simple +.
5. Click the mouse button and drag down to the end of the data. In Fig. 10.4 we
show the series after we are halfway down the column.

Fig. 10.4 Entering a sequence Fig. 10.5 Finding the dialogue to enter patterned data

6. Now clear the data you have just filled, because we will show an alternative
way that is more flexible.
This second way is also an excuse to show one element of the data analysis tool-pack.
7. Use Tools > Data Analysis. This will give the dialogue shown in Fig. 10.5.
Use the option to generate random numbers
, to give the menu shown in Fig.

You are not going to generate random numbers, but produce data with a regular pattern.

Fig. 10.6 Generating a regular sequence

8. In the dialogue in Fig. 10.6, specify the distribution as Patterned. Complete the
dialogue as shown in Fig. 10.6 and press OK.
This facility to generate patterned data is very powerful. Also the dialogue gives more
control than you would have by dragging the mouse, particularly when the columns of
data are long, as they often are with large surveys.
11. Comments and names
Adding comments to cells is a very useful feature of Excel. As a simple exercise we
will add comments to some of the column names.
1. In the worksheet that you made in the last section, go to the cell H1, which is
called ADULTQ.
2. Right click to give the popup menu and take the option called Insert Comment.
3. The dictionary states that this column is the number of adult equivalent people
in the household. So add this comment, as shown in Fig. 11.1.

Fig 11.1 Adding a comment to a cell Fig. 11.2 Options with comments

4. When you return to this cell, the comment will be shown as in Fig. 11.1.
5. When you are in a cell with a comment, right-click again. The popup menu is
given in Fig. 11.2 and shows that you can now edit, or delete the comment.
6. Add comments to some more cells. They can also be cells containing data that
you think need some further explanation.
7. If you have a large file with comments, it is sometimes useful to jump from
comment to comment. How could you find out if this is possible in Excel
When the data are in an Excel list, then the first row gives the names of the columns.
These names can simplify the use of Excel for simple statistics. They are used by
Excel, and by various add-ins, including SSC-Stat.
1. Go to the sheet where you copied the data.
2. Use SSC-Stat > Analysis > Descriptive statistics
3. The dialogue is shown in Fig. 11.3. Notice, from the variables that are listed,
that the dialogue has picked up the names of the columns.

It is possible. If you ask for Excels Help on Comment it gives many options including one to
Select Cells that contain comments. Then follow the instructions given.
This exercise assumes that SSC-Stat has been added to Excel. If not, then just read the exercise.

Fig. 11.3 SSC-Stat dialogue uses column names Fig. 11.4 Results on a new worksheet

4. Select the ADULTQ column, as shown in Fig. 11.3 and press OK.
5. The results are shown in Fig. 11.4. Notice that an extra sheet has been
generated for the results and given the name, DescStat.
The add-in, SSC-Stat makes an effort to work with column names. It is also useful to
use column names generally in Excel. So we will register the names of all the columns
in this sheet.
1. Go back to the sheet containing the 20 records you copied over.
2. Select all the data, from A1:I22.
3. Use Insert > Name >Create.
4. Check the dialogue is as in Fig. 11.5, so the names will be taken from the top
row of the data. Then press OK.

Fig. 11.5 Creating names from a selected array

5. To see the names have been defined, press the down arrow on the pull-down
list. (It is the cell marked A1 in Fig. 11.5).
We have therefore seen that sheets can be named, and columns can be named. We will
use these names in later sections. It is sometimes also useful to name a rectangle, with
the data, as we show in the next exercise.
1. Select the data on the sheet, it they are not already selected.
2. Use Insert > Name > Define and name the rectangle as Sample.
3. Go to another sheet, such as the sheet with the codes, see Fig. 11.5 and use the
pull-down name box.
4. Select the name Sample as shown in Fig. 11.5 and you return straight to the
rectangle that you named earlier.

Fig. 11.5 Using the name of an array

5. In this case the rectangle contains all the information on the sheet, but this is not
always the case. With the data selected, move the rectangle down, so it starts on
the 4
6. Now add some information about the data in the top rows, as is often done. An
example is shown in Fig. 11.6.
Fig. 11.6 Adding metadata to an Excel worksheet

The column names are coloured and in bold
The data have been moved down three rows
Used to colour the cells
7. In Fig. 11.6 we have also used the opportunity to colour the cells, and some of
the text. You could do similarly, but this is an optional extra

If the formatting tools are not visible in your version of Excel, then use View > Toolbars and tick the
formatting toolbar.
8. Now move to another sheet and again select the name Sample as before. You
will return to the same sheet, with the rectangle selected as before.
9. If you added the rows and coloured the cells, then you could practice returning
the sheet to its original form
12. Selecting subsets (Filters)
Filters display a subset of cases in a list. Excel help describes how to use AutoFilter
roughly as follows:
a) Click a cell in the list you want to filter.
b) Use Data > Filter > AutoFilter.
c) To display only the rows that contain a specific value in one column, click the arrow
at the top of the corresponding column.
d) Click the appropriate value.
e) To apply an additional condition based on a value in another column, repeat steps c)
and d) in the other column.
f) To filter the list in a slightly more complicated way, click the arrow at the top of the
column, and then click Custom.
You can apply up to two conditions to a column with AutoFilter. If you need to do something more
complicated, you can use advanced filters.

In the following exercise we just select the households with 6 people or more.
1. Click anywhere in the data and use Data > Filter > AutoFilter.
2. Click on the arrow at the top of the column called MEMBERS and then on the
option called Custom, see Fig. 12.1.

If you are still practicing basic Windows operations, then try removing in different ways. For example
simply marking and pressing <Delete> removes the contents, but not the rows themselves. Try Edit >
Delete for more options. Then use Undo and try marking the whole of the first 3 rows through Clicking
and then <Shift> <Click>, and then Edit > Delete. And there is still the option of marking, and then a
right click to explore!

Fig. 12.1 Applying a custom filteer

3. Complete the resulting dialogue as shown in Fig. 12.2.
4. Fig. 12.3 then shows there were just 5 households that satisfied this criterion.
Fig 12.3 The custom filter dialogue

Fig. 12.4 The filtered data

Selected rows
Arrow is blue to show
column used in filter.

If you need to apply a more complicated condition, and wish to avoid using Excels
advanced filters, then an alternative is to use Excels powerful calculation features and
make a new column that corresponds to the condition you need. We illustrate below,
and look in more detail at Excels functions in Section 14.
5. Use Data > Filter > AutoFilter again, to turn off the filter.
6. Go to the top of the next empty column, probably column J. Name the cell J1 as
7. Go to the cell J2 and type the formula =(MEMBERS>=6) and then press
<Enter>. See Fig. 12.5
Fig. 12.5 Entering a logical calculation

8. Drag down, as shown in Fig. 12.5, so the whole column is marked.
9. Now try filtering again, but just selecting the column called Sixplus to be
The value of this approach is that the condition can be made more complicated, and it
still becomes easy to use Excels simple AutoFilter. For example the condition in step
7 above could be MEMBERS>=MEDIAN(MEMBERS) to show which families have
more than the median household size
Be aware that filters only affect the way data are displayed, not the actual contents of
the list. Any statistical analysis using Excel own statistical tool-box will still include
the data filtered out. To analyse just the subset that you see it is necessary to copy the
filtered list into a new worksheet and then produce the required analysis. This extra
step is not needed if you use the SSC-Stat add-in. Then what you see is what is

This shows again the value of using names, because the formulae are so much easier to understand. A
further example is the condition MEMBERS>=PERCENTILE(MEMBERS,80), to identify the top 20%
of household sizes and so on.
13. Sorting
Perhaps the data would be easier to check if the respondents were sorted on household
1. Make the active cell somewhere in the column called MEMBERS and use
Data > Sort.
2. Sort in descending order, see Fig. 13.1. Press OK to give the data sorted as
shown in Fig. 13.2.
Fig. 13.1 Sort dialogue Fig. 13.2 Data sorted on household size

3. In Fig 13.2 you can see clearly that there are up to 7 people in the households,
and that 5 households have six people or more.
4. Now put the data back into their original order
14. Formulae and functions
To illustrate the use of Excel for calculation we consider the expenditure of households
on primary education. In the Expenditure sheet the columns QD5.1 gives the
expenditure on fees, while 5.4, 5.7, 5.10, 5.13 and 5.16 give expenditure on uniforms,
books, transport, food and harambee (fund raising).
Our task is first to calculate the total expenditure on primary education, and then the
percentage on books.
1. On the expenditure sheet, select the data, and then use Insert > Name > Create
to name the columns of data.
2. In the first row of the next sheet (cell CB1 for us) give the name Primary.

Hint: Luckily you constructed an index column, see Fig. 13.2, which used to be in ascending order. So
you can sort on this column. Alternatively, as you have just done the sorting, you could use Undo.
3. In the second row enter =qd5.1+qd5.4 and press <Enter>. We got 1300 for
the sum.
4. Now complete the formula to add the other columns.
5. Then drag to the bottom of the data. This should give the extra column, see
Fig. 14.1

6. The next calculations will be simpler if we register the name of the new
column. So select the whole column and use Insert > Name > Define.
7. Now go to an empty cell. Then we will use the sum function, so type
8. In the next cell type =sum(Primary).
9. Then, in a third cell, divide one by the other.
10. Finally do the same again, but in a single calculation, as in Fig. 14.2. So
overall, 31% of the expense of primary education was on books.
Fig. 14.2 Calculations on expenditure for primary education

Percentage on books
Next we might be asked what proportion (or percentage) of households spent money on
primary education.
11. One way to do this is to use the COUNTIF function. You could try using the
function wizard for this, or choose an empty cell and type
12. Then also calculate the count overall, i.e. =COUNT(Primary)
13. Finally the percentage is given by dividing one by the other. We got
100*199/402 = 49.5%.
15. Missing values
Missing values are common when analysing data. They are pieces of information that
are related to the variable of interest but that are of a different nature from the main
bulk of the data. In surveys they can be because the respondent:
Refused to reply
Did not know
The question was not applicable
In general they can indicate that information was not available, something was
impossible to measure or a mathematical calculation was impossible, such as 2/0. An
example is in Fig. 14.2, where 0/0 was given as #DIV/0!
The usual practice is to assign specific codes to different types of missing values.
These codes are often numbers that do not occur in the variable. For example, in Fig.
15.1 we see that the number of acres of agricultural land includes missing values, coded
as 999.90. From Fig. 15.2 we see that there were 4 missing values in this district.
Fig. 15.1 Missing values in the survey Fig. 15.2 Showing 4 cases were missing

Unlike statistics packages, Excel does not have specific facilities to handle missing
values. The user has to make sure that any analysis treats missing values appropriately.
One approach is to leave the cells blank, and insert a comment to indicate the reason for
the missing value.
16. In conclusion
Some readers may be surprised that this guide is mainly concerned with data
manipulation, rather than analysis. However we find that organising the data is often
the most time-consuming part of data processing.
At the start of this part of the guide we stated that individual sheets should remain
simple. One aspect of this simplicity is that results are best kept on separate sheets to
those with the data
. We did not do this ourselves, when examining to expenditure on
primary education, in Section 14, see for example, Fig. 14.2. This is because, for these
small calculations it was more convenient to use a part of the same sheet. However,
now those calculations are done the results can easily be cut and pasted to a new sheet,
as we show in Fig. 16.1.
Fig. 16.1 Putting results on a separate sheet

Sometimes you may want to show the actual values, perhaps sorted, and with summary
results for different groups, as separate rows on the sheet. In this case wee suggest that
you first make a copy of the data sheet, and then mix data and results on the copy.
Then there is always a master sheet containing just the data.
Excel is often used for tabulating data and for graphical work. These activities have
their own guides.

This happens automatically when a statistics package is used for the analysis. They normally have a
spreadsheet-type display for the data, with a separate window for the results.

Biometry Unit Consultancy Services
College of Agriculture & Veterinary Sciences
University of Nairobi, Kabete Campus
E mail: bucs@uonbi.ac.ke
Website: http://www.uonbi.ac.ke/acad_depts/BUCS


Statistical Services Center
School of Applied Statistics, The University of Reading
Reading RG6 6FN, UK
Website: http://www.reading.ac.uk/ssc