You are on page 1of 9

The Fundamentals of Pointers

Written by Mike James


Article Index
The Fundamentals of Pointers
Pointers in C
Page 1 of 2

Despite the fact that pointers have been long regarded as "dangerous" they are still deeply
embedded in the way we do things. Much of the difficulty in using them stems from not
understanding where they originate from. Pointers are a sophisticated abstraction that wraps
some fundamentals of assembly language.

The whole concept of a pointer is bound up in the idea of a memory location and its address.

Despite the fact that pointers have been long regarded as "dangerous" they are still essential
programming material.

In modern programming the pointer has been transmuted into the "safe pointer" and more
recently into the managed reference - but it's still a pointer.

Let's look at the idea that is at the bottom of it all.

Pointers are natural


In assembly language you refer to a memory location by its address, for example 2000 refers to
the memory location at address 2000. The confusion inherent in the idea of a pointer starts at this
early stage of development.

Are you talking about the thing itself i.e. the number 2000 or the thing stored at memory location
2000.

For example, does:

LDA 2000

mean "load the A register with the value 2000" or "load the A register with the contents of
address 2000"?
The ambiguity is usually solved by using an extra symbol if you mean the numeric value:

LDA #2000

means "load the A register with the value 2000" and

LDA 2000

means "load the A register with the contents of address 2000".

Indirect addressing
Once you know the rules it's easy enough but mistakes are still common - especially when you
start using indirect addressing.

Indirect addressing is where the value stored in a memory location is treated as an address to
another memory location and it is a common feature of most hardware.

So for example, the command:

LDA @2000

would mean load the A register with the value stored in the memory location whose address is
stored in memory location at 2000.

The idea is that the value stored in a memory location can be data or it can be an address of
another memory location. Direct addressing puts the address that the data is stored at in the
instruction.

instruction address---------->data

Indirect addressing puts the address of the location that holds the address of the location that
holds the data.

instruction address--------->address---------->data

Confused?

Well so were thousands of novice assembly language programmers.

Redirection is where it just gets complicated enough for mistakes to be rule rather than the
exception and redirection is something pointers allow you to do without thinking twice.

And once you have redirection you an easily invent re-redirection and so on - each one more
difficult and dangerous than the last.

Pointer Abstractions
When high level languages got going the idea of addresses and the whole idea of memory
locations were hidden behind the facade of the variable.

When you use a variable you are using and address of a memory location are part of an
instruction.

There is no doubt that when you write something like

TOTAL=SUM+10

you are referring to the contents of SUM and TOTAL and there is no hint of addresses or
redirection. The addresses are still there but they have been abstracted away into the idea of a
variable.

Many high level languages stop right there.

But some don't - they re-invent the whole concept of addressing and indirection by way of
pointer variables. A pointer variable implements indirection by being a storage location that has
the address of another storage location or in this case variable.

That is, a pointer variable contains the address of another variable.

Pascal was one of the first truly high level languages to include included pointers from very
early in its development and the pattern it adopted was used by C# and many other modern
languages. It is worth seeing how it implemented pointers.

Of course Pascal being a strongly typed language means that pointers are typed as well.

That is, a pointer can only point to a variable of one type.

This is a strange idea at first because the all pointer variables are pointers and they store
addresses so you might think that in a simple world all pointer variables would be of the same
type - pointer say. But it turns out to be better to make the type of a pointer include what it points
at. So instead of a simple pointer type you have pointer to integer, pointer to float, pointer to
string and so on.

For example,

var a:^integer
declares a to be a pointer to an integer and nothing but an integer.

When a pointer is first defined it contains the special value nil to mean that it isn't pointing to
anything. This is where we meet the first big problem with pointers - they don't always point at
anything!

To give it something to point at you have to use the procedure NEW in Pascal. The statement
NEW(pointer) allocates enough storage for the type of data that the pointer is supposed to point
at.

For example, NEW(a) would allocate enough storage for an integer and set a to point at it.
Notice that the type and amount of storage allocated by NEW is determined by the type of the
pointer or rather what it points at.

You can deallocate the storage that a pointer is pointing at using the complementary procedure
DISPOSE. That is DISPOSE(a) frees the storage that a is pointing at for reuse. You can assign
pointers but this is the only legal pointer operation.

For example, if a and b are integer pointers then

NEW(a);
b:=a

results in a and b pointing to the same area of storage.

To refer to the value actually stored in the area of memory that the pointer points at you have to
use the ^ symbol in Pascal.

This is often called the dereferencing operator.

So a is a pointer to an area of memory that holds an integer and a^ is the actual value stored
there.

If you have followed the ideas so far you should be able to tell me the difference between:

a:=b

and

a^:=b^

The first makes a and b point at the same area of memory and the second one makes the area of
memory that a points at hold the same value as the area of memory that b points at.

The most common error that beginners make is a:=b^, i.e. they try to assign the value pointed at
to the pointer. Being strongly typed Pascal picks this error up at compile time.

Notice that even though Pascal is a high level language it is easy to fall into the habit of referring
to areas of memory. The idea of an address and indirection is lurking just below the surface. But
it is possible to describe all of this without such primitive concepts.

All you need to avoid introducing the pointer to "memory" is the idea of an anonymous variable,
i.e. a variable without a name. In this way of explaining things NEW(a) creates an anonymous
integer variable that a is set to point at. Notice that even though this description is slightly higher
level it is still possible to make very strange errors using pointers to an anonymous variable.

For example, it is quite possible to lose an anonymous variable by overwriting all of the pointers
to it!

Another favourite error is to DISPOSE of the memory that a pointer is pointing at but then still
carry on using it - DISPOSE doesn't change the value of a pointer!

In short Pascal programmers discovered all of the errors that plague the use of pointers -
derferencing null pointers and dereferencing pointers that no longer point to valid data.

Using NEW and DISPOSE a Pascal programmer can create dynamic data structures such as
strings, linked lists, stacks and so on. This ability to create such dynamic data structures is the
main reason for the existence of pointers in programming languages. The only real alternative to
using pointers is to provide advanced dynamic data structures as standard types.

For example, you can use pointers to program a variable length string of characters but modern
languages make advanced and dynamic structure available without the use of pointers.

The Fundamentals of Pointers


Written by Mike James
Article Index
The Fundamentals of Pointers
Pointers in C
Page 2 of 2
the simplest way of getting the job done.

For more on this see What's the matter with pointers.

You shouldn't be too convinced by the C# inclusion of pointers. As languages such as Java,
JavaScript and so on have proved you really don't need low level pointers and pointer arithmetic
as long as the language supports sufficiently sophisticated data structures