Вы находитесь на странице: 1из 571

Introduction to Embedded Linux

Unable to handle kernel paging request at virtual address 4d1b65e8


Unable to handle kernel paging request at virtual address 4d1b65e8
pgd = Covers
c0280000 versions
pgd = 2.4.33.2
c0280000
- 2.6.18-rc4
<1>[4d1b65e8]
*pgd=00000000[4d1b65e8] *pgd=00000000
Version 1.0
Internal error: Oops: f5 [#1]
Internal error: Oops: f5 [#1]
Modules linked in:Modules linked in: hx4700_udc hx4700_udc asic3_base asic3_base
CPU: 0
CPU: 0
PC is at set_pxa_fb_info+0x2c/0x44
PC is at set_pxa_fb_info+0x2c/0x44
LR is at hx4700_udc_init+0x1c/0x38 [hx4700_udc]
LR is at hx4700_udc_init+0x1c/0x38 [hx4700_udc]
pc : [<c00116c8>] lr : [<bf00901c>] Not tainted
sp : c076df78 ip : 60000093 fp : c076df84
pc : [<c00116c8>] lr : [<bf00901c>] Not tainted

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

Rights to copy
This kit contains work by the
following authors:
Attribution ShareAlike 2.0
You are free
to copy, distribute, display, and perform the work
to make derivative works
to make commercial use of the work
Under the following conditions
Attribution. You must give the original author credit.
Share Alike. If you alter, transform, or build upon this work,
you may distribute the resulting work only under a license
identical to this one.
For any reuse or distribution, you must make clear to others the
license terms of this work.
Any of these conditions can be waived if you get permission from
the copyright holder.
Your fair use and other rights are in no way affected by the above.
License text: http://creativecommons.org/licenses/by-sa/2.0/legalcode

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Copyright 2004-2006
Michael Opdenacker
michael@free-electrons.com
http://www.free-electrons.com
Copyright 2003-2006
Oron Peled
oron@actcom.co.il
http://www.actcom.co.il/~oron
Copyright 2004 2006
Codefidence ltd.
info@codefidence.com
http:/www.codefidence.com

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

What is Linux?
Linux is a kernel that implements the POSIX and Single Unix
Specification standards which is developed as an Open Source
project.
Usually when one talks of installing Linux, one is referring to a
Linux Distribution.
A distribution is a combination of Linux and other programs and
library that form an operating system.
There exists many such distribution for various purposes, from
high end servers to embedded systems.
They all share the same interface, thanks to the LSB standard
Linux runs on 24 main platforms and supports applications ranging
from ccNUMA super clusters to cellular phones and micro controllers.
Linux is 15 years old, but is based on the 40 years old Unix design
philosophy
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

What is Open Source?


Open Source is a way to develop software application in a
distributed fashion that allows cooperation of multiple bodies to
create the end product.
They don't have to be from the same company or indeed, any
company.
With Open Source software the source code is published and any one
can use, learn, distribute, adapt and sell the program.
An Open Source program is protected under copyright law and is
licensed to it's users under a software license agreement.
It is NOT software in the public domain.
When making use of Open Source software it is imperative to
understand what license governs the use of the work and what is
and what is not allowed by the terms of the license.
The same thing is true for ANY external code used in a product.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

Open Source Licenses

BSD
MIT
X11
Artistic
As long as you acknowledge by
copyright and promise not to
sue me you can do what ever
you want with the code.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

GPL
CPL
APL

LGPL

Customer receives the


same rights I have in the
SPECIFIC PART USED,
including source code.

Customer receives the


same rights I have in
the WHOLE work,
including source code.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

Layers in a Linux system


User programs

Kernel
C library

Kernel

System libraries
Application libraries
User programs

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

C library

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

Linux vs. Legacy RTOS


Process
Library
Process

Kernel

Shell

RTOS

Process
Process
Process

RTOS are like the Linux kernel:


Single program with single memory space that
manages memory, scheduling and interrupts.
Linux also has user tasks that run in their own
memory space. One of them is the shell.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

This course
Process
Library
Process
Shell

Kernel

In this course we will go


from the shell, through
the system libraries and
applications and unto the
kernel.

Process
Process
Process

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

The Unix and GNU / Linux command line

Unix File System

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

Everything is a file
Almost everything in Unix is a file!
Regular files
Directories
Directories are just files
listing a set of files
Symbolic links
Files referring to the name of
another file

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Devices and peripherals


Read and write from devices as
with regular files
Pipes
Used to cascade programs
cat *.log | grep error
Sockets
Inter process communication

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

10

File names
File name features since the beginning of Unix
Case sensitive
No obvious length limit
Can contain any character (including whitespace, except /).
File types stored in the file (magic numbers).
File name extensions not needed and not interpreted. Just used for
user convenience.
File name examples:
README
.bashrc
index.htm
index.html

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Windows Buglist
index.html.old

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

11

File paths
A path is a sequence of nested directories with a file or directory at
the end, separated by the / character
Relative path: documents/fun/microsoft_jokes.html
Relative to the current directory
Absolute path: /home/bill/bugs/crash9402031614568
/ : root directory.
Start of absolute paths for all files on the system (even for files on
removable devices or network shared).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

12

GNU / Linux filesystem structure (1)


Not imposed by the system. Can vary from one system to the other,
even between two GNU/Linux installations!
/
/bin/
/boot/
/dev/
/etc/
/home/
/lib/

Root directory
Basic, essential system commands
Kernel images, initrd and configuration files
Files representing devices
/dev/hda: first IDE hard disk
System configuration files
User directories
Basic system shared libraries

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

13

GNU / Linux filesystem structure (2)


/lost+found
/mnt/
/opt/
/proc/
/root/
/sbin/
/sys/

Corrupt files the system tried to recover


Mounted filesystems
/mnt/usbdisk/, /mnt/windows/ ...
Specific tools installed by the sysadmin
/usr/local/ often used instead
Access to system information
/proc/cpuinfo, /proc/version ...
root user home directory
Administrator-only commands
System and device controls
(cpu frequency, device power, etc.)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

14

GNU / Linux filesystem structure (3)


/tmp/
/usr/
/usr/local/
/var/

Temporary files
Regular user tools (not essential to the system)
/usr/bin/, /usr/lib/, /usr/sbin...
Specific software installed by the sysadmin
(often preferred to /opt/)
Data used by the system or system servers
/var/log/, /var/spool/mail (incoming
mail), /var/spool/lpd (print jobs)...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

15

The Unix and GNU / Linux command line

Shells and file handling

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

16

Command line interpreters


Shells: tools to execute user commands
Called shells because they hide the details on the
underlying operating system under the shell's surface.
Commands are input in a text terminal, either a window in a
graphical environment or a text-only console.
Results are also displayed on the terminal. No graphics are
needed at all.
Shells can be scripted: provide all the resources to write
complex programs (variable, conditionals, iterations...)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

17

The Unix and GNU / Linux command line

Task control

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

18

Full control on tasks


Since the beginning, Unix supports true preemptive
multitasking.
Ability to run many tasks in parallel, and abort them even if
they corrupt their own state and data.
Ability to choose which programs you run.
Ability to choose which input your programs takes, and
where their output goes.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

19

Processes
Everything in Unix is a file
Everything in Unix that is not a file is a process
Processes
Instances of a running programs
Several instances of the same program can run at the same time
Data associated to processes:
Open files, allocated memory, stack, process id, parent, priority, state...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

20

Running jobs in background


Same usage throughout all the shells
Useful
For command line jobs which output can be examined later,
especially for time consuming ones.
To start graphical applications from the command line and
then continue with the mouse.

Starting a task: add & at the end of your line:


find_prince_charming --cute --clever --rich &

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

21

Background job control


jobs
Returns the list of background jobs from the same shell
[1][2]+

Running ~/bin/find_meaning_of_life --without-god &


Running make mistakes &

fg
fg %<n>
Puts the last / nth background job in foreground mode
Moving the current task in background mode:
[Ctrl] Z
bg
kill %<n>
Aborts the nth job.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

22

Job control example


> jobs
[1]- Running ~/bin/find_meaning_of_life --without-god &
[2]+ Running make mistakes &
> fg
make mistakes
> [Ctrl] Z
[2]+ Stopped make mistakes
> bg
[2]+ make mistakes &
> kill %1
[1]+ Terminated ~/bin/find_meaning_of_life --without-god

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

23

Listing all processes


... whatever shell, script or process they are started from
ps -ux
Lists all the processes belonging to the current user
ps -aux (Note: ps -edf on System V systems)
Lists all the processes running on the system
ps -aux | grep bart | grep bash
USER
PID %CPU %MEM
VSZ RSS
bart
3039 0.0 0.2 5916 1380
bart
3134 0.0 0.2 5388 1380
bart
3190 0.0 0.2 6368 1360
bart
3416 0.0 0.0
0
0
PID:
VSZ:
RSS:
TTY:
STAT:

TTY
pts/2
pts/3
pts/4
pts/2

STAT
S
S
S
RW

START
14:35
14:36
14:37
15:07

TIME
0:00
0:00
0:00
0:00

COMMAND
/bin/bash
/bin/bash
/bin/bash
[bash]

Process id
Virtual process size (code + data + stack)
Process resident size: number of KB currently in RAM
Terminal
Status: R (Runnable), S (Sleep), D (Uninterrupted sleep), Z (Zombie)...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

24

Live process activity


top - Displays most important processes, sorted by cpu percentage
top - 15:44:33 up 1:11, 5 users, load average: 0.98, 0.61, 0.59
Tasks: 81 total,
5 running, 76 sleeping,
0 stopped,
0 zombie
Cpu(s): 92.7% us, 5.3% sy, 0.0% ni, 0.0% id, 1.7% wa, 0.3% hi, 0.0% si
Mem:
515344k total,
512384k used,
2960k free,
20464k buffers
Swap: 1044184k total,
0k used, 1044184k free,
277660k cached
PID USER
3809 jdoe
2769 root
3006 jdoe
3008 jdoe
3034 jdoe
3810 jdoe

PR
25
16
15
16
15
16

NI VIRT RES SHR S %CPU %MEM


0 6256 3932 1312 R 93.8 0.8
0 157m 80m 90m R 2.7 16.0
0 30928 15m 27m S 0.3 3.0
0 5624 892 4468 S 0.3 0.2
0 26764 12m 24m S 0.3 2.5
0 2892 916 1620 R 0.3 0.2

TIME+
0:21.49
5:21.01
0:22.40
0:06.59
0:12.68
0:00.06

COMMAND
bunzip2
X
kdeinit
autorun
kscd
top

You can change the sorting order by typing


M: Memory usage, P: %CPU, T: Time.
You can kill a task by typing k and the process id.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

25

Killing processes (1)


kill <pids>
Sends an abort signal to the given processes. Lets processes save
data and exit by themselves. Should be used first. Example:
kill 3039 3134 3190 3416
kill -9 <pids>
Sends an immediate termination signal. The system itself
terminates the processes. Useful when a process is really stuck
(doesn't answer to kill -1).
kill -9 -1
Kills all the processes of the current user. -1: means all processes.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

26

Killing processes (2)


killall [-<signal>] <command>
Kills all the jobs running <command>. Example:
killall bash

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

27

Sequential commands
Can type the next command in your terminal even when the
current one is not over.
Can separate commands with the ; symbol:
echo I love thee; sleep 10;

echo not

Conditionals: use || (or) or && (and):


more God || echo Sorry, God doesn't exist
Runs echo only if the first command fails
ls ~sd6 && cat ~sd6/* > ~sydney/recipes.txt
Only cats the directory contents if the ls command succeeds
(means read access).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

28

Measuring elapsed time


time find_expensive_housing --near
<...command output...>
real
0m2.304s (actual elapsed time)
user
0m0.449s (CPU time running program code)
sys
0m0.106s (CPU time running system calls)
real = user + sys + waiting
waiting = I/O waiting time + idle time (running other tasks)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

29

Environment variables
Shells let the user define variables.
They can be reused in shell commands.
Convention: lower case names
You can also define environment variables: variables
that are also visible within scripts or executables called
from the shell.
Convention: upper case names.
env
Lists all defined environment variables and their value.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

30

Shell variables examples


Shell variables (bash)
projdir=/home/marshall/coolstuff
ls -la $projdir; cd $projdir
Environment variables (bash)
cd $HOME
export DEBUG=1
./find_extraterrestrial_life
(displays debug information if DEBUG is set)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

31

The Unix and GNU / Linux command line

System administration basics

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

32

File ownership
chown -R sco /home/linux/src (-R: recursive)
Makes user sco the new owner of all the files in
/home/linux/src.
chgrp -R empire /home/askywalker
Makes empire the new group of everything in
/home/askywalker.
chown -R borg:aliens usss_entreprise/
chown can be used to change the owner and group at the
same time.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

33

Shutting down
shutdown -h +5 (-h: halt)
Shuts the system down in 5 minutes.
Users get a warning in their consoles.
shutdown -r now (-r: reboot)
init 0
Another way to shutdown (used by shutdown).
init 6
Another way to reboot (used by shutdown).
[Ctrl][Alt][Del]
Also works on GNU/Linux (at least on PCs!).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

34

Network setup (1)


ifconfig -a
Prints details about all the network interfaces available
on your system.
ifconfig eth0
Lists details about the eth0 interface
ifconfig eth0 192.168.0.100
Assigns the 192.168.0.100 IP address
to eth0 (1 IP address per interface).
ifconfig eth0 down
Shuts down the eth0 interface
(frees its IP address).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

35

Network setup (2)


route add default gw 192.168.0.1
Sets the default route for packets outside the local network.
The gateway (here 192.168.0.1) is responsible for
sending them to the next gateway, etc., until the final
destination.
route
Lists the existing routes
route del default
route del <IP>
Deletes the given route
Useful to redefine a new route.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

36

Network testing
ping freshmeat.net
ping 192.168.1.1
Tries to send packets to the given machine and get acknowledgment
packets in return.
PING 192.168.1.1 (192.168.1.1) 56(84)
64 bytes from 192.168.1.1: icmp_seq=0
64 bytes from 192.168.1.1: icmp_seq=1
64 bytes from 192.168.1.1: icmp_seq=2
64 bytes from 192.168.1.1: icmp_seq=3

bytes of data.
ttl=150 time=2.51
ttl=150 time=3.16
ttl=150 time=2.71
ttl=150 time=2.67

ms
ms
ms
ms

When you can ping your gateway, your network interface works fine.
When you can ping an external IP address, your network settings are
correct!
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

37

Network setup summary


Only for simple cases with 1 interface, no dhcp server...
Connect to the network (cable, wireless card or device...)
Identify your network interface:
ifconfig -a
Assign an IP address to your interface (assuming eth0)
ifconfig eth0 192.168.0.100 (example)
Add a route to your gateway (assuming 192.168.0.1) for
packets outside the network:
route add default gw 192.168.0.1

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

38

Name resolution
Your programs need to know what IP address corresponds to a
given host name (such as kernel.org)
Domain Name Servers (DNS) take care of this.
You just have to specify the IP address of 1 or more DNS
servers in your /etc/resolv.conf file:
nameserver 217.19.192.132
nameserver 212.27.32.177
The changes takes effect immediately!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

39

Creating filesystems
Examples
mkfs.ext2 /dev/sda1
Formats your USB key (/dev/sda1: 1st partition raw data) in ext2 format.
mkfs.ext2 -F disk.img
Formats a disk image file in ext2 format
mkfs.vfat -v -F 32 /dev/sda1 (-v: verbose)
Formats your USB key back to FAT32 format.
mkfs.vfat -v -F 32 disk.img
Formats a disk image file in FAT32 format.
Blank disk images can be created a in the below example:
dd if=/dev/zero of=disk.img bs=1024 count=65536

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

40

Mounting devices (1)


To make filesystems on any device (internal or external storage)
visible on your system, you have to mount them.
The first time, create a mount point in your system:
mkdir /mnt/usbdisk (example)
Now, mount it:
mount -t vfat /dev/sda1 /mnt/usbdisk
/dev/sda1: physical device
-t: specifies the filesystem (format) type
(ext2, ext3, vfat, reiserfs, iso9660...)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

41

Mounting devices (2)


Lots of mount options are available, in particular to choose permissions or the
file owner and group... See the mount manual page for details.
Mount options for each device can be stored in the /etc/fstab file.
You can also mount a filesystem image stored in a regular file (loopback
devices)
Useful to access the contents of an ISO cdrom image without having to
burn it.
Useful to create a Linux partition on a hard disk with only Windows
partitions
cp /dev/sda1 usbkey.img
mount -o loop -t vfat usbkey.img /mnt/usbdisk

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

42

Listing mounted filesystems


Just use the mount command with no argument:
/dev/hda6 on / type ext3 (rw,noatime)
none on /proc type proc (rw,noatime)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/hda4 on /data type ext3 (rw,noatime)
none on /dev/shm type tmpfs (rw)
/dev/hda1 on /win type vfat (rw,uid=501,gid=501)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

Or display the /etc/mtab file


(same result, updated by mount and umount each time they are run)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

43

Unmounting devices
umount /mnt/usbdisk
Commits all pending writes and unmounts the given device,
which can then be removed in a safe way.
To be able to unmount a device, you have to close all the
open files in it:
Close applications opening data in the mounted partition
Make sure that none of your shells have a working directory in
this mount point.
You can run the lsof command (list open files) to view
which processes still have open files in the mounted partition.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

44

Free Software tools for embedded systems

C library for the target device


C library options

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

45

glibc
http://www.gnu.org/software/libc/
License: LGPL
C library from the GNU project
Designed for performance, standards compliance and
portability
Found on all GNU / Linux host systems
Quite big for small embedded systems: about 1.7MB on
Familiar Linux iPAQs (libc: 1.2 MB, libm: 500 KB)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

46

uClibc
http://www.uclibc.org/ for CodePoet Consulting
License: LGPL
Lightweight C library for small embedded systems, with
most features though.
The whole Debian Woody was ported to it...
You can assume it satisfied most needs!
Example size (arm): approx. 400KB (libuClibc: 300
KB, libm: 55KB)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

47

Honey, I shrunk the programs!

C program

Compiled with shared libraries


glibc

uClibc

Compiled statically
glibc

uClibc

Plain hello world

4.6 K

4.4 K

475 K

25 K

Busybox

245 K

231 K

843 K

311 K

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

48

C library options - Summary


GNU C library (glibc)
Designed for performance compliance with standards.
Best for servers and desktops.
uClibc
Highly compatible, but implemented for small embedded
systems with little storage space and RAM

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

49

Free Software tools for embedded systems

GNU / Linux workstation


Cross-compiling toolchains

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

50

Standalone toolchain build


Building a cross-compiling toolchain by yourself is a difficult and painful task!
Can take days or weeks!
Lots of details to learn. Several components to build (building gcc twice: once for gcc
+ once for compilers that need the C library).
Lots of decisions to make (such as glibc version and configuration for your platform)
Need kernel headers and glibc sources
Need to be familiar with current gcc issues and patches on your platform
Useful to be familiar with building and configuring tools
http://www.aleph1.co.uk/armlinux/docs/toolchain/toolchHOWTO.pdf can show you
how fun it can be!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

51

Get a precompiled toolchain


Can get one from several locations... Just need to know where!
Caution
Toolchains are not always relocatable! Install them in
/usr/local/<arch> if they were built for this location.
Make sure the toolchain you pick suits your needs:
CPU, endianism, C library and compiler versions...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

52

uClibc toolchains
Free Electrons uClibc toolchains
http://free-electrons.com/community/tools/uclibc
Run on i386 GNU/Linux
Supported platforms
arm, armeb, i386, m68k, ppc, mips, mipsel, sh
Quickly updated at each new uClibc release!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

53

Platform specific toolchains


ARM
Code Sourcery (glibc only, used by many):
http://www.codesourcery.com/gnu_toolchains/arm/
Also available for Solaris and Windows workstations.
ftp://ftp.handhelds.org/projects/toolchain/ (glibc only)
MIPS
http://www.linux-mips.org/wiki/Toolchains (useful links)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

54

Toolchain building utilities


Buildroot: http://buildroot.uclibc.org/
Dedicated Makefile to build uClibc based toolchains and even
entire root filesystems.
Downloads sources and applies patches.
Crosstool: http://www.kegel.com/crosstool/
Dedicated script to build glibc based toolchains
Doesnt support uClibc yet.
Downloads sources and applies patches.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

55

gcc / glibc / binutils / kernel versions


Crosstool build reports: http://kegel.com/crosstool/crosstool-0.38/buildlogs/

Can help you


to find a
working
combination
of gcc,
glibc,
binutils
and kernel
headers
versions!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

56

PTXdist
http://www.pengutronix.de/software/ptxdist_en.html - A crosscompiling toolkit project
Makes it easier to cross-compile a complete embedded Linux
system
Supports 10 different platforms.
Uses crosstool to build cross compiler toolchains
Supported libraries: glibc, uClibc
uClibc support is still experimental/buggy...
Supports around 200 different packages.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

57

PTXdist Screenshots

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

58

Toolchains - useful resources


CE Linux Forum's toolchain page
http://tree.celinuxforum.org/CelfPubWiki/ToolChains

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

59

Cross-compiling toolchains - Summary


Building a toolchain by yourself
Tough and very long to master.
Ready-to-use toolchains
Available from several locations for most platforms.
Tools to build toolchains: Buildroot and Crosstool.
Make it easy to create a toolchain for your exact needs.
Build systems: PTXdist
Help you to create and populate root filesystem.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

60

Free Software tools for embedded systems

GNU / Linux workstation


Emulators

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

61

Usefulness of emulators
Of course, can be used to run another system on top of another.
Can be used to test kernel and applications ahead of time on a
workstation without the cost of a development board.
Make the operating system easier to debug than with actual
hardware. If the OS freezes, just restart the emulator.
Can provide debugger and tracing capabilities
However, often emulate only the processor and a few devices at
best. Don't replace the real hardware or a development board.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

62

qemu
http://qemu.org
Fast processor emulator
using a portable dynamic translator.
2 operating modes
Full system emulation: processor and various peripherals
Supported: x86, x86_64, ppc, arm, sparc, mips
User mode emulation (Linux host only): can run applications
compiled for another CPU.
Supported: x86, ppc, arm, sparc, mips

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

63

qemu examples (1)


User emulation
Easy to run Busybox for arm on i386 GNU / Linux:
qemu-arm -L /usr/local/arm/3.3.2 \
/home/bart/arm/busybox-1.00-pre8/busybox ls
-L: target C library binaries path (here cross-compiler toolchain path)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

64

qemu examples (2)


System emulation
Even easier to run: (x86 example)
qemu linux.img
linux.img: full partition image including the kernel
Other options let the user choose an external kernel, kernel command
line options, and many more!
Lots of images of free operating systems available on http://free.oszoo.org!
Of course, qemu can also be used to run Windows or MacOS X on your
computer (provided you have a legal copy of the system).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

65

ARM emulators
Only Free Software, of course!
SkyEye: http://skyeye.sourceforge.net
Emulates several ARM platforms (AT91, Xscale...) and can
boot several operating systems (Linux, uClinux, and others)
Softgun: http://softgun.sourceforge.net
Virtual ARM system with many virtual on-board peripherals.
Boots Linux.
SWARM - Software ARM - arm7 emulator
http://www.cl.cam.ac.uk/~mwd24/phd/swarm.html
Can run uClinux

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

66

Other emulators
ColdFire emulator
http://www.slicer.ca/coldfire/
Can boot uClinux

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

67

Emulators - Summary
System emulators
Useful to experiment with a full system, including the kernel
qemu: x86, x86_64, arm, sparc, ppc, mips
SkyEye: several arm architectures
User emulators
Useful to run or debug user space binaries for other CPUs
qemu: x86, arm, sparc, ppc, mips

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

68

Free Software tools for embedded systems

Tools for the target device


Busybox

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

69

General purpose toolbox: busybox


http://www.busybox.net/ from Codepoet Consulting
Most Unix command line utilities within a single executable!
Even includes a web server!
Sizes less than < 500 KB (statically compiled with uClibc) or less
than 1 MB (statically compiled with glibc)
Easy to configure which features to include
The best choice for
Initrds with complex scripts
Small embedded systems with little RAM or ROM.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

70

Busybox commands!
addgroup, adduser, adjtimex, ar, arping, ash, awk, basename, bunzip2,
bzcat, cal, cat, chgrp, chmod, chown, chroot, chvt, clear, cmp, cp, cpio,
crond, crontab, cut, date, dc, dd, deallocvt, delgroup, deluser, devfsd,
df, dirname, dmesg, dos2unix, dpkg, dpkg-deb, du, dumpkmap, dumpleases,
echo, egrep, env, expr, false, fbset, fdflush, fdformat, fdisk, fgrep,
find, fold, free, freeramdisk, fsck.minix, ftpget, ftpput, getopt, getty,
grep, gunzip, gzip, halt, hdparm, head, hexdump, hostid, hostname, httpd,
hush, hwclock, id, ifconfig, ifdown, ifup, inetd, init, insmod, install,
ip, ipaddr, ipcalc, iplink, iproute, iptunnel, kill, killall, klogd,
lash, last, length, linuxrc, ln, loadfont, loadkmap, logger, login,
logname, logread, losetup, ls, lsmod, makedevs, md5sum, mesg, mkdir,
mkfifo, mkfs.minix, mknod, mkswap, mktemp, modprobe, more, mount, msh,
mt, mv, nameif, nc, netstat, nslookup, od, openvt, passwd, patch, pidof,
ping, ping6, pipe_progress, pivot_root, poweroff, printf, ps, pwd, rdate,
readlink, realpath, reboot, renice, reset, rm, rmdir, rmmod, route, rpm,
rpm2cpio, run-parts, rx, sed, seq, setkeycodes, sha1sum, sleep, sort,
start-stop-daemon, strings, stty, su, sulogin, swapoff, swapon, sync,
sysctl, syslogd, tail, tar, tee, telnet, telnetd, test, tftp, time, top,
touch, tr, traceroute, true, tty, udhcpc, udhcpd, umount, uname,
uncompress, uniq, unix2dos, unzip, uptime, usleep, uudecode, uuencode,
vconfig, vi, vlock, watch, watchdog, wc, wget, which, who, whoami, xargs,
yes, zcat

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

71

Compiling busybox
Get the latest stable sources from http://busybox.net
Configure busybox (creates a .config file):
make menuconfig
Compile it:
make
Install it:
make install
Logged as root, mount the root fs image (e.g. /mnt/rootfs)
rsync -a _install/ /mnt/rootfs/
(the / characters are needed!)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

72

logread
Part of Busybox.
Shows the messages from syslogd (using its circular buffer).
Replaces /var/log/messages. Reduces space and
complexity (no need to rotate this file).
Use logread -f to output data as the log grows.
Another possibility: use the dmesg command.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

73

tinylogin
http://tinylogin.busybox.net/
Suite of tiny Unix utilities for

0K /usr/sbin/addgroup

Handling logging

24K
16K
36K
28K
24K
12K

Being authenticated
Changing passwords
Maintaining users and groups.
Provides shadow password support
to enhance system security.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Tools and sizes


/usr/sbin/adduser
/sbin/getty
/bin/login
/usr/bin/passwd
/bin/su
/sbin/sulogin

Commands already provided by


Busybox. Just useful if your
system doesn't use busybox.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

74

Free Software tools for embedded systems

Tools for the target device


http and ssh servers

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

75

ssh server and client: dropbear


http://matt.ucc.asn.au/dropbear/dropbear.html
Very small memory footprint ssh server for embedded
systems, compared to OpenSSH.
Satisfies most needs. Both client and server!
Useful to:
Get a remote console on the target device
Copy files to and from the target device
rsync files with the target device

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

76

thttpd
Tiny/Turbo/Throttling HTTP server
http://acme.com/software/thttpd/
Simple
Implements the HTTP/1.1
minimum (or just a little more)
Simple to configure and run.
Small
Executable size: 88K (version
2.25b), Apache 2.0.52: 264K
Very low memory consumption:
does not fork and very careful
about memory consumption.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Portable
Compiles cleanly on most Unixlike operating systems
Fast
About as fast as full-featured
servers. Much faster on very high
loads (because reduces the server
load for the same amount of
work)
Secure
Designed to protect the
webserver machine from attacks.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

77

Other web servers (1)


Busybox http server: http://busybox.net
Tiny: only adds 12 K to Busybox 1.01 (compiled statically with uClibc)!
Sufficient features for many devices with a web interface,
including CGI and http authentication support.
License: GPL
KLone: http://koanlogic.com/kl/cont/gb/html/klone.html
Lightweight but full featured web server for embedded systems.
Can enclose dynamic (written in C/C++ <% code %>) and compressed
content all in an executable of an approximate size of 150 KB.
License: Dual GPL / Commercial.
See also http://linuxdevices.com/news/NS8234701895.html

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

78

Other web servers (2)


Boa: http://www.boa.org/
Designed to be simple, fast and secure.
Unlike thttpd, no particular care for memory or disk footprint though.
Embedded systems: pretty popular, though not targeted by developers.
lighthttpd: http://lighttpd.net
Low footprint server good at managing high loads.
May be useful in embedded systems too.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

79

Free Software tools for embedded systems

GNU / Linux workstation


Commercial toolsets

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

80

Commercial toolsets
Caution: commercial doesn't mean proprietary!
Vendors play fair with the GPL and do make their source code
available to their users, and most of the time, to the community.
As long as they distribute the sources to their users, the GPL
doesn't require vendors to share their sources with any third party.
No issue with all the GPL sources developed by or with the
community.
Graphical toolkits developed by the vendors look proprietary. Their
licenses are not advertised on their websites! You have to be a
customer to know or get a free preview kit to know.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

81

Commercial toolset strengths


Technical advantages

Graphical developments tools

Well tested and supported


kernel and tool versions
Including early patches not
supported by the mainstream
kernel yet
Complete development tool sets:
kernels, toolchains, utilities, binaries
for impressive lists of target
platforms
Integrated utilities for automatic
kernel image, initrd and filesystem
generation.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Development tools available on


multiple platforms: GNU / Linux,
Solaris, Windows...
Support services
Useful if you don't have your
own support resources
Long term support commitment,
even for versions considered as
obsolete by the community, but
not by your users!

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

82

Montavista
http://www.mvista.com/
The market leader
Employs some of the most active kernel hackers, in particular
on the arm platform
All kernel development shared with the community kernel
core and drivers (Linux 2.6 example: preemption option,
many drivers...)
Graphical development tools are proprietary

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

83

TimeSys
http://timesys.com
Similar toolset offering as other vendors.
Great flexibility available to their LinuxLinkTM subscribers
Community friendly: share very interesting and generic
technical whitepapers and articles.
Free Software BSPs (Board Support Packages) available
Linux soft and hard real-time OS product
Development tools seem to be proprietary

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

84

Traditional RTOS vendors


Traditional RTOS expertise. Good for hard real-time needs!
Offer the same development tools for their proprietary RTOS and their
Linux offering. Possible to switch from one to the other if needed.
Though they are big players, they do not seem to participate
much in the development of Linux.
WindRiver: http://windriver.com/linux
The makers of VxWorks operating system. Now advertise both systems.
LynuxWorks BlueCat: http://lynuxworks.com/products/bluecat/
Mainly seems to be a way of surfing the Linux wave and attracting
customers to their LynxOS RTOS (API compatible with Linux).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

85

Other vendors (1)


http://www.metrowerks.com/
Now owns Lineo
Now owned by Freescale

http://koansoftware.com

Similar tool offering, experience and


skills from Lineo.
Embedded applications portfolio.
Example: OpenPDA.
Only advisable for Freescale
customers: now mainly focusing on
Freescale platforms and giving up
support for competing architectures.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Makers of Klinux (http://klinux.org), a


GPL embedded Linux distribution for
industrial applications.
Klinux supports i386 and popular
arm platforms. Other platforms
supported upon request.
Includes several graphical toolkits and
supports hard real-time (RTAI).
Unfortunately, Klinux is GPL but not
available for public download.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

86

Other vendors (2)


http://sysgo.com
ELinOS development toolset,
in particular based on Eclipse
and the Linux Trace Toolkit.
Includes FreeToolBox, a freely
downloadable compiling and
rootfs creating toolchain.
Supports i386, arm and ppc.
Real-time support with RTAI

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

87

Commercial toolsets - Summary


Major vendors
MontaVista, TimeSys (MetroWerks exiting the market)
Involved in Linux development
Smaller vendors
Koan, Sysgo...
Trying to differenciate their products
Traditional RTOS vendors
WindRiver, LynuxWorks
Forced to support Linux to retain market share.
Strong real-time expertise but lack of community involvement

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

88

Free Software tools for embedded systems

Tools for the target device


Precompiled packages, distributions

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

89

Distributions
Distribute ready to use root filesystems.
Can be used by any binary compatible platform.
Example: running Familiar Linux on a mobile phone!
Can be reused even on an early development phase.
No need to create your root filesystem from scratch!
Easy to upgrade, remove or add applications through binary
packages.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

90

Embedded distributions (1)


arm little endian:
Familiar http://familiar.handhelds.org/
Targets PDAs and webpads (Siemens Simpad...)
IpkgFind: http://ipkgfind.handhelds.org/
References lots of arm binary packages for ipkg based
distributions (such as Familiar / OpenZaurus)
Search by category, keyword or name.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

91

Embedded distributions (2)


Emdebian: http://emdebian.sourceforge.net
Supports all embedded platforms
Leverages standard Debian package descriptions and sources,
but adapts them for embedded systems: uClibc usage,
reduced package size (no documentation), removed
dependencies, configured with less options...
Packages not available yet, but makes it possible to build
packages using Debian utilities (such as dpkg-cross)
Unfortunately, lacking momentum in recent months

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

92

Ready-made root filesystems


Very useful to start porting work. Contain C library, and minimum apps.
uClibc
http://www.uclibc.org/toolchains.html (i386, arm, mipsel)
Images include a native gcc toolchain
ARM
http://www.arm.com/linux/linux_download.html
Include kernel image, bootloader, and rootfs (cramfs or packages)
Other architectures
If some readers know more useful links, they are welcome!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

93

Buildroot
http://buildroot.uclibc.org/
Tool to automatically build a ready-made uClibc rootfs with
basic applications and a development toolchain. Useful in
early development.
Can be used to build only uClibc toolchains.
Supports lots of architectures
Automatically downloads source packages
Very easy to use: just run:
make

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

94

Linux porting projects


Useful to find patches, binaries, documentation, toolchains...
Only ports for embedded systems are listed
arm: http://www.arm.linux.org.uk/
m68k: http://www.linux-m68k.org/
mips: http://www.linux-mips.org/
ppc: http://penguinppc.org/embedded/
sh: http://linuxsh.sourceforge.net/
xtensa: http://xtensa.sourceforge.net/

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

95

Conclusion
Commercial distributions and toolsets
Best if you don't have your own support resources and have a sufficient
budget
Really help focusing on your real job: making an embedded device.
You can even subcontract driver development to the vendor
Community distributions and tools
Best if you are on a tight budget
Best if you are willing to build your own embedded Linux expertise and
train your own support resources.
In any case, your products are based on Free Software!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

96

Coding Embedded Linux Applications

Writing Code

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

97

Hello World!

#include <stdio.h>
int main(int argc, char **argv)
{
int i=1;
printf ("Hello World! %d\n", i);
return 0;
}

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

98

A simple Makefile

CC=/usr/local/arm/2.95.3/bin/arm-linux-gcc
CFLAGS=-g
LDFLAGS=-lpthreads
.PHONY: clean all
all: test
clean:
@rm -f *.o *~ test

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

99

Creating Processes using fork()


pid_t pid;
if (pid = fork() ) {
int status;
printf("I'm the parent!\n");
wait(&status);
if( WIFEXITED(status) )
printf("Child exist with status of %d\n", WEXITSTATUS(status));
} else {
printf("I'm the child!\n");
exit(0);

}
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

100

Creating Processes (2)


pid_t pid;
if (pid = fork() ) {
int status;
printf("I'm the parent!\n");
wait(&status);
} else {
printf("I'm the child!\n");
execve(/bin/ls, argv, envp);
}

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

101

Linux Processes Stack


Linux process stack is auto expanding.
A default stack (8Mb) is allocated at process creation.
By default, use of additional stack will automatically trigger
allocation of more stack space.
This behavior can be limited by setting a resource limit on
the stack size.
More on that ahead.

Before going into real time critical sections of your code


make sure stack is allocated!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

102

POSIX Priorities
int sched_setscheduler(pid_t pid, int policy, const struct sched_param *p);
struct sched_param {
...
int sched_priority
};
sched_setscheduler sets both the scheduling policy and the associated parameters
for the process identified by pid. If pid equals zero, the scheduler of the calling
process will be set. The interpretation of the parameter p depends on the selected
policy. Currently, the following three scheduling policies are supported under
Linux: SCHED_FIFO, SCHED_RR, and SCHED_OTHER;
There is also a sched_getscheduler().

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

103

POSIX Priorities (2)


SCHED_OTHER: Default Linux time-sharing scheduling
Priority 0 is reserved for it.
Fair, no starvation.
Use this for any non critical tasks.
SCHED_FIFO: First In-First Out scheduling
Priorities 1 99.
Preemtive.
SCHED_RR: Round Robin scheduling
Like SCHED_FIFO + time slice
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

104

Locking Memory
int mlock(const void *addr, size_t len);
mlock disables paging for the memory in the range starting at addr with
length len bytes.

int mlockall(int flags);


mlockall disables paging for all pages mapped into the address space of
the calling process.
MCL_CURRENT Lock all pages which are currently mapped into the
address space of the process.
MCL_FUTURE Lock all pages which will become mapped into the
address space of the process in the future. These could be for instance
new pages required by a growing heap and stack as well as new memory
mapped files or shared memory regions.
Must lock memory to guarantee real time responses!
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

105

Resource Limits
#include <sys/time.h>
#include <sys/resource.h>
struct rlimit {
rlim_t rlim_cur;
rlim_t rlim_max;
};

/* Soft limit */
/* Hard limit */

int getrlimit(int resource, struct rlimit *rlim)


int setrlimit(int resource, const struct rlimit
*rlim);

These system calls can be used to get/set soft and hard limits
to process resource usage.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

106

Resource Limits Continued


RLIMIT_AS - maximum virtual memory the process can claim.
Sends SIGSEGV after limit passed.
Effect automatic stack expansion

RLIMIT_CPU - Limits CPU time the process can consume.


After soft limits sends SIGXCPU every seocnd.
After hard limit sends SIGKILL.

RLIMIT_DATA - Limits data segment max size.


RLIMIT_STACK - Limits the maximum stack size.
After limit sends SIGKILL.
Must register an alternate stack to handle signal.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

107

Threads and Processes


Stack

Stack

Stack

Stack

Stack

State

State

State

State

State

Signal
Mask
Local
Storage

Signal
Mask
Local
Storage

Signal
Mask
Local
Storage

Signal
Mask
Local
Storage

Signal
Mask
Local
Storage

Thread

Thread

Thread

Thread

Thread 1

Process 124

Process 123
File
Signal
Memory
Descriptors
Handlers

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

File
Signal
Memory
Descriptors
Handlers

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

108

POSIX Threads
Linux uses the POSIX Threads threading mechanism.
Linux threads are Light Weight Process each thread is a task
scheduled by the kernel's scheduler.
Process creation time is roughly double then a thread creation.
But Linux process creation time is relatively low.
To use threads in your code:
#include <pthreads.h>
In your Makefile:
Add -lpthreads -lrt to LDFLAGS.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

109

Creating Threads
Function: pthread_create()
int pthread_create(pthread_t * thread, const pthread_attr_t * attr,
void * (*start_routine)(void *), void * arg);
The pthread_create() routine creates a new thread within a
process. The new thread starts in the start routine start_routine
which has a start argument arg. The new thread has attributes
specified with attr, or default attributes if attr is NULL.
If the pthread_create() routine succeeds it will return 0 and put the
new thread id into thread, otherwise an error number shall be
returned indicating the error.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

110

Creating Thread Attributes


Function: pthread_attr_init()
int pthread_attr_init(pthread_attr_t *attr);
Setting attributes for threads is achieved by filling a thread
attribute object attr of type pthread_attr_t, then passing it as
second argument to pthread_create(3). Passing NULL is
equivalent to passing a thread attribute object with all
attributes set to their default values.
pthread_attr_init initializes the thread attribute object attr and
fills it with default values for the attributes.
Each attribute attrname can be individually set using the function
pthread_attr_set|attrname| and retrieved using the function
pthread_attr_get|attrname|.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

111

Destroying Thread Attributes


Function: pthread_attr_destroy()
int pthread_attr_destroy(pthread_attr_t *attr);
pthread_attr_destroy destroys a thread attribute object, which
must not be reused until it is reinitialized. pthread_attr_destroy
does nothing in the LinuxThreads implementation.
Attribute objects are consulted only when creating a new thread.
The same attribute object can be used for creating several threads.
Modifying an attribute object after a call to pthread_create does
not change the attributes of the thread previously created.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

112

Detach State
Thread Attribute: detachstate
Control whether the thread is created in the joinable state or in the
detached state. The default is joinable state.
In the joinable state, another thread can synchronize on the thread
termination and recover its termination code using pthread_join(3), but
some of the thread resources are kept allocated after the thread
terminates, and reclaimed only when another thread performs
pthread_join(3) on that thread.
In the detached state, the thread resources are immediately freed when it
terminates, but pthread_join(3) cannot be used to synchronize on the
thread termination.
A thread created in the joinable state can later be put in the detached
thread using pthread_detach(3).
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

113

Sched Policy
Thread Attribute: schedpolicy
Select the scheduling policy for the thread: one of
SCHED_OTHER (regular, non-realtime scheduling),
SCHED_RR (realtime, round-robin) or SCHED_FIFO (realtime,
first-in first-out).
Default value: SCHED_OTHER.
The realtime scheduling policies SCHED_RR and SCHED_FIFO
are available only to processes with superuser privileges.
The scheduling policy of a thread can be changed after creation
with pthread_setschedparam(3).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

114

Sched Param
Thread Attribute: schedparam
Contain the scheduling parameters (essentially, the scheduling
priority) for the thread.
Default value: priority is 0.
This attribute is not significant if the scheduling policy is
SCHED_OTHER; it only matters for the realtime policies
SCHED_RR and SCHED_FIFO.
The scheduling priority of a thread can be changed after creation
with pthread_setschedparam(3).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

115

Inherit Sched
Thread Attribute: inheritsched
Indicate whether the scheduling policy and scheduling
parameters for the newly created thread are determined by the
values of the schedpolicy and schedparam attributes
(PTHREAD_EXPLICIT_SCHED) or are inherited from the
parent thread (value PTHREAD_INHERIT_SCHED).

Default value: PTHREAD_EXPLICIT_SCHED.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

116

Destroying Threads
Function: pthread_exit()
void pthread_exit(void * status);
The pthread_exit() routine terminates the currently running thread
and makes status available to the thread that successfully joins,
pthread_join(), with the terminating thread. In addition
pthread_exit() executes any remaining cleanup handlers in the
reverse order they were pushed, pthread_cleanup_push(), after
which all appropriate thread specific destructors are called.
An implicit call to pthread_exit() is made if any thread, other than
the thread in which main() was first called, returns from the start
routine specified in pthread_create().

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

117

Waiting for a thread to finish


Function: pthread_join()
int pthread_join(pthread_t thread, void ** status);
If the target thread thread is not detached and there are no other
threads joined with the specified thread then the pthread_join()
function suspends execution of the current thread and waits for the
target thread thread to terminate. Otherwise the results are
undefined.
On a successful call pthread_join() will return 0, and if status is
non NULL then status will point to the status argument of
pthread_exit(). On failure pthread_join() will return an error
number indicating the error.
Also exists pthread_tryjoin_np() and pthread_timedjoin_np().
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

118

Canceling Threads
Function: pthread_cancel()
int pthread_cancel(pthread_t thread);
Cancellation is the mechanism by which a thread can terminate
the execution of another thread. More precisely, a thread can send
a cancellation request to another thread. Depending on its settings,
the target thread can then either ignore the request, honor it
immediately, or defer it till it reaches a cancellation point.
When a thread eventually honors a cancellation request, it
performs as if pthread_exit(PTHREAD_CANCELED) has been
called at that point.
pthread_cancel sends a cancellation request to the thread denoted
by the thread argument.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

119

Cancel State
Function: pthread_setcancelstate()
int pthread_setcancelstate(int state, int *oldstate);
pthread_setcancelstate changes the cancellation state for the
calling thread -- that is, whether cancellation requests are
ignored or not.
The state argument is the new cancellation state: either
PTHREAD_CANCEL_ENABLE to enable cancellation, or
PTHREAD_CANCEL_DISABLE to disable cancellation
(cancellation requests are ignored).
If oldstate is not NULL, the previous cancellation state is stored
in the location pointed to by oldstate, and can thus be restored
later by another call to pthread_setcancelstate.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

120

Cancel Type
Function: pthread_setcanceltype()
int pthread_setcanceltype(int type, int *oldtype);
pthread_setcanceltype changes the type of responses to
cancellation requests for the calling thread: asynchronous
(immediate) or deferred.
The type argument is the new cancellation type: either
PTHREAD_CANCEL_ASYNCHRONOUS to cancel the calling
thread as soon as the cancellation request is received, or
PTHREAD_CANCEL_DEFERRED to keep the cancellation
request pending until the next cancellation point. If old type is
not NULL, the previous cancellation state is stored in the location
pointed to by oldtype, and can thus be restored later by another
call to pthread_setcanceltype.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

121

Cancel Point Defaults


Threads are always created by pthread_create(3) with
cancellation enabled and deferred. That is, the initial
cancellation state is PTHREAD_CANCEL_ENABLE and the
initial type is PTHREAD_CANCEL_DEFERRED.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

122

Cancel Points
Cancellation points are those points in the program execution where a
test for pending cancellation requests is performed and cancellation is
executed if positive.
The following POSIX threads functions are cancellation points:
pthread_join(3), pthread_cond_wait(3), pthread_cond_timedwait(3),
pthread_testcancel(3), sem_wait(3), sigwait(3)
All other POSIX threads functions are guaranteed not to be
cancellation points. That is, they never perform cancellation in deferred
cancellation mode.
A number of system calls (basically, all system calls that may block,
such as read(2), write(2), wait(2), etc.) and library functions that
may call these system calls (e.g. fprintf(3)) are cancellation points.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

123

Test Cancel
Function: pthread_testcancell()
void pthread_testcancel(void);
pthread_testcancel does nothing except testing for pending
cancellation and executing it. Its purpose is to introduce explicit
checks for cancellation in long sequences of code that do not
call cancellation point functions otherwise.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

124

Pthread Once
Function: pthread_once()
int pthread_once(pthread_once_t *once_control,
void(*init_routine) (void));
The purpose of pthread_once is to ensure that a piece of
initialization code is executed at most once. The once_control
argument points to a static or extern variable statically
initialized to PTHREAD_ONCE_INIT.
The first time pthread_once is called with a given once_control
argument, it calls init_routine with no argument and changes the
value of the once_control variable to record that initialization has
been per formed. Subsequent calls to pthread_once with the same
once_control argument do nothing
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

125

Cleanup Handlers
Cleanup handlers are functions that get called when a thread
terminates, either by calling pthread_exit(3) or because of
cancellation.
Cleanup handlers are installed and removed following a stack-like
discipline.
The purpose of cleanup handlers is to free the resources that a
thread may hold at the time it terminates.
In particular, if a thread exits or is canceled while it owns a
locked mutex, the mutex will remain locked forever and prevent
other threads from executing normally. The best way to avoid
this is, just before locking the mutex, to install a cleanup handler
whose effect is to unlock the mutex.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

126

Cleanup Push
Function: pthread_cleanup_push()
void pthread_cleanup_push(void (*routine) (void *), void *arg);
pthread_cleanup_push installs the routine function with
argument arg as a cleanup handler. From this point on to the
matching pthread_cleanup_pop, the function routine will be called
with arguments arg when the thread terminates, either
through pthread_exit(3) or by cancellation.
If several cleanup handlers are active at that point, they are
called in LIFO order: the most recently installed handler is called
first.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

127

Cleanup Pop
Function: pthread_cleanup_pop()
void pthread_cleanup_pop(int execute);
pthread_cleanup_pop removes the most recently installed cleanup
handler.
If the execute argument is not 0, it also executes the handler, by
calling the routine function with arguments arg.
If the execute argument is 0, the handler is only removed but not
executed.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

128

Cleanup Continued
Matching pairs of pthread_cleanup_push and
pthread_cleanup_pop must occur in the same function, at the same
level of block nesting and behave like brackets.
Here is how to lock a mutex mut in such a way that it will be
unlocked if the thread is canceled while mut is locked:
pthread_cleanup_push(pthread_mutex_unlock,
(void *) &mut);
pthread_mutex_lock(&mut);
/* do some work */
pthread_cleanup_pop(1);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

129

Cleanup Extras
pthread_cleanup_push_defer_np is a non-portable extension
that combines pthread_cleanup_push and pthread_setcanceltype(3). It
pushes a cleanup handler just as pthread_cleanup_push does, but also
saves the current cancellation type and sets it to deferred cancellation.
This ensures that the cleanup mechanism is effective even if the thread
was initially in asynchronous cancellation mode.
pthread_cleanup_pop_restore_np pops a cleanup handler
introduced by pthread_cleanup_push_defer_np!, and restores the
cancellation type to its value at the time pthread_cleanup_push_defer_np
was called.
pthread_cleanup_push_defer_np and pthread_cleanup_pop_restore_np
must occur in matching pairs, at the same level of block nesting.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

130

Barrier Init
Function: pthread_barrier_init()
int pthread_barrier_init(pthread_barrier_t *restrict barrier, const
pthread_barrierattr_t *restrict attr, unsigned count);
The pthread_barrier_init() function shall allocate any
resources required to use the barrier referenced by barrier and
shall initialize the barrier with attributes referenced by attr.
The count argument specifies the number of threads that must
call pthread_barrier_wait() before any of them successfully return
from the call. The value specified by count must be greater than
zero.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

131

Barrier Destroy
Function: pthread_barrier_destroy()
int pthread_barrier_destroy(pthread_barrier_t *barrier);
The pthread_barrier_destroy() function shall destroy the barrier
referenced by barrier and release any resources used by the
barrier.
The effect of subsequent use of the barrier is undefined until the
barrier is reinitialized by another call to pthread_barrier_init().
The results are undefined if pthread_barrier_destroy() is called
when any thread is blocked on the barrier, or if this function is
called with an uninitialized barrier.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

132

Barrier Wait
Function: pthread_barrier_wait()
int pthread_barrier_wait(pthread_barrier_t *barrier);
The pthread_barrier_wait() function shall synchronize participating
threads at the barrier referenced by barrier. The calling thread shall
block until the required number of threads have called
pthread_barrier_wait() specifying the barrier.
When the required number of threads have called pthread_barrier_wait()
specifying the barrier, the constant
PTHREAD_BARRIER_SERIAL_THREAD shall be returned to one
unspecified thread and zero shall be returned to each of the remaining
threads. At this point, the barrier shall be reset to the state it had as a
result of the most recent pthread_barrier_init() function that referenced
it.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

133

Linux IPC
POSIX IPC
SysV IPC
Pipes
Unix Domain Sockets
Signals
TCP/IP Sockets

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

134

Creating a Mutex
Function: pthread_mutex_init()
int pthread_mutex_init(pthread_mutex_t * mutex, const
pthread_mutex_attr *attr);
The pthread_mutex_init() routine creates a new mutex, with
attributes specified with attr, or default attributes if attr is NULL.
If the pthread_mutex_init() routine succeeds it will return 0 and
put the new mutex id into mutex, otherwise an error number shall
be returned indicating the error.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

135

Destroying a Mutex
Function: pthread_mutex_destroy()
int pthread_mutex_destroy(pthread_mutex_t * mutex);
The pthread_mutex_destroy() routine destroys the mutex specified
by mutex.
If the pthread_mutex_destroy() routine succeeds it will return 0,
otherwise an error number shall be returned indicating the error.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

136

Locking Mutexes
Function: pthread_mutex_lock()
int pthread_mutex_lock(pthread_mutex_t * mutex);
The pthread_mutex_lock() routine shall lock the mutex specified by mutex. If
the mutex is already locked the calling thread blocks until the mutex becomes
available.
If the pthread_mutex_lock() routine succeeds it will return 0, otherwise an error
number shall be returned indicating the error.
Function: pthread_mutex_trylock()
int pthread_mutex_trylock(pthread_mutex_t * mutex);
The pthread_mutex_trylock() routine shall lock the mutex specified by mutex
and return 0, otherwise an error number shall be returned indicating the error.
In all cases the pthread_mutex_trylock() routine will not block the current
running thread.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

137

Locking Mutexes
Function: pthread_mutex_timedlock()
int pthread_mutex_timedlock(pthread_mutex_t * mutex,struct timespec *restrict
abs_timeout);
The pthread_mutex_timedlock() function shall lock the mutex object referenced
by mutex. If the mutex is already locked, the calling thread shall block until the
mutex becomes available as in the pthread_mutex_lock() function. If the mutex
cannot be locked without waiting for another thread to unlock the mutex, this
wait shall be terminated when the specified timeout expires.
The timeout shall expire when the absolute time specified by abs_timeout
passes, as measured by the clock on which timeouts are based (that is, when the
value of that clock equals or exceeds abs_timeout), or if the absolute time
specified by abs_timeout has already been passed at the time of the call.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

138

Unlocking Mutexes
Function: pthread_mutex_unlock()
int pthread_mutex_unlock(pthread_mutex_t * mutex);
If the current thread is the owner of the mutex specifed by mutex,
then the pthread_mutex_unlock() routine shall unlock the mutex.
If there are any threads blocked waiting for the mutex, the the
scheduler will determine which thread obtains the lock on the
mutex, otherwise the mutex is available to the next thread that
calls the routine pthread_mutex_lock(), or
pthread_mutex_trylock().
If the pthread_mutex_unlock() routine succeeds it will return 0,
otherwise an error number shall be returned indicating the error.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

139

Priority Inversion
2. High Priority task preempts low
priority task

Task Priority

99
3. Hi Priority task block on mutex

50

4. Medium Priority task preempts


low priority task and high
priority task

3
1. Low Priority task takes mutex

Time
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

140

Priority Inheritance and Ceilings


Priority inheritance and ceilings are methods to protect
against priority inversions.
Linux only got support for them in 2.6.18-rc2, just a couple
of weeks ago.
Patches to add support for older version exists.
Embedded Linux vendors usually provide patched kernels.

If the kernel version you're using is not patched, make sure to


protect against this scenario in design
One possible way: raise the priority of each tasks trying to
grab a mutex to the maximum priority of all possible
contenders.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

141

POSIX Condition Variables


int pthread_cond_wait(pthread_cond_t *cond,
pthread_mutex_t *mutex);
int pthread_cond_timedwait(pthread_cond_t *cond,
pthread_mutex_t
*mutex, const struct timespec *abstime);
int pthread_cond_signal(pthread_cond_t *cond);
int pthread_cond_broadcast(pthread_cond_t *cond);
The first two function calls are used for waiting on the condition var.
The latter two function calls are used to wake up waiting tasks:
pthread_cond_signal() call wakes up one single task that is waiting for the
condition variable
pthread_cond_broadcast() call wakes up all waiting tasks.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

142

Condition Variable Usage


void sem_wait(sem_t *sem) {
pthread_mutex_lock(&sem->lock);
while (sem->count == 0)
pthread_cond_wait(&sem->cond, &sem->lock);
sem->count++;
pthread_mutex_unlock(&sem->lock);
}
void sem_post(sem_t *sem) {
pthread_mutex_lock(&sem->lock);
sem->count--;
if (sem->count <= 0)
pthread_cond_signal(&sem->cond);
pthread_mutex_unlock(&sem->lock);
}

The condition variable function calls are used within an area protected by the
mutex that belong to the condition variable. The operating system releases the
mutex every time it blocks a task on the condition variable; and it has locked
the mutex again when it unblocks the calling task from the signaling call.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

143

POSIX Semaphores
#include <semaphore.h>
int sem_init(sem_t *sem, int pshared, unsigned int
value);
int sem_wait(sem_t * sem);
int sem_trywait(sem_t * sem);
int sem_post(sem_t * sem);
int sem_getvalue(sem_t * sem, int * sval);
int sem_destroy(sem_t * sem);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

144

Shared Memory using mmap


void * mmap(void *start, size_t length, int prot , int flags, int fd, off_t
offset);
The mmap function asks to map length bytes starting at offset offset
from the file (or other object) specified by the file descriptor fd into
memory, preferably at address start. This latter address is a hint only,
and is usually specified as 0. The actual place where the object is
mapped is returned by mmap, and is never 0.
The return value in case of an error is MAP_FAILED (-1) and NOT 0!
MAP_SHARED flag: Share this mapping with all other processes that
map this bject. Storing to the region is equivalent to writing to the file.
The file may not actually be updated until msync(2) or munmap(2)
are called.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

145

POSIX Shared Memory


void * shm_open(const char *name, int oflag, mode_t mode);
shm_open creates and opens a new, or opens an existing, POSIX
shared memory object. A POSIX shared memory object is in effect a
handle which can be used by unrelated processes to mmap(2) the same
region of shared memory. The shm_unlink function performs the
converse opera tion, removing an object previously created by
shm_open.
The operation of shm_open is analogous to that of open(2). name
specifies the shared memory object to be created or opened. For
portable use, name should have an initial slash (/) and contain no
embedded slashes.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

146

SysV Mailboxes
key_t ftok(const char *pathname, int proj_id);
int msgget(key_t key, int msgflg);
int msgsnd(int msqid, struct msgbuf *msgp, size_t msgsz, int msgflg);
ssize_t msgrcv(int msqid, struct msgbuf *msgp, size_t msgsz, long
msgtyp, int msgflg);
Ftok generates a key from the pathname and project id.
Msgget creates a Mailbox from the key
Msgsnd sends a message and msgrcv retrieves it.
The different flags can control messages types (priorities), if receiving is
a blocking or non blocking operations and mailbox permissions.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

147

POSIX Pipes
int pipe(int filedes[2]);
pipe creates a pair of file descriptors, pointing to a pipe inode, and
places them in the array pointed to by filedes. filedes[0] is
forreading, filedes[1] is for writing.
FILE *popen(const char *command, const char *type);
The popen() function opens a process by creating a pipe, forking,
and invoking the shell. Since a pipe is by definition
unidirectional, the type argument may specify only reading or
writing, not both; the resulting stream is correspondingly readonly or write-only.
A named version of pipes, called fifo also exists.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

148

UNIX Domain Sockets


Files in the file system that acts like sockets.
All normal socket operations applies.
struct sockaddr_un server;
int sock;
sock = socket(AF_UNIX, SOCK_STREAM, 0);
server.sun_family = AF_UNIX;
strcpy(server.sun_path, SOCKET_NAME);
bind(sock, (struct sockaddr *) &server, sizeof(struct sockaddr_un));
UDS can be used to pass file descriptors between processes using
a special socket option operation.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

149

Signals
typedef void (*sighandler_t)(int);
sighandler_t signal(int signum, sighandler_t handler);
int kill(pid_t pid, int sig);
The signal() system call installs a new signal handler for the signal with
number signum. The signal handler is set to sighandler which may be a
user specified function, or either SIG_IGN or SIG_DFL.
Upon arrival of a signal, if the corresponding handler is set to
SIG_IGN, then the signal is ignored. If the handler is set to
SIG_DFL, then the default action associated to the signal occurs.
Finally, if the handler is set to a function sighandler then first either
the handler is reset to SIG_DFL or an implementation-dependent
blocking of the signal is performed and next sighandler is called with
argument signum.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

150

Regular Signals

SIGHUP
SIGINT
SIGQUIT
SIGILL
SIGABRT
SIGFPE
SIGKILL
SIGSEGV
SIGPIPE
SIGALRM
SIGTERM
SIGUSR1
SIGUSR2
SIGCHLD
SIGCONT
SIGSTOP
SIGTSTP
SIGTTIN
SIGTTOU

Hangup detected on controlling terminal


Interrupt from keyboard
Quit from keyboard
Illegal Instruction
Abort signal from abort(3)
Floating point exception
Kill signal
Invalid memory reference
Broken pipe: write to pipe with no readers
Timer signal from alarm(2)
Termination signal
User-defined signal 1
User-defined signal 2
Child stopped or terminate
Continue if stopped
Stop process
Stop typed at tty
tty input for background process
tty output for background process

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

SIGBUS Bus error (bad memory access)


SIGPOLL Pollable event (Sys V). Synonym of SIGIO
SIGPROF Profiling timer expired
SIGSYS
Bad argument to routine (SVID)
SIGTRAP Trace/breakpoint trap
SIGURG Urgent condition on socket (4.2 BSD)
SIGIO
I/O now possible (4.2 BSD)
SIGPWR Power failure (System V)
SIGWINCH
Window resize signal (4.3 BSD, Sun)

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

151

Problem with Regular Signals


Too few signals
Only 2 signals available for user defined uses: USR1, USR2

No queuing
If the same signal is sent multiple times until a task processes
it, it is only delivered once.

No priority
Multiple signals delivered according to order sent.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

152

Real Time Signals


Additional 32 signals
From SIGRTMIN to SIGRTMAX

No predefined meaning
But LinuxThreads lib makes use of the first 3.

Multiple instances of the same signals are queued.


Value can be sent with the signal.
Priority is guaranteed: lowest number real time signals are
delivered first. Same signals are delivered according to order
they were sent.
Regular signals have higher priority then real time signals.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

153

Real Time Signals API


int sigaction(int signum, const struct sigaction *act, struct sigaction *oldact);
Alternate way to register a signal handler.
Allow receiving an info struct with details
Allow sending a pointer with the signal
int sigqueue(pid_t pid, int sig, const union sigval value);
Alternate way to send signals
Allows sending real time signals that queue up

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

154

Signals & Threads


Signal masks are per thread.
Signal handlers are per process.
Exception signals (SIGSEGV, SIGBUS) will be caught by
thread doing the exception.
Other signals will be caught by any thread in the process
whose masks does not block the signal.
Use a signal handler thread that does sigwait(3) to make
thread catching less random.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

155

Processes Timers
#include <sys/time.h>
int getitimer(int which, struct itimerval
*value);
int setitimer(int which, const struct itimerval
*value, struct itimerval *ovalue);

The system provides each process with three interval


timers, each decrementing in a distinct time domain.
When any timer expires, a signal is sent to the process, and
the timer (potentially) restarts.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

156

Timers Expiry
The timer interval is controlled by the following structures:
struct itimerval {
struct timeval it_interval; /* next value */
struct timeval it_value;
/* current value */
};
struct timeval {
long tv_sec;
long tv_usec;
};

/* seconds */
/* microseconds */

Timers decrement from it_value to zero, generate a signal, and


reset to it_interval.
A timer which is set to zero (it_value is zero or the timer expires
and it_interval is zero) stops.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

157

Time Domains
ITIMER_REAL
Decrements in real time, and delivers SIGALRM upon
expiration.

ITIMER_VIRTUAL
Decrements only when the process is executing, and
delivers SIGVTALRM upon expiration.

ITIMER_PROF
Decrements both when the process executes and when the
system is executing on behalf of the process.
SIGPROF is delivered upon expiration.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

158

More on Timers
Timers will never expire before the requested time, instead
expiring some short, constant time afterwards, dependent on
the system timer resolution.
If the timer expires while the process is active (always true
for ITIMER_VIRT) the signal will be delivered
immediately when generated.
Otherwise the delivery will be offset by a small time
dependent on the system loading.
An alternate interface called timer_create() allows
specifying which signal will be sent (utilize real time signals)
and spawning of a thread on timer expiry.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

159

The Unix and GNU / Linux command line

Going further

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

160

Command help
Some Unix commands and most GNU / Linux commands offer
at least one help argument:
-h
(- is mostly used to introduce 1-character options)
--help
(-- is always used to introduce the corresponding long
option name, which makes scripts easier to understand)
You also often get a short summary of options when you input
an invalid argument.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

161

Manual pages
man <keyword>
Displays one or several manual pages for <keyword>
man man
Most available manual pages are about Unix commands, but some
are also about C functions, headers or data structures, or even about
system configuration files!
man stdio.h
man fstab (for /etc/fstab)
Manual page files are looked for in the directories specified by the
MANPATH environment variable.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

162

Info pages
In GNU, man pages are being replaced by info pages. Some
manual pages even tell to refer to info pages instead.
info <command>
info features:
Documentation structured in sections (nodes) and subsections
(subnodes)
Possibility to navigate in this structure: top, next, prev, up
Info pages generated from the same texinfo source as the HTML
documentation pages

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

163

Embedded Linux driver development

Kernel overview
Linux features

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

164

Studied kernel version: 2.6


Linux 2.4

Linux 2.6

Mature

2 years old stable Linux release!

But developments stopped; very


few developers willing to help.

Support from the Linux


development community and all
commercial vendors.

Now obsolete and lacks recent


features.
Still fine if you get your
sources, tools and support from
commercial Linux vendors.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Now mature and more exhaustive.


Most drivers upgraded.
Cutting edge features and
increased performance.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

165

Linux kernel key features


Portability and hardware support
Runs on most architectures.
Scalability
Can run on super computers as
well as on tiny devices
(4 MB of RAM is enough).
Compliance to standards and
interoperability.
Exhaustive networking support.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Security
It can't hide its flaws. Its code is
reviewed by many experts.
Stability and reliability.
Modularity
Can include only what a system
needs even at run time.
Easy to program
You can learn from existing code.
Many useful resources on the net.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

166

Supported hardware architectures


See the arch/ directory in the kernel sources
Minimum: 32 bit processors, with or without MMU
32 bit architectures (arch/ subdirectories)
alpha, arm, cris, frv, h8300, i386, m32r, m68k, m68knommu,
mips, parisc, ppc, s390, sh, sparc, um, v850, xtensa
64 bit architectures:
ia64, mips64, ppc64, sh64, sparc64, x86_64
See arch/<arch>/Kconfig, arch/<arch>/README, or
Documentation/<arch>/ for details

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

167

Embedded Linux driver development

Kernel overview
Kernel code

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

168

Implemented in C
Implemented in C like all Unix systems.
(C was created to implement the first Unix systems)
A little Assembly is used too:
CPU and machine initialization, critical library routines.
See http://www.tux.org/lkml/#s15-3
for reasons for not using C++
(main reason: the kernel requires efficient code).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

169

Compiled with GNU C


Need GNU C extensions to compile the kernel.
So, you cannot use any ANSI C compiler!
Some GNU C extensions used in the kernel:
Inline C functions
Inline assembly
Structure member initialization
in any order (also in ANSI C99)
Branch annotation (see next page)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

170

Help gcc to optimize your code!


Use the likely and unlikely statements
(include/linux/compiler.h)
Example:
if (unlikely(err)) {
...
}
The GNU C compiler will make your code faster
for the most likely case.
Used in many places in kernel code!
Don't forget to use these statements!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

171

No C library
The kernel has to be standalone and can't use user-space code.
Userspace is implemented on top of kernel services, not the opposite.
Kernel code has to supply its own library implementations
(string utilities, cryptography, uncompression ...)
So, you can't use standard C library functions in kernel code.
(printf(), memset(), malloc()...).
You can also use kernel C headers.
Fortunately, the kernel provides similar C functions for your
convenience, like printk(), memset(), kmalloc() ...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

172

Kernel Stack
Very small and fixed stack.
2 page stack (8k), per task.
Or 1 page stack, per task and one for interrupts.
2.6

Chosen in build time via menu.


Not for all architectures

For some architectures, the kernel provides debug facility to


detect stack overruns.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

173

Managing endianism
Linux supports both little and big endian architectures
Each architecture defines __BIG_ENDIAN or __LITTLE_ENDIAN
in <asm/byteorder.h>
Can be configured in some platforms supporting both.
To make your code portable, the kernel offers conversion macros
(that do nothing when no conversion is needed). Most useful ones:
u32 cpu_to_be32(u32); // CPU byte order to big endian
u32 cpu_to_le32(u32); // CPU byte order to little endian
u32 be32_to_cpu(u32); // Little endian to CPU byte order
u32 le32_to_cpu(u32); // Big endian to CPU byte order

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

174

Kernel coding guidelines


Never use floating point numbers in kernel code. Your code
may be run on a processor without a floating point unit (like on
arm). Floating point can be emulated by the kernel, but this is
very slow.
Define all symbols as static, except exported ones (avoid name
space pollution)
All system calls return negative numebrs (error codes) for
errors:
#include <linux/errno.h>

See Documentation/CodingStyle for more guidelines

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

175

Kernel log
Printing to the kernel log is done via the printk function.
The kernel keeps the messages in a circular buffer
(so that doesn't consume more memory with many messages)
Kernel log messages can be accessed from user space through system
calls, or through /proc/kmsg
Kernel log messages are also displayed in the system console.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

176

printk
The printk function:
Similar to stdlib's printf(3)
No floating point format.
Log message are prefixed with a <1>, where the number
denotes severity, from 1 (most severe) to 8.
Macros are defined to be used for severity levels:
KERN_EMERG, KERN_ALERT, KERT_CRIT,
KERN_ERR, KERN_WARNING, KERN_NOTICE,
KERN_INFO, KERN_DEBUG.
Usage example:
printk(KERN_DEBUG Hello World number %d\n, num);
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

177

Accessing the kernel log


Many ways are available!
Watch the system console
syslogd
Daemon gathering kernel messages
in /var/log/messages
Follow changes by running:
tail -f /var/log/messages
Caution: this file grows!
Use logrotate to control this
dmesg
Found in all systems
Displays the kernel log buffer

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

logread
Same. Often found in small
embedded systems with no
/var/log/messages or no
dmesg. Implemented by Busybox.
cat /proc/kmsg
Waits for kernel messages and
displays them.
Useful when none of the above
user space programs are available
(tiny system)

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

178

Linked Lists
Many constructs use doubly-linked lists.
List definition and initialization:
struct list_head mylist = LIST_HEAD_INIT(mylist);

or
LIST_HEAD(mylist);

or
INIT_LIST_HEAD(&mylist);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

179

List Manipulation
List definition and initialization:
void list_add(struct list_head *new, struct list_head
*head);
void list_add_tail(struct list_head *new, struct
list_head *head);
void list_del(struct list_head *entry);
void list_del_init(struct list_head *entry);
void list_move(struct list_head *list, struct list_head
*head);
void list_add_tail(struct list_head *list, struct
list_head *head);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

180

List Manipulation (cont.)


List splicing and query:
void list_splice(struct list_head *list, struct
list_head *head);
void list_add_splice_init(struct list_head *list,
struct list_head *head);
void list_empty(struct list_head *head);

In 2.6, there are variants of these API's for RCU protected


lists (see section about Locks ahead).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

181

List Iteration
Lists also have iterator macros defined:
list_for_each(pos, head);
list_for_each_prev(pos, head);
list_for_each_safe(pos, n, head);
list_for_each_entry(pos, head, member);

Example:
struct mydata *pos;
list_for_each_entry(pos, head, dev_list) {
pos->some_data = 0777;
}

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

182

Embedded Linux driver development

Kernel overview
Kernel subsystems

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

183

Kernel architecture
User
space

Kernel
space

Hardware

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

184

Kernel Mode vs. User Mode


All modern CPUs support a dual mode of operation:
User mode, for regular tasks.
Supervisor (or privileged) mode, for the kernel.

The mode the CPU is in determines which instructions the


CPU is willing to execute:
Sensitive instructions will not be executed when the CPU is
in user mode.

The CPU mode is determined by one of the CPU registers,


which stores the current Ring Level
0 for supervisor mode, 3 for user mode, 1-2 unused by Linux.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

185

The System Call Interface


When a user space tasks needs to use a kernel service, it will
make a System Call.
The C library places parameters and number of system call in
registers and then issues a special trap instruction.
The trap atomically changes the ring level to supervisor
mode and the sets the instruction pointer to the kernel.
The kernel will find the required system called via the system
call table and execute it.
Returning from the system call does not require a special
instruction, since in supervisor mode the ring level can be
changed directly.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

186

Linux System Call Path

Kernel

do_name()
sys_name()

Function call
Trap

entry.S

Task

Glibc
Task

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

187

Kernel memory constraints


Who can look after the kernel?
No memory protection
Accessing illegal memory
locations result in (often fatal)
kernel oopses.
Fixed size stack (8 or 4 KB)
Unlike in userspace,
no way to make it grow.
Kernel memory can't be swapped
out (for the same reasons).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

location

Userspace memory management


Used to implement:
- memory protection
- stack growth
- memory swapping to disk

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

188

Embedded Linux driver development

Kernel overview
Linux versioning scheme and development process

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

189

Linux Kernel Development Timeline

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

190

Linux stable releases


Major versions
1 major version every 2 or 3 years
Examples: 1.0, 2.0, 2.4, 2.6

Even number

Stable releases
1 stable release every 1 or 2 months
Examples: 2.0.40, 2.2.26, 2.4.27, 2.6.7 ...
Stable release updates (since March 2005)
Updates to stable releases up to several times a week
Address only critical issues in the latest stable release
Examples: 2.6.11.1 to 2.6.11.7

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

191

Linux development and testing releases


Testing releases
Several testing releases per month, before the next stable one.
You can contribute to making kernel releases more stable by
testing them!
Example: 2.6.12-rc1
Development versions
Unstable versions used by kernel developers
before making a new stable major release
Examples: 2.3.42, 2.5.74
Odd number

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

192

Continued development in Linux 2.6


Since 2.6.0, kernel developers have been able to introduce lots
of new features one by one on a steady pace, without having to
make major changes in existing subsystems.
Opening a new Linux 2.7 (or 2.9) development branch will be
required only when Linux 2.6 is no longer able to accommodate
key features without undergoing traumatic changes.
Thanks to this, more features are released to users at a faster pace.
However, the internal kernel API can undergo changes between
two 2.6.x releases. A module compiled for a given version may
no longer compile or work on a more recent one.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

193

Linux Kernel Development Process

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

194

What's new in each Linux release? (1)


commit 3c92c2ba33cd7d666c5f83cc32aa590e794e91b0
Author: Andi Kleen <ak@suse.de>
Date: Tue Oct 11 01:28:33 2005 +0200
[PATCH] i386: Don't discard upper 32bits of HWCR on K8
Need to use long long, not long when RMWing a MSR. I think
it's harmless right now, but still should be better fixed
if AMD adds any bits in the upper 32bit of HWCR.
Bug was introduced with the TLB flush filter fix for i386
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
...

The official list of changes for each Linux release is just a


huge list of individual patches!
Very difficult to find out the key changes and to get the
global picture out of individual changes.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

195

What's new in each Linux release? (2)


Fortunately, a summary of key changes
with enough details is available on
http://wiki.kernelnewbies.org/LinuxChanges

??!

For each new kernel release, you can also get the
changes in the kernel internal API:
http://lwn.net/Articles/2.6-kernel-api/
What's next?
Documentation/feature-removal-schedule.txt
lists the features, subsystems and APIs that are
planned for removal (announced 1 year in advance).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

196

Embedded Linux driver development

Kernel overview
Kernel user interface

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

197

Mounting virtual filesystems


Linux makes system and kernel information available in userspace through virtual filesystems (virtual files not existing on any
real storage). No need to know kernel programming to access this!

Mounting /proc:
mount -t proc none /proc
Mounting /sys:
mount -t sysfs none /sys

Filesystem type

Mount point
Raw device
or filesystem image
In the case of virtual
filesystems, any string is fine

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

198

Kernel userspace interface


A few examples:
/proc/cpuinfo: processor information
/proc/meminfo: memory status
/proc/version: version and build information
/proc/cmdline: kernel command line
/proc/<pid>/environ: calling environment
/proc/<pid>/cmdline: process command line
... and many more! See by yourself!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

199

Userspace interface documentation


Lots of details about the /proc interface are available in
Documentation/filesystems/proc.txt
(almost 2000 lines) in the kernel sources.
You can also find other details in the proc manual page:
man proc
See the New Device Model section for details about /sys

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

200

Embedded Linux driver development

Compiling and booting Linux


Getting the sources

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

201

Linux kernel size


Linux 2.6.16 sources:
Raw size: 260 MB (20400 files, approx 7 million lines of code)
bzip2 compressed tar archive: 39 MB (best choice)
gzip compressed tar archive: 49 MB
Minimum compiled Linux kernel size (with Linux-Tiny patches)
approx 300 KB (compressed), 800 KB (raw)
Why are these sources so big?
Because they include thousands of device drivers, many network
protocols, support many architectures and filesystems...
The Linux core (scheduler, memory management...) is pretty small!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

202

kernel.org

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

203

Getting Linux sources: 2 possibilities


Full sources
The easiest way, but longer to download.
Example:
http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.14.1.tar.bz2
Or patch against the previous version
Assuming you already have the full sources of the previous version
Example:
http://kernel.org/pub/linux/kernel/v2.6/patch-2.6.14.bz2 (2.6.13 to 2.6.14)
http://kernel.org/pub/linux/kernel/v2.6/patch-2.6.14.7.bz2 (2.6.14 to 2.6.14.7)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

204

Downloading full kernel sources


Downloading from the command line
With a web browser, identify the version you need on http://kernel.org
In the right directory, download the source archive and its signature
(copying the download address from the browser):
wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.11.12.tar.bz2
wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.11.12.tar.bz2.sign

Check the electronic signature of the archive:


gpg --verify linux-2.6.11.12.tar.bz2.sign

Extract the contents of the source archive:


tar jxvf linux-2.6.11.12.tar.bz2

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

~/.wgetrc config file for proxies:


http_proxy = <proxy>:<port>
ftp_proxy = <proxy>:<port>
proxy_user = <user> (if any)
proxy_password = <passwd> (if any)

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

205

Downloading kernel source patches (1)


Assuming you already have the linux-x.y.<n-1> version
Identify the patches you need on http://kernel.org with a web browser
Download the patch files and their signature:
Patch from 2.6.10 to 2.6.11
wget ftp://ftp.kernel.org/pub/linux/kernel/v2.6/patch-2.6.11.bz2
wget ftp://ftp.kernel.org/pub/linux/kernel/v2.6/patch-2.6.11.bz2.sign

Patch from 2.6.11 to 2.6.11.12 (latest stable fixes)

wget http://www.kernel.org/pub/linux/kernel/v2.6/patch-2.6.11.12.bz2
wget http://www.kernel.org/pub/linux/kernel/v2.6/patch-2.6.11.12.bz2.sign

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

206

Downloading kernel source patches (2)


Check the signature of patch files:
gpg --verify patch-2.6.11.bz2.sign
gpg --verify patch-2.6.11.12.bz2.sign

Apply the patches in the right order:


cd linux-2.6.10/
bzcat ../patch-2.6.11.bz2 | patch -p1
bzcat ../patch-2.6.11.12.bz2 | patch -p1
cd ..
mv linux-2.6.10 linux-2.6.11.12

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

207

Checking the integrity of sources


Kernel source integrity can be checked through OpenPGP digital signatures.
Full details on http://www.kernel.org/signature.html
If needed, read http://www.gnupg.org/gph/en/manual.html and create a new
private and public keypair for yourself.
Import the public GnuPG key of kernel developers:
gpg --keyserver pgp.mit.edu --recv-keys 0x517D0F0E

If blocked by your firewall, look for 0x517D0F0E on http://pgp.mit.edu/


, copy and paste the key to a linuxkey.txt file:
gpg --import linuxkey.txt
Check the signature of files:
gpg --verify linux-2.6.11.12.tar.bz2.sign

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

208

Anatomy of a patch file


A patch file is the output of the diff command
diff command line
diff -Nru a/Makefile b/Makefile
--- a/Makefile 2005-03-04 09:27:15 -08:00
File date info
+++ b/Makefile 2005-03-04 09:27:15 -08:00
@@ -1,7 +1,7 @@
Line numbers in files
VERSION = 2
Context info: 3 lines before the change
PATCHLEVEL = 6
Useful to apply a patch when line numbers changed
SUBLEVEL = 11
-EXTRAVERSION =
Removed line(s) if any
+EXTRAVERSION = .1
Added line(s) if any
NAME=Woozy Numbat

# *DOCUMENTATION*

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Context info: 3 lines after the change

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

209

Using the patch command


The patch command applies changes to files in the current directory:
Making changes to existing files

zcat diff_file.gz | patch -p<n>


n: number of directory levels to skip in the file paths
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

210

Applying a Linux patch


Linux patches...
Always to apply to the x.y.<z-1> version
Always produced for n=1 (that's what everybody does... do it too!)
Downloadable in gzip and bzip2 (much smaller) compressed files.
Linux patch command line example:
cd linux-2.6.10
bzcat ../patch-2.6.11.bz2 | patch -p1
cd ..; mv linux-2.6.10 linux-2.6.11
Keep patch files compressed: useful to check their signature later.
You can still view (or even edit) the uncompressed data with vi:
vi patch-2.6.11.bz2 (on the fly (un)compression)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

211

Accessing development sources (1)


Kernel development sources are now managed with git
You can browse Linus' git tree (if you just need to check a few files):
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=tree
Get and compile git from http://kernel.org/pub/software/scm/git/
Get and compile the cogito front-end from
http://kernel.org/pub/software/scm/cogito/
If you are behind a proxy, set Unix environment variables defining proxy
settings. Example:
export http_proxy="proxy.server.com:8080"
export ftp_proxy="proxy.server.com:8080"

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

212

Accessing development sources (2)


Pick up a git development tree on http://kernel.org/git/
Get a local copy (clone) of this tree.
Example (Linus tree, the one used for Linux stable releases):
cg-clone http://kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
or cg-clone rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

Update your copy whenever needed (Linus tree example):


cd linux-2.6
cg-update origin
More details available
on http://git.or.cz/ or http://linux.yyz.us/git-howto.html

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

213

Embedded Linux driver development

Compiling and booting Linux


Structure of source files

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

214

Linux sources structure (1)


arch/<arch>
arch/<arch>/mach-<mach>
COPYING
CREDITS
crypto/
Documentation/
drivers/
fs/
include/
include/asm-<arch>
include/linux
init/
ipc/

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Architecture specific code


Machine / board specific code
Linux copying conditions (GNU GPL)
Linux main contributors
Cryptographic libraries
Kernel documentation. Don't miss it!
All device drivers (drivers/usb/, etc.)
Filesystems (fs/ext3/, etc.)
Kernel headers
Architecture and machine dependent headers
Linux kernel core headers
Linux initialization (including main.c)
Code used for process communication

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

215

Linux sources structure (2)


kernel/
lib/
MAINTAINERS
Makefile
mm/
net/
README
REPORTING-BUGS
scripts/
security/
sound/
usr/

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Linux kernel core (very small!)


Misc library routines (zlib, crc32...)
Maintainers of each kernel part. Very useful!
Top Linux makefile (sets arch and version)
Memory management code (small too!)
Network support code (not drivers)
Overview and building instructions
Bug report instructions
Scripts for internal or external use
Security model implementations (SELinux...)
Sound support code and drivers
Early user-space code (initramfs)

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

216

On-line kernel documentation


http://free-electrons.com/kerneldoc/
Provided for all recent kernel releases
Easier than downloading kernel sources to access documentation
Indexed by Internet search engines
Makes kernel pieces of documentation easier to find!
Unlike most other sites offering this service too, also includes an
HTML translation of kernel documents in the DocBook format.
Never forget documentation in the kernel sources! It's a very
valuable way of getting information about the kernel.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

217

Embedded Linux driver development

Compiling and booting Linux


Kernel source management tools

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

218

LXR: Linux Cross Reference


http://sourceforge.net/projects/lxr
Generic source indexing tool and code
browser

Takes a little bit of time and patience to setup


(configuration, indexing, server
configuration).
Initial indexing quite slow:
Linux 2.6.11: 1h 40min on P4 M
1.6 GHz, 2 MB cache

Web server based


Very easy and fast to use
Identifier or text search available
Very easy to find the declaration,
implementation or usages of symbols

You don't need to set up LXR by yourself.


Use our http://lxr.free-electrons.com server!
Other servers available on the Internet:
http://free-electrons.com/community/kernel/lxr/

Supports C and C++


Supports huge code projects such as the
Linux kernel (260 M in Apr. 2006)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

219

LXR screenshot

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

220

Embedded Linux driver development

Compiling and booting Linux


Kernel configuration

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

221

Kernel configuration overview


Makefile edition
Setting the version and target architecture if needed
Kernel configuration: defining what features to include in the kernel:
make [config|xconfig|gconfig|menuconfig|oldconfig]

Kernel configuration file (Makefile syntax) stored


in the .config file at the root of the kernel sources
Distribution kernel config files usually released in /boot/

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

222

Makefile changes
To identify your kernel image with others build from the
same sources, use the EXTRAVERSION variable:
VERSION = 2
PATCHLEVEL = 6
SUBLEVEL = 15
EXTRAVERSION = -acme1
uname -r will return:
2.6.15-acme1

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

223

make xconfig
make xconfig
New Qt configuration interface for Linux 2.6.
Much easier to use than in Linux 2.4!
Make sure you read
help -> introduction: useful options!
File browser: easier to load configuration files

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

224

make xconfig screenshot

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

225

Compiling statically or as a module


Compiled as a module (separate file)
CONFIG_ISO9660_FS=m

Driver options
CONFIG_JOLIET=y
CONFIG_ZISOFS=y

Compiled statically in the kernel


CONFIG_UDF_FS=y
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

226

make config / menuconfig / gconfig


make config
Asks you the questions 1 by 1. Extremely long!
make menuconfig
Same old text interface as in Linux 2.4.
Useful when no graphics are available.
Pretty convenient too!
make gconfig
New GTK based graphical configuration interface.
Functionality similar to that of make xconfig.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

227

make oldconfig
make oldconfig
Needed very often!
Useful to upgrade a .config file from an earlier kernel
release
Issues warnings for obsolete symbols
Asks for values for new symbols
If you edit a .config file by hand, it's strongly recommended
to run make oldconfig afterwards!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

228

make allnoconfig
make allnoconfig
Only sets strongly recommended settings to y.
Sets all other settings to n.
Very useful in embedded systems to select only the minimum
required set of features and drivers.
Much more convenient than unselecting hundreds of features
one by one!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

229

make help
make help
Lists all available make targets
Useful to get a reminder, or to look for new or advanced
options!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

230

Embedded Linux driver development

Compiling and booting Linux


Compiling the kernel

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

231

Compiling and installing the kernel


Compiling step
make
Install steps (logged as root!)
make install
make modules_install

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

232

Dependency management
When you modify a regular kernel source file, make only
rebuilds what needs recompiling. That's what it is used for.
However, the Makefile is quite pessimistic about
dependencies. When you make significant changes to the
.config file, make often redoes much of the compile job!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

233

Compiling faster with ccache


http://ccache.samba.org/
Compiler cache for C and C++, already shipped by some distributions
Much faster when compiling the same file a second time!
Very useful when .config file change are frequent.
Use it by adding a ccache prefix
to the CC and HOSTCC definitions in Makefile:
CC
= ccache $(CROSS_COMPILE)gcc
HOSTCC
= ccache gcc
Performance benchmarks:
-63%: with a Fedora Core 3 config file (many modules!)
-82%: with an embedded Linux config file (much fewer modules!)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

234

Kernel compiling tips


View the full (gcc, ld...) command line:
make V=1
Clean-up generated files
(to force re-compiling drivers):
make clean
Remove all generated files
(mainly to create patches)
Caution: also removes your .config file!
make mrproper

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

235

Generated files
Created when you run the make command
vmlinux
Raw Linux kernel image, non compressed.
arch/<arch>/boot/zImage
zlib compressed kernel image

(default image on arm)

arch/<arch>/boot/bzImage
(default image on i386)
Also a zlib compressed kernel image.
Caution: bz means big zipped but not bzip2 compressed!
(bzip2 compression support only available on i386 as a tactical patch.
Not very attractive for small embedded systems though: consumes 1 MB
of RAM for decompression).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

236

Files created by make install


/boot/vmlinuz-<version>
Compressed kernel image. Same as the one in arch/<arch>/boot
/boot/System.map-<version>
Stores kernel symbol addresses
/boot/initrd-<version>.img (when used by your distribution)
Initial RAM disk, storing the modules you need to mount your root
filesystem. make install runs mkinitrd for you!
/etc/grub.conf or /etc/lilo.conf
make install updates your bootloader configuration files to support
your new kernel! It reruns /sbin/lilo if LILO is your bootloader.
Not relevant for embedded systems.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

237

Files created by make modules_install (1)


/lib/modules/<version>/: Kernel modules + extras
build/
Everything needed to build more modules for this kernel: Makefile,
.config file, module symbol information (module.symVers),
kernel headers (include/ and include/asm/)
kernel/
Module .ko (Kernel Object) files, in the same directory structure as in
the sources.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

238

Files created by make modules_install (2)


/lib/modules/<version>/ (continued)
modules.alias
Module aliases for module loading utilities. Example line:
alias sound-service-?-0 snd_mixer_oss
modules.dep
Module dependencies (see the Loadable kernel modules section)
modules.symbols
Tells which module a given symbol belongs to.
All the files in this directory are text files.
Don't hesitate to have a look by yourself!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

239

Compiling the kernel in a nutshell


Edit version information in the Makefile file
make xconfig
make
make install
make modules_install

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

240

Embedded Linux driver development

Compiling and booting Linux


Cross-compiling the kernel

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

241

Makefile changes
Update the version as usual
You should change the default target platform.
Example: ARM platform, cross-compiler command: arm-linux-gcc
ARCH
?= arm
CROSS_COMPILE ?= arm-linux(The Makefile defines later CC = $(CROSS_COMPILE)gcc)
See comments in Makefile for details

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

242

Configuring the kernel


make xconfig
Same as in native compiling.
Don't forget to set the right board / machine type!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

243

Ready-made config files


assabet_defconfig
badge4_defconfig
bast_defconfig
cerfcube_defconfig
clps7500_defconfig
ebsa110_defconfig
edb7211_defconfig
enp2611_defconfig
ep80219_defconfig
epxa10db_defconfig
footbridge_defconfig
fortunet_defconfig
h3600_defconfig
h7201_defconfig
h7202_defconfig
hackkit_defconfig

integrator_defconfig
iq31244_defconfig
iq80321_defconfig
iq80331_defconfig
iq80332_defconfig
ixdp2400_defconfig
ixdp2401_defconfig
ixdp2800_defconfig
ixdp2801_defconfig
ixp4xx_defconfig
jornada720_defconfig
lart_defconfig
lpd7a400_defconfig
lpd7a404_defconfig
lubbock_defconfig
lusl7200_defconfig

mainstone_defconfig
mx1ads_defconfig
neponset_defconfig
netwinder_defconfig
omap_h2_1610_defconfig
omnimeter_defconfig
pleb_defconfig
pxa255-idp_defconfig
rpc_defconfig
s3c2410_defconfig
shannon_defconfig
shark_defconfig
simpad_defconfig
smdk2410_defconfig
versatile_defconfig

arch/arm/configs example
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

244

Using ready-made config files


Default configuration files available for many boards / machines!
Check if one exists in arch/<arch>/configs/ for your target.
Example: if you found an acme_defconfig file, you can run:
make acme_defconfig
Using arch/<arch>/configs/ is a very good good way of
releasing a default configuration file for a group of users or developers.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

245

Cross-compiling setup
Example
If you have an ARM cross-compiling toolchain
in /usr/local/arm/3.3.2/
You just have to add it to your Unix search path:
export PATH=/usr/local/arm/3.3.2/bin:$PATH
Choosing a toolchain
See the Documentation/Changes file in the sources for details
about minimum tool versions requirements.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

246

Building the kernel


Run
make
Copy
arch/<platform>/boot/zImage
to the target storage
You can customize arch/<arch>/boot/install.sh so that
make install does this automatically for you.
make INSTALL_MOD_PATH=<dir>/ modules_install
and copy <dir>/ to /lib/modules/ on the target storage

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

247

Cross-compiling summary
Edit Makefile: set ARCH, CROSS_COMPILE and EXTRA_VERSION
Get the default configuration for your machine:
make <machine>_defconfig (if existing in arch/<arch>/configs)
Refine the configuration settings according to your requirements:
make xconfig
Add the crosscompiler path to your PATH environment variable
Compile the kernel: make
Copy the kernel image from arch/<arch>/boot/ to the target
Copy modules to a directory which you replicate on the target:
make INSTALL_MOD_PATH=<dir> modules_install

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

248

Embedded Linux driver development

Compiling and booting Linux


Overall system startup

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

249

Linux 2.4 booting sequence


Bootloader
- Executed by the hardware at a fixed location in ROM / Flash
- Initializes support for the device where the kernel image is found (local storage, network,
removable media)
- Loads the kernel image in RAM
- Executes the kernel image (with a specified command line)

Kernel
- Uncompresses itself
- Initializes the kernel core and statically compiled drivers (needed to access the root filesystem)
- Mounts the root filesystem (specified by the init kernel parameter)
- Executes the first userspace program

First userspace program


- Configures userspace and starts up system services

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

250

Linux 2.6 booting sequence


Bootloader
- Executed by the hardware at a fixed location in ROM / Flash
- Initializes support for the device where the images are found (local storage, network, removable media)
- Loads the kernel image in RAM
- Executes the kernel image (with a specified command line)

Kernel
- Uncompresses itself
- Initializes the kernel core and statically compiled drivers
- Uncompresses the initramfs cpio archive included in the kernel file cache (no mounting, no filesystem).
- If found in the initramfs, executes the first userspace program: /init

Userspace: /init script (what follows is just a typical scenario)


- Runs userspace commands to configure the device (such as network setup, mounting /proc and /sys...)
- Mounts a new root filesystem. Switch to it (switch_root)
- Runs /sbin/init (or sometimes a new /linuxrc script)

Userspace: /sbin/init
- Runs commands to configure the device (if not done yet in the initramfs)
- Starts up system services (daemons, servers) and user programs

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

251

Linux 2.6 booting sequence with initrd


Bootloader
- Executed by the hardware at a fixed location in ROM / Flash
- Initializes support for the device where the images are found (local storage, network, removable media)
- Loads the kernel and init ramdisk (initrd) images in RAM
- Executes the kernel image (with a specified command line)

Kernel
- Uncompresses itself
- Initializes statically compiled drivers
- Uncompresses the initramfs cpio archive included in the kernel. Mounts it. No /init executable found.
- So falls back to the old way of trying to locate and mount a root filesystem.
- Mounts the root filesystem specified by the init kernel parameter (initrd in our case)
- Executes the first userspace program: usually /linuxrc

Userspace: /linuxrc script in initrd (what follows is just a typical sequence)

- Runs userspace commands to configure the device (such as network setup, mounting /proc and /sys...)
- Loads kernel modules (drivers) stored in the initrd, needed to access the new root filesystem.
- Mounts the new root filesystem. Switch to it (pivot_root)
- Runs /sbin/init (or sometimes a new /linuxrc script)

Userspace: /sbin/init
- Runs commands to configure the device (if not done yet in the initrd)
- Starts up system services (daemons, servers) and user programs

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

252

Linux 2.4 booting sequence drawbacks


Trying to mount the filesystem specified
by the init kernel parameter is complex:
Need device and filesystem drivers to be loaded
Specifying the root filesystem requires ugly black magic device
naming (such as /dev/ram0, /dev/hda1...), while / doesn't
exist yet!
Can require a complex initialization to implement within the
kernel. Examples: NFS (set up an IP address, connect to the
server...), RAID (root filesystem on multiple physical drives)...
In a nutshell: too much complexity in kernel code!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

253

Extra init ramdisk drawbacks


Init ramdisks are implemented as standard block devices
Need a ramdisk and filesystem driver
Fixed in size: cannot easily grow in size.
Any free space cannot be reused by anything else.
Needs to be created and modified like any block device:
formatting, mounting, editing, unmounting.
Root permissions needed.
Like in any block device, files are first read from the storage,
and then copied to the file cache.
Slow and duplication in RAM!!!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

254

Initramfs features and advantages (1)


Root file system built in in the kernel image
(embedded as a compressed cpio archive)
Very easy to create (at kernel build time).
No need for root permissions (for mount and mknod).
Compared to init ramdisks, just 1 file to handle.
Always present in the Linux 2.6 kernel (empty by default).
Just a plain compressed cpio archive.
Neither needs a block nor a filesystem driver.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

255

Initramfs features and advantages (2)


ramfs: implemented in the file cache.
No duplication in RAM, no filesystem layer to manage.
Just uses the size of its files. Can grow if needed.
Loaded by the kernel earlier.
More initialization code moved to user-space!
Simpler to mount complex filesystems from flexible userspace
scripts rather than from rigid kernel code. More complexity
moved out to user-space!
No more magic naming of the root device.
pivot_root no longer needed.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

256

Initramfs features and advantages (3)


Possible to add non GPL files (firmware, proprietary drivers)
in the filesystem. This is not linking, just file aggregation
(not considered as a derived work by the GPL).
Possibility to remove these files when no longer needed.
Still possible to use ramdisks.
More technical details about initramfs:
see Documentation/filesystems/ramfs-rootfs-initramfs.txt
and Documentation/early-userspace/README in kernel sources.
See also http://www.linuxdevices.com/articles/AT4017834659.html for a nice
overview of initramfs (by Rob Landley, new Busybox maintainer).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

257

How to populate an initramfs


Using CONFIG_INITRAMFS_SOURCE
in kernel configuration (General Setup section)
Either specify an existing cpio archive
Or specify a list of files or directories
to be added to the archive.
Or specify a text specification file (see next page)
Can use a tiny C library: klibc
(ftp://ftp.kernel.org/pub/linux/libs/klibc/)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

258

Initramfs specification file example


dir /dev 755 0 0
nod /dev/console 644 0 0 c 5 1
nod /dev/loop0 644 0 0 b 7 0
dir /bin 755 1000 1000
slink /bin/sh busybox 777 0 0
file /bin/busybox initramfs/busybox 755 0 0
dir /proc 755 0 0
dir /sys 755 0 0
dir /mnt 755 0 0
file /init initramfs/init.sh 755 0 0
No need for root user access!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

user id

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

group id

259

How to handle compressed cpio archives


Useful when you want to build the kernel with a ready-made cpio
archive. Better let the kernel do this for you!

Extracting:
gzip -dc initramfs.img | cpio -id
Creating:
find <dir> -print -depth | cpio -ov | gzip -c >
initramfs.img

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

260

How to create an initrd


In case you really need an initrd (why?).
mkdir /mnt/initrd
dd if=/dev/zero of=initrd.img bs=1k count=2048
mkfs.ext2 -F initrd.img
mount -o loop initrd.img /mnt/initrd
Fill the ramdisk contents: busybox, modules, /linuxrc script
More details in the Free Software tools for embedded systems training!
umount /mnt/initrd
gzip --best -c initrd.img > initrd
More details on Documentation/initrd.txt in the kernel
sources! Also explains pivot rooting.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

261

Embedded Linux driver development

Compiling and booting Linux


Bootloaders

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

262

x86 bootloaders
LILO: LInux LOad. Original Linux bootloader. Still in use!
http://freshmeat.net/projects/lilo/
Supports: x86
GRUB: GRand Unified Bootloader from GNU. More powerful.
http://www.gnu.org/software/grub/
Supports: x86
SYSLINUX: Utilities for network and removable media booting
http://syslinux.zytor.com
Supports: x86

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

263

Generic bootloaders
Das U-Boot: Universal Bootloader from Denk Software
The most used on arm.
http://u-boot.sourceforge.net/
Supports: arm, ppc, mips, x86
RedBoot: eCos based bootloader from Red-Hat
http://sources.redhat.com/redboot/
Supports: x86, arm, ppc, mips, sh, m68k...
uMon: MicroMonitor general purpose, multi-OS bootloader
http://microcross.com/html/micromonitor.html
Supports: ARM, ColdFire, SH2, 68K, MIPS, PowerPC, Xscale...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

264

Other bootloaders
LAB: Linux As Bootloader, from Handhelds.org
http://handhelds.org/cgi-bin/cvsweb.cgi/linux/kernel26/lab/
Idea: use a trimmed Linux kernel with only features needed in a
bootloader (no scheduling, etc.). Reuses flash and filesystem access,
LCD interface, without having to implement bootloader specific drivers.
Supports: arm (still experimental)
And many more: lots of platforms have their own!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

265

Postprocessing kernel image for U-boot


The U-boot bootloader needs extra information to be added to
the kernel and initrd image files.
mkimage postprocessing utility provided in U-boot sources
Kernel image postprocessing:
make uImage

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

266

Postprocessing initrd image for U-boot


mkimage
-n initrd \
-A arm \
-O linux \
-T ramdisk \
-C gzip \
-d rd-ext2.gz \
uInitrd

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Name
Architecture
Operating System
Type
Compression
Input file
Output file

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

267

Compiling U-boot mkimage


If you don't have mkimage yet
Get the U-boot sources from
http://u-boot.sourceforge.net/
In the U-boot source directory:
Find the name of the config file for your board in
include/configs (for example: omap1710h3.h)
make omap1710h3_config (.h replaced by _config)
make (or make -k if you have minor failures)
cp tools/mkimage /usr/local/bin/

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

268

Embedded Linux driver development

Compiling and booting Linux


Kernel booting

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

269

Kernel command line parameters


As most C programs, the Linux kernel accepts command line
arguments
Kernel command line arguments are part of the bootloader
configuration settings.
Useful to configure the kernel at boot time, without having to
recompile it.
Useful to perform advanced kernel and driver initialization,
without having to use complex user-space scripts.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

270

Kernel command line example


HP iPAQ h2200 PDA booting example:
root=/dev/ram0 \
rw \
init=/linuxrc \
console=ttyS0,115200n8 \
console=tty0 \
ramdisk_size=8192 \
cachepolicy=writethrough

Root filesystem (first ramdisk)


Root filesystem mounting mode
First userspace program
Console (serial)
Other console (framebuffer)
Misc parameters...

Hundreds of command line parameters described on


Documentation/kernel-parameters.txt

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

271

Booting variants
XIP (Execute In Place)
The kernel image is directly executed from the storage
Can be faster and save RAM
However, the kernel image can't be compressed
No initramfs / initrd
Directly mounting the final root filesystem
(root kernel command line option)
No new root filesystem
Running the whole system from the initramfs

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

272

Usefulness of rootfs on NFS


Once networking works, your root filesystem could be a directory on
your GNU/Linux development host, exported by NFS (Network File
System). This is very convenient for system development:
Makes it very easy to update files (driver modules in particular) on
the root filesystem, without rebooting. Much faster than through the
serial port.
Can have a big root filesystem even if you don't have support for
internal or external storage yet.
The root filesystem can be huge. You can even build native compiler
tools and build all the tools you need on the target itself (better to
cross-compile though).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

273

NFS boot setup (1)


On the PC (NFS server)
Add the below line to your /etc/exports file:

/home/rootfs 192.168.0.202(rw,insecure,sync,no_wdelay,no_root_squash)

client address

NFS server options

If not running yet, you may need to start portmap


/etc/init.d/portmap start

Start or restart your NFS server:


Fedora Core: /etc/init.d/nfs restart
Debian (Knoppix, KernelKit): /etc/init.d/nfs-user-server restart

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

274

NFS boot setup (2)


On the target (NFS client)
Compile your kernel with CONFIG_NFS_FS=y
and CONFIG_ROOT_NFS=y
Boot the kernel with the below command line options:
root=/dev/nfs
virtual device
ip=192.168.1.111:192.168.1.110:192.168.1.100:255.255.255.0:at91:eth0
local IP address
server IP address
gateway
netmask
hostname device
nfsroot=192.168.1.110:/home/nfsroot
NFS server IP address Directory on the NFS server

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

275

First user-space program


Specified by the init kernel command line parameter
Executed at the end of booting by the kernel
Takes care of starting all other user-space programs
(system services and user programs).
Gets the 1 process number (pid)
Parent or ancestor of all user-space programs
The system won't let you kill it.
Only other user-space program called by the kernel:
/sbin/hotplug

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

276

/linuxrc
1 of the 2 default init programs
(if no init parameter is given to the kernel)
Traditionally used in initrds or in simple systems not using
/sbin/init.
Is most of the time a shell script, based on a very lightweight
shell: nash or busybox sh
This script can implement complex tasks: detecting drivers to
load, setting up networking, mounting partitions, switching
to a new root filesystem...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

277

The init program


/sbin/init is the second default init program
Takes care of starting system services, and eventually the user
interfaces (sshd, X server...)
Also takes care of stopping system services
Lightweight, partial implementation available through busybox
See the Init runlevels annex section for more details about starting
and stopping system services with init.
However, simple startup scripts are often sufficient
in embedded systems.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

278

Embedded Linux driver development

Compiling and booting Linux


Linux device files

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

279

Character device files


Accessed through a sequential flow of individual characters
Character devices can be identified by their c type (ls -l):
crw-rw---crw--w---crw------crw-rw-rw-

1
1
1
1

root
jdoe
root
root

uucp
4,
tty 136,
root 13,
root
1,

64
1
32
3

Feb
Feb
Feb
Feb

23
23
23
23

2004
2004
2004
2004

/dev/ttyS0
/dev/pts/1
/dev/input/mouse0
/dev/null

Example devices: keyboards, mice, parallel port, IrDA,


Bluetooth port, consoles, terminals, sound, video...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

280

Block device files


Accessed through data blocks of a given size. Blocks can be
accessed in any order.
Block devices can be identified by their b type (ls -l):
brw-rw---brw-rw---brw-rw---brw-rw---brw-------

1
1
1
1
1

root
jdoe
root
root
root

disk
floppy
disk
disk
root

3,
2,
7,
1,
8,

1
0
0
1
1

Feb
Feb
Feb
Feb
Feb

23
23
23
23
23

2004
2004
2004
2004
2004

hda1
fd0
loop0
ram1
sda1

Example devices: hard or floppy disks, ram disks, loop devices...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

281

Device major and minor numbers


As you could see in the previous examples,
device files have 2 numbers associated to them:
First number: major number
Second number: minor number
Major and minor numbers are used by the kernel to bind a driver
to the device file. Device file names don't matter to the kernel!
To find out which driver a device file corresponds to,
or when the device name is too cryptic,
see Documentation/devices.txt.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

282

Device file creation


Device files are not created when a driver is loaded.
They have to be created in advance:
mknod /dev/<device> [c|b] <major> <minor>
Examples:
mknod /dev/ttyS0 c 4 64
mknod /dev/hda1 b 3 1

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

283

Drivers without device files


They don't have any corresponding /dev entry you could read or
write through a regular Unix command.
Network drivers
They are represented by a network device such as ppp0, eth1,
usbnet, irda0 (listed by ifconfig -a)
Other drivers
Often intermediate or lowlevel drivers just interfacing with other
ones. Example: usbcore.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

284

Embedded Linux driver development

Driver development
Loadable kernel modules

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

285

Loadable kernel modules (1)


Modules: add a given functionality to the kernel (drivers,
filesystem support, and many others)
Can be loaded and unloaded at any time, only when their
functionality is need. Once loaded, have full access to the
whole kernel. No particular protection.
Useful to keep the kernel image size to the minimum
(essential in GNU/Linux distributions for PCs).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

286

Loadable kernel modules (2)


Useful to support incompatible drivers (either load one or the
other, but not both)
Useful to deliver binary-only drivers (bad idea) without
having to rebuild the kernel.
Modules make it easy to develop drivers without rebooting:
load, test, unload, rebuild, load...
Modules can also be compiled statically into the kernel.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

287

Module dependencies
Module dependencies stored in
/lib/modules/<version>/modules.dep
They don't have to be described by the module writer.
They are automatically computed during kernel building from
module exported symbols. module2 depends on module1 if
module2 uses a symbol exported by module1.
You can update the modules.dep file by running (as root)
depmod -a [<version>]

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

288

hello module
/* hello.c */
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
static int __init hello_init(void)
{
printk(KERN_ALERT "Good morrow");
printk(KERN_ALERT "to this fair assembly.\n");
return 0;
}
static void __exit hello_exit(void)
{
printk(KERN_ALERT "Alas, poor world, what treasure");
printk(KERN_ALERT "hast thou lost!\n");
}

__init:
removed after initialization
(static kernel or module).
__exit: discarded when
module compiled statically
into the kernel.

module_init(hello_init);
module_exit(hello_exit);
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Greeting module");
MODULE_AUTHOR("William Shakespeare");

Example available on http://free-electrons.com/doc/c/hello.c


Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

289

Module license usefulness


Used by kernel developers to identify issues coming from
proprietary drivers, which they can't do anything about.
Useful for users to check that their system is 100% free
Useful for GNU/Linux distributors for their release policy
checks.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

290

Possible module license strings


Available license strings explained in include/linux/module.h
GPL
GNU Public License v2 or later
GPL v2
GNU Public License v2
GPL and additional
rights

Dual BSD/GPL
GNU Public License v2 or
BSD license choice
Dual MPL/GPL
GNU Public License v2 or
Mozilla license choice
Proprietary
Non free products

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

291

Compiling a module
The below Makefile should be reusable for any Linux 2.6 module.
Just run make to build the hello.ko file
Caution: make sure there is a [Tab] character at the beginning of
the $(MAKE) line (make syntax)
# Makefile for the hello module

[Tab]!
(no spaces)

obj-m := hello.o
KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
default:
$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules

Either
- full kernel source
directory
(configured and
compiled)
- or just kernel
headers directory
(minimum needed )

Example available on http://free-electrons.com/doc/c/Makefile


Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

292

Using the module


Need to be logged as root
Load the module:
insmod ./hello.ko
You will see the following in the kernel log:
Good morrow
to this fair assembly
Now remove the module:
rmmod hello
You will see:
Alas, poor world, what treasure
hast thou lost!
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

293

Module utilities (1)


modinfo <module_name>
modinfo <module_path>.ko
Gets information about a module: parameters, license,
description. Very useful before deciding to load a module or not.
insmod <module_name>
insmod <module_path>.ko
Tries to load the given module, if needed by searching for its
.ko file throughout the default locations (can be redefined by
the MODPATH environment variable).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

294

Module utilities (2)


modprobe <module_name>
Most common usage of modprobe: tries to load all the
modules the given module depends on, and then this module.
Lots of other options are available.
lsmod
Displays the list of loaded modules
Compare its output with the contents of /proc/modules!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

295

Module utilities (3)


rmmod <module_name>
Tries to remove the given module
modprobe -r <module_name>
Tries to remove the given module and all dependent modules
(which are no longer needed after the module removal)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

296

Embedded Linux driver development

Driver development
Module parameters

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

297

hello module with parameters


/* hello_param.c */
#include <linux/init.h>
#include <linux/module.h>
#include <linux/moduleparam.h>

Thanks to
Jonathan Corbet
for the example!

MODULE_LICENSE("GPL");
/* A couple of parameters that can be passed in: how many times we say
hello, and to whom */
static char *whom = "world";
module_param(whom, charp, 0);
static int howmany = 1;
module_param(howmany, int, 0);
static int __init hello_init(void)
{
int i;
for (i = 0; i < howmany; i++)
printk(KERN_ALERT "(%d) Hello, %s\n", i, whom);
return 0;
}
static void __exit hello_exit(void)
{
printk(KERN_ALERT "Goodbye, cruel %s\n", whom);
}
module_init(hello_init);
module_exit(hello_exit);

Example available on http://free-electrons.com/doc/c/hello_param.c


Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

298

Passing module parameters


Through insmod or modprobe:
insmod ./hello_param.ko howmany=2 whom=universe
Through modprobe
after changing the /etc/modprobe.conf file:
options hello_param howmany=2 whom=universe
Through the kernel command line, when the module is built statically
into the kernel:
options hello_param.howmany=2 hello_param.whom=universe
module name
module parameter name
module parameter value
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

299

Declaring a module parameter


#include <linux/moduleparam.h>
module_param(
/* name of an already defined variable */
name,
type,
/* either byte, short, ushort, int, uint, long,
ulong, charp, bool or invbool
(checked at compile time!) */
perm
/* for /sys/module/<module_name>/<param>
0: no such module parameter value file */
);
Example
int irq=5;
module_param(irq, int, S_IRUGO);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

300

Declaring a module parameter array


#include <linux/moduleparam.h>
module_param_array(
/* name of an already defined array */
name,
type,
/* same as in module_param */
num,
/* number of elements in the array, or NULL (no check?) */
perm
/* same as in module_param */
);
Example
static int base[MAX_DEVICES] = { 0x820, 0x840 };
module_param_array(base, int, NULL, 0);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

301

Embedded Linux

Driver development
Using the proc file system interface

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

302

hello module with proc file


#include <linux/proc_fs.h>

The proc file name

#define MYNAME "driver/my_proc_file"

The proc_dir _entry


struct

static struct proc_dir_entry

*my_proc_dir = NULL;

int mymodule_proc_read(char *page, char **start, off_t off,


int count, int *eof, void *data) {
int len = 0;

len += sprintf(page + len, "io=%d\n", io);


len += sprintf(page + len, "irq=%d\n", irq);
if (len <= off+count)
*eof = 1;
*start = page + off;
len -= off;
if (len > count) len = count;
if (len < 0) len = 0;
return len;

int __init startup_mymodule(void) {


my_proc_dir = create_proc_entry(MYNAME, 0, NULL);
my_proc_dir->read_proc = mymodule_proc_read;
return 0;
}
void __exit shutdown_mymodule(void) {
remove_proc_entry(MYNAME, NULL);
}
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

Callback function
page is the buffer we
write to
Setting *eof to 1 means
end of file.
start is set to a pointer
to where we wrote
second parameters is
file permission, last
parameter is a handle
of directory.
Don't forget to check
for NULL here!

303

Some more proc details


You can also register a proc_write callback.
proc_dir_entry has a data field. The kernel does not use it, but
whatever you set there will be returned to you as the last parameter of
the callback.
The permissions (2nd) parameter of create_proc_entry is the same
as the mode flags of the open(2) system call.
0 means use the system wide defaults.
The directory handle (3rd) parameter of create_proc_entry is an
address of proc_dir_entry for a proc directory.
NULL means the proc root directory.
For large and complex files use the seq_file wrapper.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

304

Using a proc file


Once the module is loaded, you can access the registered proc file:
From the shell:
Read cat /proc/driver/my_proc_file
Write echo 123 > /proc/driver/my_proc_file
Programatically, using open(2), read(2) write(2) and related
functions.
You can't delete, move or rename a proc file.
Proc files usually don't have reported size.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

305

Embedded Linux driver development

Driver development
Adding sources to the kernel tree

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

306

New directory in kernel sources (1)


To add an acme_drivers/ directory to the kernel sources:
Move the acme_drivers/ directory to the appropriate location
in kernel sources
Create an acme_drivers/Kconfig file
Create an acme_drivers/Makefile file based on the
Kconfig variables
In the parent directory Kconfig file, add
source acme_drivers/Kconfig

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

307

New directory in kernel sources (2)


In the parent directory Makefile file, add
obj-$(CONFIG_ACME) += acme_drivers/ (just 1 condition)

or
obj-y += acme_drivers/ (several conditions)

Run make xconfig and see your new options!


Run make and your new files are compiled!
See Documentation/kbuild/ for details

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

308

How to create Linux patches


Download the latest kernel sources
Make a copy of these sources:
rsync -a linux-2.6.9-rc2/ linux-2.6.9-rc2-patch/

Thanks to Nicolas Rougier (Copyright 2003, http://webloria.loria.fr/~rougier/) for the Tux image

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

309

Embedded Linux driver development

Driver development
Memory management

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

310

Physical Memory
In ccNUMA1 machines:
The memory of each node is represented in pg_data_t
These memories are linked into pgdat_list

In uniform memory access systems:


There is just one pg_data_t named contig_page_data

If you don't know which of these is your machine, you're


using a uniform memory access system :-)
1

ccNUMA: Cache Coherent Non Uniform Memory Access

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

311

Memory Zones
Each pg_data_t is split to three zones
Each zone has different properties:
ZONE_DMA
DMA operations on address limited busses is possible

ZONE_NORMAL
Maps directly to linear addressing (<~1Gb on i386)
Always mapped to kernel space.

ZONE_HIMEM
Rest of memory.
Mapped into kernel space on demand.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

312

Physical and virtual memory


Physical address space

Virtual address spaces

0xFFFFFFFFF

0xFFFFFFFFF

0xFFFFFFFFF

I/O memory 3
Kernel

Process1

I/O memory 2
0x00000000

I/O memory 1

Flash

MMU

RAM 1

Memory
Management
Unit

RAM 0

0x00000000

CPU
0xFFFFFFFFF

All the processes have


their own virtual address
space, and run as if they
had access to the whole
address space.

Process2

0x00000000

0x00000000
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

313

3:1 Virtual Memory Map


Physical address space
0xFFFFFFFFF

Random
opportunistic
mappings

I/O memory

Virtual address spaces


Kernel
Virtual Addresses

0xFFFFFFFFF
VMALLOC_START

Kernel
Logical
Addresses

3rd Gb

PAGE_OFFSET
0xC0000000

2nd Gb

User Space
1st Gb

0x00000000
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

1:1
persistent
mapping

Zero Page
0x00000000

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

314

Address Types
Physical address
Physical memory as seen from the CPU, with out MMU1
translation.

Bus address
Physical memory as seen from device bus.
May or may not be virtualized (via IOMMU, GART, etc).

Virtual address
Memory as seen from the CPU, with MMU1 translation.
1

MMU: Memory Management Unit

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

315

Address Translation Macros


bus_to_phys(address)
phys_to_bus(address)
phys_to_virt(address)
virt_to_phys(address)
bus_to_virt(address)
virt_to_bus(address)
...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

316

The MMU
Task

Virtual Physical Permission

12

0x8000

0x5340

RWX

12

0x8001

0x1000

RX

15

0x8000

0x3390

RX

CPU

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

MMU

Memory

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

317

kmalloc and kfree


Basic allocators, kernel equivalents of glibc's malloc and free.
static inline void *kmalloc(size_t size, int flags);

size: number of bytes to allocate


flags: priority (see next page)
void kfree (const void *objp);
Example:
data = kmalloc(sizeof(*data), GFP_KERNEL);
...
kfree(data);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

318

kmalloc features
Quick (unless it's blocked waiting for memory to be freed).
Doesn't initialize the allocated area.
You can use kcalloc or kzalloc to get zeroed memory.
The allocated area is contiguous in physical RAM.
Allocates by 2n sizes, and uses a few management bytes.
So, don't ask for 1024 when you need 1000! You'd get 2048!
Caution: drivers shouldn't try to kmalloc
more than 128 KB (upper limit in some architectures).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

319

Main kmalloc flags (1)


Defined in include/linux/gfp.h (GFP: get_free_pages)
GFP_KERNEL
Standard kernel memory allocation. May block. Fine for most needs.
GFP_ATOMIC
Allocated RAM from interrupt handlers or code not triggered by user
processes. Never blocks.
GFP_USER
Allocates memory for user processes. May block. Lowest priority.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

320

Main kmalloc flags (2)


Extra flags (can be added with |)
__GFP_DMA
Allocate in DMA zone
__GFP_REPEAT
Ask to try harder. May still
block, but less likely.

__GFP_NORETRY
If allocation fails, doesn't try to
get free pages.
Example:
GFP_KERNEL | __GFP_DMA

__GFP_NOFAIL
Must not fail. Never gives up.
Caution: use only when
mandatory!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

321

Slab caches
Also called lookaside caches
Slab: name of the standard Linux memory allocator
Slab caches: Objects that can hold any number
of memory areas of the same size.
Optimum use of available RAM and reduced fragmentation.
Mainly used in Linux core subsystems: filesystems (open files, inode
and file caches...), networking... Live stats on /proc/slabinfo.
May be useful in device drivers too, though not used so often.
Linux 2.6: used by USB and SCSI drivers.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

322

Slab cache API (1)


#include <linux/slab.h>
Creating a cache:
cache = kmem_cache_create (
/* Name for /proc/slabinfo */
name,
size,
/* Cache object size */
flags,
/* Options: alignment, DMA... */
constructor, /* Optional, called after each allocation */
destructor); /* Optional, called before each release */

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

323

Slab cache API (2)


Allocating from the cache:
object = kmem_cache_alloc (cache, flags);
Freing an object:
kmem_cache_free (cache, object);
Destroying the whole cache:
kmem_cache_destroy (cache);
More details and an example in the Linux Device Drivers book:
http://lwn.net/images/pdf/LDD3/ch08.pdf

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

324

Memory pools
Useful for memory allocations that cannot fail
Kind of lookaside cache trying to keep a minimum number
of pre-allocated objects ahead of time.
Use with care: otherwise can result in a lot of unused
memory that cannot be reclaimed! Use other solutions
whenever possible.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

325

Memory pool API (1)


#include <linux/mempool.h>
Mempool creation:
mempool = mempool_create (
min_nr,
alloc_function,
free_function,
pool_data);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

326

Memory pool API (2)


Allocating objects:
object = mempool_alloc (pool, flags);
Freeing objects:
mempool_free (object, pool);
Resizing the pool:
status = mempool_resize (
pool, new_min_nr, flags);
Destroying the pool (caution: free all objects first!):
mempool_destroy (pool);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

327

Memory pool implementation


mempool_create

Call alloc
function min_nr
times

No
Call alloc
function

mempool_alloc

Success?

Take an
object from
the pool

Yes
Yes
mempool_free

pool count
< min_nr?

Add freed
object to pool

New object

No
Call free
function
on object

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

328

Memory pools using slab caches


Idea: use slab cache functions to allocate and free objects.
The mempool_alloc_slab and mempool_free_slab
functions supply a link with slab cache routines.
So, you will find many code examples looking like:
cache = kmem_cache_create (...);
pool = mempool_create (
min_nr,
mempool_alloc_slab,
mempool_free_slab,
cache);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

329

Allocating by pages
More appropriate when you need big slices of RAM:
unsigned long get_zeroed_page(int flags);
Returns a pointer to a free page and fills it up with zeros
unsigned long __get_free_page(int flags);
Same, but doesn't initialize the contents
unsigned long __get_free_pages(int flags,
unsigned long order);
Returns a pointer on a memory zone of several contiguous pages in
physical RAM.
order: log2(<number_of_pages>)
maximum: 8192 KB (MAX_ORDER=11 in linux/mmzone.h)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

330

Freeing pages
void free_page(unsigned long addr);
void free_pages(unsigned long addr,
unsigned long order);
Need to use the same order as in allocation.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

331

The Buddy System


Kernel memory page allocation follows the Buddy
System.
Free Page Frames are allocated in powers of 2:
If suitable page frame is found, allocate.
Else: seek higher order frame, allocate half, keep
buddy
When freeing page frames, coalescing occurs.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

332

Buddy System 1

16 Mb

We need 8 Mb of memory, but


don't find an exact match.
We do have a block of 16 Mb
memory though.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

333

Buddy System 2

8 Mb

8 Mb

So we'll split the 16 Mb into two 8


Mb areas.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

334

Buddy System 3

8 Mb

8 Mb

We'll use 8 Mb and keep the rest


as a free block of 8 Mb.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

335

Buddy System 4

8 Mb

8 Mb

When the allocated memory has


been freed...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

336

Buddy System 5

16 Mb
We can once again combine the
two blocks in a single 16 Mb free
block.
Because of the order of 2
allocation, it's easy to spot our
buddy.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

337

vmalloc
vmalloc can be used to obtain contiguous memory zones in virtual
address space (even if pages may not be contiguous in physical
memory).
void *vmalloc(unsigned long size);
void vfree(void *addr);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

338

Memory utilities
void * memset(void * s, int c, size_t count);
Fills a region of memory with the given value.
void * memcpy(void * dest,
const void *src,
size_t count);
Copies one area of memory to another.
Use memmove with overlapping areas.
Lots of functions equivalent to standard C library ones defined in
include/linux/string.h

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

339

Memory management - Summary


Small allocations

Bigger allocations

kmalloc, kzalloc
(and kfree!)
slab caches
memory pools

__get_free_page[s],
get_zeroed_page,
free_page[s]
vmalloc, vfree
Libc like memory utilities
memset, memcopy,
memmove...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

340

Embedded Linux driver development

Driver development
I/O memory and ports

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

341

Requesting I/O ports


/proc/ioports example
0000-001f
0020-0021
0040-0043
0050-0053
0060-006f
0070-0077
0080-008f
00a0-00a1
00c0-00df
00f0-00ff
0100-013f
0170-0177
01f0-01f7
0376-0376
0378-037a
03c0-03df
03f6-03f6
03f8-03ff
0800-087f
0800-0803
0804-0805
0808-080b
0820-0820
0828-082f
...

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

dma1
pic1
timer0
timer1
keyboard
rtc
dma page reg
pic2
dma2
fpu
pcmcia_socket0
ide1
ide0
ide1
parport0
vga+
ide0
serial
0000:00:1f.0
PM1a_EVT_BLK
PM1a_CNT_BLK
PM_TMR
PM2_CNT_BLK
GPE0_BLK

struct resource *request_region(


unsigned long start,
unsigned long len,
char *name);
Tries to reserve the given region and returns NULL if
unsuccessful. Example:
request_region(0x0170, 8, "ide1");
void release_region(
unsigned long start,
unsigned long len);
See include/linux/ioport.h and
kernel/resource.c

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

342

Reading / writing on I/O ports


The implementation of the below functions and the exact unsigned
type can vary from platform to platform!
bytes
unsigned inb(unsigned port);
void outb(unsigned char byte, unsigned port);
words
unsigned inw(unsigned port);
void outw(unsigned char byte, unsigned port);
"long" integers
unsigned inl(unsigned port);
void outl(unsigned char byte, unsigned port);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

343

Reading / writing strings on I/O ports


Often more efficient than the corresponding C loop, if the processor
supports such operations!
byte strings
void insb(unsigned port, void *addr, unsigned long count);
void outsb(unsigned port, void *addr, unsigned long count);

word strings
void insw(unsigned port, void *addr, unsigned long count);
void outsw(unsigned port, void *addr, unsigned long count);

long strings
void inbsl(unsigned port, void *addr, unsigned long count);
void outsl(unsigned port, void *addr, unsigned long count);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

344

Requesting I/O memory


/proc/iomem example
00000000-0009efff :
0009f000-0009ffff :
000a0000-000bffff :
000c0000-000cffff :
000f0000-000fffff :
00100000-3ffadfff :
00100000-0030afff
0030b000-003b4bff
3ffae000-3fffffff :
40000000-400003ff :
40001000-40001fff :
40001000-40001fff
40002000-40002fff :
40002000-40002fff
40400000-407fffff :
40800000-40bfffff :
40c00000-40ffffff :
41000000-413fffff :
a0000000-a0000fff :
a0001000-a0001fff :
e0000000-e7ffffff :
e8000000-efffffff :
e8000000-efffffff
...

Equivalent functions with the same interface


System RAM
reserved
Video RAM area
Video ROM
System ROM
System RAM
: Kernel code
: Kernel data
reserved
0000:00:1f.1
0000:02:01.0
: yenta_socket
0000:02:01.1
: yenta_socket
PCI CardBus #03
PCI CardBus #03
PCI CardBus #07
PCI CardBus #07
pcmcia_socket0
pcmcia_socket1
0000:00:00.0
PCI Bus #01
: 0000:01:00.0

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

struct resource * request_mem_region(


unsigned long start,
unsigned long len,
char *name);
void release_mem_region(
unsigned long start,
unsigned long len);

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

345

Choosing I/O ranges


I/O port and memory ranges can be passed as module
parameters. An easy way to define those parameters is through
/etc/modprobe.conf.
Modules can also try to find free ranges by themselves (making
multiple calls to request_region or
request_mem_region.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

346

Mapping I/O memory in virtual memory


To access I/O memory, drivers need to have a virtual address
that the processor can handle.
The ioremap functions satisfy this need:
#include <asm/io.h>
void *ioremap(unsigned long phys_addr,
unsigned long size);
void iounmap(void *address);
Caution: check that ioremap doesn't return a NULL address!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

347

Differences with standard memory


Reads and writes on memory can be cached
The compiler may choose to write the value in a cpu register,
and may never write it in main memory.
The compiler may decide to optimize or reorder read and
write instructions.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

348

Avoiding I/O access issues


Caching on I/O ports or memory already disabled, either by
the hardware or by Linux init code.
Memory barriers are supplied to avoid reordering

Hardware independent

Hardware dependent

#include <asm/kernel.h>
void barrier(void);

#include <asm/system.h>
void rmb(void);
void wmb(void);
Only impacts the behavior of the
compiler. Doesn't prevent reordering void mb(void);
Safe on all architectures!
in the processor!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

349

Accessing I/O memory


Directly reading from or writing to addresses returned by ioremap
(pointer dereferencing) may not work on some architectures.
Use the below functions instead. They are always portable and safe:
unsigned int ioread8(void *addr); (same for 16 and 32)
void iowrite8(u8 value, void *addr); (same for 16 and 32)

To read or write a series of values:


void ioread8_rep(void *addr, void *buf, unsigned long count);
void iowrite8_rep(void *addr, const void *buf, unsigned long count);

Other useful functions:


void memset_io(void *addr, u8 value, unsigned int count);
void memcpy_fromio(void *dest, void *source, unsigned int count);
void memcpy_toio(void *dest, void *source, unsigned int count);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

350

/dev/mem
Used to provide user-space applications with direct access to
physical addresses.
Actually only works with addresses that are non-RAM (I/O
memory) or with addresses that have some special flag set in
the kernel's data structures. Fortunately, doesn't provide
access to any address in physical RAM!
Used by applications such as the X server to write directly to
device memory.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

351

Embedded Linux driver development

Driver development
Character drivers

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

352

Usefulness of character drivers


Except for storage device drivers, most drivers for devices with
input and output flows are implemented as character drivers.
So, most drivers you will face will be character drivers
You will regret if you sleep during this part!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

353

Creating a character driver


User-space

User-space needs
The name of a device file in /dev to interact
with the device driver through regular file
operations (open, read, write, close...)

Read
buffer

Write
string

read

write

The kernel needs

major / minor

Read
handler

Write
handler

Copy from user

For a given driver, to have handlers (file


operations) to execute when user-space opens,
reads, writes or closes the device file.

/dev/foo
Copy to user

To know which driver is in charge of device


files with a given major / minor number pair

Device driver
Kernel space
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

354

Declaring a character driver


Device number registration
Need to register one or more device numbers (major / minor pairs),
depending on the number of devices managed by the driver.
Need to find free ones!
File operations registration
Need to register handler functions called when user space programs
access the device files: open, read, write, ioctl, close...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

355

Information on registered devices


Registered devices are visible in /proc/devices:
Character devices:
Block devices:
1 mem
1 ramdisk
4 /dev/vc/0
3 ide0
4 tty
8 sd
4 ttyS
9 md
5 /dev/tty
22 ide1
5 /dev/console
65 sd
5 /dev/ptmx
66 sd
6 lp
67 sd
7 vcs
68 sd
10 misc
69 sd
13 input
14 sound
...
Major
Registered

number
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Can be used to
find free major
numbers

name

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

356

dev_t structure
Kernel data structure to represent a major / minor pair
Defined in <linux/kdev_t.h>
Linux 2.6: 32 bit size (major: 12 bits, minor: 20 bits)
Macro to create the structure:
MKDEV(int major, int minor);
Macro to extract the numbers:
MAJOR(dev_t dev);
MINOR(dev_t dev);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

357

Allocating fixed device numbers


#include <linux/fs.h>
int register_chrdev_region(
dev_t from,
/* Starting device number */
unsigned count,
/* Number of device numbers */
const char *name); /* Registered name */
Returns 0 if the allocation was successful.
Example
if (register_chrdev_region(MKDEV(202, 128),
acme_count, acme)) {
printk(KERN_ERR Failed to allocate device number\n);
...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

358

Dynamic allocation of device numbers


Safer: have the kernel allocate free numbers for you!
#include <linux/fs.h>
int alloc_chrdev_region(
/* Output: starting device number */
dev_t *dev,
unsigned baseminor, /* Starting minor number, usually 0 */
unsigned count,
/* Number of device numbers */
const char *name); /* Registered name */
Returns 0 if the allocation was successful.
Example
if (alloc_chrdev_region(&acme_dev, 0, acme_count, acme)) {
printk(KERN_ERR Failed to allocate device number\n);
...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

359

Creating device files


Issue: you can no longer create /dev entries in advance!
You have to create them on the fly after loading the driver according to
the allocated major number.
Trick: the script loading the module can then use /proc/devices:
module=foo; name=foo; device=foo
rm -f /dev/$device
Caution: back quotes!
insmod $module.ko
major=`awk "\\$2==\"$name\" {print \\$1}" /proc/devices`
mknod /dev/$device c $major 0

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

360

file operations (1)


Before registering character devices, you have to define
file_operations (called fops) for the device files.
Here are the main ones:
int (*open) (
struct inode *, /* Corresponds to the device file */
struct file *); /* Corresponds to the open file descriptor */
Called when user-space opens the device file.
int (*release) (
struct inode *,
struct file *);
Called when user-space closes the file.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

361

The file structure


Is created by the kernel during the open call. Represents open files.
Pointers to this structure are usually called "fips".
mode_t f_mode;
The file opening mode (FMODE_READ and/or FMODE_WRITE)
loff_t f_pos;
Current offset in the file.
struct file_operations *f_op;
Allows to change file operations for different open files!
struct dentry *f_dentry
Useful to get access to the inode: filp->f_dentry->d_inode.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

362

file operations (2)


ssize_t (*read) (
struct file *,
/* Open file descriptor */
char *,
/* User-space buffer to fill up */
size_t,
/* Size of the user-space buffer */
loff_t *);
/* Offset in the open file */
Called when user-space reads from the device file.
ssize_t (*write) (
struct file *, /* Open file descriptor */
const char *, /* User-space buffer to write to the device */
size_t,
/* Size of the user-space buffer */
loff_t *);
/* Offset in the open file */
Called when user-space writes to the device file.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

363

Exchanging data with user-space (1)

The user-space address may be swapped out to disk


The user-space address may be invalid
(user space process trying to access unauthorized data)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

364

Exchanging data with user-space (2)


You must use dedicated functions such as the following ones in your
read and write file operations code:
include <asm/uaccess.h>
unsigned long copy_to_user (void __user *to,
const void *from,
unsigned long n);
unsigned long copy_from_user (void *to,
const void __user *from,
unsigned long n);
Make sure that these functions return 0!
Another return value would mean that they failed.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

365

file operations (3)


int (*ioctl) (struct inode *, struct file *,
unsigned int, unsigned long);
Can be used to send specific commands to the device, which are neither
reading nor writing (e.g. formatting a disk, configuration changes).
int (*mmap) (struct file *,
struct vm_area_struct);
Asking for device memory to be mapped into the address space of a user
process
struct module *owner;
Used by the kernel to keep track of who's using this structure and count
the number of users of the module.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

366

read operation example


static ssize_t
acme_read(struct file *file, char __user *buf, size_t count, loff_t * ppos)
{
/* The hwdata address corresponds to a device I/O memory area */
/* of size hwdata_size, obtained with ioremap() */
int remaining_bytes;
/* Number of bytes left to read in the open file */
remaining_bytes = min(hwdata_size - (*ppos), count);
if (remaining_bytes == 0) {
/* All read, returning 0 (End Of File) */
return 0;
}

if (copy_to_user(buf /* to */, *ppos+hwdata /* from */, remaining_bytes)) {


return -EFAULT;
} else {
/* Increase the position in the open file */
*ppos += remaining_bytes;
return remaining_bytes;
}

Read method

Piece of code available on


http://free-electrons.com/doc/c/acme_read.c

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

367

write operation example


static ssize_t
acme_write(struct file *file, const char __user *buf, size_t count, loff_t * ppos)
{
/* Assuming that hwdata corresponds to a physical address range */
/* of size hwdata_size, obtained with ioremap() */
/* Number of bytes not written yet in the device */
remaining_bytes = hwdata_size - (*ppos);
if (count > remaining_bytes) {
/* Can't write beyond the end of the device */
return -EIO;
}

if (copy_from_user(*ppos+hwdata /* to */, buf /* from */, count)) {


return -EFAULT;
} else {
/* Increase the position in the open file */
*ppos += count;
return count;
}

Write method

Piece of code available on


http://free-electrons.com/doc/c/acme_write.c

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

368

file operations definition example (3)


Defining a file_operations structure
include <linux/fs.h>
static struct file_operations acme_fops =
{
.owner = THIS_MODULE,
.read = acme_read,
.write = acme_write,
};

You just need to supply the functions you implemented!


Defaults for other functions (such as open, release...)
are fine if you do not implement anything special.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

369

Character device registration (1)


The kernel represents character drivers with a cdev structure
Declare this structure globally (within your module):
#include <linux/cdev.h>
static struct cdev *acme_cdev;
In the init function, allocate the structure and set its file operations:
acme_cdev = cdev_alloc();
acme_cdev->ops = &acme_fops;
acme_cdev->owner = THIS_MODULE;

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

370

Character device registration (2)


Then, now that your structure is ready, add it to the system:
int cdev_add(
/* Character device structure */
struct cdev *p,
dev_t dev,
/* Starting device major / minor number */
unsigned count); /* Number of devices */

Example (continued):
if (cdev_add(acme_cdev, acme_dev, acme_count)) {
printk (KERN_ERR Char driver registration failed\n);

...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

371

Character device unregistration


First delete your character device:
void cdev_del(struct cdev *p);
Then, and only then, free the device number:
void unregister_chrdev_region(dev_t from,
unsigned count);
Example (continued):
cdev_del(acme_cdev);
unregister_chrdev_region(acme_dev, acme_count);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

372

Linux error codes


Try to report errors with error numbers as accurate as possible!
Fortunately, macro names are explicit and you can remember
them quickly.
Generic error codes:
include/asm-generic/errno-base.h
Platform specific error codes:
include/asm/errno.h

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

373

Char driver example summary (1)


static void *acme_buf;
static acme_bufsize=8192;
static int acme_count=1;
static dev_t acme_dev;
static struct cdev *acme_cdev;
static ssize_t acme_write(...) {...}
static ssize_t acme_read(...) {...}
static struct file_operations acme_fops =
{
.owner = THIS_MODULE,
.read = acme_read,
.write = acme_write,
};

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

374

Char driver example summary (2)


static int __init acme_init(void)
{
acme_buf = kmalloc(acme_bufsize,
GFP_KERNEL);
if (!acme_buf) {
err = -ENOMEM;
goto err_exit;
}

if (cdev_add(acme_cdev, acme_dev,
acme_dev_count)) {
err=-ENODEV;
goto err_free_cdev;
}
return 0;

if (alloc_chrdev_region(&acme_dev, 0,
acme_count, acme)) {
err=-ENODEV;
goto err_free_buf;
}
acme_cdev = cdev_alloc();
if (!acme_cdev) {
err=-ENOMEM;
goto err_dev_unregister;
}
acme_cdev->ops = &acme_fops;
acme_cdev->owner = THIS_MODULE;

err_free_cdev:
kfree(acme_cdev);
err_dev_unregister:
unregister_chrdev_region(
acme_dev, acme_count);
err_free_buf:
kfree(acme_buf);
err_exit:
return err;
}
static void __exit acme_exit(void)
{
cdev_del(acme_cdev);
unregister_chrdev_region(acme_dev,
acme_count);
kfree(acme_buf);
}

Show how to handle errors and deallocate resources in the right order!
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

375

Character driver summary


- Define the file operations callbacks for the device file: read, write, ioctl...
- In the module init function, get major and minor numbers with alloc_chrdev_region(),
init a cdev structure with your file operations and add it to the system with cdev_add().
- In the module exit function, call cdev_del() and unregister_chrdev_region()

Kernel

Character driver writer

- Load the character driver module


- In /proc/devices, find the major number it uses.
- Create the device file with this major number
The device file is ready to use!

System user
- Open the device file, read, write, or send ioctl's to it.

User-space

System administration

Kernel

Kernel

- Executes the corresponding file operations

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

376

Embedded Linux driver development

Driver development
Debugging

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

377

Usefulness of a serial port


Most processors feature a serial port interface (usually very
well supported by Linux). Just need this interface to be
connected to the outside.
Easy way of getting the first messages of an early kernel
version, even before it boots. A minimum kernel with only
serial port support is enough.
Once the kernel is fixed and has completed booting, possible
to access a serial console and issue commands.
The serial port can also be used to transfer files to the target.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

378

When you don't have a serial port


On the host
Not an issue. You can get a USB to serial converter. Usually very
well supported on Linux and roughly costs $20. The device appears
as /dev/ttyUSB0 on the host.
On the target
Check whether you have an IrDA port. It's usually a serial port too.
If you have an Ethernet adapter, try with it
You may also try to manually hook-up the processor serial interface
(check the electrical specifications first!)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

379

Debugging with printk


Universal debugging technique used since the beginning of
programming (first found in cavemen drawings)
Printed or not in the console or /var/log/messages
according to the priority. This is controlled by the loglevel
kernel parameter, or through /proc/sys/kernel/printk
(see Documentation/sysctl/kernel.txt)
Available priorities (include/linux/kernel.h):
#define
#define
#define
#define
#define
#define
#define
#define

KERN_EMERG
KERN_ALERT
KERN_CRIT
KERN_ERR
KERN_WARNING
KERN_NOTICE
KERN_INFO
KERN_DEBUG

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

"<0>"
"<1>"
"<2>"
"<3>"
"<4>"
"<5>"
"<6>"
"<7>"

/*
/*
/*
/*
/*
/*
/*
/*

system is unusable */
action must be taken immediately */
critical conditions */
error conditions */
warning conditions */
normal but significant condition */
informational */
debug-level messages */

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

380

Debugging with /proc or /sys (1)


Instead of dumping messages in the kernel log, you can have your
drivers make information available to user space
Through a file in /proc or /sys, which contents are handled by
callbacks defined and registered by your driver.
Can be used to show any piece of information
about your device or driver.
Can also be used to send data to the driver or to control it.
Caution: anybody can use these files.
You should remove your debugging interface in production!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

381

Debugging with /proc or /sys (2)


Examples
cat /proc/acme/stats (dummy example)
Displays statistics about your acme driver.
cat /proc/acme/globals (dummy example)
Displays values of global variables used by your driver.
echo 600000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed

Adjusts the speed of the CPU (controlled by the cpufreq driver).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

382

Debugging with ioctl


Can use the ioctl() system call to query information
about your driver (or device) or send commands to it.
This calls the ioctl file operation that you can register in
your driver.
Advantage: your debugging interface is not public.
You could even leave it when your system (or its driver) is in
the hands of its users.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

383

Debugging with gdb


Schrdinger penguin principle.
If you execute the kernel from a debugger on the same machine,
this will interfere with the kernel behavior.
However, you can access the current kernel state with gdb:
gdb /usr/src/linux/vmlinux /proc/kcore
uncompressed kernel
kernel address space
You can access kernel structures, follow pointers... (read only!)
Requires the kernel to be compiled with CONFIG_DEBUG_INFO
(Kernel hacking section)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

384

kgdb kernel patch


http://kgdb.linsyssoft.com/
The execution of the patched kernel is fully controlled by
gdb from another machine, connected through a serial line.
Can do almost everything, including inserting breakpoints in
interrupt handlers.
Supported architectures: i386, x86_64, ppc and s390.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

385

Kernel crash analysis with kexec


kexec system call: makes it possible to
call a new kernel, without rebooting and
going through the BIOS / firmware.
Idea: after a kernel panic, make the
kernel automatically execute a new,
clean kernel from a reserved location in
RAM, to perform post-mortem analysis
of the memory of the crashed kernel.

1. Copy debug
kernel to
reserved
RAM
3. Analyze
crashed
kernel RAM

See Documentation/kdump/kdump.txt
in the kernel sources for details.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

Standard kernel
2. kernel
panic, kexec
debug kernel
Debug kernel

Regular RAM

386

Kernel Oops
Caused by an exception in the kernel code itself.
The kernel issues a diagnostic messaged, called an Oops!.
The message contains debug information, such as register
content, stack trace, code dump etc.
After the message is sent to the log and console From within a task the kernel kills the task.
From interrupt the kernel panics and may reboot
automatically if given a-priori the panic=secs parameter.

In 2.4, stack trace addresses can be translated via the


ksymoops utility. In 2.6, with the CONFIG_KALLSYMS
option on, the kernel does translation internally.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

387

Kernel Oops Example


Unable to handle kernel NULL pointer dereference at virtual address
00000014
printing eip: d8d4d545
*pde = 00000000
Oops: 0000
CPU:
0
EIP:
0010:[<d8d4d545>]
Tainted: PF
EFLAGS: 00013286
eax: c8b078c0
ebx: c8b078c0
ecx: 00000019
edx: c8b078c0
esi: 00000000
edi: 00000000
ebp: bffff75c
esp: c403bf84
ds: 0018
es: 0018
ss: 0018
Process vmware (pid: 5194, stackpage=c403b000)
Stack: c8b078c0 00000000 bffff35c 00000000 c0136d64 c403bfa4 c8b078c0
c0130185
Call Trace: [sys_stat64+100/112] [filp_close+53/112]
[sys_close+67/96] [system_call+51/56]
Code: f6 46 14 01 74 1c 83 c4 f4 8b 06 50 e8 da 62 43 e7 83 c4 f8

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

388

Kernel debugging tips


If your kernel doesn't boot yet or hangs without any message, it
can help to activate Low Level debugging
(Kernel Hacking section, only available on arm):
CONFIG_DEBUG_LL=y
More about kernel debugging in the free
Linux Device Drivers book (References section)!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

389

Embedded Linux driver development

Driver development
Concurrent access to resources

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

390

Sources of concurrency issues


The same resources can be accessed by several kernel processes in
parallel, causing potential concurrency issues
Several user-space programs accessing the same device data or
hardware. Several kernel processes could execute the same code on
behalf of user processes running in parallel.
Multiprocessing: the same driver code can be running on another
processor. This can also happen with single CPUs with hyperthreading.
Kernel preemption, interrupts: kernel code can be interrupted at any
time (just a few exceptions), and the same data may be access by another
process before the execution continues.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

391

Avoiding concurrency issues


Avoid using global variables and shared data whenever possible
(cannot be done with hardware resources)
Don't make resources available to other kernel processes until
they are ready to be used.
Use techniques to manage concurrent access to resources.
See Rusty Russell's Unreliable Guide To Locking
Documentation/DocBook/kernel-locking/
in the kernel sources.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

392

Concurrency protection with semaphores

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

393

Kernel semaphores
Also called mutexes (Mutual Exclusion)

1
(free)
P
(down)

P: Probeer
Try (to decrement) in Dutch

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

V: Verhoog
Increment in Dutch

V
(up)
0
(locked)

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

394

Initializing a semaphore
Statically
DECLARE_MUTEX(name);
DECLARE_MUTEX_LOCKED(name);
Dynamically
void init_MUTEX(struct semaphore *sem);
void init_MUTEX_LOCKED(struct semaphore *sem);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

395

locking and unlocking semaphores


void down (struct semaphore *sem);
Decrements the semaphore if set to 1, waits otherwise.
Caution: can't be interrupted, causing processes you cannot kill!
int down_interruptible (struct semaphore *sem);
Same, but can be interrupted. If interrupted, returns a non zero value
and doesn't hold the semaphore. Test the return value!!!
int down_trylock (struct semaphore *sem);
Never waits. Returns a non zero value if the semaphore is not
available.
void up (struct semaphore *sem);
Releases the semaphore. Make sure you do it as soon as possible!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

396

Reader / writer semaphores


Allow shared access by unlimited readers, or by only 1 writer. Writers get priority.
void init_rwsem (struct rw_semaphore *sem);
void down_read (struct rw_semaphore *sem);
int down_read_trylock (struct rw_semaphore *sem);
int up_read (struct rw_semaphore *sem);
void down_write (struct rw_semaphore *sem);
int down_write_trylock (struct rw_semaphore *sem);
int up_write (struct rw_semaphore *sem);
Well suited for rare writes, holding the semaphore briefly. Otherwise, readers get
starved, waiting too long for the semaphore to be released.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

397

When to use semaphores


Before and after accessing shared resources
Before and after making other resources available to other
parts of the kernel or to user-space (typically and module
initialization).
In situations when sleeping is allowed.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

398

Spinlocks
Locks to be used for code that can't sleep (critical sections,
interrupt handlers... Be very careful not to call functions which
can sleep!
Still locked?

Intended for multiprocessor systems


Spinlocks are not interruptible,
don't sleep and keep spinning in a loop
until the lock is available.

Spinlock

Spinlocks cause kernel preemption to be disabled on the CPU


executing them.
May require interrupts to be disabled too.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

399

Initializing spinlocks
Static
spinlock_t my_lock = SPIN_LOCK_UNLOCKED;
Dynamic
void spin_lock_init (spinlock_t *lock);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

400

Using spinlocks
void spin_[un]lock (spin_lock_t *lock);
void spin_[un]lock_irqsave (spin_lock_t *lock,
unsigned long flags);
Disables IRQs on the local CPU
void spin_[un]lock_irq (spin_lock_t *lock);
Disables IRQs without saving flags. When you're sure that nobody
already disabled interrupts.
void spin_[un]lock_bh (spin_lock_t *lock);
Disables software interrupts, but not hardware ones
Note that reader / writer spinlocks also exist.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

401

Deadlock situations
They can lock up your system. Make sure they never happen!
Don't call a function that can try
to get access to the same lock

Holding multiple locks is risky!

Get lock1
Get lock1

Get lock2

call
Get lock2

Dead
Lock!

Get lock1

Wait for lock1


Dead
Lock!
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

402

Alternatives to locking
As we have just seen, locking can have a strong negative impact on
system performance. In some situations, you could do without it.
By using lock-free algorithms like Read Copy Update (RCU).
RCU API available in the kernel
(See http://en.wikipedia.org/wiki/RCU).
When available, use atomic operations.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

403

Atomic variables
Useful when the shared resource is an
integer value
Even an instruction like n++ is not
guaranteed to be atomic on all processors!
Header

Operations without return value:


void
void
void
void

atomic_inc
atomic_dec
atomic_add
atomic_sub

(atomic_t *v);
(atomic_ *v);
(int i, atomic_t *v);
(int i, atomic_t *v);

Simular functions testing the result:

#include <asm/atomic.h>
Type
atomic_t
contains a signed integer (at least 24 bits)

Atomic operations (main ones)

int atomic_inc_and_test (...);


int atomic_dec_and_test (...);
int atomic_sub_and_test (...);

Functions returning the new value:


int
int
int
int

atomic_inc_and_return
atomic_dec_and_return
atomic_add_and_return
atomic_sub_and_return

(...);
(...);
(...);
(...);

Set or read the counter:


atomic_set (atomic_t *v, int i);
int atomic_read (atomic_t *v);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

404

Atomic bit operations


Supply very fast, atomic operations
On most platforms, apply to an unsigned long type.
Apply to a void type on a few others.
Set, clear, toggle a given bit:
void set_bit(int nr, unsigned long * addr);
void clear_bit(int nr, unsigned long * addr);
void change_bit(int nr, unsigned long * addr);
Test bit value:
int test_bit(int nr, unsigned long *addr);
Test and modify (return the previous value):
int test_and_set_bit (...);
int test_and_clear_bit (...);
int test_and_change_bit (...);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

405

Embedded Linux Driver Development

Driver development
Processes and scheduling

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

406

Processes and Threads


A process is an instance of a running program
Multiple instances of the same program can be running.
Program code (text section) memory is shared.
Each process has its own data section, address space, open
files and signal handlers.

A thread is a single task in a program


It belongs to a process and shares the common data section,
address space, open files and pending signals.
It has it's own stack, pending signals and state.

It's common to refer to single threaded programs as


processes.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

407

The Kernel and Threads


The 2.4 kernel did not have a notion of threads.
All threads were implemented as processes that happen to
share the same address space, file system resources, file
descriptors and signal handlers as their parent process.

In 2.6 an explicit notion of processes and threads was


introduced to the kernel.
Scheduling is still done on a thread by thread basis.
The basic object the kernel works with is a task, which is
analogous to a thread.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

408

task_struct
Each task is represented by a task_struct.
The task is linked in the task tree via:
parent Pointer to it's parent
children

A linked list

sibling

A linked list

task_struct contains a pid field


pid is mapped to task_struct pointer via a hash table

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

409

Task Identifiers
Each task_struct has the following identities:
PID

Globally unique. Different one for each thread.

TGID Thread Group Id. Returned to user space as


getpid()
Shared by all threads of a process.
For single threaded process == PID.

PGID Proccess Group Id. (Posix.1).


SID

Session Id (Posix.1).

current points to the current process task_struct


When applicable not valid in interrupt context.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

410

A process life
TASK_ZOMBIE

Parent process
Calls fork()
and creates
a new process

The process is elected


by the scheduler

Task terminated but its


resources are not freed yet.
Waiting for its parent
to acknowledge its death.

TASK_RUNNING

TASK_RUNNING

Ready but
not running

The event occurs


or the process receives
a signal. Process becomes
runnable again

Actually running

The process is preempted


by to scheduler to run
a higher priority task

TASK_INTERRUPTIBLE
or TASK_UNINTERRUPTIBLE

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Decides to sleep
on a wait queue
for a specific event

Waiting

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

411

Process context
User space programs and system calls are scheduled together

Process continuing in user space...


(or replaced by a higher priority process)
(can be preempted)

Process executing in user space...


(can be preempted)
System call
or exception

Kernel code executed


on behalf of user space
(can be preempted too!)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Still has access to process


data (open files...)

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

412

Kernel threads
The kernel does not only react from user-space (system calls, exceptions)
or hardware events (interrupts). It also runs its own processes.
Kernel space are standard processes scheduled and preempted in the same
way (you can view them with top or ps!) They just have no special
address space and usually run forever.
Kernel thread examples:
pdflush: regularly flushes dirty memory pages to disk (file
changes not committed to disk yet).
ksoftirqd: manages soft irqs.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

413

Process priorities
Regular processes
Priorities from -20 (maximum) to 19 (minimum)
Only root can set negative priorities
(root can give a negative priority to a regular user process)
Use the nice command to run a job with a given priority:
nice -n <priority> <command>
Use the renice command to change a process priority:
renice <priority> -p <pid>

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

414

Real-time processes
Real-time processes can be started by root using the POSIX API
Available through <sched.h> (see man sched.h for details)
100 real-time priorities available
SCHED_FIFO scheduling class:
The process runs until completion unless it is blocked by an I/O, voluntarily
relinquishes the CPU, or is preempted by a higher priority process.
SCHED_RR scheduling class:
Difference: the processes are scheduled in a Round Robin way.
Each process is run until it exhausts a max time quantum. Then other
processes with the same priority are run, and so and so...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

415

Timer frequency
Timer interrupts are raised every HZ th of second (= 1 jiffy)
HZ is now configurable (in Processor type and features):
100, 250 (i386 default) or 1000.
Supported on i386, ia64, ppc, ppc64, sparc64, x86_64
See kernel/Kconfig.hz.
Compromise between system responsiveness and global throughput.
Caution: not any value can be used. Constraints apply!
Another idea is to completely turn off CPU timer interrupts when the
system is idle (dynamic tick): see http://muru.com/linux/dyntick.
This saves power. Supports arm and i386 so far.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

416

O(1) scheduler
The kernel maintains 2 priority arrays:
the active and the expired array.
Each array contains 140 entries (100 real-time priorities + 40
regular ones), 1 for each priority, each containing a list of
processes with the same priority.
The arrays are implemented in a way that makes it possible
to pick a process with the highest priority in constant time
(whatever the number of running processes).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

417

Choosing and expiring processes


The scheduler finds the highest process priority
It executes the first process in the priority queue for this
priority.
Once the process has exhausted its timeslice, it is moved to
the expired array.
The scheduler gets back to selecting another process with the
highest priority available, and so on...
Once the active array is empty, the 2 arrays are swapped!
Again, everything is done in constant time!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

418

When is scheduling run?


Each process has a need_resched flag which is set:
After a process exhausted its timeslice.
After a process with a higher priority is awakened.
This flag is checked (possibly causing the execution of the scheduler)
When returning to user-space from a system call
When returning from an interrupt handler (including the cpu timer)
Scheduling also happens when kernel code explicitely runs
schedule() or executes an action that sleeps.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

419

Timeslices
The scheduler also prioritizes high priority processes by giving
them a bigger timeslice.
Initial process timeslice: parent's timeslice split in 2
(otherwise process would cheat by forking).
Minimum priority: 5 ms or 1 jiffie (whichever is larger)
Default priority in jiffies: 100 ms
Maximum priority: 800 ms
Note: actually depends on HZ.
See kernel/sched.c for details.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

420

Dynamic priorities
Only applies to regular processes
For a better user experience, the Linux scheduler boots the priority
of interactive processes (processes which spend most of their time
sleeping, and take time to exhaust their timeslices). Such
processes often sleep but need to respond quickly after waking up
(example: word processor waiting for key presses).
Priority bonus: up to 5 points.
Conversely, the Linux scheduler reduces the priority of compute
intensive tasks (which quickly exhaust their timeslices).
Priority penalty: up to 5 points.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

421

Embedded Linux driver development

Driver development
Sleeping

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

422

How to sleep (1)


Sleeping is needed when a user process is waiting for data which
are not ready yet. The process then puts itself in a waiting queue.
Static queue declaration
DECLARE_WAIT_QUEUE_HEAD (module_queue);
Dynamic queue declaration
wait_que_head_t queue;
init_waitqueue_head(&queue);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

423

How to sleep (2)


Several ways to make a kernel process sleep
wait_event(queue, condition);
Sleeps until the given boolean expression is true.
Caution: can't be interrupted (i.e. by killing the client process in user-space)
wait_event_interruptible(queue, condition);
Can be interrupted
wait_event_timeout(queue, condition, timeout);
Sleeps and automatically wakes up after the given timeout.
wait_event_interruptible_timeout(queue, condition, timeout);

Same as above, interruptible.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

424

Waking up!
Typically done by interrupt handlers when data sleeping
processes are waiting for are available.
wake_up(&queue);
Wakes up all the waiting processes on the given queue
wake_up_interruptible(&queue);
Does the same job. Usually called when processes waited
using wait_event_interruptible.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

425

Embedded Linux driver development

Driver development
Interrupt management

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

426

Need for interrupts


Internal processor interrupts used by the processor, for
example for multi-task scheduling.
External interrupts needed because most internal and external
devices are slower than the processor. Better not keep the
processor waiting for input data to be ready or data to be
output. When the device is ready again, it sends an interrupt
to get the processor attention again.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

427

Interrupt handler constraints


Not run from a user context:
Can't transfer data to and from user space
(need to be done by system call handlers)
Interrupt handler execution is managed by the CPU, not by
the scheduler. Handlers can't run actions that may sleep,
because there is nothing to resume their execution.
In particular, need to allocate memory with GFP_ATOMIC
Have to complete their job quickly enough:
they shouldn't block their interrupt line for too long.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

428

Registering an interrupt handler (1)


Defined in include/linux/interrupt.h
int request_irq(
Requested irq channel
unsigned int irq,
irqreturn_t (*handler) (...),
Interrupt handler
unsigned long irq_flags,
Option mask (see next page)
const char * devname,
Registered name
void *dev_id);
Pointer to some handler data
Cannot be NULL and must be unique for shared irqs!
void free_irq( unsigned int irq, void *dev_id);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

429

Registering an interrupt handler (2)


irq_flags bit values (can be combined, none is fine too)
SA_INTERRUPT
"Quick" interrupt handler. Run with all interrupts disabled on the current cpu.
Shouldn't need to be used except in specific cases (such as timer interrupts)
SA_SHIRQ
Run with interrupts disabled only on the current irq line and on the local cpu.
The interrupt channel can be shared by several devices.
Requires a hardware status register telling whether an IRQ was raised or not.
SA_SAMPLE_RANDOM
Interrupts can be used to contribute to the system entropy pool used by
/dev/random and /dev/urandom. Useful to generate good random
numbers. Don't use this if the interrupt behavior of your device is predictable!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

430

When to register the handler


Either at driver initialization time:
consumes lots of IRQ channels!
Or at device open time (first call to the open file operation):
better for saving free IRQ channels.
Need to count the number of times the device is opened, to
be able to free the IRQ channel when the device is no longer
in use.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

431

Information on installed handlers


/proc/interrupts
CPU0
0:
5616905
XT-PIC timer # Registered name
1:
9828
XT-PIC i8042
2:
0
XT-PIC cascade
3:
1014243
XT-PIC orinoco_cs
7:
184
XT-PIC Intel 82801DB-ICH4
8:
1
XT-PIC rtc
9:
2
XT-PIC acpi
11:
566583
XT-PIC ehci_hcd, uhci_hcd,
uhci_hcd, uhci_hcd, yenta, yenta, radeon@PCI:1:0:0
12:
5466
XT-PIC i8042
14:
121043
XT-PIC ide0
15:
200888
XT-PIC ide1
# Non Maskable Interrupts
NMI:
0
ERR:
0
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

432

Total number of interrupts


cat /proc/stat | grep intr
intr 8190767 6092967 10377 0 1102775 5 2 0 196 ...
Total number
of interrupts

IRQ1
total

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

IRQ2 IRQ3
total
...

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

433

Interrupt channel detection (1)


Useful when a driver can be used in different machines / architectures
Some devices announce their IRQ channel in a register
Manual detection
Register your interrupt handler for all possible channels
Ask for an interrupt
Let the called interrupt handler store the IRQ number in a global
variable.
Try again if no interrupt was received
Unregister unused interrupt handlers.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

434

Interrupt channel detection (2)


Kernel detection utilities
mask = probe_irq_on();
Activate interrupts on the device
Deactivate interrupts on the device
irq = probe_irq_off(mask);
> 0: unique IRQ number found
= 0: no interrupt. Try again!
< 0: several interrupts happened. Try again!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

435

The interrupt handler's job


Acknowledge the interrupt to the device
(otherwise no more interrupts will be generated)
Read/write data from/to the device
Wake up any waiting process waiting for the completion of
this read/write operation:
wake_up_interruptible(&module_queue);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

436

Interrupt handler prototype


irqreturn_t (*handler) (
int,
void *dev_id,

struct pt_regs *regs

/* irq number */
/* Pointer used to keep track of the
corresponding device. Useful
when several devices are
managed by the same module */
/* cpu register snapshot, rarely
needed*/

);
Return value:
IRQ_HANDLED: recognized and handled interrupt
IRQ_NONE: not on a device managed by the module. Useful to share interrupt
channels and/or report spurious interrupts to the kernel.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

437

Top half and bottom half processing (1)


Top half: the interrupt handler must complete as quickly as
possible. Once it acknowledged the interrupt, it just
schedules the lengthy rest of the job taking care of the data,
for a later execution.
Bottom half: completing the rest of the interrupt handler job.
Handles data, and then wakes up any waiting user process.
Best implemented by softirqs, tasklets, timers or work
queues.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

438

Linux Contexts
Interrupt Handlers
Interrupt
Context

SoftIRQs
Regular
tasklets

Timers

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Net
Stack

Hi prio
tasklets

Scheduling
Points

...

Kernel
Space

User Context

Process
Thread

User Space

Kernel
Thread

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

439

Softirq
A fixed set (max 32) of software interrupts (prioritized):
HI_SOFTIRQ

Runs low latency tasklets

TIMER_SOFTIRQ

Runs timers

NET_TX_SOFTIRQ

Network stack Tx

NET_RX_SOFTIRQ

Network stack Rx

SCSI_SOFTIRQ

SCSI sub system

TASKLET_SOFTIRQ

Runs normal tasklets

Activated on return from interrupt (in do_IRQ())


Can run concurrently on SMP systems (even the same
softirq).
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

440

Tasklets
Added in 2.4
Are run from softirqs (normal or low-latency)
Each tasklet runs only on a single CPU (serialization)
You can initialize a tasklet via:
init tasklet_init (struct tasklet_struct *t
void (*func)(unsigned long),
unsigned long data));

Or declare the tasklet in the module source file:


DECLARE_TASKLET (module_tasklet,
/* name */
module_do_tasklet, /* function */
0
/* data */);
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

441

Tasklet scheduling and killing


Scheduling a tasklet in the top half part (interrupt handler):
For regular tasklets:
tasklet_schedule(&module_tasklet);
Or for low latency tasklets (runs first):
tasklet_hi_schedule.
If this tasklet was already scheduled it is run only once.
If this tasklet was already running it is rescheduled for later.
On module exit, the tasklet should be killed:
tasklet_kill(&module_tasklet);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

442

Tasklet Masking
Tasklets may be temporarily disabled/enabled:
tasklet_enable(&module_tasklet);
tasklet_disable(&module_tasklet);
tasklet_hi_enable(&module_tasklet);
tasklet_disable_nosync(&module_tasklet);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

443

Timers
Runs via softirq like tasklets
But at a specific time
A timer is represented by a timer_list:
struct timer_list {
/* ... */
unsigned long expires;
/* In Jiffies */
void (*function )(unsigned int);
/* Optional */
unsigned long data;
);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

444

Timer operations
Manipulated with:
void init_timer(struct timer_list
void add_timer(struct timer_list

*timer);
*timer);

void init_timer_on(struct timer_list *timer, int


cpu);
void del_timer(struct timer_list

*timer);

void del_timer_sync(struct timer_list

*timer);

void mod_timer(struct timer_list *timer, unsigned


long expires);
void timer_pending(const struct timer_list
*timer);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

445

Work Queues (2.6 only)


Each work queue has a kernel thread (task) per cpu.
Since 2.6.6 also a single threaded version exists.

Code in work queue:


Has a process context.
May sleep.

New work queues may be created/destroyed via:


struct workqueue_struct *create_workqueue(const
char * name);
struct workqueue_struct
*create_singlethread_workqueue(const char * name);
void destroy_workqueue(const char * name);
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

446

Working the Workqueue


Work is delivered to a workqueue via:
DECLARE_WORK(work, func, data);
INIT_WORK(work, func, data);
int queue_work(struct workqueue_struct *wq, struct
work_struct *work);
int queue_delayed_work(struct workqueue_struct
*wq, struct work_struct *work, unsigned long
delay);
int flush_workqueue(struct workqueue_struct *wq);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

447

The Default Workqueue


One default work queue is run by the keventd kernel thread
For keventd, we have the more common:
int schedule_work(struct work_struct *work);
int schedule_delayed_work(struct work_struct
*work, unsigned long delay);
int cancel_delayed_work(struct work_struct *work);
int flush_scheduled_work(void);
int current_is_keventd(void);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

448

Disabling interrupts
May be useful in regular driver code...
Can be useful to ensure that an interrupt handler will not preempt your
code (including kernel preemption)
Disabling interrupts on the local CPU:
unsigned long flags;
// Interrupts disabled
local_irq_save(flags);
...
local_irq_restore(flags); // Interrupts restored to their previous state.

Note: must be run from within the same function!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

449

Masking out an interrupt line


Useful to disable interrupts on a particular device
void disable_irq (unsigned int irq);
Disables the irq line for all processors in the system.
Waits for all currently executing handlers to complete.
void disable_irq_nosync (unsigned int irq);
Same, except it doesn't wait for handlers to complete.
void enable_irq (unsigned int irq);
Restores interrupts on the irq line.
void synchronize_irq (unsigned int irq);
Waits for irq handlers to complete (if any).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

450

Checking interrupt status


Can be useful for code which can be run from both process or
interrupt context, to know whether it is allowed or not to call
code that may sleep.
irqs_disabled()
Tests whether local interrupt delivery is disabled.
in_interrupt()
Tests whether code is running in interrupt context
in_irq()
Tests whether code is running in an interrupt handler.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

451

Interrupt management fun


In a training lab, somebody forgot to unregister a handler on
a shared interrupt line in the module exit function.

In a training lab, somebody freed the timer interrupt handler


by mistake (using the wrong irq number). The system froze.
Remember the kernel is not protected against itself!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

452

Interrupt management summary


Device driver

Tasklet

When the device file is first open,


register an interrupt handler for the
device's interrupt channel.
Interrupt handler

Wake up processes waiting for


the data
Device driver

Called when an interrupt is raised.


Acknowledge the interrupt
If needed, schedule a tasklet taking
care of handling data. Otherwise,
wake up processes waiting for the
data.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Process the data

When the device is no longer


opened by any process,
unregister the interrupt handler.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

453

Embedded Linux driver development

Driver development
mmap

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

454

mmap (1)
Possibility to have parts of the virtual address space of a program
mapped to the contents of a file!
> cat /proc/1/maps (init process)
start
end
perm offset
major:minor inode
00771000-0077f000 r-xp 00000000 03:05 1165839
0077f000-00781000 rw-p 0000d000 03:05 1165839
0097d000-00992000 r-xp 00000000 03:05 1158767
00992000-00993000 r--p 00014000 03:05 1158767
00993000-00994000 rw-p 00015000 03:05 1158767
00996000-00aac000 r-xp 00000000 03:05 1158770
00aac000-00aad000 r--p 00116000 03:05 1158770
00aad000-00ab0000 rw-p 00117000 03:05 1158770
00ab0000-00ab2000 rw-p 00ab0000 00:00 0
08048000-08050000 r-xp 00000000 03:05 571452
08050000-08051000 rw-p 00008000 03:05 571452
08b43000-08b64000 rw-p 08b43000 00:00 0
f6fdf000-f6fe0000 rw-p f6fdf000 00:00 0
fefd4000-ff000000 rw-p fefd4000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

mapped file name


/lib/libselinux.so.1
/lib/libselinux.so.1
/lib/ld-2.3.3.so
/lib/ld-2.3.3.so
/lib/ld-2.3.3.so
/lib/tls/libc-2.3.3.so
/lib/tls/libc-2.3.3.so
/lib/tls/libc-2.3.3.so
/sbin/init (text)
/sbin/init (data, stack)

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

455

mmap (2)
Particularly useful when the file is a device file!
Allows to access device I/O memory and ports without having to go
through (expensive) read, write or ioctl calls!
X server example (maps excerpt)
start
end
08047000-081be000
081be000-081f0000
...
f4e08000-f4f09000
f4f09000-f4f0b000
f4f0b000-f6f0b000
f6f0b000-f6f8b000

perm offset
major:minor inode
r-xp 00000000 03:05 310295
rw-p 00176000 03:05 310295

mapped file name


/usr/X11R6/bin/Xorg
/usr/X11R6/bin/Xorg

rw-s
rw-s
rw-s
rw-s

/dev/dri/card0
/dev/dri/card0
/dev/mem
/dev/mem

e0000000
4281a000
e8000000
fcff0000

03:05
03:05
03:05
03:05

655295
655295
652822
652822

A more user friendly way to get such information: pmap <pid>

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

456

How to implement mmap - User space


Open the device file
Call the mmap system call (see man mmap for details):
void * mmap(
void *start,
/* Often 0, preferred starting address */
size_t length, /* Length of the mapped area */
int prot ,
/* Permissions: read, write, execute */
int flags,
/* Options: shared mapping, private copy... */
int fd,
/* Open file descriptor */
off_t offset
/* Offset in the file */
);
Read from the return virtual address or write to it.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

457

How to implement mmap - Kernel space


Character driver: implement a mmap file operation
and add it to the driver file operations:
int (*mmap) (
struct file *,
struct vm_area_struct
);

/* Open file structure */


/* Kernel VMA structure */

Initialize the mapping.


Can be done in most cases with the remap_pfn_range()
function, which takes care of most of the job.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

458

remap_pfn_range()
pfn: page frame number
The most significant bits of the page address
(without the bits corresponding to the page size).
#include <linux/mm.h>
int remap_pfn_range(
struct vm_area_struct *,
/* VMA struct */
unsigned long virt_addr, /* Starting user virtual address */
unsigned long pfn,
/* pfn of the starting physical address */
unsigned long size,
/* Mapping size */
pgprot_t
/* Page permissions */
);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

459

Simple mmap implementation


static int acme_mmap (
struct file * file, struct vm_area_struct * vma)
{
size = vma->vm_start - vma->vm_end;
if (size > ACME_SIZE)
return -EINVAL;
if (remap_pfn_range(vma,
vma->vm_start,
ACME_PHYS >> PAGE_SHIFT,
size,
vma->vm_page_prot))
return -EAGAIN;
return 0;
}
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

460

Embedded Linux driver development

Block Devices
and File Systems

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

461

Block device or MTD filesystems


Block devices

Memory Technology Devices (MTD)

Floppy or hard disks


(SCSI, IDE)

Flash, ROM or RAM chips


MTD emulation on block devices

Compact Flash (seen as a


regular IDE drive)
RAM disks
Loopback devices

Filesystems are either made for block or MTD storage devices.


See Documentation/filesystems/ for details.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

462

I/O schedulers
Mission of I/O schedulers: re-order reads and writes to disk to
minimize disk head moves (time consuming!)

Slower

Faster

2.4 has one fixed: the Linus Elevator.


2.6 has modular IO scheduler No-op, Elevator, Antciptory,
Deadline, CFQ
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

463

Traditional block filesystems


Traditional filesystems
Hard to recover from crashes. Can be left in a corrupted (half
finished) state after a system crash or sudden power-off.
ext2: traditional Linux filesystem
(repair it with fsck.ext2)
vfat: traditional Windows filesystem
(repair it with fsck.vfat on GNU/Linux or Scandisk on
Windows)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

464

Journaled filesystems
Designed to stay in a
correct state even after
system crashes or a
sudden power-off
All writes are first
described in the journal
before being committed
to files

Application
Write to file

User-space
Kernel space
(filesystem)

Write an entry
in the journal

Write
to file

Clear
journal entry

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

465

Filesystem recovery after crashes


Reboot

Journal
empty?

No
Discard
incomplete
journal entries

Thanks to the journal,


the filesystem is never
left in a corrupted state
Recently saved data
could still be lost

Yes
Execute
journal
Filesystem OK

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

466

Journaled block filesystems


Journaled filesystems
ext3: ext2 with journal extension
reiserFS: most innovative (fast and extensible)
Others: JFS (IBM), XFS (SGI)
NTFS: well supported by Linux in read-mode

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

467

Compressed block filesystems (1)


Cramfs
Simple, small, read-only compressed filesystem
designed for embedded systems .
Maximum filesystem size: 256 MB
Maximum file size: 16 MB
See Documentation/filesystems/cramfs.txt
in kernel sources.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

468

Compressed block filesystems (2)


Squashfs: http://squashfs.sourceforge.net
A must-use replacement for Cramfs! Also read-only.
Maximum filesystem and file size: 232 bytes (4 GB)
Achieves better compression and much better performance.
Fully stable but released as a separate patch so far (waiting for
Linux 2.7 to start).
Successfully tested on i386, ppc, arm and sparc.
See benchmarks on
http://tree.celinuxforum.org/CelfPubWiki/SquashFsComparisons

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

469

ramdisk filesystems
Useful to store temporary data not kept after power off or reboot: system
log files, connection data, temporary files...
Traditional block filesystems: journaling not needed.
Many drawbacks: fixed in size. Remaining space not usable as RAM.
Files duplicated in RAM (in the block device and file cache)!
tmpfs (Config: File systems -> Pseudo filesystems)
Doesn't waste RAM: grows and shrinks to accommodate stored files
Saves RAM: no duplication; can swap out pages to disk when needed.
See Documentation/filesystems/tmpfs.txt in kernel sources.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

470

Mixing read-only and read-write filesystems

A read-write partition with a journaled filesystem (like ext3)


Used to store user or configuration data.
Guarantees filesystem integrity after power off or crashes.
A ramdisk for temporary files (tmpfs)

read-only
compressed
root
filesystem

ext3
read-write
user and
configuration
data

tmpfs
read-write
volatile data

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

Block Storage

A compressed read-only partition (Squashfs)


Typically used for the root filesystem (binaries, kernel...).
Compression saves space. Read-only access protects your
system from mistakes and data corruption.

Squashfs

ramdisk

Good idea to split your block storage into

471

The MTD subsystem


Linux filesystem interface
MTD User modules
jffs2
yaffs2

Char device

Block device

Flash Translation Layers


for block device emulation
Caution: patented algorithms!
FTL

NFTL

INFTL

Read-only block device

MTD Chip drivers


CFI flash

RAM chips
Block device

NAND flash

DiskOnChip flash

ROM chips

Virtual memory

Virtual devices appearing as


MTD devices

Memory devices hardware

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

472

MTD filesystems - jffs2


jffs2: Journaling Flash File System v2
Designed to write flash sectors in an homogeneous way.
Flash bits can only be rewritten a relatively small number of times
(often < 100 000).
Compressed to fit as many data as possible on flash chips. Also
compensates for slower access time to those chips.
Power down reliable: can restart without any intervention
Shortcomings: low speed, big RAM consumption (4 MB for 128
MB of storage).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

473

Mounting a jffs2 image


Useful to create or edit jffs2 images on your GNU / Linux PC!
Mounting an MTD device as a loop device is a bit complex task. Here's an
example for jffs2:
modprobe loop
modprobe mtdblock
losetup /dev/loop0 <file>.jffs2
modprobe blkmtd erasesz=256 device=/dev/loop0
(if not done yet)
mknod /dev/mtdblock0 b 31 0
mkdir /mnt/jffs2
(example mount point, if not done yet)
mount -t jffs2 /dev/mtdblock0 /mnt/jffs2/

It's very likely that your standard kernel misses one of these modules. Check
the corresponding .c file in the kernel sources and look in the corresponding
Makefile which option you need to recompile your kernel with.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

474

MTD filesystems - yaffs2


yaffs2: Yet Another Flash Filing System, version 2
yaffs2 home: http://aleph1.co.uk/drupal/?q=node/35
Caution: site under reconstruction. Lots of broken links!
Features: NAND flash only. No compression. Several times
faster than jffs2 (mainly significant in boot time) Consumes
much less RAM. Also includes ECC and is power down reliable.
License: GPL or proprietary
Ships outside the Linux kernel. Get it from CVS:
http://aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs/

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

475

Filesystem choices for block flash devices


Typically for Compact Flash storage
Can't use jffs2 or yaffs2 on CF storage (block device). MTD Block
device emulation could be used, but jffs2 / yaffs2 writing schemes
could interfere with on-chip flash management (manufacturer dependent).
Never use block device journaled filesystems on unprotected flash chips!

noatime: doesn't write access time information in file inodes


sync: to perform writes immediately (reduce power down failure risks)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

476

Filesystem choice summary

Storage
type?

Block

No

Read-only
files?

MTD

Contains
flash?

No

No

Volatile
data?

Yes
Yes

Yes

choose jffs2 or yaffs2

choose ext2

read-only or read-write

noatime + sync mount options

choose Squashfs
read-only

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Choose ext3 or reiserfs

Choose tmpfs

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

477

Embedded Linux driver development

The Network subsystem


and Network device drivers

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

478

A view of the Linux networking subsystem


Stack <> App

App 1

App2

App3

Socket Layer
UDP
Networking Stack

TCP

ICMP

IP

Bridge

Networking Stack
Driver <> Stack
Driver
Driver <> Hardware

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

479

Network Device Driver Hardware Interface


xxx

xxx

xxx

xxx

xxx
Memory
Access

Tx

Rx

Send

Send

Send

SentOK SendErr

Free

Free

RcvOk

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

Driver

RcvErr RecvCRC RcvOK

xxx
DMA

Free

xxx

xxx

xxx

Driver allocates Ring Buffers.


Driver resets descriptors to initial state.
Driver puts packet to be sent in Tx buffers.
Device puts received packet in Rx buffers.
Driver/Device update descriptors to indicate state.
Usually, device indicates Rx and end of Tx with
interrupt, unless polling or NAPI.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

480

Network Device Driver Stack Interface


A network device driver provides interface to the network
stack.
It does not have or use major/minor numbers, like character
devices.
A network driver is represented by a:
struct net_device

And is registered via:


int register_netdev(struct net_device *dev);
int unregister_netdev(struct net_device dev);

After filling in some important bits...


Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

481

Socket Buffers
We need to manipulate packets through the stack
This manipulation involves efficiently:
Adding protocol headers/trailers down the stack.
Removing protocol headers/trailers up the stack.

Packets can be chained together.


Each protocol should have convenient access to header
fields.
To do all this the kernel provides the sk_buff structure.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

482

sk_buff
An sk_buff represents a single packet.
This struct is passed through the protocol stack.
It holds pointers to a buffer with the packet data:
sk_buff
mac
head

data

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

h
tail

nh

headroom mac ip

end

tcp

telnet

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

tailroom
483

sk_buff Manipulation
Manipulate sk_buff:
unsigned char *skb_put(struct sk_buff * skb,
unsigned int len);
tail += len
unsigned char *skb_push(struct sk_buff * skb,
unsigned int len);
data -= len
unsigned char *skb_pull(struct sk_buff * skb,
unsigned int len);
data += len

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

484

sk_buff Manipulation
Manipulate sk_buff:
int skb_headroom(const struct sk_buff *skb);
data - head
int skb_tailroom(const struct sk_buff *skb);
end - tail
int skb_reserve(const struct sk_buff *skb, unsigned int len);
tail = (data +=len)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

485

sk_buff Allocation
Low level allocation is done via:
struct sk_buff *alloc_skb(unsigned int
size, int gfp_mask);

But it is better to use the wrapper:


struct sk_buff *dev_alloc_skb(unsigned int
size);
Which reserves some space for optimization.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

486

sk_buff Allocation Example


Immediately after allocation, we should reserve the needed
headroom:
struct sk_buff*skb;
skb =

dev_alloc_skb(1500);

if(unlikely(!skb))
break;

/* Mark as being used by this device */


skb->dev = dev;
/* Align IP on 16 byte boundaries */
skb_reserve(skb, 2);
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

487

Softnet
Was introduced in kernel 2.4.x
Parallelize packet handling on SMP machines
Packet transmit/receive is handled via two softirqs:
NET_TX_SOFTIRQ Feeds packets from network stack to
driver.
NET_RX_SOFTIRQ Feeds packets from driver to network
stack.

The transmit/receive queues are stored in per-cpu


softnet_data.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

488

Packet Reception
The driver:
Allocates an skb.
sets up a descriptor in the ring buffers for the hardware.

The driver Rx interrupt handler calls netif_rx(skb).


netif_rx(skb)
Deposits the sk_buff in the per-cpu input queue.
Marks the NET_RX_SOFTIRQ to run.

Later net_rx_action() is called by NET_RX_SOFTIEQ,


which calls the driver poll() method to feed the packet up.
Normally poll() is set to proccess_backlog() by net_dev_init()
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

489

Packet Reception Overview

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

490

Packet Transmission
Each network device defines a method:
int (*hard_start_xmit) (struct sk_buff *skb, struct
net_device *dev);

This method is indirectly called from the


NET_TX_SOFTIRQ
Calls to this method are serialized via dev->xmit_lock_owner
The driver can manage the transmit queue:
void netif_start_queue(struct net_device *net);
void netif_stop_queue(struct net_device *net);
void netif_wake_queue(struct net_device *net);
int netif_queue_stopped(struct net_device *net);
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

491

Packet Reception Overview

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

492

Network Device Allocation


Each network device is represented by a struct net_device
They are allocated using:
struct net_device *alloc_netdev(size, mask,
setup_func);

size size of our priv data part


mask a naming pattern (e.g. eth%d)
setup_func A function to prepare the rest of net_device.

And deallocated with


void free_netdev(struct *net_device);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

493

Network Device Allocation (cont.)


For Ethernet we have a short version:
struct net_device *alloc_etherdev(size);

which calls alloc_netdev(size, eth%d, ether_setup);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

494

Network Device Initialization


The net_device should be filled with numerous methods:
open request resources, register interrupts, start queues.
stop deallocates resources, unregister irq, stop queue.
get_stats report statistics
set_multicast_list configure device for multicast
do_ioctl device specific IOCTL function
change_mtu Control device MTU setting
hard_start_xmit called by the stack to initiate Tx.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

495

Network Device Initialization (Cont.)


Also, the dev->flags should be set according to device
capabilities:
IFF_MULTICAST Device support multicast
IFF_NOARP Device does not support ARP protocol

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

496

NAPI
Net network API
Optional provides interrupt mitigation under high load
Requirements:
A DMA ring buffer.
Ability to turn off receive interrupts.

It is used by defining a new method:


int (*poll) (struct net_device *dev, int * budget);

Called by the network stack periodically when signaled by


the driver to do so.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

497

NAPI (cont.)
When a receive interrupt occurs, driver:
Turns off receive interrupts.
Calls netif_rx_schedule(dev) to get stack to start
calling it's poll method.

Poll method
Scans receive ring buffers, feeding packets to the stack via:
netif_receive_skb(skb).
If work finished within budget parameter, re-enables interrupts
and calls netif_rex_complete(dev)
Else, stack will call poll method again.
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

498

Embedded Linux driver development

Advice and resources


Getting help and contributions

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

499

Solving issues
If you face an issue, and it doesn't look specific to your work but
rather to the tools you are using, it is very likely that someone else
already faced it.
Search the Internet for similar error reports
On web sites or mailing list archives
(using a good search engine)
On newsgroups: http://groups.google.com/
You have great chances of finding a solution or workaround, or at
least an explanation for your issue.
Otherwise, reporting the issue is up to you!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

500

Getting help
If you have a support contract, ask your vendor
Otherwise, don't hesitate to share your questions and issues
on mailing lists
Either contact the Linux mailing list for your architecture (like linuxarm-kernel or linuxsh-dev...)
Or contact the mailing list for the subsystem you're dealing with
(linux-usb-devel, linux-mtd...). Don't ask the maintainer directly!
Most mailing lists come with a FAQ page. Make sure you read it
before contacting the mailing list
Refrain from contacting the Linux Kernel mailing list, unless you're
an experienced developer and need advice

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

501

Embedded Linux driver development

Advice and resources


Bug report and patch submission

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

502

Reporting Linux bugs


First make sure you're using the latest version
Make sure you investigate the issue as much as you can:
see Documentation/BUG-HUNTING
Make sure the bug has not been reported yet. A bug tracking system
(http://bugzilla.kernel.org/) exists but very few kernel developers use it.
Best to use web search engines (accessing public mailing list archives)
If the subsystem you report a bug on has a mailing list, use it.
Otherwise, contact the official maintainer (see the MAINTAINERS file).
Always give as many useful details as possible.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

503

How to submit patches or drivers


Don't merge patches addressing different issues
You should identify and contact the official maintainer for the
files to patch.
See Documentation/SubmittingPatches for details.
For trivial patches, you can copy the Trivial Patch Monkey.
Special subsystems:
ARM platform: it's best to submit your ARM patches to Russell
King's patch system:
http://www.arm.linux.org.uk/developer/patches/

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

504

How to become a kernel developer?


Greg Kroah-Hartman gathered useful references and advice for
people interested in contributing to kernel development:
Documentation/HOWTO (in kernel sources since 2.6.15-rc2)
Do not miss this very useful document!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

505

Embedded Linux driver development

Advice and resources


References

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

506

Information sites (1)


Linux Weekly News
http://lwn.net/
The weekly digest off all Linux and free software
information sources
In depth technical discussions about the kernel
Subscribe to finance the editors ($5 / month)
Articles available for non subscribers
after 1 week.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

507

Information sites (2)


KernelTrap
http://kerneltrap.org/
Forum website for kernel developers
News, articles, whitepapers, discussions, polls, interviews
Perfect if a digest is not enough!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

508

Useful reading (1)


Linux Device Drivers, 3rd edition, Feb 2005
By Jonathan Corbet, Alessandro Rubini, Greg Kroah-Hartman, O'Reilly
http://www.oreilly.com/catalog/linuxdrive3/
Freely available on-line!
Great companion to the printed book for easy electronic searches!
http://lwn.net/Kernel/LDD3/ (1 PDF file per chapter)
http://free-electrons.com/community/kernel/ldd3/ (single PDF file)
A must-have book for Linux device driver writers!

Useful reading (2)


Linux Kernel Development, 2nd Edition, Jan 2005
Robert Love, Novell Press
http://rlove.org/kernel_book/
A very synthetic and pleasant way to learn about kernel
subsystems (beyond the needs of device driver writers)
Understanding the Linux Kernel, 3rd edition, Nov 2005
Daniel P. Bovet, Marco Cesati, O'Reilly
http://oreilly.com/catalog/understandlk/
An extensive review of Linux kernel internals, covering Linux 2.6 at last.
Unfortunately, only covers the PC architecture.

Useful reading (3)


Building Embedded Linux Systems, April 2003
Karim Yaghmour, O'Reilly
http://www.oreilly.com/catalog/belinuxsys/
Not very fresh, but doesn't depend too much on kernel versions
See http://www.linuxdevices.com/articles/AT2969812114.html
for more embedded Linux books.

Useful on-line resources


Linux kernel mailing list FAQ
http://www.tux.org/lkml/
Complete Linux kernel FAQ
Read this before asking a question to the mailing list
Kernel Newbies
http://kernelnewbies.org/
Glossaries, articles, presentations, HOWTOs,
recommended reading, useful tools for people
getting familiar with Linux kernel or driver
development.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

512

CE Linux Forum resources


CE Linux Forum's Wiki
is full of useful resources for embedded systems developers:
Kernel patches not available in mainstream yet
Many howto documents of all kinds
Details about ongoing projects, such as reducing kernel size,
boot time, or power consumption.
Contributions are welcome!
http://tree.celinuxforum.org/CelfPubWiki

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

513

ARM resources
ARM Linux project: http://www.arm.linux.org.uk/
Developer documentation:
http://www.arm.linux.org.uk/developer/
arm-linux-kernel mailing list:
http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel
FAQ: http://www.arm.linux.org.uk/armlinux/mlfaq.php
How to post kernel fixes:
http://www.arm.uk.linux.org/developer/patches/
ARMLinux @ Simtec: http://armlinux.simtec.co.uk/
A few useful resources: FAQ, documentation and Who's who!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

514

International conferences (1)


Useful conferences featuring Linux kernel presentations
Ottawa Linux Symposium (July): http://linuxsymposium.org/
Right after the (private) kernel summit.
Lots of kernel topics. Many core kernel hackers still present.
Fosdem: http://fosdem.org (Brussels, February)
For developers. Kernel presentations from well-known kernel hackers.
CE Linux Forum: http://celinuxforum.org/
Organizes several international technical conferences, in particular in
California (San Jose) and in Japan. Now open to non CELF members!
Very interesting kernel topics for embedded systems developers.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

515

International conferences (2)

Don't miss our free conference videos on


http://free-electrons.com/community/videos/conferences/!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

516

Embedded Linux driver development

Advice and resources


Last advice

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

517

Use the Source, Luke!


Many resources and tricks on the Internet find you will, but
solutions to all technical issues only in the Source lie.

Thanks to LucasArts
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

518

Embedded Linux driver development

Annexes
Quiz answers

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

519

Quiz answers
request_irq, free_irq
Q: Why does dev_id have to be unique for shared IRQs?
A: Otherwise, the kernel would have no way of knowing which handler to
release. Also needed for multiple devices (disks, serial ports...) managed
by the same driver, which rely on the same interrupt handler code.
Interrupt handling
Q: Why did the kernel segfault at module unload (forgetting to unregister
a handler in a shared interrupt line)?
A: Kernel memory is allocated at module load time, to host module code.
This memory is freed at module unload time. If you forget to unregister a
handler and an interrupt comes, the cpu will try to jump to the address of
the handler, which is in a freed memory area. Crash!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

520

Embedded Linux driver development

Annexes
More Shell Tips & Tricks

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

521

ls command
Lists the files in the current directory, in alphanumeric order,
except files starting with the . character.
ls -a (all)
Lists all the files (including .*
files)
ls -l (long)
Long listing (type, date, size,
owner, permissions)
ls -t (time)
Lists the most recent files first

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

ls -S (size)
Lists the biggest files first
ls -r (reverse)
Reverses the sort order
ls -ltr (options can be
combined)
Long listing, most recent files at
the end

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

522

File name pattern substitutions


Better introduced by examples!
ls *txt
The shell first replaces *txt by all the file and directory names
ending by txt (including .txt), except those starting with .,
and then executes the ls command line.
ls -d .*
Lists all the files and directories starting with .
-d tells ls not to display the contents of directories.
cat ?.log
Displays all the files which names start by 1 character and end by
.log

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

523

Special directories (1)


./
The current directory. Useful for commands taking a directory
argument. Also sometimes useful to run commands in the current
directory (see later).
So ./readme.txt and readme.txt are equivalent.
../
The parent (enclosing) directory. Always belongs to the . directory
(see ls -a). Only reference to the parent directory.
Typical usage:
cd ..

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

524

The cd and pwd commands


cd <dir>
Change the current directory to <dir>.
pwd
Displays the current directory ("working directory").

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

525

The cp command
cp <source_file> <target_file>
Copies the source file to the target.
cp file1 file2 file3 ... dir
Copies the files to the target directory (last argument).
cp -i (interactive)
Asks for user confirmation if the target file already exists
cp -r <source_dir> <target_dir> (recursive)
Copies the whole directory.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

526

mv and rm commands
mv <old_name> <new_name> (move)
Renames the given file or directory.
mv -i (interactive)
If the new file already exits, asks for user confirm
rm file1 file2 file3 ... (remove)
Removes the given files.
rm -i (interactive)
Always ask for user confirm.
rm -r dir1 dir2 dir3 (recursive)
Removes the given directories with all their contents.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

527

Creating and removing directories


mkdir dir1 dir2 dir3 ... (make dir)
Creates directories with the given names.
rmdir dir1 dir2 dir3 ... (remove dir)
Removes the given directories
Safe: only works when directories and empty.
Alternative: rm -r (doesn't need empty directories).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

528

Displaying file contents


Several ways of displaying the contents of files.
cat file1 file2 file3 ... (concatenate)
Concatenates and outputs the contents of the given files.
more file1 file2 file3 ...
After each page, asks the user to hit a key to continue.
Can also jump to the first occurrence of a keyword
(/ command).
less file1 file2 file3 ...
Does more than more with less.
Doesn't read the whole file before starting.
Supports backward movement in the file (? command).

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

529

The head and tail commands


head [-<n>] <file>
Displays the first <n> lines (or 10 by default) of the given file.
Doesn't have to open the whole file to do this!
tail [-<n>] <file>
Displays the last <n> lines (or 10 by default) of the given file.
No need to load the whole file in RAM! Very useful for huge files.
tail -f <file> (follow)
Displays the last 10 lines of the given file and continues to display new lines
when they are appended to the file.
Very useful to follow the changes in a log file, for example.
Examples
head windows_bugs.txt
tail -f outlook_vulnerabilities.txt

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

530

The grep command


grep <pattern> <files>
Scans the given files and displays the lines which match the given pattern.
grep error *.log
Displays all the lines containing error in the *.log files
grep -i error *.log
Same, but case insensitive
grep -ri error .
Same, but recursively in all the files in . and its subdirectories
grep -v info *.log
Outputs all the lines in the files except those containing info.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

531

The sort command


sort <file>
Sorts the lines in the given file in character order and
outputs them.
sort -r <file>
Same, but in reverse order.
sort -ru <file>
u: unique. Same, but just outputs identical lines once.
More possibilities described later!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

532

Symbolic links
A symbolic link is a special file which is just a reference to the
name of another one (file or directory):
Useful to reduce disk usage and complexity when 2 files have the
same content.
Example:
anakin_skywalker_biography -> darth_vador_biography

How to identify symbolic links:


ls -l displays -> and the linked file name.
GNU ls displays links with a different color.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

533

Creating symbolic links


To create a symbolic link (same order as in cp):
ln -s file_name link_name
To create a link with to a file in another directory, with the
same name:
ln -s ../README.txt
To create multiple links at once in a given directory:
ln -s file1 file2 file3 ... dir
To remove a link:
rm link_name
Of course, this doesn't remove the linked file!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

534

Hard links
The default behavior for ln is to create hard links
A hard link to a file is a regular file with exactly the same
physical contents
While they still save space, hard links can't be distinguished
from the original files.
If you remove the original file, there is no impact on the hard
link contents.
The contents are removed when there are no more files (hard
links) to them.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

535

Files names and inodes


Makes hard and symbolic (soft) links easier to understand!
Users
File name interface

Soft link

rm

File

Hard link
rm

Inode

Inode
interface
Filesystem

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

536

File access rights


Use ls -l to check file access rights
3 types of access rights
Read access (r)
Write access (w)
Execute rights (x)

3 types of access levels


User (u): for the owner of the
file
Group (g): each file also has a
group attribute, corresponding
to a given list of users
Others (o): for all other users

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

537

Access right constraints


x without r is legal but is useless
You have to be able to read a file to execute it.
Both r and x permissions needed for directories:
x to enter, r to list its contents.
You can't rename, remove, copy files in a directory if you don't
have w access to this directory.
If you have w access to a directory, you CAN remove a file even if
you don't have write access to this file (remember that a directory
is just a file describing a list of files). This even lets you modify
(remove + recreate) a file even without w access to it.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

538

Access rights examples


-rw-r--r-Readable and writable for file owner, only readable for others
-rw-r----Readable and writable for file owner, only readable for users
belonging to the file group.
drwx-----Directory only accessible by its owner
-------r-x
File executable by others but neither by your friends nor by
yourself. Nice protections for a trap...

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

539

chmod: changing permissions


chmod <permissions> <files>
2 formats for permissions:
Octal format (abc):
a,b,c = r*4+w*2+x (r, w, x: booleans)
Example: chmod 644 <file>
(rw for u, r for g and o)
Or symbolic format. Easy to understand by examples:
chmod go+r: add read permissions to group and others.
chmod u-w: remove write permissions from user.
chmod a-x: (a: all) remove execute permission from all.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

540

More chmod (1)


chmod -R a+rX linux/
Makes linux and everything in it available to
everyone!
R: apply changes recursively
X: x, but only for directories and files already executable
Very useful to open recursive access to directories,
without adding execution rights to all files.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

541

More chmod (2)


chmod a+t /tmp
t: (sticky). Special permission for directories, allowing
only the directory and file owner to delete a file in a
directory.
Useful for directories with write access to anyone,
like /tmp.
Displayed by ls -l with a t character.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

542

Standard output
More about command output
All the commands outputting text on your terminal do it by
writing to their standard output.
Standard output can be written (redirected) to a file using the
> symbol
Standard output can be appended to an existing file using the
>> symbol

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

543

Standard output redirection examples


ls ~saddam/* > ~gwb/weapons_mass_destruction.txt
cat obiwan_kenobi.txt > starwars_biographies.txt
cat han_solo.txt >> starwars_biographies.txt
echo README: No such file or directory > README
Useful way of creating a file without a text editor.
Nice Unix joke too in this case.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

544

Standard input
More about command input
Lots of commands, when not given input arguments, can take their
input from standard input.
sort
windows
linux
[Ctrl][D]
linux
windows

sort takes its input from


the standard input: in this case,
what you type in the terminal
(ended by [Ctrl][D])

sort < participants.txt


The standard input of sort is taken from the given file.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

545

Pipes
Unix pipes are very useful to redirect the standard output of a
command to the standard input of another one.
Examples
cat *.log | grep -i error | sort
grep -ri error . | grep -v ignored | sort -u \
> serious_errors.log
cat /home/*/homework.txt | grep mark | more

This one of the most powerful features in Unix shells!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

546

The tee command


tee [-a] file
The tee command can be used to send standard output
to the screen and to a file simultaneously.
make | tee build.log
Runs the make command and stores its output to
build.log.
make install | tee -a build.log
Runs the make install command and appends its
output to build.log.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

547

Standard error
Error messages are usually output (if the program is well written)
to standard error instead of standard output.
Standard error can be redirected through 2> or 2>>
Example:
cat f1 f2 nofile > newfile 2> errfile
Note: 1 is the descriptor for standard output, so 1> is equivalent
to >.
Can redirect both standard output and standard error to the same
file using &> :
cat f1 f2 nofile &> wholefile

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

548

Special devices (1)


Device files with a special behavior or contents
/dev/null
The data sink! Discards all data written to this file.
Useful to get rid of unwanted output, typically log information:
mplayer black_adder_4th.avi &> /dev/null
/dev/zero
Reads from this file always return \0 characters
Useful to create a file filled with zeros:
dd if=/dev/zero of=disk.img bs=1k count=2048

See man null or man zero for details

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

549

Special devices (2)


/dev/random
Returns random bytes when read. Mainly used by cryptographic
programs. Uses interrupts from some device drivers as sources of
true randomness (entropy).
Reads can be blocked until enough entropy is gathered.
/dev/urandom
For programs for which pseudo random numbers are fine.
Always generates random bytes, even if not enough entropy is
available (in which case it is possible, though still difficult, to
predict future byte sequences from past ones).
See man random for details.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

550

Embedded Linux driver development

Annexes
Init runlevels

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

551

System V init runlevels (1)


Introduced by System V Unix
Much more flexible than in BSD
Make it possible to start or stop
different services for each
runlevel
Correspond to the argument given
to /sbin/init.
Runlevels defined in
/etc/inittab.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

/etc/initab excerpt:
id:5:initdefault:
# System initialization.
si::sysinit:/etc/rc.d/rc.sysinit
l0:0:wait:/etc/rc.d/rc
l1:1:wait:/etc/rc.d/rc
l2:2:wait:/etc/rc.d/rc
l3:3:wait:/etc/rc.d/rc
l4:4:wait:/etc/rc.d/rc
l5:5:wait:/etc/rc.d/rc
l6:6:wait:/etc/rc.d/rc

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

0
1
2
3
4
5
6

552

System V init runlevels (2)


Customizable levels: 2, 3, 4, 5

Standard levels
init 0
Halt the system
init 1
Single user mode for maintenance
init 6
Reboot the system

init 3
Often multi-user mode, with only
command-line login
init 5
Often multi-user mode, with
graphical login

init S
Single user mode for maintenance.
Mounting only /. Often identical to 1

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

553

init scripts
According to /etc/inittab settings, init <n> runs:
First /etc/rc.d/rc.sysinit for all runlevels
Then scripts in /etc/rc<n>.d/
Starting services (1, 3, 5, S):
runs S* scripts with the start option
Killing services (0, 6):
runs K* scripts with the stop option
Scripts are run in file name lexical order
Just use ls -l to find out the order!

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

554

/etc/init.d
Repository for all available init scripts
/etc/rc<n>.d/ only contains links to the /etc/init.d/
scripts needed for runlevel n
/etc/rc1.d/ example (from Fedora Core 3)
K01yum -> ../init.d/yum
K02cups-config-daemon -> ../init.d/cupsconfig-daemon
K02haldaemon -> ../init.d/haldaemon
K02NetworkManager ->
../init.d/NetworkManager
K03messagebus -> ../init.d/messagebus
K03rhnsd -> ../init.d/rhnsd
K05anacron -> ../init.d/anacron
K05atd -> ../init.d/atd

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

S00single -> ../init.d/single


S01sysstat -> ../init.d/sysstat
S06cpuspeed -> ../init.d/cpuspeed

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

555

Handling init scripts by hand


Simply call the /etc/init.d scripts!
/etc/init.d/sshd start
Starting sshd:
/etc/init.d/nfs stop
Shutting down NFS mountd:
Shutting down NFS daemon:
[FAILED]Shutting down NFS quotas:
[FAILED]
Shutting down NFS services:

OK

[FAILED]

OK

[
[

OK
OK

]
]

/etc/init.d/pcmcia status
cardmgr (pid 3721) is running...
/etc/init.d/httpd restart
Stopping httpd:
Starting httpd:
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

556

Embedded Linux driver development

Annexes
Linux DMA API

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

557

DMA situations
Synchronous

Asynchronous

A user process calls the read


method of a driver. The driver
allocates a DMA buffer and asks
the hardware to copy its data.
The process is put in sleep mode.
The hardware copies its data and
raises an interrupt at the end.
The interrupt handler gets the
data from the buffer and wakes
up the waiting process.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

The hardware sends an interrupt to


announce new data.
The interrupt handler allocates a
DMA buffer and tells the hardware
where to transfer data.
The hardware writes the data and
raises a new interrupt.
The handler releases the new data,
and wakes up the needed
processes.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

558

Memory constraints
Need to use contiguous memory in physical space
Can use any memory allocated by kmalloc (up to 128 KB)
or __get_free_pages (up to 8MB)
Can use block I/O and networking buffers,
designed to support DMA.
Can not use vmalloc memory
(would have to setup DMA on each individual page)

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

559

Reserving memory for DMA


To make sure you've got enough RAM for big DMA transfers...
Example assuming you have 32 MB of RAM, and need 2 MB for DMA:
Boot your kernel with mem=30
The kernel will just use the first 30 MB of RAM.
Driver code can now reclaim the 2 MB left:
dmabuf = ioremap (
0x1e00000,
/* Start: 30 MB */
0x200000
/* Size: 2 MB */
);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

560

Memory synchronization issues


Memory caching could interfere with DMA
Before DMA to device:
Need to make sure that all writes to DMA buffer are committed.
After DMA from device:
Before drivers read from DMA buffer, need to make sure that memory
caches are flushed.
Bidirectional DMA
Need to flush caches before and after the DMA transfer.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

561

Linux DMA API


The kernel DMA utilities can take care of:
Either allocating a buffer in a cache coherent area,
Or make sure caches are flushed when required,
Managing the DMA mappings and IOMMU (if any)
See Documentation/DMA-API.txt
for details about the Linux DMA generic API.
Most subsystems (such as PCI or USB) supply their own DMA API,
derived from the generic one. May be sufficient for most needs.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

562

Limited DMA address range?


By default, the kernel assumes that your device can DMA to any
32 bit address. Not true for all devices!
To tell the kernel that it can only handle 24 bit addresses:
if (dma_set_mask (dev,
/* device structure */
0xffffff
/* 24 bits */
))
use_dma = 1;
/* Able to use DMA */
else
use_dma = 0;
/* Will have to do without DMA */

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

563

Coherent or streaming DMA mappings


Coherent mappings
Can simultaneously be accessed by the CPU and device.
So, have to be in a cache coherent memory area.
Usually allocated for the whole time the module is loaded.
Can be expensive to setup and use.
Streaming mappings (recommended)
Set up for each transfer.
Keep DMA registers free on the physical hardware registers.
Some optimizations also available.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

564

Allocating coherent mappings


The kernel takes care of both the buffer allocation and mapping:
include <asm/dma-mapping.h>
void *
dma_alloc_coherent(
struct device *dev,
size_t size,
dma_addr_t *handle,
gfp_t gfp
);

/* Output: buffer address */


/* device structure */
/* Needed buffer size in bytes */
/* Output: DMA bus address */
/* Standard GFP flags */

void dma_free_coherent(struct device *dev,


size_t size, void *cpu_addr, dma_addr_t handle);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

565

DMA pools (1)


dma_alloc_coherent usually allocates buffers with
__get_free_pages (minimum: 1 page).
You can use DMA pools to allocate smaller coherent mappings:
<include linux/dmapool.h>
Create a dma pool:
struct dma_pool *
dma_pool_create (
const char *name,
struct device *dev,
size_t size,
size_t align,
size_t allocation
);
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

/* Name string */
/* device structure */
/* Size of pool buffers */
/* Hardware alignment (bytes) */
/* Address boundaries not to be crossed */

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

566

DMA pools (2)


Allocate from pool
void * dma_pool_alloc (
struct dma_pool *pool,
gfp_t mem_flags,
dma_addr_t *handle
);
Free buffer from pool
void dma_pool_free (
struct dma_pool *pool,
void *vaddr,
dma_addr_t dma);
Destroy the pool (free all buffers first!)
void dma_pool_destroy (struct dma_pool *pool);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

567

Setting up streaming mappings


Works on buffers already allocated by the driver
<include linux/dmapool.h>
dma_addr_t dma_map_single(
/* device structure */
struct device *,
void *,
/* input: buffer to use */
size_t,
/* buffer size */
enum dma_data_direction /* Either DMA_BIDIRECTIONAL,
DMA_TO_DEVICE or DMA_FROM_DEVICE */
);
void dma_unmap_single(struct device *dev, dma_addr_t
handle, size_t size, enum dma_data_direction dir);

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

568

DMA streaming mapping notes


When the mapping is active: only the device should access the buffer
(potential cache issues otherwise).
The CPU can access the buffer only after unmapping!
Another reason: if required, this API can create an intermediate bounce
buffer (used if the given buffer is not usable for DMA).
Possible for the CPU to access the buffer without unmapping it, using
the dma_sync_single_for_cpu() (ownership to cpu) and
dma_sync_single_for_device() functions (ownership back to
device).
The Linux API also support scatter / gather DMA streaming mappings.

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

569

Lab Information
Login:
root:secret and gby:qwerty
Running the Eclipse IDE:
/opt/course/scripts/eclipse &
Running the QEMU Emulator:
/opt/course/scripts/run-qemu &
The emulated board files are at:
/opt/course/emulation/root
The kernel module template is at:
/opt/course/skeleton/
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

570

Copyrights
Copyright 2006-2004, Michael Opdenacker
Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.
Portions of this text are reprinted and reproduced in electronic form from IEEE Std 1003.1,
2003 Edition, Standard for Information Technology -- Portable Operating System Interface
(POSIX), The Open Group Base Specifications Issue 6, Copyright (C) 2001-2003 by the
Institute of Electrical and Electronics Engineers, Inc and The Open Group. In the event of
any discrepancy between this version and the original IEEE and The Open Group Standard,
the original IEEE and The Open Group Standard is the referee document. The original
Standard can be obtained online at http://www.opengroup.org/unix/online.html .
Portions Xavier Leroy <Xavier.Leroy@inria.fr>
Tux Image Copyright: (C) 1996 Larry Ewing
Linux is a registered trademark of Linus Torvalds.
All other trademarks are property of their respective owners.
Used and distributed under a Creative Commons Attribution-ShareAlike 2.0 license

Copyright 2006-2004, Michael Opdenacker


Copyright 2003-2006, Oron Peled
Copyright 2004-2006 Codefidence Ltd.

For full copyright information see last page.


Creative Commons Attribution-ShareAlike 2.0 license

571

Вам также может понравиться