Вы находитесь на странице: 1из 104

A

VOYAGE TO THE
KERNEL
The Day Before the Voyage

T
his voyage, which may seem a bit tedious, is new versions are released to fix the bugs reported
really full of fun and excitement. In fact, the by the users.
way through the waters in the deep interiors You might have read that Linux shares the
of the Amazon may seem less adventurous. Unlike similarities with other ‘UNIX-like operating
in those voyages, here the controls are in our hands. systems’, like System V Release 4 (SVR4)
We are least concerned about the deeds of nature. developed by AT&T, the 4.4 BSD release
We know the directions and targets. from the University of California at Berkeley
Being the guide for this voyage, it is essential for (4.4BSD), Digital Unix from Digital Equipment
me to give an introduction. As many of the people on Corporation (now Compaq); AIX from IBM; HP-
board are first time travellers to this area, I shall try UX from Hewlett-Packard; and Solaris from Sun
to make the initial elucidations more qualitative in Microsystems.
nature. Linus Torvalds (in 1991) developed Linux
Before we start our voyage, we shall look at the as an operating system for IBM-compatible PCs
prerequisites. I expect the readers to have a copy based on the Intel 80386 microprocessor. By now,
of the kernel (version 2.2 is enough for our use, developers have worked to make Linux available
but you can have a copy of the latest stable release on other architectures, including Alpha, SPARC,
and go for a comparison). You can download the PowerPC, Motorola MC680x0 and IBM System/390.
kernel from the official site kernel.org. If you are a You may take a glance at the following list to get
novice, I strongly recommend you don’t download an outline of the various architectures:
the development release. (Of course, I take it for  arm: Acorn personal computers
granted that you’ve set sail on a Linux boat.)  alpha: Compaq Alpha workstations
I have done (tested) the codings in my Intel  i386: IBM-compatible personal computers
x86 machine with ‘VINU—THE FREE OS’ as the based on Intel 80x86 or Intel 80x86-compatible
platform [an operating system the author has microprocessors
developed, based on Gentoo]. You will get the  m68k: Personal computers based on Motorola
same results if you try the coding in the terminal of MC680x0 microprocessors
Gentoo or any of its derivatives like Sabayon.  mips: Workstations based on Silicon Graphics
By looking at the version number of the kernel MIPS microprocessors
itself, one can tell whether it is a stable release or  ppc: Workstations based on Motorola-IBM
not. The idea behind it is very simple! If the second PowerPC microprocessors
digit is even, it is stable. If not, it is a development  sparc: Workstations based on Sun
release. Quite a simple trick, right? Once you learn Microsystems SPARC microprocessors
the basics, the shell and kernel coding will be even  sparc64: Workstations based on Sun
simpler for you!! Microsystems 64-bit Ultra SPARC
It should be emphasised that most of the microprocessors
significant data structures and many algorithms  s390: IBM System/390 mainframes
remain unaltered in the Linux kernel versions. The A lion’s share of the kernel source code is

68 jUNe 2008 | LINUX For YoU | www.openITis.com


a voyage to the Kernel
processor-independent and is written in C language. But
a small (and critical part!) is coded in assembly language. INTERPRETERS
In the course of our voyage, we will be dealing with the #!/bin/bash is used to execute using the Bourne-
assembly language programming in Linux, as it is required again shell. If it is not available in the /bin/ directory,
to study the kernel thoroughly. you may get a response similar to what follows:
The source file of the kernel, which is used as standard
reference for this voyage, has about 4,500 C and assembly localhost ~ # !/bin/bash
files, stored in about 270 sub-directories (roughly 2 million bash: !/bin/bash: event not found
lines of code!). We are concerned only about the parts that
are essential to comprehend the working of the kernel. You The following is a list of how to use the shebang:
can also browse the folder /usr/src/linux and get yourself  #!/bin/bash -c ‘/bin/bash’—Using bash in the /bin/
familiarised with the folders there. directory
Before we move on to kernel programming and related  #!/bin/csh—Execute using csh, the C shell
studies, it is essential to know some tools related to shell  #!/bin/ksh—Execute using the Korn shell
programming. I don’t expect any prior knowledge in  #!/bin/awk -f—Execute using awk program in the
the field, but the user should be familiar with the basic /bin/ directory
commands and how to execute them.  #!/bin/sh—On some systems, such as Solaris, this
Also, a basic knowledge of terms like commands, is the Bourne shell of Linux
arguments, etc, and the syntax for writing them are expected  #!/bin/zsh—Execute using zsh, the Z shell
to be familiar. Let’s say that, who am i is a special type of  #!/usr/bin/perl—Execute using Perl
who command. It has arguments along with the command,  #!/usr/bin/python—Execute using Python
and the general syntax for these types of commands is:  #!/usr/bin/env—Used to invoke any other
program using env program in /usr/bin
$ command argument1 argument2 argument3 ... argumentN
following command:
Also, to execute the date and who am i commands in
a single line, we use complex commands using command localhost ~ # !/bin/sh
separators as follows:
And you will get a response like what’s shown below:
$ date ; who am i ;
localhost ~ # !/bin/sh
The syntax for a complex command is: /bin/sh
sh-3.2#
$ command1 ; command2 ; command3 ; ... ; commandN ;
You might be a little perplexed now, after seeing the
You might have tried simple commands (like ls or cd) ‘#!’symbol. You may wonder why we require this symbol. In
and powerful commands (like awk, sed, etc.). Suppose we fact, it is telling the platform that the file (corresponding to the
give the command for the date: path provided) is a script. And it can execute it using the link
directed towards the interpreter. It is known as shebang (also
localhost ~ # date termed as hashbang, hashpling, or pound bang) when they are
used as the first two characters in a script. A shebang line will
[Yes, here I have logged in as root—note the hash sign!], have a hash sign followed by an exclamation mark, and this will
you will get a response similar to the one shown below: be followed by a complete path to any interpreter program.
However, it should be noted that, generally, the contents
localhost ~ # date in the shebang line are ignored by the interpreter. You will
Thu Apr 17 05:15:19 IST 2008 comprehend this once we start our voyage.
Let us continue with our mini-experiments. Using Box 1
Now the question is how does the computer perform the as a reference, you can try some interpreters.
task when you give this command. And here starts the story You might have noticed the references to C shell, Korn
of shell... shell, etc, in Box 1. They are the different types of shells.
Well, let us follow a different approach. Normally, text- And we will visit these areas first, in our voyage. Wait till the
books on shell programming will first provide a historical next day and we will start!
description, then they will speak about the types of shells, By: Aasis Vinayak PG is a hacker and a free software activist
and finally move towards the programming part. But I feel who does programming in the open source domain. He is the
it will be better if you start doing small experiments before developer and CEO of the Mozhi Search engine. His research
works/publications are available at www.aasisvinayak.com
moving to those areas. On your terminal, execute the

www.openITis.com | LINUX For You | june 2008 69


A
VOYAGE TO THE
KERNEL
Day One Part 2

W
e have seen that the shell acts as an interface option and aliases (mnemonic names for commands)
for UNIX (the UNIX system’s command made it quite popular.
interpreter). It is quite essential to make it We have already seen some simple commands. And
clear that the shell is not just an interpreter but is a door today we will look at more useful commands. Let us
opening towards a very powerful programming language begin with some interesting stuff. If you want to know
with conditional statements, loops and functions. when you have booted the system, you can use who -b.
On a large frame we can categorise the types of
shells as the Bourne shell (sh, ksh and bash) and the C vinayak@gnubox:~$ who -b
shell (csh and tcsh). Normally, by looking at the prompt system boot 2008-05-14 21:46
symbol itself one can guess the type of shell. If you find
the $ character by default, then you are in a Bourne- At times, you may need to know your current
type shell and if you have the % character by default directory, especially when you want to meddle with
then you are using a C-type shell. your files. You can get the information by using the pwd
As indicated earlier, there are different kinds of command.
C-type and Bourne shells. The C-types are: C shell (csh)
and TENEX/TOPS C shell (tcsh). And we have shell vinayak@gnubox:~$ pwd
types like the Bourne shell (sh), the Korn shell (ksh), /home/vinayak
the Bourne Again shell (bash) and the POSIX shell (sh),
belonging to the second group. While doing the programming or compiling related
The pages from history will tell you that in the work, we may want to look for the virtual memory
mid-1970s, the original shell was the Bourne shell and status. For that you can go for vmstat (refer to
was written by Stephen R. Bourne (who was working Listing 1).
for the AT&T Bell Labs). If you have been exposed You might have used the line clrscr() many times
to the ALGOL language, you will find that the syntax when you have written programs in C. Here you have a
used in the Bourne shell is quite similar to that of more lucid command—just type clear; that will do the
ALGOL language. work. Another useful one is the finger command that
Later in the 1980s, Bill Joy (of the University of will tell you about the logged in users, including the
California at Berkeley) came out with the C shell. The user's .plan and .project.
prime motivation to develop the shell was that writing Following are some of the basic commands that we
programs would be easier in this than in a shell where can use:
one needs to follow the ALGOL style syntax. And, • pwd—Print the current directory
moreover, the C language was quite familiar to the • cd—Change directory
programmers working on UNIX at Berkeley. Features • find— Locate files and directories
like command history (for recalling the scripts that • ls—List files in a directory
were executed already), the file name completion • file—Print the type

Listing 1:
vinayak@gnubox:~$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 216460 15148 344048 0 0 388 32 1002 532 12 4 77 7

80 juLY 2008 | LINuX For You | www.openITis.com


a voyage to the Kernel
devshm 481312 0 481312 0% /dev/shm
• cat—Display the contents
/dev/sda1 101086 92967 2900 97% /boot
• cp or copy—Copy
/dev/sda6 28842748 22979020 4398604 84% /home
• chmod—Change file modes, permissions
/dev/sda4 83625016 42685324 36691708 54% /media/sda4
• chown—Change the owner of a file
• chgrp—Change the group of a file
• wc—Count the number of words, lines and characters in Some fundamental commands that we have already
a file stumbled upon:
• rm—Remove a file • cal—Display a calendar
• mv—Rename a file • date—Display the system date and time
• mkdir—Create a directory • who—Display information about users on the system
• rmdir—Remove a directory • w—Extended who command
It may not be known to novice users that they don’t require • whoami—Display $LOGNAME or $USER environment
the GUI mode to change their user password in UNIX. You can parameter
do it by using the command passwd. • who am I—Display login name, terminal and login details
The difference between whoami and who am I is
vinayak@gnubox:~$ passwd illustrated below:
Changing password for vinayak.
(current) UNIX password: vinayak@gnubox:~$ whoami
Enter new UNIX password: vinayak
Retype new UNIX password: vinayak@gnubox:~$ who am I
vinayak pts/1 2008-05-16 18:02 (:0.0)
The following commands will give you more power to
work on: And more commands/tools...
• write—Display a message on a user’s screen • grep—Pattern matching
• wall—Display a message on all logged-in users’ screens • egrep—grep command for extended regular expressions
• rwall—Display a message to all users on a remote host • >>—Append to end of a file
• df—Filesystem statistics • >—Redirect, create or overwrite a file
• ps—Information on currently running processes • |—Pipe, used to string commands together
• rsh or remsh—Execute a command, or log in, on a remote • ||—Logical OR—command1 || command2—execute
host command2 if command1 fails
• netstat—Show network status • &—Execute in background
• iostat—Show input/output status • echo—Write strings to standard output
• uname—Displays the name of the current operating • sleep—Execution halts for the specified number of
system seconds
• sar—To show the system activity report • head—View top of a file
• basename—Base filename of a string parameter • tail—View end of a file
• man—Display the reference manual • diff—Compare two files
• su—Switch to user (known as super-user) • sdiff—Compare two files side by side (requires 132-
• cut—Write out selected characters character display)
You may need to enable some utilities by downloading the • spell—Spell checker
packages for better results. • lp, lpr, enq, qprt—Print a file
Some commands useful while you are going to write • lpstat—Status of system print queues
programs: • enable—Enable, or start, a print queue
• awk—Programming language to parse characters • disable—Disable, or stop, a print queue
• sed—Programming language for character substitution The execution of these commands will make you feel
• vi—To start vi editor more comfortable with the shell. But to comprehend shell
• emacs—To start the emacs editor programming, you are required to understand the shell in a
Also, you can get information regarding the filesystem systematic way. We will be doing this over the next couple of
statistics by executing df: articles—to use it as a very powerful programming language
with conditional statements, loops and functions! 
vinayak@gnubox:~$ df
Filesystem 1K-blocks Used Available Use% Mounted on By: Aasis Vinayak PG. The author is a hacker and a free
/dev/sda5 19236308 6260592 11998564 35% / software activist who does programming in the open source
varrun 481312 92 481220 1% /var/run
domain. He is the developer and CEO of the Mozhi Search
engine. His research work/publications are available at
varlock 481312 0 481312 0% /var/lock
www.aasisvinayak.com
udev 481312 112 481200 1% /dev

82 juLY 2008 | LINUX For You | www.openITis.com


A
VOYAGE TO THE
KERNEL
Day Two Part 3

N
ow it’s time to learn about the shell in a aasisvinayak@free-laptop:~$ ./first
systematic way. Let us begin today’s voyage I love Linux
by writing our first shell script. If you have
not been exposed to the commands in the vi editor It works! So you have written your first program
(my favourite editor after Emacs) you can use the in the shell. Please note that it is always better to
following listing as a reference: save the shell scripts with the extension .sh, so that
• To insert a new text: Esc + i it can easily be identified.
• To save a file: Esc + : + w Now let us try experimenting with user defined
• To save a file with the file name (save as): Esc + variables (UDV). Here (just like #define in C), you
: + w filename will be defining your variable as shown below:
• To quit the vi editor: Esc + : + q
• To quit without saving: Esc + : + q! #
• To save and quit the vi editor: Esc + : + wq # Script to test variables
• To search for specified words in forward # Script written for “A Voyage to the Kernel”
direction: Esc + /word myname=Vinayak
• To continue with the search: n myos=ubuntu 8.04
• To search for a given word in the backward echo “My name is $myname”
direction: Esc + ?word echo “My os is $myos. It is really good.”
• To copy the line where the cursor is placed: Esc
+ yy You now have an idea regarding the initial
• To paste the text just deleted or copied at the steps. Hence, for the sake of saving space, I will
cursor: Esc + p not include the commands that are specific to the
• To delete the entire line where the cursor is vi editor. You may include the same wherever it is
located: Esc + dd required.
• To delete the word from the cursor’s position: Well, let’s try executing the code that we wrote
Esc + dw just now:
For beginners, I have included the screen shots
of the process. Open your terminal and type the aasisvinayak@free-laptop:~/Desktop$ /home/aasisvinayak/
code as shown in Figure 1, and save the file using Desktop/variable.sh
the Esc + : + w filename command (Figure 2). Now My name is Vinayak
try running the script from the terminal and you My os is ubuntu. It is really good.
will see that you get an error:
Yup! The variables are set. Now let me show
aasisvinayak@free-laptop:~$ ./first you how to get the input from the keyboard and
bash: ./first: Permission denied how the program interacts with the user.

This is because you have not set the permissions #Script to read one’s name from key-board
for the file to be executed. You can use chmod 755 #Script written for “A Voyage to the Kernel”
or your GUI tool to do this. Now try running it: echo “Please type your name”
read name
aasisvinayak@free-laptop:~$ chmod 755 first echo “Hello $name, Let’s be friends!”

102 AUgUsT 2008 | LINUX For YoU | www.openITis.com


a voyage to the Kernel
Once executed, you will see that the program takes the
values you entered and prints the statement.

aasisvinayak@free-laptop:~/Desktop$ /home/aasisvinayak/Desktop/
inputname.sh
Please type your name
Aasis
Hello Aasis, Lets be friends!
Figure 1: The vi editor with some text
Now let’s understand wild card characters. The
following is the wild card expansion listing:
• ls * — This command will show all files
• ls a* — This will show all files with names starting with
the letter 'a'
• ls *.c — This is used to list files (names) that end with '.c'
• ls ib*.c — This will list all files that begin with 'ib' and Figure 2: The vi editor immediately after the file is saved with the filename first
end with '.c'
• ls ? — If you want to list files that have a single letter as
the file name, you can use this
• ls ab? — This will display files that begin with ‘ab’ and
have 3 letters in their name
Here is an application of wild-card expansion (and it is
self-explanatory):

aasisvinayak@free-laptop:~/Desktop$ ls /bin/[x-z]*
/bin/zcat /bin/zdiff /bin/zfgrep /bin/zgrep /bin/zmore
/bin/zcmp /bin/zegrep /bin/zforce /bin/zless /bin/znew

Now I will introduce you to another code that can


manipulate the data a user has provided:

Figure 3: The output when you execute the lengthy program


#
# Script using echo and read command for user interaction according user’s input
# Script written for "A Voyage to the Kernel" #
# #
echo "Your good name please :" while :
read name do
echo "Your age please :" clear
read age echo “***************************************”
newage=`expr $age + 5` echo “ Menu “
echo "Hello $name, in 5 years you will be $newage years old." echo “****************************************”
echo “[1] To show today’s date/time”
And if you have written the code well, you will get a echo “[2] To show files in current directory”
result similar to the one shown below: echo “[3] To show calendar”
echo “[4] To start Vi editor”
aasisvinayak@free-laptop:~$ /home/aasisvinayak/Desktop/usernameage.sh echo “[5] Exit”
Your good name please : echo “==============================”
XYZ echo -n “Enter your choice [1-5]: “
Your age please : read yourchoice
100 case $yourchoice in
Hello XYZ, in 5 years you will be 105 years old. 1) echo “Today is `date` ,Press any key . . .” ; read ;;
2) echo “Files in `pwd`” ; ls -l ; echo “Press any key. . .” ;
Let's move onto a little lengthy, yet simple, program: read ;;
3) cal ; echo “Press any key . . .” ; read ;;
# 4) vi ;;
# Script illustrating the way to create simple menus and take action 5) exit 0 ;;

www.openITis.com | LINUX For You | August 2008 103


a voyage to the Kernel
Figure 3 shows what the program looks like when you
execute it.
Tired of trying the terminal stuff? Let’s have some fun
(of course, at the terminal itself). Try the following code:

dialog --title “Suggestion” --backtitle “


Voyage to the Kernel
\
“ --infobox “

This dialog box is just to show another utility

Please send your suggestion to improve the column at


aasisvinayak@gmail.com
www.aasisvinayak.com” 15 75 ;

Well, before executing this code, make sure that


you have the dialog utility installed in your OS. If not,
install it and then run the script. The output is shown in
Figure 4: A simple program using the dialog utility Figure 4.
Wow! Surprised that the shell can even do such catchy
stuff? Well, let’s look for a way to receive an input from the
user with dialog:

dialog --title “Linux for You” --backtitle “Shell script written for
A Voyage to the Kernel
“ --msgbox “Thanks for trying this script. Click Ok or press space
bar/enter key” 8 30

Figure 5 elucidates that your program can respond to


the input provided by the user.
If you wish to create a program that’s a little more
advanced, where the script allows the user to enter some
text, try the following one:

Figure 5: An interactive dialog-based program dialog --title “For queries - post@aasisvinayak.com” --backtitle
“Script written for A Voyage to the kernel” --inputbox “Please enter
some text” 10 75
read $name
sel=$name
case $sel in
0) echo “$name” ;;
1) echo “Cancel is Press” ;;
255) echo “[ESCAPE] key pressed” ;;
esac

You will see an input box, where the user can enter the
text (Figure 6).
Now, are you in a mood to learn some theory? Just hold
on till the next issue. 
Figure 6: Another dialog-based program that lets you enter comments
By: Aasis Vinayak PG. The author is a hacker and a free
*) echo “Opps!!! Please select choice 1,2,3,4, or 5”; software activist who does programming in the open source
echo “Press any key . . .” ; read ;;
domain. He is the developer and CEO of the Mozhi Search
engine. His research work/publications are available at
esac
www.aasisvinayak.com
done

104 August 2008 | LINUX For You | www.openITis.com


A
VOYAGE TO THE
KERNEL
Day 3
Part 4

W
elcome back to the Voyage. We have or / ). We can easily incorporate these elements
already experimented a lot with shell whenever we need to build a sensible step.
programming. Now let us take a glance Now consider the following code:
at the theory behind the programming. In the last
issue, we meddled with the decision-making side if cat $1
in shell. If you observed it closely, you must have then
noticed that you need to work in a specific syntax. echo -e “\nFile $1 was found and successfully echoed”
fi
if (condition_command) then
command no1 This might be only a portion of the code. But it
command no2 does have a specific function. It will search for the
... file and print the file that was found.
last_command We can also deploy a test command or [ expr
fi ] to see if an expression is true (for which it
will return 0) or false (for which it will return
If you wish to use multiple commands, you a non-zero value). We have also seen the ‘if-
need to follow a different style: else-fi’ style (i.e., if the condition is true, the
first command will be executed; otherwise the
if (condition_command) then second). Now, we can go for a ‘Nested if-else-
command no1 fi’ style, where you can write the entire if-else
command no2 construct within the body itself. Let me illustrate
... this with an example:
last_command
elif (condition_command2) then favcompany=0
command no1 echo -n “Select your favorite company [1 or 2]? “
command no2 echo “1. IBM”
... echo “2. Wipro”
last_command
else read favcompany
command no1 if [ $favcompany -eq 1 ] ; then
command no2 echo “IBM ! Good selection”
... else
last_command if [ $favcompany -eq 2 ] ; then
fi echo “Wipro - Be proud of an Indian!”
else
We know that an expression is nothing but a echo “Hmm. So you don’t like these companies”
combination of values, relational operators (say fi
> or <) and mathematical operators (say +, -, fi

104 SepTember 2008 | LINUX For YoU | www.openITis.com


a voyage to the Kernel
Once executed, it will give the following result: Vlanguage—Comprehend
Kernel in a better way!
$ /home/aasisvinayak/Desktop/favcomp.sh
V is a new computer language that is in the
Select your favorite company [1 or 2]? 1. IBM
development phase. It essentially tries to do away with
2. Wipro
the need to learn any other programming language.
1
Currently, you need to learn many languages, often
IBM ! Good selection
getting confused by their syntaxes and functions. V
intends to solve the issue by allowing you to write
Another style using multi-level ‘if-then-else’ is as
programs in a single language; and then converts them
follows:
to any language required by you. It has its own libraries,
which allow you to execute any program with the .v
if condition
extension. From the code, the program will decipher the
then
nature of the language.
if condition
Further, from Stage 5 onwards, you need not learn
then
the syntaxes of the new language. You can write the
.....
programs in plain English and the program, by itself,
..
will convert it into the preferred syntax by employing the
do this
libraries of languages like AIML and employing artificial
else
intelligence. So it averts the need to mug up stuff. In
....
short, with V, you can easily write programs just by using
..
your brains!
do this
If you wish, you may try out the Internet version
fi
of V at grogammer.com (but the development work is
else
going on at vlanguage.org), which is the Web-based
application version of V. By employing the reverse
...
process of V, you can translate the codes in the kernel
.....
to plain English, so that any intermediate user can
do this
comprehend the code easily.
fi
done
Those who are into programming know that the echo “ #HOPE THIS IS CLEAR”
computer can repeat a particular instruction again and done
again, till a particular condition is satisfied (as defined in
the program). Yes, the same loop function! Bash essentially Hope the above code is lucid.
supports ‘for loop’ and ‘while loop’. For example, you can
roughly have: $ /home/aasisvinayak/Desktop/complexloop.sh
11111 #HOPE THIS IS CLEAR
for i in 1 2 3 22222 #HOPE THIS IS CLEAR
do 33333 #HOPE THIS IS CLEAR
echo “$i I LOVE LINUX” 44444 #HOPE THIS IS CLEAR
done 55555 #HOPE THIS IS CLEAR
66666 #HOPE THIS IS CLEAR
This is straightforward. And you may get the result as: 77777 #HOPE THIS IS CLEAR
88888 #HOPE THIS IS CLEAR
$ /home/aasisvinayak/Desktop/loop.sh 99999 #HOPE THIS IS CLEAR
1 I LOVE LINUX
2 I LOVE LINUX We have seen some of what is indispensable in shell.
3 I LOVE LINUX We will wind up our visit to Shell by looking at some more
essential stuff. After that, we can pass over to the kernel
You can increase the complexity in different ways: and other related topics. 

for (( i = 1; i < 10; i++ )) By: Aasis Vinayak PG. The author is a hacker and a free
do software activist who does programming in the open source
for (( j = 1 ; j <= 5; j++ ))
domain. He is the developer and CEO of the Mozhi Search
engine. His research work/publications are available at
do
www.aasisvinayak.com
echo -n “$i”

www.openITis.com | LINUX For You | September 2008 105


A
VOYAGE TO THE
KERNEL
Day Four
Part 5

A
continuation of the journey of echo “You have guessed the number, $varnumber, in
exploration, in search of all the treasures $totalguesses guesses.”
that the kernel holds!
Welcome back! We are now going to look at exit 0
some more applications of shell programming.
We saw the use of the dialog utility earlier. Once you execute this, you’ll notice that
Now, we will learn some coding that we can the program reads your input and churns out
incorporate into our main program so as to make clues to find the correct number. It is illustrated
it more lucrative. below:
To begin with, let us glance through this
script that will select a random number (which aasisvinayak@free-laptop:~$ ./Desktop/voyage/guessnumber.
is not greater than 25) and ask the user to guess sh
the number. (If you wish to avoid the display of The computer as selected a number which is less than 25.
previous commands and messages please use the Can you guess the number? 10
clear command before you start.) The original number is bigger than your gussed number!
The computer as selected a number which is less than 25.
#!/bin/sh Can you guess the number? 15
# Guess the random number The original number is bigger than your gussed number!
#Script written for A Voyage to the Kernel The computer as selected a number which is less than 25.
Can you guess the number? 20
biggest=25 The original number is bigger than your gussed number!
userguess=0 The computer as selected a number which is less than 25.
totalguesses=0 Can you guess the number? 24
varnumber=$(( $$ % $biggest )) The original number is smaller than your gussed number!
The computer as selected a number which is less than 25.
while [ $userguess -ne $varnumber ] ; do Can you guess the number? 22
echo -n “The computer as selected a number which You have guessed the number, 22, in 5 guesses.
is less than 25. Can you guess the number? “ ; read
userguess Let’s assume that you have developed an
if [ “$userguess” -lt $varnumber ] ; then application that allows users to enter a large
echo “The original number is bigger than your gussed amount of text. Why not then incorporate a
number!” spell-check utility into your main program? Here
elif [ “$userguess” -gt $varnumber ] ; then is a code that explains what exactly happens in
echo “The original number is smaller than your gussed the shell during the process. Of course, to use
number!” the script in a program, you need to customise
fi it, and the input data should be fed to the
totalguesses=$(( $totalguesses+1)) program. Please make sure that you have ispell
done installed before trying the script.

102 ocTober 2008 | LINUX For YoU | www.openITis.com


a voyage to the Kernel
#!/bin/sh fbefore=””
# To check spelling a word entered fi
#Script written for A Voyage to the Kernel if [ $after -gt $characters ] ; then
spell=”ispell -l” format=”$fbefore$userguess”
for word else
do format=”$fbefore$userguess$(echo $format | cut -c$after-
if [ -z $(echo $word | $spell) ] ; then $characters)”
echo “$word -The word is spelled correctly” fi
else fi
echo “$word - The word is misspelled” character=$(( $character + 1 ))
fi done
done
exit 0 leftover=$(echo $format|sed ‘s/[^\.]//g’|wc -c|sed ‘s/[[:
space:]]//g’)
The following is a demo: leftover=$(( $leftover - 1 ))
}
aasisvinayak@free-laptop:~$ ./Desktop/voyage/spellcheck.sh goat
goat -The word is spelled correctly word=$(selectedrandomword)
aasisvinayak@free-laptop:~$ ./Desktop/voyage/spellcheck.sh linux characters=$(echo $word | wc -c | sed ‘s/[[:space:]]//g’)
linux - The word is misspelled characters=$(( $characters - 1 ))
aasisvinayak@free-laptop:~$ ./Desktop/voyage/spellcheck.sh Linux format=”$(echo $blankdots | cut -c1-$characters)”
Linux -The word is spelled correctly leftover=$characters ; userguessed=”” ; userguesses=0;
userbadguesses=0
You can see from the second and third trials that for
some words, it checks whether the first character is in echo “** Try to guess a word with $characters characters **”
capital letters.
We have seen the guess-number script. Now, let’s while [ $leftover -gt 0 ] ; do
discuss a guess-word script that’s a little more complex. echo -n “Word is: $format Guess the character next to this? “ ;
read userguess
#!/bin/sh userguesses=$(( $userguesses + 1 ))
# Guess the Word (selected randomly from the list) if echo $userguessed | grep -i $userguess > /dev/null ; then
# Script written for A Voyage to the Kernel echo “You’ve already guessed that character. Try something
else”
blankdots=”..................” elif ! echo $word | grep -i $userguess > /dev/null ; then
echo “Sorry, the character you gussed , \”$guess\”, is not in
selectedrandomword() the random word selected.”
{ userguessed=”$userguessed$userguess”
case $(( $$ % 8 )) in userbadguesses=$(( $userbadguesses + 1 ))
0 ) echo “Linux” ;; 1 ) echo “GNU” ;; else
2 ) echo “FSF” ;; 3 ) echo “Vlanguage” ;; echo “Good guess! The character $userguess is in the random
word selected!”
esac addthegussedcharactertotheformat $userguess
} fi
done
addthegussedcharactertotheformat() echo -n “Great! You guessed $word in $userguesses guesses”
{ echo “ with $userbadguesses bad guesses”
exit 0
character=1
while [ $character -le $characters ] ; do There is a function in the script called
if [ “$(echo $word | cut -c$character)” = “$userguess” ] ; addthegussedcharactertotheformat. This function
then replaces the dots (‘.’) in the standard format with
before=”$(( $character - 1 ))”; after=”$(( $character + 1 guesses. Also note that the dots “..................” must be
))” longer than the longest word in the list. And the function
if [ $before -gt 0 ] ; then selectedrandomword will select a random word. Now,
fbefore=”$(echo $format | cut -c1-$before)” let us try executing this. (If you have observed the script
else carefully, you can see that there are chances of one or

www.openITis.com | LINUX For You | October 2008 103


a voyage to the Kernel
two bugs emerging in it. Can you trace them?) Word is: .languag. Guess the letter next to this? e
Good guess! The letter e is in the random word selected!
aasisvinayak@free-laptop:~$ ./Desktop/voyage/guessword.sh Word is: .language Guess the letter next to this? V
** Try to guess a word with 3 characters ** Good guess! The letter V is in the random word selected!
Word is: ... Guess the character next to this? F Great! You guessed Vlanguage in 11 guesses with 1 bad guesses
Good guess! The character F is in the random word selected! aasisvinayak@free-laptop:~$
Word is: F.F Guess the character next to this? S
Good guess! The character S is in the random word selected! Here the script recognises the character v, but it is
Great! You guessed FSF in 2 guesses with 0 bad guesses not added. Why? As per the list, it should be in upper-
aasisvinayak@free-laptop:~$ before case format. You can fix this by adding a statement that
tells the script that both are equal.
Here, when you execute the code, it asks you And the following is yet another response that you
to enter the words. As the selected word (random) may get at times:
is FSF, when you enter F, two positions are filled
simultaneously. Though this is not a bug, you can try ** Try to guess a word with 0 characters **
removing this by adding a line that prevents the filling of Great! You guessed in 0 guesses with 0 bad guesses
more than one place simultaneously.
The code given below shows the way in which the Why? There is an invalid decreasing range in the script. Try the
program reacts if you enter the wrong character: cut --help command to find the resolution.

aasisvinayak@free-laptop:~$ ./Desktop/voyage/guessword.sh #!/bin/bash


** Try to guess a word with 3 characters **
Word is: ... Guess the letter next to this? G echo “Set Positions”
Sorry, the letter you gussed , “G”, is not in the random word echo ‘$1 = ‘ $1
selected. echo ‘$2 = ‘ $2
Word is: ... Guess the letter next to this? F echo ‘$3 = ‘ $3
Good guess! The letter F is in the random word selected! echo ‘$4 = ‘ $4
Word is: F.F Guess the letter next to this? S echo ‘$5 = ‘ $5
Good guess! The letter S is in the random word selected!
Great! You guessed FSF in 3 guesses with 1 bad guesses Now assume that you have an input box where users
will enter their preferences in a particular order. You
If you have not found the bug, don’t worry; we are can feed that to your dynamic script (which employs $).
going to discuss them. Look at the following demo: And the following code is the static equivalent to achieve
that:
aasisvinayak@free-laptop:~$ /home/aasisvinayak/Desktop/guessword.
sh aasisvinayak@free-laptop:~$ ./Desktop/voyage/arrangebyposition.sh
** Try to guess a word with 9 characters ** LFY EFY BenefIT IT FFU
Word is: ......... Guess the letter next to this? v Set Positions
Good guess! The letter v is in the random word selected! $1 = LFY
Word is: ......... Guess the letter next to this? l $2 = EFY
Good guess! The letter l is in the random word selected! $3 = BenefIT
Word is: .l....... Guess the letter next to this? a $4 = IT
Good guess! The letter a is in the random word selected! $5 = FFU
Word is: .la...a.. Guess the letter next to this? n
Good guess! The letter n is in the random word selected! I am not going to illustrate this, as it is self-
Word is: .lan..a.. Guess the letter next to this? g explanatory.
Good guess! The letter g is in the random word selected! Assume that you want to add some colour to your
Word is: .lang.ag. Guess the letter next to this? u program. For that, here is a way:
Good guess! The letter u is in the random word selected!
Word is: .languag. Guess the letter next to this? a clear
Good guess! The letter a is in the random word selected! echo -e “\033[24m Freedom”
Word is: .languag. Guess the letter next to this? h echo -e “\033[32m Freedom”
Sorry, the letter you gussed , “”, is not in the random word echo -e “\033[36m Freedom”
selected. echo -e “\033[31m Freedom”
Word is: .languag. Guess the letter next to this? g echo -e “\033[33m Freedom”
Good guess! The letter g is in the random word selected! echo -e “\033[34m Freedom”

104 October 2008 | LINUX For You | www.openITis.com


a voyage to the Kernel

Figure 1: Colours in the terminal output

echo -e “\033[35m Freedom”

Figure 1 illustrates the execution of the code. Figure 2: Coloured background in the terminal output
Now if you wish to highlight the items using colours,
you can try something similar to the code shown below: Figure 2 shows the result of the above code on
execution.
clear Today, we have explored many ways by which you
echo -e “\033[41m A Voyage to Kernel” can enhance your applications. 
echo -e “\033[46m A Voyage to Kernel”
echo -e “\033[43m A Voyage to Kernel” By: Aasis Vinayak PG. The author is a hacker and a free
echo -e “\033[44m A Voyage to Kernel” software activist who does programming in the open source
echo -e “\033[42m A Voyage to Kernel”
domain. He is the developer and CEO of the Mozhi Search
engine. His research work/publications are available at
echo -e “\033[45m A Voyage to Kernel”
www.aasisvinayak.com

www.openITis.com | LINUX For You | October 2008 105


A
VOYAGE TO THE
KERNEL Part 6

Day Five—The end of the First Segment

W
e will now conclude the shell echo -n “ $i”
programming part of our voyage. In this done
column, I will try to review the tools echo “”
described in earlier columns and apply those tools done
to solve little complicated problems --solutions
that you may exploit while coding. Figure 1 shows the execution of the code. For
This part addresses three categories of a detailed explanation of the mode of functioning,
readers: the first one is for newbies who have just please refer to the earlier columns.
started their experiments in shell; the next one is Now, let us write the code to find the reverse of
for intermediates and the last one for advanced a given number. (By this time, you must know why
users. Besides, I have skipped the illustration of we use exit 1 in the code.)
some of the codes.
Let us fire up by trying a code for newbies: #!/bin/bash

#!/bin/bash if [ $# -ne 1 ]
then
NUMBER=0 echo “Usage: $0 number”
echo “This will help you to fi nd reverse of a number”
echo -n “Please enter number between 2 and 9” exit 1
read NUMBER fi

if ! [ $NUMBER -ge 2 -a $NUMBER -le 9 ] ; then number=$1


echo “Please enter number between 2 and 9” reverse=0
exit 1 division=0
fi
while [ $number -gt 0 ]
clear do
division=`expr $number % 10`
for (( i=1; i<=NUMBER; i++ )) reverse=`expr $reverse \* 10 + $division`
do number=`expr $number / 10`
for (( n=NUMBER; n>=i; n-- )) done
do echo “Reverse number is $reverse”
echo -n “ “
done Can you guess what the following code does?
for (( j=1; j<=i; j++ )) Else, give it a try and find out:
do

102 November 2008 | LINUX For YoU | www.openITis.com


a voyage to the Kernel
#!/bin/bash

echo “Enter number:”


read number
i=$number
while test $i != 0
do
echo “$i

i=`expr $i - 1`
done

Sometimes, you may need to find out whether the Figure 1: Terminal output after execution of the first code for newbies
user is logged in as a root user, especially when you write
system tools (administration tools) in shell: given in Figure 2.
Sometimes you may
#!/bin/bash wish to extract some
content from the Web. You
ROOT_UID=0 have many methods to
do that, using shell. I will
if [ “$UID” -eq “$ROOT_UID” ] show you how to fetch a
then Web page (say, an article
echo “Welcome, root.” from Wikipedia):
else
echo “Please login as root “ #!/bin/bash Figure 2: The “let’s climb” code
fi
if [ -z “$1” ]
exit 0 then echo “Usage: `basename $0` Wikipedia article name”
exit
As the root user has $UID 0, you can easily uncover fi
this, by a conditional statement. article=$1
If you wish to have some fun in between, use the URL=’http://en.wikipedia.org/wiki/’
following code: wget -O ${article} “${URL}${article}”
exit $?
#!/bin/bash
A demo of the code is shown in the following snippet:
echo “Enter number:”
read number hacker@free-laptop:~$ /home/hacker/Desktop/a H
--12:07:16-- http://en.wikipedia.org/wiki/H
=> `H’
for (( i=1; i<=$number; i++ )) Resolving en.wikipedia.org... 208.80.152.2
do Connecting to en.wikipedia.org|208.80.152.2|:80... connected.
for (( j=1; j<=i; j++ )) HTTP request sent, awaiting response... 200 OK
do Length: 66,846 (65K) [text/html]
echo -n “ |”
done 100%[=============================>] 66,846 2.14K/s ETA 00:00
echo “_ “
12:08:00 (2.08KB/s) - `H’ saved [66846/66846]
done
If you need to crawl through some special pages, you
echo “let’s climb” can add some suffix to the URL. For example, if you need
the printable version, you can add something like:
You can easily understand the above code by looking at i
and j (and the increment factor associated with those). If you suffix=&printable=yes
check the result of the operation “|” (and its alternative “_”)
you can guess what the final result will look like. A sample is Then add ${suffix} to wget.

www.openITis.com | LINUX For You | November 2008 103


a voyage to the Kernel
preferred value.
While executing some shell-based commands, you
may need to change the working directory automatically
to the one in which the shell script is located. The
following code does this:

#!/bin/sh

directory=`pwd`
for cmd in *
do
if test -d $directory/$cmd
then
Figure 3: What’s the system time? cd $directory/$cmd
while echo “$cmd:~$”
read commd
do
eval $commd
done
cd ..
fi
done

If you wish to display the system time in your


program, you can use the following code:

#!/bin/bash

temph=`date | cut -c12-13`


dat=`date +”%A %d in %B of %Y (%r)”`
dialog --backtitle “For “\
Figure 4: A tool that displays system information
--title “A Voyage to Kernel”\
Now, let’s see how to capture the keystrokes using shell: --infobox “\n Now it is $dat” 7 50

#!/bin/bash Figure 3 shows a demo. If you wish to have a dynamic


greeting, you can use conditional statements and link
keystrokes=10 them to the system timings.
Now, while writing admin tools you may have to
old_pref=$(stty -g) display information regarding the system. The following
code illustrates their usage (see Figure 4 for the demo):
echo “Enter $keypresses keystrokes:”
stty -icanon -echo #!/bin/bash

pressed_keys=$(dd bs=1 count=$keystrokes 2> /dev/null) user=`who | wc -l`


echo -e “Log in as : $USER (Login name: $LOGNAME)” >> /tmp/info.
stty “$old_pref” tmp.01.$$$
echo -e “OS Type: $OSTYPE” >> /tmp/info.tmp.01.$$$
echo “You pressed the \”$pressed_keys\” keys.” echo -e “Home Directory: $HOME” >> /tmp/info.tmp.01.$$$
echo -e “Current directory: `pwd`” >> /tmp/info.tmp.01.$$$
exit 0
echo -e “----------------------------------” >> /tmp/info.
You can see that we are able to disable the canonical tmp.01.$$$
mode and local echo in this. And old_pref is used to echo -e “Computer CPU Information:” >> /tmp/info.tmp.01.$$$
restore the old preference settings that are saved using echo -e “----------------------------------” >> /tmp/info.
$(stty -g). tmp.01.$$$
You may also change the keystrokes value to your cat /proc/cpuinfo >> /tmp/info.tmp.01.$$$

104 November 2008 | LINUX For You | www.openITis.com


a voyage to the Kernel
dialog --backtitle “A Voyage to Kernel” --title “Press Up/Down
Keys “ --textbox /tmp/info.tmp.01.$$$ 21 70
Voyage music
Let’s end our voyage by playing some music! Our journey
rm -f /tmp/info.tmp.01.$$$ to this locale is about to draw to a close and this piece
further illustrates the clout of shell. You may see a reference
Similarly, you can display other information as to /dev/ dsp, which is actually the Digital Signal Processor.
well—say that related to computer memory, the hard You can also vary the tune, sound, etc, for better results.
disk, filesystem, etc. If you are good in music, you will find that I have used the
European notation in the code. Now let’s play the notes:
#!/bin/bash
#!/bin/bash
echo -e “----------------------------------” >> /tmp/info.tmp.01.$$$
echo -e “Computer Memory Info:” >> /tmp/info.tmp.01.$$$ duration=1000
echo -e “----------------------------------” >> /tmp/info.tmp.01.$$$ volume=$’\xff’ # Max volume = \xff
cat /proc/meminfo >> /tmp/info.tmp.01.$$$ mute=$’\x80’ # No volume = \x80
echo -e “----------------------------------” >> /tmp/info.tmp.01.$$$
echo -e “Hard disk info:” >> /tmp/info.tmp.01.$$$ function voyage_music () # Voyage music note Hz in bytes
echo -e “----------------------------------” >> /tmp/info.tmp.01.$$$ {
echo -e “Model: `cat /proc/ide/hda/model` “ >> /tmp/info.tmp.01.$$$ for t in `seq 0 $duration`
echo -e “Driver: `cat /proc/ide/hda/driver` “ >> /tmp/info.tmp.01.$$$ do
echo -e “Cache size: `cat /proc/ide/hda/cache` “ >> /tmp/info. test $(( $t % $1 )) = 0 && echo -n $volume || echo -n $mute
tmp.01.$$$ done

echo -e “----------------------------------” >> /tmp/info.tmp.01.$$$ }


echo -e “File System :” >> /tmp/info.tmp.01.$$$
echo -e “----------------------------------” >> /tmp/info.tmp.01.$$$ e=`voyage_music 50`
cat /proc/mounts >> /tmp/info.tmp.01.$$$ g=`voyage_music 42`
a=`voyage_music 39`
dialog --backtitle “A Voyage to Kernel” --title “Press Up/Down Keys b=`voyage_music 40`
“ --textbox /tmp/info.tmp.01.$$$ 21 70 c=`voyage_music 21`
cis=`voyage_music 23`
rm -f /tmp/info.tmp.01.$$$ d=`voyage_music 21`
e2=`voyage_music 22`
The next segment n=`voyage_music 32767`
I was planning to take a leap into kernel programming
directly. But from the e-mails that I have received, I echo -n “$g$e2$d$c$d$c$a$g$n$g$e$n$g$e2$d$c$c$b$c$cis$n$cis$d \
understand that many readers are new to areas like $n$g$e2$d$c$d$c$a$g$n$g$e$n$g$a$d$c$b$a$b$c” > /dev/dsp
writing device drivers. And some readers are unfamiliar
with tasks like kernel compilation. Considering the exit $?
requests from beginners and intermediates, I am
changing our voyage schedule.
Instead of going directly into kernel programming, I
shall introduce you to a new segment dealing with the
mathematical skills required for problem solving. I am of
the outlook that computer science has got nothing to do
with computers. It is the science of problem solving using
algorithms.
Even in kernel programming, you can use many of
these tools. This will enable intermediates to acquire
more mathematical skills in programming, which are
indispensable when playing around with the kernel. But
I have ensured the layout of the next segment suits all By: Aasis Vinayak PG. The author is a hacker and a free
programmers. Hence, even if you don’t wish to meddle software activist who does programming in the open source
much with the kernel, you will find these tips useful for domain. He is the developer and CEO of the Mozhi Search
engine. His research work/publications are available at
writing all types of algorithms.
www.aasisvinayak.com
Stay tuned! 

www.openITis.com | LINUX For You | November 2008 105


A
VOYAGE TO THE
KERNEL Part 7

Day 6. Segment 2.1

T
his article will concentrate on the Appears very simple, right?
computational methods used for problem If you wish to manipulate with matrices,
solving. Unlike in the previous segment, you can do that in Octave. You can enter
here we will be dealing more with the theory than the elements of the Matrix A in the manner
with trials! elucidated in Figure 3.
A problem associated with some of the standard While building (or making) simulations, you
books on computational methods is that they may need to have random values for checking. You
require the use of proprietary software. I remember can do that by using the rand command followed
reading Applied Quantum Mechanics by A. F. J. by the number of rows and columns as shown in
Levi. The book, though very useful and interesting, Figure 4.
provides the solutions to problems in Matlab code. Another fact is that you can use many of your
So, I shall deal with some of the free software C commands in Octave, of which the following is a
tools that you can use while trying to apply the simple example:
theory to your problems.
octave-3.0.0:9> printf (“A Voyage to Kernel\n”);
GNU Octave: For scientific computation A Voyage to Kernel
I use this wonderful software for all my work. There octave-3.0.0:10>
are few people who prefer Scilab to Octave. But I
prefer the latter. There are even tools associated And further, Octave has many built-in, loadable
with some of these programs for the conversion and mapping functions, function files, etc, for
of code from Matlab. And many of them employ advanced tools.
simple parsers for this purpose.
Figure 1 shows a typical Octave window. Now let’s see how to save the code in Octave.
You can get the latest copy of Octave (3.0.3)
from www.gnu.org/software/octave/ #! /usr/bin/octave-3.0.0 -qf
download.html or check your distribution’s # Script written for A Voyage to Kernel
software repository for it. Another option is the Qt- printf (“Applied Quantum Mechanics lured me into scientific
front-end (see Figure 2). computing\n”);

Some trials with Octave This is quite akin to the style we followed
To start with a simple one, let’s find the square root in shell programming. The first line invokes the
of 3 using Octave: interpreter. (Please note that if you use a different
version, you need to change the interpreter name,
octave-3.0.0:1> sqrt (3) unlike in shell programming).
ans = 1.7321 Octave has many in-built mathematical
octave-3.0.0:2> conversion tools. The following code shows that

104 December 2008 | LINUX For YoU | www.openITis.com


a voyage to the Kernel
you can easily convert numbers from decimal to binary or
hexadecimal:

octave-3.0.0:2> dec2bin (15)


ans = 1111
octave-3.0.0:4> dec2hex (475)
ans = 1DB

And there are many other in-built functions like


tolower:

octave-3.0.0:5> tolower (“LInUX”)


ans = linux
Figure 1: A Typical Octave Window
Another category is built-in variables (for example,
history_file). You can get the details by issuing the
corresponding command:

octave-3.0.0:6> history_size
ans = 1024

Just like in shell, you can have user-defined functions as


shown below:

octave-3.0.0:9> function voyage_wish (wish)


>
> printf (“\a%s\n”, wish);
>
> endfunction
octave-3.0.0:10> voyage_wish (“Happy Journey to Scientific Computation”)
Happy Journey to Scientific Computation

Let us move on to something more complicated. If we


wish to plot a function with respect to a variable by taking
many parameters, it may seem a tedious task. But it is quite
easy in Octave.

octave-3.0.0:1> t = 0:0.6:9.3;
octave-3.0.0:2> plot (t, sin(t), “3;sin(t);”, t, cos(t), “+6;cos(t);”); Figure 2: Qt front-end for Octave

You get the resultant graph as shown in Figure 5.


You can use other tools like clearplot, shg, closeplot,
etc, for better results and to define it completely. You can
also draw different types of graphs like histograms, bar
graphs, pie-charts, etc, in Octave (Figure 6).

octave-3.0.0:4> hist (2 * t, t) Figure 3: Elements of the Matrix A

You can also find functions to perform computational Towards more complexity
tasks in other fields in mathematics. For example, in case Most of you will be familiar with the simple numerical
we have functions like conj (z), imag (z), real (z), etc with methods that we use for computing, like the Euler and
complex numbers. Range-kutta method. These are relatively simple yet
powerful methods. For advanced-level problems we may
octave-3.0.0:9> abs (6 + 8i) need more functions as well. Let’s take the beta function
ans = 10 that is given mathematically as:

www.openITis.com | LINUX For You | December 2008 105


a voyage to the Kernel
But you are safe when you are in Octave as you have the
betainc (x, a, b) function. So is the case with gamma and
incomplete gamma functions.

Figure 4: rand generates random values

octave-3.0.0:1> gamma (3)


ans = 2

Hence, you can easily write algorithms (in Octave-like


language) just by remembering things like lgamma (a, x),
gammaln (a, x), etc.
You can make use of these types of tools while
trying to meddle with tasks like finding the Hessenberg
decomposition of the matrix or computing the Cholesky
factor.

Figure 5: An Octave-generated graph Computational methods


Let me try to explain the simple numerical method (in
scientific computation) to deal with differential equations.
Some of you might have lost touch with all this, so I
shall consider going over the concepts. Please note that the
concepts developed during the early stages of the voyage
into the kernel will be used for solving problems in the
upcoming days.
The most important point we need to note is that in
the case of differential equations, it relates a function to
its derivatives, so that we can compute the function itself.
Take, for example:

Figure 6: An Octave-generated bar graph The general solution for this will be of the form ‘a
constant multiplied by e^t’. If you are sceptical, try
differentiating the solution!
The ordinary differential equations can be represented
as shown below:

You can straight away proceed with the ‘beta’ function in


Octave (provided you know its use) without worrying about
the stuff inside:

octave-3.0.0:10> beta (3,4)


ans = 0.016667

Things will become more complex when you need to (The orders of the equations are different.)
meddle with the incomplete beta function given by: We also have partial differential equations, which differ
from ODEs.
And typical partial differential equations (PDE) can be
classified as shown below:

106 December 2008 | LINUX For You | www.openITis.com


a voyage to the Kernel
u(tm). Let vm be a known quantity. Then vm+1 can be computed.
So we can write:

Using our definition:

…where,

From this, vm+1 is evaluated. This is a simple process


using the Euler method. The advantage associated is that it
can be implemented on a computer for any function.
Out of these, there are homogeneous and non- We shall deal with more complex problems and their
homogeneous ones. Some equations (like Laplace equation) solutions, using computational methods, in the forthcoming
are homogeneous in nature: articles. Please note that you can directly apply some of
these methods when you deal with problems in kernel
programming. But for others, you may need to ‘customise’
the method to suit the defined problem. 

While others are non-homogeneous (like the Poisson References:


equation):
Some of the books recommended for the voyage into the
new segment:
1. G. Dahlquist, A Bjorck, Numerical Methods,
Englewood Cliffs, Prentice-Hall
2. S.D. Conte, C. de Boor, Elementary Numerical
Solving problems: by computational methods
Analysis, an Algorithmic Approach, McGraw-Hill
Consider the equation
3. J. D. Logan, Applied Mathematics, A Contemporary
Approach, Wiley-Interscience
4. G. B. Whitham, Linear and Nonlinear Waves, Wiley-
Interscience
5. H. O. Kreiss, J. Lorenz, Initial-Boundary Value
Also take:
Problems and the Navier-Stokes Equations, Academic
Press
6. J. Smoller, Shock Waves and Reaction-Diffusion
Equations, Springer-Verlag
Applying the Taylor series for smooth functions, we get:
7. W. Hackbusch: Iterative Solution of Large Sparse
Systems of Equations, Springer Verlag
8. D. Gottlieb, S. A. Orszag, Numerical Analysis of
Spectral Methods: Theory and Applications, Siam,
Regional Conference Series in Applied Mathematics
…provided ‘u’ is twice differentiable and the series is
9. E. Isaacson, H. B. Keller, Analysis of Numerical
applied for ∆t > 0.
Methods, Wiley
Making use of the O-notation, we can have:
10. W. Aspray, John von Neumann, The Origins of Modern
Computing, MIT Press.

By: Aasis Vinayak PG. The author is a hacker and a free


software activist who does programming in the open source
Now we can split the time interval into small parts. Let domain. He is the developer and CEO of the Mozhi Search
engine. His research work/publications are available at
us take tm as an integer multiple of t. By invoking numerical
www.aasisvinayak.com
approximation, we now consider vm as the approximation of

www.openITis.com | LINUX For You | December 2008 107


A Voyage to the
Kernel
Part 8
Segment: 2.2, Day 7

W
e are about to enter the core part of this sorted, the extent to which the items are to be sorted, the current
segment—algorithms. An algorithm could be state (sorting) of the elements, possible restrictions on the items,
termed a sequence of computational steps that and even on the kind of storage device used.
can transform an input into the output. Here, it Hence, while dealing with algorithms, we need to consider
should be emphasised that any such sequence cannot be called an the data structures employed with which we can manipulate the
algorithm since a wrong methodology will give an incorrect output. way to store and organise data in order to facilitate access and
Almost all the code used in this segment will be in a pseudo- modifications as per our requirement.
code format that is akin to C code. Sometimes the algorithms are Finding the shortest route is a kind of sorting algorithm.
given in C itself. We have to stick to this format, as our primary Consider a trucking company with a central warehouse. Each
intention is to meddle with the kernel. day, it loads up a truck at the central warehouse and then sends
Those who wish to have an overview of the importance of it around to several locations to deliver the products. At the end
bringing in innovative algorithms and building simulations (based of each day, the truck should return to the central warehouse
on them) can look up the GNU Hurd project. so that it is ready to be loaded for the next day. To find out the
As I promised earlier, this voyage will not neglect the lowest operating cost, the company needs an algorithm that will
novices. So, let’s start with a few simple things. Let’s suppose indicate a specific order of delivery stops such that the truck
we have a set of values ranging from x1, x2 x3..............xn. If you are travels the lowest overall distance.
asked to arrange them in ascending order, you need to write an If you have enough data in your hands, you can write down an
algorithm so as to get the output as xa, xb xc........, which satisfies algorithm. Now, let's look at how to write such an algorithm. For the
the condition xa < xb < xc....... sake of simplicity, let's replace our problem with a simple one.
This is a simple case of sorting, which is a common algorithm Consider five cards of clubs—2, 4, 5, 7 and 10—which are placed
that we employ in our programs. randomly (just like we had random stops). If you were to play the
We just looked at an instance of the sorting problem. This may game, you would have arranged the cards as shown in Figure 1.
become more complex depending on the number of items to be But what if a computer had to play your role? How would it
arrange the cards? The answer is quite simple: by employing the
sorting algorithm. The following code elucidates the algorithm:
7 INSERTION-SORT(X)
for j ← 2 to length[X]
do key-select ← X[j]

5 10 Insert X[j] into series X[1,.. [j-1]]

4 i←j-1

2 while i > 0 and X[i] > key-select


do X[i + 1] ← X[i]
i←i-1
10
X[i + 1] ← key-select

The pseudo code carries elements that are quite akin to those
10 in C, and the alphabets used for each iteration are conventional
ones. The character ‘j’ indicates the ‘current card’ that is picked
Figure 1: Five cards of clubs arranged in an order up and the ‘▹’ sign symbolises that the remainder of the line is a

98  |  January 2009  |  LINUX For You  |  www.openITis.com


Guest Column  |  A Voyage to the Kernel

Some fields that rely on algorithms


3 5 2 5 Q Q
W riting algorithms is not just the headache of
programmers alone. There are various other fields in
which we need to rely on the algorithmic approach:
• The Human Genome Project has the goal of identifying all
the 100,000 genes in the human DNA. It has to determine
about three billion sequences of the chemical base pairs!
This is virtually impossible unless we employ effective
algorithms for pattern recognition and identification.
Q 5 2 5 3 Q • Today we depend a lot on electronic commerce for the
purchase of any commodity. A good system should have
the ability to keep information (such as credit card numbers,
passwords and bank statements) secure and encrypted.
Figure 2: A set of cards from different families
Algorithms are used for these cryptographic processes.
• Other branches like physical science (see the box on
comment. The numbers that we wish to sort are represented as
'Simulation building in physical science) require the use of
the key-select. By looking at the algorithm, you can see that the
algorithms for solving complex problems.
parameters are passed to a procedure by value.
• Even a manufacturing industry (or any other commercial
As the algorithm is quite simple, it is self-explanatory. Now try
setting) needs algorithms for allocating scarce resources.
to expand the same algorithm to another problem shown in Figure
2. If you have understood the first one, this will be quite simple,
except that you need to bring in some additional rules as there are 1 Sec 1 Min 1 Hr 1 Day 1 1
cards from different families and two cards here have the same Year
value (of hearts and diamonds). n!
Analysing an algorithm has come to mean predicting n2
the resources that the algorithm requires so as to get the ln (n)
desired output. It considers aspects like memory allocation, log (n)
communication bandwidth, computer hardware, etc. We also need n x ln (n)
to take into account parameters like computational time when it 2n
comes to the practical side. These parameters may, in turn, depend Table 1
on input size and the running time.

Methodology: the divide-and-conquer approach


This is an effective methodology that we can adopt when we design
algorithms. It involves:
 Divide: Divide the given sequence (with n elements) into two
sub-sequences (of n/2 elements each)
 Conquer: Sort the two new sub-sequences recursively using the
merge sort algorithm.
 Combine: Merge the two sorted sub-sequences to produce the
desired result.
The merge mode is illustrated below:

MERGE(A, p, q, r)

MERGE-SORT ( A, p, r)
if p < r To check for base case
then q < ( p + r)/2 For dividing
MERGE-SORT ( A, p, q) Conquering
MERGE-SORT ( A, q + 1, r) Conquering
MERGE ( A, p, q, r) Combining

Figure 3: Sorting and arranging an array of values in numbers


The above pseudo code may not enlighten novice
programmers, who can look at the expanded code given below and MERGE ( A, p, q, r)
then use the above example to assimilate the core idea: s1 ← q − p + 1
s2 ← r − q

www.openITis.com  |  LINUX For You  |  January 2009  |  99


A Voyage to the Kernel  |  Guest Column

Simulation building in physical science

S olid-state physics largely employs simulation techniques for modelling.


By taking data from experiments, crystal lattice structures can be
constructed easily.
Figure 7 shows one such three-dimensional array of lattice points.
The properties of the crystal structure can be inferred by building them.
Simulations will give us a clear picture of the crystal properties by
considering its primitives.
So is the case with complex bodies. It is found that Jupiter is
accompanied, in its orbit, by two groups of asteroids that precede it and
follow it at an angular distance of π/3 (see Figure 8). By building simulations
Figure 7: A 3D array of lattice points
we can show that these are positions of stable equilibrium. A Runge-Kutta
procedure with automatic step control can be used for analysing the data
from Table 2.

Data for analysing a Runge-Kutta procedure with automatic


step control
Sun Jupiter Trojan 1 Trojan 2

Mass 1 0.001 0 0
x 0 0 -4.50333 4.50333
y 0 5.2 2.6 2.6
z 0 0 0 0 Figure 8: Jupiter is accompanied by two groups of asteroids
Vx 0 -2.75674 -1.37837 -1.37837 that precede and follow it at an angular distance of π/3
Vy 0 0 -2.38741 2.38741
Vz 0 0 0 0
Table 2
Using the computational procedure we can get a prediction as shown
in Figure 9.
You can also try solving simple problems of the following form:

The Runge-Kutta procedure is quite sufficient to handle these types of


problems, provided you have enough data. Figure 9: This prediction can be made using computational
procedures

make arrays L[1 .... s1 + 1] and R[1 ..... s2 + 1] Let’s look at an example to comprehend the idea better. I
for i ← 1 to s1 found the following problem in a book that deals with problem
do L[i] ← A[ p + i − 1] solving. A part of the problem involves arranging an array of
for j ← 1 to s2 values [5, 2, 4, 7, 1, 3, 2, 6]. This will be our initial (given) array.
do R[ j ] ← A[q + j ] Now, we can use our methodology to solve the problem. Figure
L[s1 + 1] ← ∞ 3 shows the consecutive division and merging processes.
R[s2 + 1] ← ∞ We can imagine the process in many different ways. If the set is
i ←1 random, then we could have another array with the same elements
j ←1 in a different order. The solution is shown in Figures 4, 5 and 6. Here,
for k ← p to r Figures 4 and 5 represent the initial steps and Figure 6 represents
do if L[i] ≤ R[ j ] the final output. It should be noted that the intermediate stages are
then A[k] ← L[i] not shown here, as they are similar steps that can be envisaged in
i ←i +1 one’s mind. The step-by-step procedure will also throw ample light
else A[k] ← R[ j ] on the subs L and R that are created.
j ← j +1 Once you are done with simple algorithms, you can apply the
same ‘black box representation’ idea (that you employ while writing

100  |  January 2009  |  LINUX For You  |  www.openITis.com


Guest Column  |  A Voyage to the Kernel

int *place2;

place1 = (int *) vplace1;


place2 = (int *) vplace2;

return *place1 == *place2;


}

Figure 4: Initial steps of sorting the array


int int_is_compare(void *vplace1, void *vplace2)
{
int *place1;
int *place2;

place1 = (int *) vplace1;


place2 = (int *) vplace2;

Figure 5: Initial steps of sorting the array if (*place1 < *place2) {


return -1;
} else if (*place1 > *place2) {
return 1;
} else {
return 0;
}
}

Figure 6: The final output You may find a reference to a header file in the program. So you
need compare.h along with it to get the desired result:
programs) and use the simple algorithms as sub-routine calls in
your main algorithm (depending on the model you opt for). #ifndef ALGORITHM_COMPARE_INT_H
Now we can look at another simple algorithm that can handle #define ALGORITHM_COMPARE_INT_H
errors. The following code shows the multiplication of matrices:
#ifdef __cplusplus
MATRIX-MULTIPLICATION(A, B) extern “C” {
if columns[A] ≠ rows[B] #endif
then error “This operation is not allowed”
else for i ← 1 to rows[A] int int_is_equal(void *place1, void *place2);
do for j ← 1 to columns[B]
do C[i, j] ← 0 int int_is_compare(void *place1, void *place2);
for p ← 1 to columns[A]
do C[i, j] ← C[i, j] + A[i, k] · B[k, j] #ifdef __cplusplus
return C }
#endif
Before we move on to the complex stuff, you can test yourself
by considering the following problem (Table 1). You are required to #endif
find out the maximum value of n that corresponds to each of the
stipulated times. The data provided along with it is that you have There could be problems that belong to the NP-complete
f(n) in milliseconds. Try solving it! category. These problems may not have an exact solution. We
We can see that the style remains the same when we try to will be discussing these aspects once we are done with topics
write algorithms in C. If we need to compare functions for a pointer like asymptotic notations and complex algorithms. 
to an integer, we can have the following:
By: Aasis Vinayak PG
#include “compare.h” The author is a hacker and a free software activist who does
programming in the open source domain. He is the developer
int int_is_equal(void *vplace1, void *vplace2) of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
{
www.aasisvinayak.com
int *place1;

www.openITis.com  |  LINUX For You  |  January 2009  |  101


A Voyage to the
Kernel
Part 9

Segment – 2.2, Day 8

I
n the last column, we had is indicated by: IF, THEN, ELSE, and ENDIF. The
discussed some basic algorithms general representation is:
and methodologies. Now we
will generalise the scheme of an IF condition THEN
algorithm. It will have the following properties:
 Arranged in an ordered sequence that you sequence 1
can number (as steps)
 Unambiguous and well-defined ELSE
 Halts in finite time (i.e., the algorithm sequence 2
terminates!)
And we will generally stumble upon the ENDIF
following algorithmic operations:
 Sequential operations—where instruction Please note that the ELSE keyword and
sets are executed in order “sequence 2” are optional.
 Conditional operations—that ask a true/ WHILE: The loop is executed only if the
false question and then select the next condition is true. The ‘WHILE construct’ is used
instruction based on the answer to specify a loop with a test at the top. WHILE
 Iterative operations (loops)—that repeat the and ENDWHILE are to identify the beginning
execution of a set of instructions and ending of the loop. The general form for
It should be emphasised that, as we WHILE is:
discussed before, not every problem has a ‘good’
algorithmic solution. There are: WHILE condition
 Halting problems (unsolvable problems)—
for which no algorithm exists to solve the sequence
problem.
 Travelling salesman problems (intractable ENDWHILE
problems)—algorithms that take too long to
solve the problem. CASE: In order to handle mutually exclusive
 Problems with no known algorithmic conditions, we can go for CASE. CASE, OF,
solution. OTHERS, and ENDCASE are the keywords
commonly used for this. The alternatives
Pseudo code structure available will be represented with the help of
The algorithm may contain the following ‘conditions’.
elements: IF-THEN-ELSE
Binary choice on a given Boolean condition CASE expression OF

100  |  February 2009  |  LINUX For You  |  www.openITis.com


Guest Column  |  A Voyage to the Kernel

condition 1 : sequence 1
condition 2 : sequence 2
...
condition n : sequence n
OTHERS:
default sequence

ENDCASE

REPEAT-UNTIL: This is quite akin to WHILE, but here


the operation is performed at the bottom of the loop.

REPEAT

sequence Figure 1: The free Pascal compiler

UNTIL condition simulation building and scientific computing. In order


to elucidate the implementation of the algorithm, I have
FOR: We employ FOR and ENDFOR in our algorithm written the code in Pascal.
for iterating the code block for a specific number of
times. It is also called the “counting” loop. program random_no (input, output) ;
const m=100000000; m1=10000; q=31415821;
FOR iteration bounds var i, p, No: integer;
function mult(r, s: integer): integer;
sequence var r1, r0, s1, s0: integer;
begin
ENDFOR r1 :=r div m1 ; r0:=r mod m1 ;
s1 :=s div m1; s0:=s mod m1;
EXCEPTION HANDLING: The following code mult:=( ((r0*s1+r1*s0) mod m1)*m1+r0*s0)
elucidates this. end ;
function random : integer ;
BEGIN begin
statements p:=(mult(p, q)+1) mod m;
EXCEPTION random :=p;
WHEN exception type end ;
statements to handle exception begin
WHEN another exception type read(No, p);
statements to handle exception for i:=1 to No do writeln(random)
END end.

You can also utilise methods like NESTED It’s assumed that you are skilled in grasping the
CONSTRUCTS, INVOKING SUBPROCEDURES to meaning of simple codes. You can try this in your
construct elegant and powerful algorithms. preferred Pascal compiler. (Figure 1 shows the code in
We can define seven golden rules for writing Free Pascal Compiler, which I installed for trial.)
algorithms:
 Use good code and good English. Euclid’s Algorithm: A response to the queries
 Ignore unnecessary details Many readers who tried the algorithms in the
 Take advantage of programming shorthand reference books (suggested in the last column) posed
 Be context specific the question concerning the simplification of Euclid’s
 Don’t lose sight of the base model algorithm and the base problem. As it was given
 Always check for balance (can be easily as a problem, many of ‘our passengers’ got wedged
implemented in a language) between the lines!
For those who have not tried the exercises (if
Implementation you don’t have the book, you can find many online
Let’s write the code that produces a random number. resources), here is a prologue to the problem:
Random number generation is an important tool in Let a and b be integers, with a ≥ b ≥ 0. Using the

www.openITis.com  |  LINUX For You  |  February 2009  |  101


A Voyage to the Kernel  |  Guest Column

division with the remainder property, define the handling the Sturm-Liouville problem that intends
integers r0 , r1 , . . . , rλ+1 , and q1 , . . . , qλ , where λ ≥ to find the value of λ . Please refer to mathworld.
0, as follows: wolfram.com/Sturm-LiouvilleEquation.html for
more information. Now our concern is to build an
a = r0 , algorithm. Let’s do this in C.
b = r1, Problem: To solve the Legendre equation with the
r0 = r1 q1 + r2 (0 < r2 < r1 ), simplest algorithm for the Sturm-Liouville equation,
.
. /* Solving the Legendre equation with the simplest algorithm for the Sturm-
. Liouville equation
ri−1 = ri qi + ri+1 (0 < ri+1 < ri ), Code written for A Voyage to Kernel - 8*/
.
. #include <stdio.h>
. #include <math.h>
rλ−2 = rλ−1 qλ−1 + rλ (0 < rλ < rλ−1 ), #define NOMAX 1000
rλ−1 = rλqλ (rλ+1 = 0).
main()
From the given definitions λ = 0 if b = 0, and λ > 0 for all {
other cases. Then we have rλ =√gcd(a, b). Moreover, if b > 0, int i,no,inc;
then λ ≤ log b/ log ϕ + 1, where ϕ := (1 + 5)/2 ≈ 1.62. double dl=1e-5;
double u[NOMAX];
Algorithm: If we could furnish input as a and b, double z,r,s,uu,t,f0,f1;
where a and b are integers such that a ≥ b ≥ 0, we can void salgo();
compute d = gcd(a, b) as exemplified below:
/* Initialization of our problem */
r ← a, r ← b
while r = 0 do no = NOMAX;
r ← r mod r z = 3.0/(no-1);
(r, r ) ← (r , r ) s = 1.5;
d←r r = 0.5;
output d t = 0.5;
inc = 0;
Designing algorithms for specific problems u[0] = -1;
We have the Legendre differential equation as: u[1] = -1+z;
uu = r;
salgo (no,z,uu,u);
f0 = u[no-1]-1;
while (fabs(t) > dl)
{
uu = (r+s)/2;
Also, please note that the Legendre equation may be salgo (no,z,uu,u);
represented by: f1 = u[no-1]-1;
if ((f0*f1) < 0)
ax2 + by2 + cz2 = 0 {
s = uu;
And we also have the Sturm-Liouville equation that t = s-r;
is a real second-order linear equation of the form: }
else
{
r = uu;
t = s-r;
f0 = f1;
}
inc = inc+1;
Here, y is a function of the free variable x, and }
p(x), q(x) and w(x) are the functions of x. We will be printf(“%4d %16.8lf %16.8lf %16.8lf\n”, inc,uu,t,f1);

102  |  February 2009  |  LINUX For You  |  www.openITis.com


Guest Column  |  A Voyage to the Kernel

}
To my readers
void salgo (no,z,uu,u)

/* Consider simplest algorithm for the Sturm-Liouville equation.*/


T hanks for your response, feedback and suggestions. But
I am extremely sorry that I am not in a position to reply
to all the queries made through e-mails. The major problem
is that e-mails often get messed up. Hence, I may miss your
int no;
e-mail. Sometimes I mark your mails, but the new e-mails
double z,uu;
often mask them. To avoid this problem, I have created a new
double u[];
platform for answering questions:
{
aasisvinayak.com/new_zone/forum.php
int i;
Considering the suggestions from readers, the next
double z2,q,p,p1,x;
column will focus on notations and numerical analysis. But
q = uu*(1+uu);
the emphasis will be on the designing of algorithms, as this
z2 = 2*z*z;
is a kernel-programming column and not a section that
for (i = 1; i <no-1; ++i)
addresses pure scientific computing.
{
p1 = -2*x*z;
p = 2*(1-x*x); printf(“%10.6lf %10.6lf %10.6lf\n”, x, f, dfi);
x = -1+i*z; }
u[i+1] = ((2*p-z2*q)*u[i]+(p1-p)*u[i-1])/(p1+p); void aitken_method_kernel_voyage(n, xi, fi, x, f, dfi)
} int n;
} double xi[], fi[];
double x;
You can extend this algorithmic method to solve double *f, *dfi;
other problems as many equations like the Bessel {
equation, which is given by… int i, j;
double fn[NOMAX];
x2y'' + xy' + (λ2x2 — v2)y = 0 double x1, x2, f1, f2;
for (i = 0; i <= (n-1); ++i)
…can be represented in the Sturm-Liouville form as {
fn[i] = fi[i];
(xy') + (λ2x2 — v2/x)y = 0 }
for (i = 0; i <= (n-2); ++i)
We will now try out Lagrange interpolation using {
the Aitken method. I’m sure you must have done a for (j = 0; j <= (n-i-2); ++j)
simple form of this already. But this one is slightly {
diverse. x1 = xi[j];
x2 = xi[j+i+1];
/* Main program for the Lagrange interpolation with the Aitken method.*/ f1 = fn[j];
f2 = fn[j+1];
#include <stdio.h> fn[j] = (x-x1)/(x2-x1)*f2+(x-x2)/(x1-x2)*f1;
#include <math.h> }
#define NOMAX 3 }
*f = fn[0];
main() *dfi = (fabs(*f-f1)+fabs(*f-f2))/2;
}
{
int i; You can see the output by executing this code
double f, dfi; in your compiler. (You can try changing the values
double h = 0.5, x = 0.9; assigned and see what happens.)
double xi[NOMAX], fi[] = {1.1, 0.9, 0.7, 0.5, 0.3};
void aitken_method_kernel_voyage(); Iterations in coding
for (i = 0; i < NOMAX; i++) As we discussed before, iteration plays an important
{ role when it comes to designing algorithms. Now, we
xi[i] = h*i; will try to find the value of Pi using this.
}
aitken_method_kernel_voyage(NOMAX, xi, fi, x, &f, &dfi); /* Estimating Pi - Iteration*/

www.openITis.com  |  LINUX For You  |  February 2009  |  103


A Voyage to the Kernel  |  Guest Column

SAMPLE TREE. This will lend a hand to such users to


comprehend the idea of branching.

Today’s problem
We have a theorem: If Ω be the set of all exact execution
paths for a defined ‘A’ on input x, then…

Figure 2: A SAMPLE TREE

#include <stdlib.h>
#include <stdio.h>
#include <math.h> Can you prove this? (Note that all symbols have
#include <string.h> the usual meaning that we used for other algorithms/
#define STD 23742246 problems.)
main(int argc, char* argv)
{ Clues:
int no=0; Start with…
double x,y;
int i,count=0;
double z;
double pi;
printf(“Enter the number of iterations to be done to estimate the value of
pi: “);
scanf(“%d”,&no); …and deduce
srand(STD);
count=0;
for ( i=0; i<no; i++) {
x = (double)rand()/RAND_MAX;
y = (double)rand()/RAND_MAX; Then, try to get…
z = x*x+y*y;
if (z<=1) count++;
}
pi=(double)count/no*4;
printf(“No of trials= %d , The Estimated value of pi is %g \n”,no,pi);
}

You can see that the estimated value of Pi …by considering …


approaches the correct value when the number of steps
in the procedure increases (dynamically). If you look at
the code carefully, you will see that the code terminates
after a finite period (provided the input is finite!):

aasisvinayak@GNU-BOX:~/Documents/Desktop$ ./pi If you can reach this juncture, the rest will follow. For all
Enter the number of iterations to be done to estimate the value of pi: 44 these, you can assume that A halts with probability α on
No of trials= 44 , The Estimated value of pi is 2.81818 input x. If α = 1, we define the distribution P: Ω → [0, 1].
aasisvinayak@GNU-BOX:~/Documents/Desktop$ ./pi Beginners need not worry much. We will be
Enter the number of iterations to be done to estimate the value of pi: 1234567 dealing with complex problems much later in our
No of trials= 1234567 , The Estimated value of pi is 3.14508 Voyage!  

We have seen the implemented codes. Now, try By: Aasis Vinayak PG
writing the algorithms behind the codes. As you have The author is a hacker and a free software activist who does
the code, it is quite easy! programming in the open source domain. He is the developer
Another way of visualising our algorithms is the Tree of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
Model. Beginners can opt for this since this will never www.aasisvinayak.com
cause any bewilderment. Figure 2 shows a model of A

104  |  February 2009  |  LINUX For You  |  www.openITis.com


A Voyage to the
Kernel
Part 10
Day 9 | Segment 2.3

Algorithms in cryptography As we discussed, one of the simplest (and one


The study of secret communication systems has of the oldest, too!) methods for encryption is the
lured people of all ages. And the old methods Caesar cipher method. Here, if a character in a
of encrypting messages are quite popular even particular place of the word is the Nth letter of
in literature. But our interest is centred around the alphabet series, it is replaced by the (N + K)th
two aspects-cryptography and cryptanalysis. letter in the series, where K is the parameter-an
Cryptography, in plain words, is concerned with integer (Caesar used K = 3!).
the design of secret communications systems,
while the latter studies the ways to compromise CAESAR(CA,N,K)
secret communications systems! for_all_characters
We all know that when a bank upgrades its character (N+K)? character(N)
systems to incorporate IT, it has to make sure
that the methods of electronic funds transfer You can add more statements to fix bugs (say,
are just as secure as funds transferred by an if you’re using English, you can specify what to do
armoured vehicle. if (N+K) exceeds 26).
You might have seen the arithmetic and Well, as said before, this method is very
string-processing algorithms that people simple. Therefore, it’s no big deal for the crypt
employ in this realm, which are what beginners analyst to crack the encrypted data. Things will
are expected to study. become more complex if we use a general table
Cryptanalysis, for sure, can place an incredible to define the substitution and then use the same
strain on the available computational resources. for the process. But here, too, our villain can try
That is why people consider this to be a very some tricks. He may choose the first character
tedious process. To comprehend this, let’s discuss arbitrarily, say E (as E is the most frequent letter
a simple case of cryptography. in English text). He may also choose not to go for
Let sender (S) send a message (called certain diagrams such as QJ (as they never occur
plaintext) to a particular receiver (R). ‘S’ converts together in English).
his plaintext message to a secret form for You can develop the method further by using
transmission (which we may call the ciphertext) multiple look-up tables. Then, you will come
with the aid of a cryptographic algorithm (CA) across many interesting cases like the one when
and some defined key (K) parameters. ‘CA’ is the the key is as long as the plaintext (‘one-time
encryption method used here. pad’ case) and so on. It should be noted that if
The whole procedure assumes some prior the message and key are encoded in binary, a
method of communication, as ‘R’ needs to know more common scheme for position-by-position
the parameters. The headache of the crypt analyst encryption is to use the “exclusive-or” function
is that he needs to decipher the plaintext from the to encrypt the plaintext-“exclusive-or” it (bit by
ciphertext without knowing the key parameters. bit) with the key.

98  |  March 2009  |  LINUX For You  |  www.openITis.com


Guest Column  |  A Voyage to the Kernel

Geometric algorithms If you can’t straight away do it, try this function to
This methodology can be adopted to solve complex compute these lines and check whether they meet our
problems that are inherently geometric. It can be applied condition:
to solve problems concerning physical objects ranging
from large buildings (design) and automobiles, to very function same_point(l: line; pl,p2: point): integer;
large-scale integrated circuits (ICs). variable Δx, Δy, Δxl, Δx2, Δyl, Δy2: integer;
But, you will soon see that even the most elementary begin
operations (even on points) are computationally Δx:=l.p2.x-1.pl.x; Δy:=l.p2.y-1.pl.y;
challenging. The interesting aspect is that some of these Δxl :=pl .x-l.p1.x; Δyl :=pl.y-l.p1 .y;
problems can readily be solved just by looking at them Δx2:=p2.x-1.p2.x; Δy2:=p2.y-1.p2.y;
(and some others by applying the concepts in graph same_point:=(Δx*Δyl-Δy*Δxl)*(Δx*Δy2-Δy*Δx2)
theory). If we resort to computational methods, we may end;
have to go in for non-trivial methodologies.
This branch is relatively new and many fundamental If the quantity (Δx. Δyl – Δy. Δxl) is non-zero, we can say
algorithms are still being developed. Hence you that pl is not on the line.
can consider this as a potentially challenging and
promising realm. A problem for beginners
In this introductory piece, we’ll restrict ourselves to Here we are not trying to address a real problem! We will
the two-dimensional space. If you are able to properly look at how to produce graphical output with the help of
define any point, then we can easily manage to include libraries. You might have drawn ‘pictures’ in BASIC while at
complex geometrical objects, say a line (as it is a pair of school, but this is not that method. In fact, our intentions
points connected by a straight line segment) or a polygon are different.
(defined by a set of points-array). Let’s define our problem: We need to draw a sphere
We can represent them by: with the help of a few straight lines.
We can use HoloDraw (see the resource links for more
type point = record x,y: integer end; information) as the library for drawing the sphere and
line = record pl, p2: point end; we will do the codes in Shell. We start by ‘flattening’ the
sphere to a flat rectangular map.
It is quite easy to work with pictures compared to As it is a sphere, we will meddle with the changes in
numbers, especially when it comes to developing a new terms of ‘degrees’. We also need an input file for processing
design (algorithm) pattern. It is also very helpful while by the HoloDraw. (Before you proceed, download a copy
debugging the code. of HoloDraw and untar it into a local directory. Also make
Let’s see a recursive program that will enable us to sure that you have Perl installed.)
‘draw’ a line by drawing the endpoints. The input file, sphere.draw, will be quite akin to the
following:
procedure draw(l: line) ;
variable Δx, Δy: integer; color=0 1 0
p: point; 10,11: line; # draw a line around the sphere’s equator
begin line: 0 0 1000, 360 0 1000
dot(l.pl.x,l.pl.y); dot(l.p2.x,l.p2.y); line: 0 45 1000, 360 45 1000
Δx:=l.p2.x-1.pl.x; Δy:=l.p2.y-1.pl.y; line: 0 -45 1000, 360 -45 1000
if (abs(Δx)>l) or (abs(Δy)>l) then
begin color=0 0 1
p.x:=l.pl .x+Δx div 2; p.y:=l.pl .y+Δy div 2;
ll.pl:=l.pl; l.p2:=p; draw(l0); line: 0 90 1000, 0 -90 1000
l2.pl:=p; l2.p2:=l.p2; draw(11); line: 180 90 1000, 180 -90 1000
end ;
end line: 30 90 1000, 30 -90 1000
line: 60 90 1000, 60 -90 1000
You can see that there is a division of the space into two line: 90 90 1000, 90 -90 1000
parts, joined by using line segments. You may stumble upon line: 120 90 1000, 120 -90 1000
many algorithms where we will be converting geometric line: 150 90 1000, 150 -90 1000
objects to points in a specific way. We can group them under line: 210 90 1000, 210 -90 1000
the term ‘scan-conversion algorithms’. To get a clear picture, line: 240 90 1000, 240 -90 1000
you may write the pseudo code to check whether two lines line: 270 90 1000, 270 -90 1000
are intersecting. (Hint: check for a common point.) line: 300 90 1000, 300 -90 1000

www.openITis.com  |  LINUX For You  |  March 2009  |  99


A Voyage to the Kernel  |  Guest Column

Finding (opting for) a strategy and the line: 330 90 1000, 330 -90 1000

efficiency factor Here the X and Y values (which you can identify from

W hile designing strategies it is important to consider


their viability, effectiveness and efficiency. To
comprehend the idea completely, consider a basic problem
the codes directly) are in degrees around the sphere. And Z
(or some axis reference) is the sphere’s radius. As you can
see, we have used different colours for east-west lines and
in quantum mechanics. north-south lines.
Schrödinger equation for the time-dependent wave Now we will create our flat grid file from this, using the
function can be written as: following shell code:

#!/bin/sh
/path_to_holodraw/drawwrl.pl < /location_of_input_file/sphere.draw >
We can also write an expression for the thermal flatgrid.wrl
expectation value of an observable X as:
But when we draw the sphere, we have to slice our
long lines into small ones, so that our sphere will have a
‘smooth’ curve. We can do that by using the ‘drawchop’ and
You can see that the above equation is modelled by a ‘drawball’ library files:
Hamiltonian H. Classically, it is quite easy to come out with
a computational method to solve such equations (say by #!/bin/sh
using Monte Carlo methods). But here the problem is that the /path_to_holodraw/drawchop.pl x=15+15 y=15+15 < /location_of_input_
objects (say operators or matrices) in QM do not necessarily file/sphere.draw |
commute. /path_to_holodraw/drawball.pl |
Still, we can go for models defined by: /path_to_holodraw/drawwrl.pl > ballgrid.wrl

We can create the VRML (Virtual Reality Modelling


Language) using the ‘drawwrl’ file:

A lattice of L sites filled with L/2 electrons with up spin, #VRML V2.0 utf8
and L/2 electrons with down spin, is a physical model that # draw a line around the sphere’s equator
easily fits into this. (Please Google the term ‘Hubbard model’ Shape {
for more information about a better model.) But to find out appearance Appearance {
what is really required to carry out these few steps, we need material Material {
an order of magnitude for M. And by using approximation emissiveColor 0 1 0
methods (like the Sterlings method) we can see that: transparency 0
}
}
geometry IndexedLineSet {
This means that the quantity M increases exponentially coord Coordinate {
with 2L (approximately). And if we allocate 8 bytes per floating point [
point number, the amount of memory we need to store a 0 0 1000,
single eigenvector will turn out to be: 500 0 866.025403784439,
866.025403784439 0 500,
1000 0 6.12303176911189e-14,
866.025403784439 0 -500,
So if I put L = 64, the memory required will be 1028 500 0 -866.025403784439,
GB! This means that I need 1028 GB to study a quantum 1.22460635382238e-13 0 -1000,
system of just 64 particles on 64 sites. If I submit a proposal -500 0 -866.025403784439,
with such high values, I am sure that no funding agency will -866.025403784438 0 -500,
accept this. -1000 0 -1.83690953073357e-13,
The only way I can do the computational task is to -866.025403784439 0 500,
go for an algorithmic strategy that will reduce the amount -500 0 866.025403784438,
of memory needed, at the expense of more CPU time. -2.44921270764475e-13 0 1000
This is further considered in relation to ‘clouds’ and their ]
effectiveness. }
coordIndex [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 ]

100  |  March 2009  |  LINUX For You  |  www.openITis.com


Guest Column  |  A Voyage to the Kernel

Some ‘tree’ facts Some evolutionary concepts: In a nutshell

In the last column, we discussed the use of trees. I will now


list some of their properties that you can employ while
designing the strategy:
E volutionary algorithms themselves form another major
branch. We will confine ourselves to some basic ideas,
problem definitions and generalisations (definitions).
• There will only be one node that connects two nodes in General single-objective optimisation problem: This is
a tree defined as minimising (or maximising) f (x) subject to gi (x)
• If a tree has N nodes, there will be N-1 edges ≤ 0, i = {1, . . . , m}, and hj (x) = 0, j = {1, . . . , p} x ∈ Ω. A
• For any binary tree with N internal nodes, there are N+1 solution minimises (or maximises) the scalar f (x) where x is a
external nodes n-dimensional decision variable vector x = (x1 , . . . , xn ) from
• The height of a given full binary tree with N internal nodes some universe Ω.
is about log N / log 2 Single-objective global minimum optimisation: Given a
function f : Ω ⊆ Rn → R, Ω = ø, for x ∈ Ω the value f * f (x* ) >
-∞ is called a global minimum if and only if
}
x ∈ Ω : f (x* ) ≤ f (x)
}
x* is by definition the global minimum solution, f is the
objective function, and the set Ω is the feasible region of x.
Shape {
appearance Appearance { Useful facts:
material Material { • The purpose of finding the global minimum solution(s)
emissiveColor 0 1 0 is called the global optimisation problem for a single-
transparency 0 objective problem.
} • Evolutionary multi-objective optimisation (EMO) refers to
} the use of evolutionary algorithms of any sort (like genetic
geometry IndexedLineSet { algorithms, evolution strategies, evolutionary programming
coord Coordinate { or genetic programming) to solve multi-objective
point [ optimisation problems.
0 707.106781186547 707.106781186548, • Other meta-heuristics that are being used to solve multi-
353.553390593274 707.106781186547 612.372435695795, objective optimisation problems include particle swarm
612.372435695795 707.106781186547 353.553390593274, optimisation, artificial immune systems and cultural
707.106781186548 707.106781186547 4.32963728535968e-14, algorithms.
612.372435695795 707.106781186547 -353.553390593274, • Differential evolution, ant colony, tabu search, scatter
353.553390593274 707.106781186547 -612.372435695795, search, and memetic algorithms are other key ideas in the
8.65927457071935e-14 707.106781186547 -707.106781186548, realm.
-353.553390593274 707.106781186547 -612.372435695795,
Key ideas:
-612.372435695794 707.106781186547 -353.553390593274,
• You must see that non-dominated points are preserved in
-707.106781186548 707.106781186547 -1.2988911856079e-13,
objective space, and the associated solution points in the
-612.372435695795 707.106781186547 353.553390593274,
decision space.
-353.553390593274 707.106781186547 612.372435695794,
• The design should be such that it should continue to
-1.73185491414387e-13 707.106781186547 707.106781186548
allow algorithmic progress towards the Pareto Front in the
]
objective function space.
}
• Maintain the diversity of points on Pareto/phenotype
coordIndex [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 ]
front (space) or of Pareto optimal solutions on decision/
}
genotype space.
}
• Provide the decision maker (DM) sufficient but limited
..........
number of Pareto points for the selection (which results in
...................
decision variable values).
Please let me know if you wish to discuss these ideas more
This way the code goes on. (The complete code of
in depth.
flatgid.wrl is available at aasisvinayak.com/new_zone/
forum.php?do=viewtopic&cat=2&topic=1)
We can generalize it as : transparency x
}
Shape { }
appearance Appearance { geometry IndexedLineSet {
material Material { coord Coordinate {
emissiveColor x x x point [

www.openITis.com  |  LINUX For You  |  March 2009  |  101


A Voyage to the Kernel  |  Guest Column

xyz corresponding output.]


] We have seen that with the help of libraries, we can
} generate complex codes quite easily. So you can employ
coordIndex [x,y,z] such functions, libraries and black-boxes when you write
} the algorithms.
} If you are able to achieve this, then you can straight
away try geometrical algorithms.
…where x,y,z are local variables with respect to each Having completed a good portion of our new segment,
reference point. And footer lines will be akin to: we can discuss the ideas you suggested. But I think it is too
late to discuss notations (and advanced ideas in numerical
#HISTORY# /home/aasisvinayak/Documents/Desktop/holodraw.0.37/ computation) today. So wait for the forthcoming issues, in
drawchop.pl x=30+30 y=30+30 which we will address them. 
#HISTORY# /home/aasisvinayak/Documents/Desktop/holodraw.0.37/
drawball.pl
Resources
• http://simkin.asu.edu/holodraw/download.html
NavigationInfo {
• http://www.perl.com/
type [ “EXAMINE”, “FLY”, “WALK”, “ANY” ] • http://www.dmoz.org/Computers/Software/Internet/Clients/
speed 1.0 VRML/Browser_Plugins/
}
• http://www.web3d.org/x3d/vrml/tools/viewers_and_
browsers/
#HISTORY# /home/aasisvinayak/Documents/Desktop/holodraw.0.37/
drawwrl.pl

By: Aasis Vinayak PG


This keeps track of the functions we employed. The author is a hacker and a free software activist who does
[Initially, I thought of putting the generated images programming in the open source domain. He is the developer
here, but later I felt that it was better to put the code of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
itself because once you have the copy of the ‘drawwrl’ Perl www.aasisvinayak.com
source file, you can use it to analyse our input and the

102  |  March 2009  |  LINUX For You  |  www.openITis.com


A Voyage to the
Kernel
Part 11
Segment 2.4, Day 10

W
e are winding up the segment Complex t = 0.0;
with this column. Here I shall for (long p=0; p<n; ++p)
address some of the aspects we {
skipped in the previous issues. t += f[k] * SinCos(ph0*p*u);
We will also discuss some shell coding examples }
towards the end of the segment. h[u] = t;
}
More numerical methods copy(h, f, n);
Let’s try to handle some high-level mathematical }
methods of computing. We have meddled with the
discrete Fourier transform (DFT or, simply FT) of There are such straightforward algorithms to
a complex sequence a = [a(0) , a(1) , . . . , a(n−1) ] of sort a scale of about n2, where n is the size of the
length n. It is essentially the sequence c = [c(0) , c(1) array to be sorted. The methodology is that of
, . . . , c(n−1) ] defined by c = F [a] selection, where we find the minimum of the array
This can be written as: and then the first element. Further, we will adopt a
recursive mechanism, as illustrated below:

void voyage_sort(type *f, userlong n)

{
Similarly, the back-transform (or inverse discrete for (userlong i=0; i<n; ++i)
Fourier transform, IDFT or IFT) can be represented as: {
Type z = f[i];
userlong m = i;
userlong j = n;
while ( --j > i )
{
if ( f[j]<z )
{
Now, our intention is to come out with m = j;
a straightforward implementation of the z = f[m];
discrete Fourier transform. This means that the }
computation of n sums (each of dimension n) }
requires about n2 operations. swap(f[i], f[m]);
We have the following algorithm for the purpose: }
}
void ft(Complex *f, long n, int i)
{ One can see that ‘userlong m = i;’ identifies the
Complex h[n]; position of minimum and ‘while ( --j > i )’ does the
constant double ph0 = i*2.0*M_PI/n; searching for minimum.
for (long u=0; u<n; ++u) Now we can see how a binary search
{ algorithm works:

102  |  April 2009  | LINUX For You  |  www.openITis.com


Guest Column  |  A Voyage to the Kernel

userlong binary_search(const Type *f, userlong n, const type z) schedule_file=”$HOME/.schedule”

{ check_day_name()
userlong nlo=0, nmi=n-1; {
while ( nlo != nmi ) case $(echo $1 | tr ‘[[:upper:]]’ ‘[[:lower:]]’) in
{ sun*|mon*|tue*|wed*|thu*|fri*|sat*) retval=0 ;;
userlong t = (nmi+nlo)/2; * ) retval=1 ;;
if ( f[t] < z ) nlo = t + 1; esac
else nmi = t; return $retval
} }
if ( f[nmi]==z ) return nmi;
else return ~0UL; check_month_name()
} {
case $(echo $1 | tr ‘[[:upper:]]’ ‘[[:lower:]]’) in
The above code works by the subdivision of the data, jan*|feb*|mar*|apr*|may*|jun*) return 0 ;;
and it must not have n equal to zero. jul*|aug*|sep*|oct*|nov*|dec*) return 0 ;;
To quantify it, consider an array of floating point * ) return 1 ;;
numbers that are equally distributed in a given interval esac
[with min(v) and max(v)]. If the specific value here equals }
v, we can go for: normalize()
{
echo -n $1 | cut -c1 | tr ‘[[:lower:]]’ ‘[[:upper:]]’
echo $1 | cut -c2-3| tr ‘[[:upper:]]’ ‘[[:lower:]]’
}

Here n represents the index, and min (v) and max (v) if [ ! -w $HOME ] ; then
represent the index limits. We can obtain an equation for echo “$0: The script can’t write in your home directory ($HOME)” >&2
n as: exit 1
fi

echo “Schedule: Voyage to Kernel Code”


echo -n “Date of event (in proper format): “
read word1 word2 word3 junk
This relation will be useful while writing the related
algorithms. Just as with the previous case, we can go for if check_day_name $word1 ; then
an approximated search, as shown below: if [ ! -z “$word2” ] ; then
echo “Invalid dayname format: please specify the day name.” >&2
userlong_binarysearch_approx(const type *f, userlong n, const type v, type da) exit 1
fi
{ date=”$(normalize $word1)”
userlonglong p = binarysearch_ge(f, n, v-da);
if ( p<n ) p = binarysearch_le(f+p, n-p, v+da); else
return p;
} if [ -z “$word2” ] ; then
echo “Invalid dayname format: unknown day name specified” >&2
The code will have significance only if it meddles exit 1
with the float or double types. fi

Shell codes if [ ! -z “$(echo $word1|sed ‘s/[[:digit:]]//g’)” ] ; then


We are going to discuss some more shell scripts that echo “Invalid date format: please specify day first, by day number” >&2
might be useful if you want to add extra features to your exit 1
application (based on a custom algorithm). These scripts fi
essentially beautify your mother program.
Here is a code that helps you to schedule events: if [ “$word1” -lt 1 -o “$word1” -gt 31 ] ; then
echo “Invalid date format: day number can only be in range 1-31” >&2
#!/bin/sh exit 1

www.openITis.com  | LINUX For You  |  April 2009  |  103


A Voyage to the Kernel  |  Guest Column

Notations

A ll the definitions given here are provided by the National


Institute of Standards and Technology.
O Notation
Formal definition: f(n) = ω (g(n)) means that for any positive
constant c, there exists a constant k, such that 0 ≤ cg(n) < f(n)
for all n ≥ k. The value of k must not depend on n, but may
This is a theoretical measure of the execution of an depend on c.
algorithm, usually the time or memory needed, given the
problem size n, which is usually the number of items. Informally, Θ Notation
saying some equation f(n) = O(g(n)) means it is less than some This is a theoretical measure of the execution of an algorithm,
constant multiple of g(n). The notation is read, “f of n is big oh usually the time or memory needed, given the problem size n,
of g of n”. which is usually the number of items. Informally, saying some
Formal definition: f(n) = O(g(n)) means there are positive equation f(n) = Θ (g(n)) means it is within a constant multiple of
constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k. The g(n). The equation is read as, “f of n is theta g of n”.
values of c and k must be fixed for the function f and must not
depend on n.

Formal definition: f(n) = Θ (g(n)) means there are positive


Also known as O. constants c1, c2, and k, such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n)
for all n ≥ k. The values of c1, c2, and k must be fixed for the
Ω notation function f and must not depend on n.
This is a theoretical measure of the execution of an algorithm,
usually the time or memory needed, given the problem size n, o notation
which is usually the number of items. Informally, saying some This is a theoretical measure of the execution of an algorithm,
equation f(n) = Ω (g(n)) means it is more than some constant usually the time or memory needed, given the problem size n,
multiple of g(n). which is usually the number of items. Informally, saying some
Formal definition: f(n) = Ω (g(n)) means there are positive equation f(n) = o(g(n)) means f(n) becomes insignificant relative
constants c and k, such that 0 ≤ cg(n) ≤ f(n) for all n ≥ k. The to g(n) as n approaches infinity. The notation is read as, “f of n is
values of c and k must be fixed for the function f and must not little oh of g of n”.
depend on n. Formal definition: f(n) = o(g(n)) means for all c > 0 there exists
some k > 0 such that 0 ≤ f(n) < cg(n) for all n ≥ k. The value of k
ω notation must not depend on n, but may depend on c.
This is a theoretical measure of the execution of an algorithm,
usually the time or memory needed, given the problem size n, Resources:
which is usually the number of items. Informally, saying some • Finding the Big-O of a function: www.eecs.harvard.
equation f(n) = ω (g(n)) means g(n) becomes insignificant relative edu/~ellard/Q-97/HTML/root/node7.html
to f(n) as n goes to infinity. • Big-O notation: en.wikipedia.org/wiki/Big_O_notation

fi date=”$word1$word2”
else
if ! check_month_name $word2 ; then if [ ! -z “$(echo $word3|sed ‘s/[[:digit:]]//g’)” ] ; then
echo “Invalid date format: unknown month name specified.” >&2 echo “Invalid date format: third field should be a year.” >&2
exit 1 exit 1
fi elif [ $word3 -lt 2009 -o $word3 -gt 3000 ] ; then
echo “Invalid date format: year value should be between 2009-3000”
word2=”$(normalize $word2)” >&2
exit 1
if [ -z “$word3” ] ; then fi

104  |  April 2009  | LINUX For You  |  www.openITis.com


Guest Column  |  A Voyage to the Kernel

date=”$word1$word2$word3” temp=”/tmp/$(basename $0).$$”


fi
fi trap “/bin/rm -f $temp” 0

echo -n “Add description: “ if [ $# -eq 0 -o ! -f “$1” ] ; then


read description echo “Please follow this format: $(basename $0) logfile domain” >&2
exit 1
echo “$(echo $date|sed ‘s/ //g’)|$description” >> $schedule_file fi

exit 0 for URL in $(awk ‘{ if (length($11) > 4) { print $11 } }’ “$1” | \


grep $2)
The code will create a .schedule file, which can be read do
using a proper read command. args=”$(echo $URL | cut -d\? -f2 | tr ‘&’ ‘\n’ | \
Here is more code that allows the user to change the grep -E ‘(^q=|^sid=|^p=|query=|item=|ask=|name=|topic=)’ | \
date ( from the application): cut -d= -f2)”
echo $args | sed -e ‘s/+/ /g’ -e ‘s/”//g’ >> $temp
#!/bin/sh count=”$(( $count + 1 ))”
done
userinput() echo “$2 searches from ${1}:”
{
echo -n “$1 [$2] : “ sort $temp | uniq -c | sort -rn | head -$maxmatches | sed ‘s/^/ /g’
read input
if [ ${input:=$2} -gt $3 ] ; then exit 0
echo “$0: $1 $input is found to be invalid”; exit 0
elif [ “$(( $(echo $input | wc -c) - 1 ))” -lt $4 ] ; then Now, let’s see how to prepare for the installation of
echo “$0: $1 $input is very short. Please specify $4 “; exit 0 PHP on your server (it can be used if your application
fi needs PHP and is not currently installed):
eval $1=$input
} !/bin/sh

eval $(date “+nyear=%Y nmon=%m nday=%d nhr=%H nmin=%M”) set -e

userinput year $nyear 5000 4 export DOMAIN=”www.yourdomain.com”


userinput month $nmon 12 2
userinput day $nday 31 2 SOURCEDIR=${HOME}/sourcefile
userinput hour $nhr 24 2 INSTALLDIR=${HOME}/php5.2.9
userinput minute $nmin 59 2 DISTDIR=${HOME}/dist

format=”$year$month$day$hour$minute” PHP5=”php-5.2.9”
LIBICONV=”libiconv-1.12”
echo “Setting date to $format. Please enter the root password:” ZLIB=”zlib-1.2.3”
sudo date $format CURL=”curl-7.17.1”
LIBIDN=”libidn-0.6.8”
exit 0 LIBMCRYPT=”libmcrypt-2.5.7”
LIBXML2=”libxml2-2.6.27”
Now, coming to the server side, if you wish to analyse LIBXSLT=”libxslt-1.1.18”
your logs (in the Web server) you can go to a suitable MHASH=”mhash-0.9.9”
shell script (you can use the method while writing CCLIENT=”imap-2004g”
solutions for people working on the server side). For CCLIENT_DIR=”imap-2004g”
example, the following code analyses the hits from a FREETYPE=”freetype-2.3.8”
search engine:
export PATH=${INSTALLDIR}/bin:$PATH
#!/bin/sh
function voyage_funtion_unpack () {
maxmatches=50
count=0 if [ -f $DISTDIR/$1* ] ; then

www.openITis.com  | LINUX For You  |  April 2009  |  105


A Voyage to the Kernel  |  Guest Column

echo “Extracting $1”; cd ${SOURCEDIR}


zcat ${DISTDIR}/$1* | tar -xvf - &>/dev/null; voyage_funtion_unpack ${PHP5}
echo “fINISHED”; wait voyage_funtion_unpack ${LIBICONV}
fi voyage_funtion_unpack ${LIBMCRYPT}
} voyage_funtion_unpack ${LIBXML2}
voyage_funtion_unpack ${LIBXSLT}
function voyage_funtion_grab () { voyage_funtion_unpack ${MHASH}
voyage_funtion_unpack ${ZLIB}
echo `basename $1` voyage_funtion_unpack ${CURL}
curl -L --retry 30 --max-time 3600 --retry-delay 60 -# -f --max-redirs 5 voyage_funtion_unpack ${LIBIDN}
--remote-name “$1” voyage_funtion_unpack ${CCLIENT}
} voyage_funtion_unpack ${FREETYPE}
wait
echo “ ---- Downloads and unpacks all prerequisite packages ---”
echo “ --- **SOURCEDIR and DISTDIR will be deleted---” echo “--------------------------------------------------”
read -p “ (Press any key to continue)” temp; echo “-- Successfully downloaded and unpacked prerequisites for
echo; installation --”
echo “--------------------------------------------------”

if [ -d “$SOURCEDIR” ] || [ -d “$DISTDIR” ];then exit 0;


echo
echo “--- Cleaning up . Please wait ---” An ‘extra’ code
rm -rf $SOURCEDIR $DISTDIR &>/dev/null The iTunes list is a very beautiful format, which you can
echo “Finished” share with others. All your albums will be properly placed on
wait the list. If you can get hold of it, then you can share the list.
fi The following code will give you an outline:

mkdir -p ${SOURCEDIR} ${INSTALLDIR} ${DISTDIR} &>/dev/null #!/bin/sh

echo itunehome=”$HOME/Music/iTunes”
echo “--- Downloading required packages. Please Wait ---” ituneconfig=”$itunehome/iTunes_Music_Library.xml”
echo
musiclibrary=”/$(grep ‘>Music Folder<’ “$ituneconfig” | cut -d/ -f5- | \
cd ${DISTDIR} cut -d\< -f1 | sed ‘s/%20/ /g’)”
voyage_funtion_grab http://us.php.net/distributions/${PHP5}.tar.gz
voyage_funtion_grab http://mirrors.usc.edu/pub/gnu/libiconv/ echo “Music library is located at $musiclibrary”
${LIBICONV}.tar.gz
voyage_funtion_grab http://umn.dl.sourceforge.net/sourceforge/mcrypt/ if [ ! -d “$musiclibrary” ] ; then
${LIBMCRYPT}.tar.gz echo “$0: Music library $musiclibrary is not a directory?” >&2
voyage_funtion_grab ftp://xmlsoft.org/libxml2/${LIBXML2}.tar.gz exit 1
voyage_funtion_grab http://curl.askapache.com/download/${CURL}.tar.gz fi
voyage_funtion_grab http://easynews.dl.sourceforge.net/sourceforge/
freetype/${FREETYPE}.tar.gz voyage_funtion_grab ftp://alpha.gnu.org/pub/ exec find “$musiclibrary” -type d -mindepth 2 -maxdepth 2 \! -name ‘.*’
gnu/libidn/${LIBIDN}.tar.gz -print | sed “s|$musiclibrary/||”
voyage_funtion_grab ftp://ftp.cac.washington.edu/imap/old/${CCLIENT}. With this, we have come to the end of our second
tar.Z segment. I believe we have discussed almost all the topics
voyage_funtion_grab ftp://xmlsoft.org/libxml2/${LIBXSLT}.tar.gz in the realm. If you have any suggestions, please drop me
voyage_funtion_grab http://umn.dl.sourceforge.net/sourceforge/mhash/ a line. From the next column onwards, we will address
${MHASH}.tar.gz kernel related topics. 
voyage_funtion_grab http://www.zlib.net/${ZLIB}.tar.gz

By: Aasis Vinayak PG


wait The author is a hacker and a free software activist who does
echo “Finished grabing” programming in the open source domain. He is the developer
of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
echo “--- The script is unpacking downloaded archives. Please wait ---”
www.aasisvinayak.com

106  |  April 2009  | LINUX For You  |  www.openITis.com


A Voyage to the
Kernel
Part 12

Segment 3.1, Day 11

W
e’re entering a new phase in our Date: Aug 26 1991, 11:12 am
journey—kernel programming. Subject: What would you like to see most in minix?
In the first part, we’ll cover a To: comp.os.minix
broad introduction to the Linux
platform for newbies, along with some history. Hello everybody out there using minix -

The Linux platform I’m doing a (free) operating system (just a hobby, won’t be big and
The UNIX platform showed us the power of professional like gnu) for 386(486) AT clones. This has been brewing
the multi-layer security architecture, and the since april, and is starting to get ready. I’d like any feedback on
beauty of a well-organised structure in terms things people like/dislike in minix, as my OS resembles it somewhat
of subsystems and layers. This is perhaps (same physical layout of the file-system (due to practical reasons)
one aspect that inspired Linus Torvalds, among other things).
who was then studying computer science at
the University of Helsinki, to develop a free I’ve currently ported bash(1.08) and gcc(1.40), and things seem to work.
operating system. (It should be noted that there This implies that I’ll get something practical within a few months, and
is still an ongoing project to remove the non- I’d like to know what features most people would want. Any suggestions
free portion of the kernel code.) are welcome, but I won’t promise I’ll implement them :-)
In 1990 he wrote:
Linus (torva...@kruuna.helsinki.fi)
From: torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds)
Newsgroups: comp.os.minix PS. Yes - it’s free of any minix code, and it has a multi-threaded fs.
Subject: Gcc-1.40 and a posix-question It is NOT portable (uses 386 task switching etc), and it probably never
Message-ID: <1991Jul3.100050.9886@klaava.Helsinki.FI> will support anything other than AT-harddisks, as that’s all I have
Date: 3 Jul 91 10:00:50 GMT :-(.
Hello netlanders,
Due to a project I’m working on (in minix), I’m interested in the posix He wrote the first Linux kernel in 1991.
standard definition. Could somebody please point me to a (preferably) Two years after Linus’ post, there were 12,000
machine-readable format of the latest posix rules? Ftp-sites would be Linux users. Linux is an essentially full UNIX
nice. clone. And now the Linux kernel is over six
million lines of code! The main reason why
The 90s home PCs were powerful enough to Linux gained its popularity is because it was
run a full-blown UNIX OS. So Linus thought it released under the GNU GPL licence (arguably
would be a good idea to come out with a freely one of the strongest ‘copyleft’ or quid pro quo
available academic version of UNIX. He based his licences). Later, Slackware, Red Hat, SuSE,
project on Minix. To quote him again: Debian, et al, sprung up to provide packaged

102  |  May 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

Linux distributions. This further spurred the growth of


the GNU/Linux community.
Some of the earlier large-scale users include Amazon,
the US Post Office and the German Army. It is interesting
to note that Linux machine clusters were used while
making movies like Titanic and Shrek. Today, we
have Linux on gadgets like PDAs, mobiles, embedded
applications and even on wristwatches. The introduction
of 3D acceleration support, support for USB devices,
single-click updates of systems and packages made Linux
more popular. Desktop users can now log in graphically
and start all applications without typing a single character
on a terminal. At the same time, you still have the ability to
access the core of the system.
Figure 1: A typical lsmod output
Here are a few other utilities and programs (mostly
from GNU) that added power to Linux: MODULE_LICENSE(“GPL GPLv3”);
 Bash: The GNU shell static int new__init(void)
 GCC: The GNU C compiler {
 coreutils: A set of basic UNIX-style utilities printk(“Let’s begin our new segment!\n”);
 findutils: For searching and finding files return 0;
 GDB: The GNU debugger }
 fontutils: For converting fonts from one format to static void goodbye__old(void)
another or making new fonts {
 Emacs: A very powerful editor printk(“Adieu to segment 2!\n”);
 Ghostscript and Ghostview: An interpreter and }
graphical front-end for the PostScript files module_init(new__init);
 GNU Photo: A software to manipulate the inputs module_exit(goodbye__old);
from digital cameras
 Octave: To perform numerical computations We have written a module. Let’s save this as module1.c
 GNOME and KDE: GUI desktop environments in the home directory. Now we need a make file:
 GNU SQL: A relational database system
 Radius: A remote authentication and accounting obj-m += module.o
server all:
make -C /lib/modules/$(shell uname -r)/build/ M=$(PWD) modules
An introduction to the tech side clean:
The Linux kernel is monolithic with module support. make -C /lib/modules/$(shell uname -r)/build/ M=$(PWD) clean
Beginners may not fully appreciate the meaning of this clean-files := Module.symvers
statement. A monolithic kernel is essentially a single
large complex “do-it-yourself ” kernel program, which Before proceeding, check the kernel version you are using:
has several different logical entities, and these run in
kernel mode. The Linux kernel has a modular design. aasisvinayak@GNU-BOX:~$ uname -r
When you boot the system, only a minimal resident 2.6.27-7-generic
kernel is loaded into memory. When you request
a feature that is not in the kernel, a kernel module Now we can make our module:
(or driver) is dynamically loaded into memory. The
object code normally has a collection of functions that aasisvinayak@GNU-BOX:~$ make -C /lib/modules/`uname -r`/build/
implements a file system, a device driver or any other M=`pwd`
feature at the kernel’s upper layer. This has the following make: Entering directory `/usr/src/linux-headers-2.6.27-7-generic’
advantages: minimal main memory usage, modularised CC [M] /home/aasisvinayak/module1.o
structure and platform independence. Building modules, stage 2.
MODPOST 1 modules
Our first module CC /home/aasisvinayak/module1.mod.o
Let’s begin the technical side by writing a module: LD [M] /home/aasisvinayak/module1.ko
make: Leaving directory `/usr/src/linux-headers-2.6.27-7-generic’
#include <linux/module.h>
MODULE_AUTHOR(“Aasis Vinayak PG”); You may change the directory by changing the pwd

www.LinuxForU.com  |  LINUX For You  |  May 2009  |  103


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

command to your directory. We can check our module by #include <linux/moduleparam.h>


issuing the following commands: #include <linux/marker.h>
#include <asm/local.h>
aasisvinayak@GNU-BOX:~$ sudo dmesg -c > /dev/null #include <asm/module.h>
aasisvinayak@GNU-BOX:~$ sudo insmod module1.ko
aasisvinayak@GNU-BOX:~$ sudo dmesg -c If you are an experienced programmer, then by
[74091.826941] Let’s begin our new segment! looking at the dependencies you will know how the
aasisvinayak@GNU-BOX:~$ sudo rmmod module1 new file has been created. Novice users need not worry,
aasisvinayak@GNU-BOX:~$ sudo dmesg -c as this column will address everything from scratch.
[74119.021887] Adieu to segment 2! For the time being, you can just explore the directories
/lib/modules/2.x.x-version and /usr/src/linux-headers-
You can see that a module1.mod.c file was also created 2.x.x-version.
in the process: As I mentioned before, you can load a module by
issuing commands. lsmod will list the modules (with
#include <linux/module.h> a small description about its current usage) in your
#include <linux/vermagic.h> terminal. Figure 1 shows the result of the execution.
#include <linux/compiler.h> You can also view the list of modules to load at boot
time by checking the /etc/modules file. It will have
MODULE_INFO(vermagic, VERMAGIC_STRING); entries similar to the one listed below:

struct module __this_module # /etc/modules: kernel modules to load at boot time.


__attribute__((section(“.gnu.linkonce.this_module”))) = { #
.name = KBUILD_MODNAME, # This file contains the names of kernel modules that should be loaded
.init = init_module, # at boot time, one per line.
#ifdef CONFIG_MODULE_UNLOAD
.exit = cleanup_module, fuse
#endif lp
.arch = MODULE_ARCH_INIT, sbp2
};

static const struct modversion_info ____versions[] # Generated by sensors-detect on Wed Jan 14 21:14:15 2009
__used # Chip drivers
__attribute__((section(“__versions”))) = { coretemp
{ 0xa257c5a3, “struct_module” },
{ 0xb72397d5, “printk” }, You can issue modprobe for probing. Its usage is shown below:
{ 0xb4390f9a, “mcount” },
}; Usage: modprobe [-v] [-V] [-C config-file] [-n] [-i] [-q] [-Q] [-b] [-o
<modname>] [ --dump-modversions ] <modname> [parameters...]
static const char __module_depends[] modprobe -r [-n] [-i] [-v] <modulename> ...
__used modprobe -l -t <dirname> [ -a <modulename> ...]
__attribute__((section(“.modinfo”))) =
“depends=”; As this is an introductory column, the kernel build
process (which has five parts, viz. Makefile, .config, arch/
MODULE_INFO(srcversion, “4675BE4AD96DEF402B04BD1”); $(ARCH)/Makefile, scripts/Makefile, Kbuild Makefiles) will
be explained in detail in subsequent articles.
You will find that the code requires the following files
as well: Compiling the kernel
Now we will look at how to compile a kernel:
#include <linux/list.h>  Download the latest Linux source. If you have
#include <linux/stat.h> installed Linux with the kernel source option, the
#include <linux/compiler.h> source will already be there. You may copy the kernel
#include <linux/cache.h> source to a safe location so that you can restore it
#include <linux/kmod.h> later. It is a good idea to backup your /boot directory
#include <linux/elf.h> as well or at least the kernel image. On my machine,
#include <linux/stringify.h> the source was located in /usr/src/linux-2.6.27-7.
#include <linux/kobject.h>  Create a soft link to your Linux folder, so that you

104  |  May 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

have /usr/src/linux:  Reboot. When the machine restarts, you can select
ln -s /usr/src/linux-2.6.27-7 /usr/src/linux the new kernel!
 Make the necessary changes to the kernel This is the standard procedure. There are simpler
 make clean methods as well. (For example, in the case of Debian-
 make mrproper based distros, you have a very easy method. Just Google
 make clean it and find the resources.)
 Configure the modules. You have to specify the
modules to be loaded and those to be compiled Kernel structure
in the kernel. It would be ideal to start with your Within the kernel layer, Linux is composed of five (or
existing configuration file. six, based on the classification style) major subsystems:
 make dep the process scheduler (sched), the memory manager
 Modify your Makefile EXTRAVERSION tag to give a (mm), the virtual file system (vfs), the network interface
unique name to your kernel (net), and the inter-process communication (ipc).
 Compile the kernel image
nohup make bzImage & Organisation of code
tail -f nohup.out We shall now discuss the way in which the source
 Compile the modules layout is organised. If you go to the root of the source
nohup make modules 1> modules.out 2> modules.err & tree and list the contents, you will see the following
tail –f modules.err directories:
 make modules_install  arch – Contains architecture-specific files
 Copy your kernel image and config file to the /boot  block – Has the implementation of I/O scheduling
directory: algorithms
cp arch/i386/boot/bzImage /boot/cpKernel  crypto – Has the implementation for cipher
cp .config /boot/cpKernelConfig operations and contains the cryptographic API for
 Create a new initrd image so that the modules can implementing encryption algorithms
be loaded while booting. The module created in the  Documentation – Contains descriptions about
make modules step will be located in /lib/modules: various kernel subsystems
/sbin/mkinitrd /boot/initrd-2.6.27-7cpKernel.img  drivers – Device drivers for various device classes
/lib/modules/2.6.27-7cpKernel and peripheral controllers can be found here
 Now you can edit your /boot/grub/menu.lst file (if  fs – Contains the implementation of file systems
you are using the GRUB bootloader), so that while  include – Kernel header files
booting you can select which kernel to start. Here is  init – High-level initialisation and start-up code
a sample entry in /boot/grub/menu.lst:  ipc – Contains support for Inter-Process
timeout 10 Communication (IPC) mechanisms
 kernel – Architecture-independent portions of the
splashimage=/boot/grub/splashimages/firework.xpm.gz base kernel
 lib – Has the library routines
title VINU, kernel 2.6.27-7  mm – Holds the memory management
uuid c21385ba-7c68-4ac0-924f-8bfafdaddc5f implementation
kernel /boot/vmlinuz-2.6.27-7  net – Networking protocols
root=UUID=c21385ba-7c68-4ac0-924f-8bfafdaddc5f ro quiet splash  scripts – Scripts used for kernel build
initrd /boot/initrd.img-2.6.27-7  security – Holds the framework for security
quiet  sound – Linux audio subsystem
 usr – Has the initramfs implementation
title VINU, kernel 2.6.27-7 We will dedicate a few more sessions to reviewing
uuid c21385ba-7c68-4ac0-924f-8bfafdaddc5f the kernel structure before we meddle with
kernel /boot/vmlinuz-2.6.27-7 programming. So look out for the forthcoming columns
root=UUID=c21385ba-7c68-4ac0-924f-8bfafdaddc5f ro single to hack the kernel! 
initrd /boot/initrd.img-2.6.27-7

title VINU, memtest86+ By: Aasis Vinayak PG


uuid c21385ba-7c68-4ac0-924f-8bfafdaddc5f The author is a hacker and a free software activist who does
kernel /boot/memtest86+.bin programming in the open source domain. He is the developer
quiet
of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
 (Optional step) Delete the entry for the old kernel www.aasisvinayak.com
version in the config file

www.LinuxForU.com  |  LINUX For You  |  May 2009  |  105


A Voyage to the
Kernel
Part 13
Segment 3.2: Day 12
In the last column we focused on some of the fundamentals in kernel programming. We
are going to dedicate today’s voyage to reviewing some more basic literature.

A
s I mentioned before, present day #include <asm/unistd.h>
CPUs can run in two modes -- kernel
mode and user mode. Interrupt #ifndef _LIBC
drivers and operating system services /* The Linux kernel header file defines macros `__NR_<name>’,
are examples of those that can run in kernel but some
mode. In this mode, free access to the entire programs expect the traditional form `SYS_<name>’. So in
memory and device registers are supported building libc
with the help of an ‘extended set’ of instructions. we scan the kernel’s list and produce <bits/syscall.h> with
While when the CPU runs in user mode, it will macros for
have access only to a specific restricted set of all the `SYS_’ names. */
instructions (the CPU cannot use the entire # include <bits/syscall.h>
memory in this case). These two modes are #endif
used for security reasons. This, in turn, provides
reliability to the operating systems. Normally, a #endif
program uses the user mode (which is ‘safer’).
When it needs more utilities and restricted This code is taken from the GNU C library.
resources, it will switch over to the kernel mode. Here, you can see a dependency file. And you
Systems calls are used to access kernel utilities may find some of those included in the sidebox
and are essentially operating system services. on ‘syscalls’.
These software interrupts are processed by the OS services can be well accessed using
OS as kernel mode processes. these calls. In the last column, we discussed the
We can see a system call list maintained module programming part. There we stumbled
by the OS and it has respective pointers to the upon two types of modules—essential and
functions that can implement the calls in the loadable. As the name suggests, the loadable
kernel. Here, you can see a list of such system ones can be loaded (or unloaded) based on the
calls ( from the syscall.h file): needs of the user (or program). These modules
can provide extra utilities and modes to the
#ifndef _SYSCALL_H kernel. As most of the readers would be aware of
#define _SYSCALL_H 1 the basic differences between a micro kernel and
a monolithic one, you can understand why some
/* This file should list the numbers of the system the system of them are arguing for micro kernel model!
knows. If a new functionality offered by the ‘module’
But instead of duplicating this we use the information has to be added directly to the original code,
available then it will be tedious, as you need to rebuild it
from the kernel sources. */ very time you add a new portion of ‘extra code’!

102  |  June 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

The kernel programming can be really hectic because Few important syscall declarations
of the long debugging cycle. Thus, we go for loadable
• #define SYS_adjtimex __NR_adjtimex
modules in kernel development. Linux is a modern
• #define SYS_afs_syscall __NR_afs_syscall
monolithic kernel that can support loadable ‘modules’.
• #define SYS_alarm __NR_alarm
We have already seen how to write a simple module. And
• #define SYS_brk __NR_brk
here we will investigate more about the process.
• #define SYS_capget __NR_capget
If you use loadable modules for programming
• #define SYS_capset __NR_capset
purposes, then you need not reboot the system every
• #define SYS_chdir __NR_chdir
time. I will illustrate this with the help of an example:
• #define SYS_get_mempolicy __NR_get_mempolicy
#include <stdio.h>
• #define SYS_get_robust_list __NR_get_robust_list
int main(void)
• #define SYS_get_thread_area __NR_get_thread_area
{
• #define SYS_getcwd __NR_getcwd
FILE *samplefile;
• #define SYS_getdents __NR_getdents
char tempstring[1024];
• #define SYS_getdents64 __NR_getdents64
if(!(samplefile=fopen(“/etc/passwd”,”r”)))
• #define SYS_getegid __NR_getegid
{
• #define SYS_pipe __NR_pipe
fprintf(stderr,”System could not open the file\n”);
• #define SYS_pipe2 __NR_pipe2
exit(1);
• #define SYS_pivot_root __NR_pivot_root
}
• #define SYS_poll __NR_poll
while(!feof(samplefile))
• #define SYS_ppoll __NR_ppoll
{
• #define SYS_prctl __NR_prctl
fscanf(samplefile,”%s\n”,tempstring);
• #define SYS_msgctl __NR_msgctl
fprintf(stdout,”%s\n”,tempstring);
• #define SYS_msgget __NR_msgget
}
• #define SYS_msgrcv __NR_msgrcv
exit(0);
Because of space constraints, I could not include all the
}
calls here. But we will be discussing them in future columns.
The above code can be used to open a particular file
and print the contents of the file. It should be noted
that the system calls and library functions are different mmap2(0xb7ed2000, 12288, PROT_READ|PROT_WRITE, MAP_
things. The main difference is that library functions are PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x158) = 0xb7ed2000
not attached to the kernel. mmap2(0xb7ed5000, 9840, PROT_READ|PROT_WRITE, MAP_
In the code, we have tried to use fopen (which is PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7ed5000
not a system call) to open the passwd file. But we can close(3) =0
see the system calls invoked by a program by using the mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_
strace utility. ANONYMOUS, -1, 0) = 0xb7d79000
aasisvinayak@GNU-BOX:~/Documents/Desktop$ strace ./sample.out set_thread_area({entry_number:-1 -> 6, base_addr:0xb7d796b0,
execve(“./sample.out”, [“./sample.out”], [/* 36 vars */]) = 0 limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1,
brk(0) = 0x854e000 seg_not_present:0, useable:1}) = 0
access(“/etc/ld.so.nohwcap”, F_OK) = -1 ENOENT (No such file or mprotect(0xb7ed2000, 8192, PROT_READ) = 0
directory) mprotect(0x8049000, 4096, PROT_READ) =0
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ mprotect(0xb7f17000, 4096, PROT_READ) = 0
ANONYMOUS, -1, 0) = 0xb7efa000 munmap(0xb7ed8000, 137651) =0
access(“/etc/ld.so.preload”, R_OK) = -1 ENOENT (No such file or brk(0) = 0x854e000
directory) brk(0x856f000) = 0x856f000
open(“/etc/ld.so.cache”, O_RDONLY) =3 open(“/etc/passwd”, O_RDONLY) =3
fstat64(3, {st_mode=S_IFREG|0644, st_size=137651, ...}) = 0 fstat64(3, {st_mode=S_IFREG|0644, st_size=1969, ...}) = 0
mmap2(NULL, 137651, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7ed8000 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_
close(3) =0 ANONYMOUS, -1, 0) = 0xb7ef9000
access(“/etc/ld.so.nohwcap”, F_OK) = -1 ENOENT (No such file or read(3, “root:x:0:0:root:/root:/bin/bash\n”..., 4096) = 1969
directory) fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
open(“/lib/tls/i686/cmov/libc.so.6”, O_RDONLY) = 3 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_
read(3, “\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340g\1”..., 512) ANONYMOUS, -1, 0) = 0xb7ef8000
= 512 .................
fstat64(3, {st_mode=S_IFREG|0755, st_size=1425800, ...}) = 0 .......................
mmap2(NULL, 1431152, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ write(1, “administrator,,,:/var/lib/postgr”..., 47administrator,,,:/var/lib/
DENYWRITE, 3, 0) = 0xb7d7a000 postgresql:/bin/bash

www.LinuxForU.com  |  LINUX For You  |  June 2009  |  103


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

) = 47 #define kfree_s(a,b) kfree(a)


read(3, “”, 4096) =0
write(1, “jetty:x:118:65534::/usr/share/je”..., 47jetty:x:118:65534::/usr/ It is good to note that static memory allocation is
share/jetty:/bin/false also handled in a similar way. The initialisation routine
) = 47 for character-oriented devices for this is handled by:
exit_group(0) =?
Process 16610 detached memory_start = console_init(memory_start,memory_end);

A close review of the above lines will give you a lucid For mapping functions we can have the functions in
picture about the invoked actions. Now, we will look at the header file sys/mman.h:
how to load the module. Let’s have some new code:
extern caddr_t mmap (caddr_t addr, size_t len,
#include <linux/kernel.h> int prot, int flags, int fd, off_t off);
#include <sys/syscall.h> extern int munmap (caddr_t addr, size_t len);
#include <linux/module.h> extern int mprotect (caddr_t addr, size_t len, int prot);
extern int msync;
extern void *sys_table[];
asmlinkage int (*main_sys_exit)(int); An overview of the Linux kernel
asmlinkage int alt_exit_function(int err_code) We have seen some basic aspects of modules. Before we
{ proceed, we need to go over some more general ideas
printk(“Sys_exit called with err_code=%d\n”,err_code); about the Linux kernel. This is required for writing new
return main_sys_exit(err_code); modules and building your own custom kernel. In fact,
} there are some quick methods you can employ while
int init_module() making a custom package. For example, you can get the
{ list of modules required to run during the boot process
main_sys_exit=sys_table[__NR_exit]; by looking at the kernel configuration (which is located
sys_table[__NR_exit]=alt_exit_function; in your current distribution’s kernel package).
} We can generally classify the Linux files as:
void cleanup_module()  Regular files
{  Directory
sys_table[__NR_exit]=main_sys_exit;  Symbolic link
}  Block-oriented device files
 Character-oriented device files
You can run:  Pipes (also called ‘named pipes’)
 Sockets
gcc -Wall -DMODULE -D__KERNEL__ -DLINUX -c sample2.c Beginners should note that Linux treats even
directories as files. We prefer GNU/Linux OS because of
...to compile the code (which is assumed to be saved its unique features. For novices, let me summarise some
as sample2.c). Then you can use: of the main features here:
 Multi-tasking (here processes run independent of
insmod filename.o // where filename is your file name each other)
 Multi-user access (Linux allows a number of users to
...to insert this module. You can use lsmod to list the work at the same time)
loaded modules.  Multi-processing (supports multi-processor
When a system function is initiated by a program, architectures)
the process switches to kernel mode. In the x86 family  Architecture independence (works on a variety of
architecture, a system call is normally initiated by hardware platforms)
the software interrupt 128 (0х80) being triggered.  Support for loadable executables
Now, let’s take the case of the memory. In some cases,  Paging *
we may require dynamic memory allocation. An apt  Dynamic cache for hard disks
example is when you deal with temporary buffers. The  Shared libraries
functions used for this are kmalloc() and kfree(), which  Support for POSIX
are implemented in the kmalloc file:  Different formats for executable files
 Memory protected mode
void * kmalLoc (size_t size, int priority);  Support for national keyboards and fonts
void kfree (void *obj);  Different file systems are supported

104  |  June 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

The getpid() system call basically returns the process


ID of the calling process. But it should not be used
linux
for constructing temporary file names due to security
reasons.
modules mm net kernel init lib include ipc
For this purpose, we can use:
unix inet asmlinkage int sys_getpid(void)
fs arch drivers linux
ext nfs mips net {
asm-alpha
ext2 proc alpha char return currcnt->pid;
asm-i386
xiafs minix sparc block
}
asm-m68k Remember the printk we have used before? Beginners
isofs msdos ppc scsi
asm-generic might find it hard to cope with new functions (which
hpfs sysv i386 sound
asm-mips may appear undefined to them). To put it in simple
umsdos kernel boot mm math-emu asm-sparc words, I can say that there are functions available to
modules. Here, too, you can find an included header
file (more details about the subject can be accessed at
Figure 1: Linux in a tree structure.
en.wikipedia.org/wiki/Unistd.h).
And now let me describe a few commands (and their
 TCP/IP, SLIP and PPP support actions) for your reference. It will be useful when we
*Though there are different mechanisms for efficient proceed further:
memory management, sometimes it may be eaten • kill (send a signal to a process)
away. In that case, the OS looks for 4 KB memory pages
that can be made free. Then the pages whose contents #include <sys/types.h>
are already stored on the hard disk are identified and #include <signal.h>
discarded.
The ability to support modules is really an amazing int
feature. In fact, there are few differences between kill(pid_t pid, int sig);
normal programs and modules. In the case of normal C
programs, they usually begin with a main() function • init (process control initialisation)
and then execute a set of instructions. But in the case
of modules, there will be an init_module or the init
function you point using the module_init call. This init [0 | 1 | 6 | c | q]
essentially speaks about the functionality that it can
provide. And these modules end by calling cleanup_ • telnetd
module (or like in the first case, the function you /usr/libexec/telnetd [-46BUhlkn] [-D debugmode] [-S tos] [-X
specify using a module_exit call). authtype]
We have seen that for a system call to perform, there [-a authmode] [-edebug] [-p loginprog] [-u len]
should be a transition from user mode to the system [-debug [port]]
mode. This is handled with the help of interrupts.
The parameters sys_call_num and sys_call_args are • gethostname, sethostname
used to represent the number of the system call and its #include <unistd.h>
arguments. For example, we can have:
int
SAMPLE system_call( int sys_call_num , sys_call_args ) gethostname(char *name, size_t namelen);

Let’s look at some examples to comprehend this in a int


better way. sethostname(const char *name, int namelen);

getpid (and getppid) We will continue to dedicate a few more days to


#include <sys/types.h> review the literature. Happy kernel hacking! 
#include <unistd.h>

By: Aasis Vinayak PG


pid_t The author is a hacker and a free software activist who does
getpid(void); programming in the open source domain. He is the developer
of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
pid_t
www.aasisvinayak.com
getppid(void);

www.LinuxForU.com  |  LINUX For You  |  June 2009  |  105


A Voyage to the
Kernel
Part 14

Segment 3.3: Day 13

L
ast time we discussed the various struct irqaction {
types of operations and system irq_handler_t handler;
calls. In this article, we will focus unsigned long flags;
more on the literature. Like I cpumask_t mask;
mentioned, there are two different modes: the const char *name;
kernel and the user mode. Let’s look at the two void *dev_id;
types of switching. The first is when you make struct irqaction *next;
a system call. After calling it, the task will go int irq;
for codes that are operational in the kernel struct proc_dir_entry *dir;
mode. The other case is when you deal with };
interrupt requests (IRQs). Soon after an IRQ, a
handler is called and the control goes back to extern irqreturn_t no_action(int cpl, void *dev_id);
the task that was interrupted. extern int __must_check request_irq(unsigned int, irq_
A system call may be used when you want handler_t handler,
to access a particular I/O device or file, or unsigned long, const char *, void *);
when you need to get privileged information. It extern void free_irq(unsigned int, void *);
may also be used when you require to execute
a command or to change the execution struct device;
context.
Now let me elucidate the whole process extern int __must_check devm_request_irq(struct device *dev,
that governs an IRQ event. We’ll assume unsigned int irq,
that a particular process is running. An IRQ irq_handler_t handler, unsigned long irqflags,
may occur while the process is running. const char *devname, void *dev_id);
Then, the task will be interrupted to call extern void devm_free_irq(struct device *dev, unsigned int
the corresponding interrupt handler and it irq, void *dev_id);
is executed right there. In the next step, as
mentioned before, the control goes back to the
task (which is running in user mode) and the Non-free elements in the kernel
process is back to its original state. In an earlier column, I had discussed the non-
Advanced users can comprehend the mode free code portions in the kernel. And I received
of initiation by looking at the code below: a number of queries regarding the subject. So, I
think it is appropriate to discuss it here.
typedef irqreturn_t (*irq_handler_t)(int, void *); It is true that the kernel ( from the original

www.LinuxForU.com  |  LINUX For You  |  July 2009  |  93


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

repository) contains non-free elements, especially


hardware drivers that depend on non-free firmware. Inode interface
It will ask you to install additional non-free software  create(): creates file in the directory
that it doesn’t contain.  lookup(): finds files by name, in a directory
In fact, there is a project (unfortunately, not very  link()/symlink()/unlink()/readlink()/follow_link():
active!) involved in the process of removing software manages filesystem links
that is included without source code; say, with  mkdir()/rmdir(): creates/removes sub-directories
obfuscated or obscured source code. It is the Linux-  mknod(): creates a directory or file
libre project of FSF-LA.  readpage()/writepage(): reads or writes a page of
Now let’s have a look at the automated process physical memory to backing store
( for an exploded tree) that does it!  truncate(): sets the length of a file to zero
 permission(): checks to see if a user process has
kver=2.6.21 extra=0++ permission to execute a given operation
 smap(): maps a logical file block to a physical device
case $1 in sector
--force) die () { echo ERROR: “$@”: ignored >&2; }; shift;;  bmap(): maps a logical file block to a physical
*) die () { echo “$@” >&2; exit 1; };; device block
esac  rename(): renames a file/directory

if unifdef -Utest /dev/null; then :; else File interface


die unifdef is required  open()/release(): to open/close the file
fi  read()/write(): read the file/write to the file
 select(): waits until the file is in a given state
check=`echo $0 | sed ‘s,/[^/]*$,,’`/deblob-check  lseek(): moves to a particular offset in the file (if
if [ ! -f $check ] ; then supported)
echo optional deblob-check missing, will remove entire files >&2  mmap(): maps a region of the file into the virtual
have_check=false memory of the user process
else  fsync()/fasync(): synchronises memory buffers with
have_check=: physical device
fi  readdir: reads the files that are pointed by the
directory file
Those who are good with shell programming can  ioctl: sets file attributes
follow the steps easily (please refer to Segment 1 of  check_media_change: checks if a removable media
the ‘Voyage’ series for shell programming related has been removed
queries).  revalidate: verifies that all the cached information
Now you can see how it performs the locating is valid
process:

clean_file () {
#$1 = filename #$1 = filename
if test ! -f $1; then if $have_check; then
die $1 does not exist, something is wrong if test ! -f $1; then
fi die $1 does not exist, something is wrong
rm -v $1 fi
} name=$1
echo Removing blobs from $name
check_changed () { set fnord “$@” -d
if test ! -f $1; then shift 2
die $1 does not exist, something is wrong $check “$@” -i linux-$kver $name > $name.deblob
elif cmp $1.deblob $1 > /dev/null; then check_changed $name
die $1 did not change, something is wrong else
fi clean_file $1
mv $1.deblob $1 fi
} }

clean_blob () { clean_kconfig () {

94  |  July 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

Memory Manager # check_changed $2


if sed -n “/\\($1\\)/p” $2 | grep . > /dev/null; then
Hardware Independent
:
Hardware Dependent
else
die $2 does not contain matches for $1
fi
Virtual File System
}
Logical File System
Process Inter-Process
Scheduler Communication
Hardware Drivers clean_ifdef () {
Legend:
#$1 = filename $2 = macro to -U
Subsystem echo unifdefing $1 with -U$2
Network
Subsystem Layer unifdef -U$2 $1 > $1.deblob
Network Protocols
depends on check_changed $1
Hardware Drivers
}

Figure 1: System decomposition


Please note that deblob-check looks for blobs in the
tarballs, source files and patches. Then it tries to clean
Process up individual source files from non-free blobs. At the
System-Call process scheduling Scheduler
Interface end, you should only have free and apparent blobs. The
timer management non-free bits are often derived from code under non-
module management
Legend: disclosure agreements that don’t bestow permission
resource
dependency for the code to be distributed under the GNU General
Subsystem Public License. Now, to handle the drivers:
Architecture Specific source module
Modules # First, check that files that contain firmwares and their
# corresponding sources are present.

Figure 2: Process scheduler structure


for f in \
drivers/char/ser_a2232fw.h \
drivers/char/ser_a2232fw.ax \
fs ipc drivers/net/ixp2000/ixp2400_rx.ucode \
drivers/net/ixp2000/ixp2400_rx.uc \
drivers/net/ixp2000/ixp2400_tx.ucode \
sched drivers/net/ixp2000/ixp2400_rx.uc \
drivers/net/wan/wanxlfw.inc_shipped \
drivers/net/wan/wanxlfw.S \
drivers/net/wireless/atmel.c \
mm net
drivers/net/wireless/atmel.c \
drivers/scsi/53c700_d.h_shipped \
Legend:

resource drivers/scsi/53c700.scr \
Subsystem dependency
drivers/scsi/aic7xxx/aic79xx_seq.h_shipped \
drivers/scsi/aic7xxx/aic79xx.seq \
Figure 3: Process scheduler dependencies drivers/scsi/aic7xxx/aic7xxx_seq.h_shipped \
drivers/scsi/aic7xxx/aic7xxx.seq \
#$1 = filename $2 = things to remove drivers/scsi/aic7xxx_old/aic7xxx_seq.c \
echo Marking config $2 as depending on NONFREE in $1 drivers/scsi/aic7xxx_old/aic7xxx.seq \
sed “/^config \\($2\\)\$/{p;i\ drivers/scsi/53c7xx_d.h_shipped \
depends on NONFREE drivers/scsi/53c7xx.scr \
d;}” $1 > $1.deblob drivers/scsi/sym53c8xx_2/sym_fw1.h \
check_changed $1 drivers/scsi/sym53c8xx_2/sym_fw1.h \
} drivers/scsi/sym53c8xx_2/sym_fw2.h \
drivers/scsi/sym53c8xx_2/sym_fw2.h \
clean_mk () { drivers/usb/serial/keyspan_pda_fw.h \
#$1 = config $2 = Makefile name drivers/usb/serial/keyspan_pda.S \
# We don’t clean up Makefiles any more --lxoliva drivers/usb/serial/xircom_pgs_fw.h \
# sed -i “/\\($1\\)/d” $2 drivers/usb/serial/xircom_pgs.S \

www.LinuxForU.com  |  LINUX For You  |  July 2009  |  95


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

sound/pci/cs46xx/imgs/cwcbinhack.h \
mmap
System-Call
sound/pci/cs46xx/imgs/cwcdma.asp \ mmap
Interface
; do
if test ! $f; then
mremap filemap

die $f is not present, something is amiss


fi
done swap core
swapfile swap_state page_alloc

For your reference, here are the functions memory swap kswapd memory
performed by the scripts:
 deblob-main: The main script used to clean up
the Linux tarball.
 deblob-check: The script that finds blobs. It may Legend:
Architecture Specific resource
also clean up work. Modules
dependency

 deblob-2.6.##: The script that cleans up blobs Subsystem


from within a given exploded Linux source tree. source module
Now, coming to the removal:
MMU daemon

# Identify the tarball.


sed -i “s,^EXTRAVERSION.*,&-libre$extra,” Makefile Figure 4: Memory manager structure

#######################
fs
# Removed ATM Drivers #
#######################
mm ipc Legend:
mmap
subsystem
# ATM_AMBASSADOR - Madge Ambassador (Collage PCI 155
swap resource
Server) dependency
sched core
clean_blob drivers/atm/atmsar11.data source module
architecture net
grouping
specific

# ATM_FORE200E_PCA
# ATM_FORE200E_SBA - SBA-200E Figure 5: Memory manager dependencies
clean_kconfig drivers/atm/Kconfig ‘ATM_FORE.*’
clean_mk CONFIG_ATM_FORE200E drivers/atm/Makefile
shares the system’s resources may be termed as a
clean_file drivers/atm/pca200e.data task. And by multi-tasking, we are actually referring
clean_file drivers/atm/pca200e_ecd.data to the effective sharing of these resources among the
clean_file drivers/atm/sba200e_ecd.data tasks. Here, the system can switch from one task to
clean_kconfig drivers/atm/Kconfig ‘ATM_AMBASSADOR’ another after a given timeslice time (say 10 ms). This
clean_mk CONFIG_ATM_AMBASSADOR drivers/atm/Makefile gives an impression that many tasks are handled
simultaneously.
The interesting point is that maintaining Linux- Here are the detailed steps of the process: Let’s
libre is not a time-consuming process. And there are say task1 is running and using the resources. Then, a
scripts that will inform the project manager whether resource request will be made that forces the system
there is anything that needs manual intervention. to put the task1 in the block list and choose task2
David Woodhouse suggested having a separate from the ready list for task switching. This is what
branch of the kernel source tree (which would be happens when it comes to two tasks. You can extend
excluded from a normal kernel build process) for this idea to N number of tasks by choosing a timer
non-free firmware. Thus, the non-free firmware IRQ for the switching stage.
could be distributed in a separate package. But the Having discussed these ideas, we can now go back
idea of ‘complete freedom’, as proposed by Linux- to the sub-system structure of the operating system.
libre, is not respected here. The process scheduler is employed to:
 Allow processes to create fresh copies
Outline of Linux kernel  Send signals to the user processes
Now let’s consider the idea of tasks. We have  Manage the timer
already seen that Linux supports multi-tasking. Any  Select the process that can access the CPU
application that runs the memory of the system and  Receive interrupts and route them to the

96  |  July 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

appropriate kernel subsystem


 Clean up process resources ( final stage of the
System-call interface
process)
There are two types of interfaces for this — a System-call interface
complete interface for the kernel system and a  mprotect: To change the protection on the virtual
limited one for user processes. A process can initiate memory portion
other processes by copying the existing process. For  mmap()/munmap()/msync()/mremap(): To map
example, when the system is booting, only init will files into the virtual memory portions
be running. Then the fork() system call is used to  mlock()/mlockall()/munlock()/munlockall():
spawn off copies. This means that it creates a new Super-user routines to refrain memory from being
child that is a true copy of its parent. swapped
You can see that the process scheduler is  swapon()/swapoff(): Super-user routines to add and
also vital for loading, execution, and the proper remove swap files
termination of the user processes.  malloc()/free(): To allocate or free a portion of the
The structure task_struct is used to refer a task. memory for the use of a given process
You can find a field that is used to indicate the state.
That may have any of the following states: ready, Intra-kernel interface
waiting, running, returning from a system call,  verify_area(): To verify that a portion of the user
processing the INT routine and processing SC. You memory is mapped with the necessary permissions
can also find fields that carry information about the  kmalloc()/kfree(): To allocate and free memory for
clock interval and priority. From this, process ID use by the data structures of the kernel
information can be retrieved. If you take a look at the  get_free_page()/free_page(): To allocate and free
files_struct (which is a substructure), you can see the memory pages
list of files opened by the process. Fields concerning
the amount of time the process has spent, can also
be located.
Now we can discuss the aspects concerning inter-operations are made possible. The filesystem has
memory management. Here are a few of the main the following advantages:
points concerning the unique features:  Supports multiple hardware devices
 A large pool of address space (so that user  Supports multiple logical filesystems
programs can refer more memory than the  Supports multiple executable formats
physically available one).  Offers a common interface to the logical
 Memory for a process is private and it cannot filesystems
be modified by another process. The memory  Provides high-speed access to files
manager restricts processes from overwriting  Can restrict a user’s access to files and the user's
code and any read-only data. total file size, with quotas
 The memory-mapping feature can map a file into There are two levels of interfaces here—a system-
a portion of virtual memory and access the file as call interface for the user processes and an internal
memory. interface for other kernel subsystems. File subsystems
 The Fair Access to Physical Memory feature offers expose the data structures and the implementation
good system performance. function for the direct manipulation by other kernel
 The memory manager allows processes to share subsystems. You may note that two interfaces are
portions of their memory. exposed, viz., inodes and files. Please glance at the box
The memory manager offers two interfaces—a for more information.
system-call interface that’s used by user processes We have reached the end of today’s voyage. I look
and another interface used by the kernel subsystems forward to your feedback so that I can incorporate
to perform their actions. Please see the sidebox titled your ideas into our next voyage.
'System-call interface' for a detailed review. Happy kernel hacking! 

Filesystem
We have already seen that Linux has been ported By: Aasis Vinayak PG
to various platforms ranging from computers to The author is a hacker and a free software activist who does
wristwatches. We know that even for one particular programming in the open source domain. He is the developer
device, say a hard drive, there are many differences of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
in the interfaces used by different vendors. Linux www.aasisvinayak.com
supports a large number of logical filesystems. Thus,

www.LinuxForU.com  |  LINUX For You  |  July 2009  |  97


A Voyage to the
Kernel
Part 15

Segment: 3.4, Day 14

W
elcome to another segment of struct thermal_cooling_device *cdev = to_cooling_
our voyage! In this article, we’ll device(dev);
talk about device drivers.
Device drivers are important return cdev->ops->get_max_state(cdev, buf);
when it comes to topics closely related to }
kernel-level programming. The device driver
layer is required to have an interface to all static ssize_t
physical devices. There are three types of device thermal_cooling_device_cur_state_show(struct device *dev,
drivers—character, block and network. The struct device_attribute *attr, char *buf)
character and block devices are specific to the file {
subsystem—for example, in the case of tape drives struct thermal_cooling_device *cdev = to_cooling_
and modems we meddle with character devices device(dev);
(which are to be accessed sequentially), while
block devices can be accessed in any order. return cdev->ops->get_cur_state(cdev, buf);
Here is a sample piece of code that shows a }
‘cooling operation’ (please refer to thermal_sys.
c, in your kernel source, for more details and for static ssize_t
all the steps). thermal_cooling_device_cur_state_store(struct device *dev,
struct device_attribute *attr,
/* sys I/F for cooling device */ const char *buf, size_t count)
#define to_cooling_device(_dev) \ {
container_of(_dev, struct thermal_cooling_device, device) struct thermal_cooling_device *cdev = to_cooling_
device(dev);
static ssize_t int state;
thermal_cooling_device_type_show(struct device *dev, int result;
struct device_attribute *attr, char *buf)
{ if (!sscanf(buf, “%d\n”, &state))
struct thermal_cooling_device *cdev = to_cooling_ return -EINVAL;
device(dev);
if (state < 0)
return sprintf(buf, “%s\n”, cdev->type); return -EINVAL;
}
result = cdev->ops->set_cur_state(cdev, state);
static ssize_t if (result)
thermal_cooling_device_max_state_show(struct device *dev, return result;
struct device_attribute *attr, char *buf) return count;
{ }

www.LinuxForU.com  |  LINUX For You  |  AUGUST 2009  |  101


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

static struct device_attribute dev_attr_cdev_type =


Legend
System-Call resource
__ATTR(type, 0444, thermal_cooling_device_type_show, NULL); Interface VFS dependency
vfs common code
static DEVICE_ATTR(max_state, 0444, subsystem
affs ext ext2
thermal_cooling_device_max_state_show, NULL);
fat hpfs isofs source modules
static DEVICE_ATTR(cur_state, 0644,
minix msdos ncpfs daemon
thermal_cooling_device_cur_state_show,
bin_exec nfs proc smbfs
thermal_cooling_device_cur_state_store);
a.out ELF sysv ufs xiafs

static ssize_t Java script


buffer-cache
thermal_cooling_device_trip_point_show(struct device *dev,
struct device_attribute *attr, char *buf) buffer kflushd
{
struct thermal_cooling_device_instance *instance; device drivers

char block
instance = loop cdrom
ftape keyboard random real-time
clock
container_of(attr, struct thermal_cooling_device_instance, attr); floppy ide
sound mouse mem printers

video XT hard scsi


serial monitor pc watchdog drives
if (instance->trip == THERMAL_TRIPS_NONE)
return sprintf(buf, “-1\n”);
else
Figure 1: File subsystem structure
return sprintf(buf, “%d\n”, instance->trip);
}
CSRs of the peripheral to confirm whether the current
We know that in Linux, a device is accessed as request has been addressed. If it is done, it proceeds to
though it is a file in the filesystem (let’s call this a device the next.
special file). Now you can guess why it is easy to add We go for polling when we consider low-speed
a new part by incorporating the hardware-dependent hardware devices (say a floppy drive). And, finally, in the
code to support the abstract file interface. We know case of DMA, the DD begins a DMA transfer between
that, given a large number of different hardware devices, the system’s main memory and the peripheral. It should
the procedure to write a new device driver should not be noted that this transfer operation also allows the
be complex in any respect. CPU to work on other tasks along with this. Once the
It should be noted that the Linux kernel employs a process is over, the CPU receives an interrupt. You may
buffer cache to effectively deal with block devices, and see that this method is far more complicated than those
access to them is through a buffer cache subsystem. This mentioned earlier.
further improves the overall system performance (if you
wish to know why, then I would say because it reduces the Interrupts
reads and writes to the hardware devices). You may also If a device intends to report a change, it does so using
notice that each hardware device has a request queue and an interrupt to the CPU. The same method is used to
if the buffer cache is not in a position to ‘address’ a request let the CPU know that a task has been completed. If
from in-memory buffers, the buffer cache adds a request to these are of a high value (logical) then the CPU ceases
the device’s request queue. And it will remain dormant till the current operation and begins to execute the Linux
the request has been ‘addressed’. kernel’s interrupt handling code. When a particular
Let’s now go to the technical side of the process. The interrupt is being handled, the other interrupts may
buffer cache essentially uses kflushd (kernel thread) be put on hold until the original interrupt is handled.
to write buffer pages out to the device (and to remove While programming, you need to take note of the
them from the cache). If, for instance, a request has efficiency of the handlers, otherwise other interrupts
been placed before a DD (device driver), then it begins would be lost. Now assume a condition in which the
the process by manipulating the device’s control and CPU is unable to finish the work in the time frame.
status registers (CSRs). There are broadly three modes Then, it will employ a bottom-half handler to handle the
to move data from the core part of the system to the remaining portion of the work. This will be executed by
peripheral device. They are: polling, direct memory the scheduler in the next phase.
access (DMA), and interrupts. I’m sure that these are
familiar terms to even intermediate-level users. But, Let’s experiment with customising your kernel
considering the good number of newbies interested in You may have plenty of reasons to consider
learning about this topic, let me sum up the concepts. customisation. Advanced users will try to patch the
In the first case, the device driver regularly checks the kernel to support hardware that is not currently

102  |  AUGUST 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

/.
/usr
/usr/src
/usr/src/linux-source-2.6.27.tar.bz2
/usr/share
/usr/share/doc
/usr/share/doc/linux-source-2.6.27
/usr/share/doc/linux-source-2.6.27/copyright
/usr/share/doc/linux-source-2.6.27/changelog.Debian.gz

Now, as the root user (you may use: sudo /bin/bash),


execute the following:

cd /usr/src
Figure 2: Using dpkg to list the kernel source files bunzip2 linux-source-2.6.27.tar.bz2
tar xvf linux-source-2.6.27.tar
ln -s linux-source-2.6.27 linux

Then, let’s make a copy of the existing kernel


configuration by issuing the following:

cp /boot/config-`uname -r` /usr/src/linux/.config

Now let’s start customising it:

root@GNU-BOX:/usr/src# cd /usr/src/linux-headers-2.6.27-7
root@GNU-BOX:/usr/src/linux-headers-2.6.27-7#
root@GNU-BOX:/usr/src/linux-headers-2.6.27-7# make menuconfig
HOSTCC scripts/basic/fixdep

Figure 3: Configuring the kernel in Ubuntu HOSTCC scripts/basic/docproc


HOSTCC scripts/kconfig/conf.o
supported by the kernel. And if you are an scripts/kconfig/conf.c: In function ‘conf_askvalue’:
intermediate-level user you might want to take away scripts/kconfig/conf.c:104: warning: ignoring return value of ‘fgets’,
some ‘extra’ features (and support) from the kernel declared with attribute warn_unused_result
and build a custom one with just the essentials. scripts/kconfig/conf.c: In function ‘conf_choice’:
Well, we cannot address patching at this stage. But we scripts/kconfig/conf.c:306: warning: ignoring return value of ‘fgets’,
can speak about how customisation works. Please note that declared with attribute warn_unused_result
this demonstration is on an OS derived from Debian. HOSTCC scripts/kconfig/kxgettext.o
In Segment 1, we discussed commands like uname. HOSTCC scripts/kconfig/lxdialog/checklist.o
Now let’s have: HOSTCC scripts/kconfig/lxdialog/inputbox.o
HOSTCC scripts/kconfig/lxdialog/menubox.o
aasisvinayak@GNU-BOX:~$ uname -r HOSTCC scripts/kconfig/lxdialog/textbox.o
2.6.27-7-generic HOSTCC scripts/kconfig/lxdialog/util.o
HOSTCC scripts/kconfig/lxdialog/yesno.o
We’ve got the version number. Now let’s get the HOSTCC scripts/kconfig/mconf.o
source code by issuing the following command: SHIPPED scripts/kconfig/zconf.tab.c
SHIPPED scripts/kconfig/lex.zconf.c
sudo apt-get install linux-source-2.6.27 kernel-package libncurses5- SHIPPED scripts/kconfig/zconf.hash.c
dev fakeroot HOSTCC scripts/kconfig/zconf.tab.o
In file included from scripts/kconfig/zconf.tab.c:2486:
You can use dpkg to show you the files. Please scripts/kconfig/confdata.c: In function ‘conf_write’:
refer to the screenshot in Figure 2. scripts/kconfig/confdata.c:501: warning: ignoring return value of ‘fwrite’,
You can also see that the source is located at declared with attribute warn_unused_result
/usr/src/. scripts/kconfig/confdata.c: In function ‘conf_write_autoconf’:
scripts/kconfig/confdata.c:739: warning: ignoring return value of ‘fwrite’,
aasisvinayak@GNU-BOX:~$ dpkg -L linux-source-2.6.27 declared with attribute warn_unused_result

www.LinuxForU.com  |  LINUX For You  |  AUGUST 2009  |  103


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

Figure 6: Built-in options


Figure 4: The config dialogue box

Figure 7: Changing the configuration

Figure 5: Saving an alternate configuration file

scripts/kconfig/confdata.c:740: warning: ignoring return value of ‘fwrite’,


declared with attribute warn_unused_result
In file included from scripts/kconfig/zconf.tab.c:2487:
scripts/kconfig/expr.c: In function ‘expr_print_file_helper’:
scripts/kconfig/expr.c:1090: warning: ignoring return value of ‘fwrite’,
declared with attribute warn_unused_result
HOSTLD scripts/kconfig/mconf
scripts/kconfig/mconf arch/x86/Kconfig

Figure 8: Running make clean


Note: For the time being, let us set aside the details
concerning confdata.c and expr.c. But you can use
the above code as a reference in subsequent discussions. After making the changes, we can save the new
configuration and exit:
This will launch a configuration dialogue box that
will allow you to configure various aspects with regards # using defaults found in /boot/config-2.6.27-7-generic
to your kernel. You will get an option to save your #
alternate configuration file and later use the same. This #
is elucidated in Figures 4 and 5. # configuration written to .config1
You can also see that options like kernel debugging #
will have an * (asterisk) attached to it, which means that
those features are built in.
Now let’s go back to the general set-up to make a few *** End of Linux kernel configuration.
changes (see Figure 7). If you stumble upon something *** Execute ‘make’ to build the kernel or try ‘make help’.
new while configuring the kernel, you can rely on the
Help button. Time for a make clean? Refer to Figure 8 for a

104  |  AUGUST 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

typical terminal output. */


So far, so good… Now about compilation: SEMOPM 32 /* ~ 100 max num of ops per semop call */
SEMVMX 32767 /* semaphore maximum value */
fakeroot make-kpkg –initrd –append-to-version=-custom kernel_
image kernel_headers /* unused */
SEMUME SEMOPM /* max num of undo entries per process */
This command is most likely to work. If SEMMNU SEMMNS /* num of undo structures system wide */
anything goes wrong, just delete the entry in GRUB SEMAEM (SEMVMX >> 1) /* adjust on exit max value */
and give another try after logging into the system SEMMAP SEMMNS /* # of entries in semaphore map */
using the old kernel version. SEMUSZ 20 /* sizeof struct sem_undo */

Logical filesystems 3. Shared memory


It is quite comfortable to access block devices with /*
a logical filesystem. This can be mounted in the * Keep _SHM_ID_BITS as low as possible since SHMMNI
virtual filesystem; which implies that the block * depends on it and there is a static array of size SHMMNI.
device contains file (and structure) details, which */
permit the logical filesystem to access the device. _SHM_ID_BITS 7
You need to note that the device can only support _SHM_IDX_BITS 15
one logical filesystem at a time.
You can guess that when a filesystem is /*
made to mount as a subdirectory, the directories * _SHM_ID_BITS + _SHM_IDX_BITS must be <= 24 on the i386 and
and files existing on the device will be seen as * SHMMAX <= (PAGE_SIZE << _SHM_IDX_BITS).
subdirectories of the mount point (which itself is */
a subdirectory). This also offers a great amount SHMMAX 0x2000000 /* max shared seg size (bytes) */
of abstraction (as the users need not know the SHMMIN 1 /* really PAGE_SIZE */ /* min shared seg size (bytes) */
specific details of the logical filesystem) and SHMMNI (1<<_SHM_ID_BITS) /* max num of segs system wide */
flexibility. SHMALL (1<<(_SHM_IDX_BITS+_SHM_ID_BITS))
You might already know that Linux uses the /* max shm system wide (pages) */
concept of inodes to represent a file on a block SHMLBA PAGE_SIZE /* attach addr a multiple of this */
device and one can also comprehend that the idea SHMSEG SHMMNI /* max shared segs per process */
is virtual. It can be used as a storage location for
any information concerning an open file on the 4. Number of open files
disk, and it stores associated buffers (and even the NR_INODE 3072 /* this should be bigger than NR_FILE */
mapping between device blocks and file offsets). NR_FILE 1024 /* this can well be larger on a larger system */
Now, it is time to wind up this voyage. IPC, external
Linux kernel limits interface and data structures are some of the important
Here is an overview of the default Linux kernel topics that we will discuss next month. Experiment
limits (the kernel parameter): with your kernel and let me know the results.
1. Maximum number of processes Happy kernel hacking! 
NR_TASKS 512
MAX_TASKS_PER_USER (NR_TASKS/2) By: Aasis Vinayak PG
The author is a hacker and a free software activist who does
2. Semaphores programming in the open source domain. He is the developer
SEMMNI 128 /* ? max # of semaphore identifiers */ of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
SEMMSL 32 /* <= 512 max num of semaphores per id */
www.aasisvinayak.com
SEMMNS (SEMMNI*SEMMSL) /* ? max # of semaphores in system

A Portal For Technologists And Technopreneurs

www.LinuxForU.com  |  LINUX For You  |  AUGUST 2009  |  105


A Voyage to the
Kernel
Part 16
Segment: 3.5, Day 15
We are dedicating today’s voyage to learning more theory and clarifying the
doubts of beginners.

Inter-Process Communication (IPC) •Shared memory: As the name suggests, this


The Linux Inter-Process Communication (IPC) mechanism facilitates accessing a given
mechanism is an essential tool that allows the ‘sync portion of physical memory by several
operation’ between processes and facilitates sharing processes.
of resources. IPC is also employed for exchanging The following code shows the notification
information with another program. Thus, this is mechanism for IPC namespaces:
implemented using shared resources, kernel data
structures and wait queues. Refer to Figure 1 for the #include <linux/msg.h>
subsystem structure of IPC. #include <linux/rcupdate.h>
Here are the main IPC implementation methods #include <linux/notifi er.h>
(note: the following are generalised descriptions): #include <linux/nsproxy.h>
 Signals: These are the oldest form of UNIX IPC #include <linux/ipc_namespace.h>
and they are actually asynchronous messages sent
to a process. #include “util.h”
 Wait queues: The system can use this to put
a process in sleep mode, if the corresponding static BLOCKING_NOTIFIER_HEAD(ipcns_chain);
operation is not yet completed ( for example,
bottom-half handling by the process scheduler). static int ipcns_callback(struct notifi er_block *self,
 Pipes (and named pipes): By using a pipe unsigned long action, void *arg)
connection, a connection-oriented bi-directional {
data transfer between any two processes (or, via a struct ipc_namespace *ns;
named pipe in the filesystem) can be done.
 File locks: This IPC allows a process to declare a switch (action) {
file (or part) as read-only to all other processes. case IPCNS_MEMCHANGED: /* amount of lowmem has
Hence the one that holds the lock can only changed */
modify it. case IPCNS_CREATED:
 Unix Domain sockets: This also is a case IPCNS_REMOVED:
connection-oriented data-transfer mechanism /*
like the pipe, and the implementation is akin * It’s time to recompute msgmni
to the INET sockets. */
 System V IPC: ns = container_of(self, struct ipc_namespace, ipcns_nb);
• Semaphores: This IPC model allows the /*
creation of arrays of semaphores. * No need to get a reference on the ns: the 1st job of
• Message queues: This is a connectionless * free_ipc_ns() is to unregister the callback routine.
data-transfer model. A message is * blocking_notifi er_chain_unregister takes the wr lock to do
essentially a sequence of bytes. And by * it.
reading the message queues, it is retrieved. * When this callback routine is called the rd lock is held by
This also uses an associated type that * blocking_notifi er_call_chain.
restricts the message read. * So the ipc ns cannot be freed while we are here.

100  |  SEPTEMBER 2009 | LINUX FoR YoU | www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

*/ File IPC
System-Call
recompute_msgmni(ns);
Interface fifo pipes
break;
default:
break;
System V IPC
}
Message
Quees
return NOTIFY_OK;
Shared Semaphores
} Quees

int register_ipcns_notifier(struct ipc_namespace *ns)


{ Legend
resource
int rc; dependency
Net IPC Kernel IPC
subsystem
Domain Wait Signals
memset(&ns->ipcns_nb, 0, sizeof(ns->ipcns_nb)); Sockets Queues source modules
ns->ipcns_nb.notifier_call = ipcns_callback;
ns->ipcns_nb.priority = IPCNS_CALLBACK_PRI; Figure 1: Subsystem structure of IPC
rc = blocking_notifier_chain_register(&ipcns_chain, &ns->ipcns_nb);
if (!rc)
ns->auto_msgmni = 1; IPC
return rc;
}

int cond_register_ipcns_notifier(struct ipc_namespace *ns)


Process Scheduler Memory Management File System
{
int rc;
Figure 2: Sub-system dependencies of IPC
memset(&ns->ipcns_nb, 0, sizeof(ns->ipcns_nb));
ns->ipcns_nb.notifier_call = ipcns_callback; group). Before a process initiates, the scheduler will look for
ns->ipcns_nb.priority = IPCNS_CALLBACK_PRI; a signal and if it finds one, the scheduler uses the do_
rc = blocking_notifier_chain_cond_register(&ipcns_chain, signal() function to handle the signal sent. The second type
&ns->ipcns_nb); deals with any process that is in a ‘waiting state’ (awaiting
if (!rc) a kernel event, say the conclusion of a DMA transfer). A
ns->auto_msgmni = 1; process can go for this by just calling the sleep_on() function
return rc; or the interruptable_sleep_on() function. Likewise, the
} wake_up() function or wake_up_interruptable() function is
used to unlist from the queue.
void unregister_ipcns_notifier(struct ipc_namespace *ns) In the case of pipes, a file descriptor will refer to the pipe,
{ and one page of memory (a circular buffer) is allotted with
blocking_notifier_chain_unregister(&ipcns_chain, &ns->ipcns_nb); the opened pipe. Here, I should clarify that if it tries to read
ns->auto_msgmni = 0; more data than what is available, it will result in a block.
} Restricting the access to a file is quite important in a typical
Linux kernel and this can be done with the help of file-locks.
int ipcns_notify(unsigned long val) The implementation of UNIX domain sockets is akin to pipes
{ (using the circular buffer). But UNIX domain sockets can offer a
return blocking_notifier_call_chain(&ipcns_chain, val, NULL); separate buffer for each communication direction.
} When it comes to semaphores, I must point out that it is,
in fact, implemented using wait queues and follows a classical
Though we have discussed IPC mechanisms earlier, I think semaphore model, as I mentioned before. Every such thing will
it is necessary to go over some features of these IPC methods. have an associated value. Up() and down() operations can be
We mainly use a signal to notify a process and it changes the done using this. It operates in such a way that when its value is
state of the receiving process. zero, the corresponding process (that does the decrement on
Theoretically, the machine can send these signals to any it) is blocked on the wait queue.
executing process. The point to note is that if it is a user A message queue can be viewed as a linear linked-list,
process, it can send a signal to a process that carries an where a process can read/write information (as series of bytes).
associated PID (process ID) or GID (in the case of a process There are two wait queues in this case. The first wait queue is

www.LinuxForU.com  |  LINUX For You  |  SEPTEMBER 2009  |  101


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

_ _add_ _wait_queue() ads task to a wait queue, sets processes that are reading (/writing) from (/to) the pipe.
the task’s state to TASK_INTERRUPTIBLE, and calls
schedule(), schedule() calls deactivate_task()which
For file locks we have file_lock structure. This has a pointer
removes the task from the runqueue. to a task_struct for the owning process, a file descriptor of the
locked file, and a wait queue for processes and the area where
it is locked. For every open file, the file_lock structures will link
receives a signal to a list. And socket data structure is used for representing the
TASK_RUNNING task state is se to TASK_RUNNING TASK_INTERRUPTIBLE
and task executes signal handler
UNIX domain sockets.
When it comes to the system V IPC objects, we can see
that these are created in the kernel. The associated access
Event the task is waiting for occurs, and permissions are mentioned in the ipc_perm structure.
try_to_wake_up() sets the task to TASK_RUNNING,
calls activate_task() to add the task to a runqueue, and
Semaphores are represented with the sem structure (which
calls schedule(). _ _remove_wait_queue() removes the also has the value and the PID of the process that used
task fromt the wait queue.
semaphore). The arrays are done using the semid_ds structure
Figure 3: The interrupt process (that carries access permissions and the time of the last
operation, the pointer to the first semaphore in the list, etc).
The Sem_undo (structure) is employed to get an array of
Application 1 Application 2 Application 3
semaphore operations performed by a process (used during the
user-space killing of the process).
Message queues are represented by the msquid_ds
structure, which also has the control and management
information. It can store the following:
 access permissions
 the current number of bytes in the queue
System Call Interface
 the number of messages
kernel-space  the size of the queue
Kernel Subsystems
 the process number of the last message sent
 the process number of the last message received
 the time of the last change and message sent and received
Device Drivers
 link fields to implement the message queue
And a message is recorded in the kernel with a msg
hardware structure. This has information about the link field ( for the link
list), the message type, address of data (message data) and the
length. The Shmid_ds structure is used to represent the shared
Figure 4: The generalised architecture memory implementation. It has access control permissions,
PIDs of the creator, number of processes to which the shared
for any process that is writing to a full message queue, while physical memory region (and number of pages that make this
the second one is for serialising the message writes. The size of zone) is linked, detach and change times, etc.
the message is found when the message queue is generated. We can now summarise the overall architecture using
Out of all these, shared memory is supposed to be the Figure 4. (You can see that this is a very generalised view to just
fastest IPC mechanism. As we discussed, this IPC allows put the idea in a nutshell; for the exact structure, please refer to
processes to share the physical memory. MMS (memory the earlier columns.)
management system) does the creation of the shared physical
memory regions. The system call sys_shmat() links the shared Newbie zone
pages to the user processes virtual memory space. I’m going to discuss a few things that I have been telling novice
programmers. First of all, let’s look at the differences between
Data structures (for IPC mechanisms) Linux and the UNIX kernel. Since Linux is based on the UNIX
Now, we can talk about the data structures to implement these architecture, there are many similarities. But Linux is not just a
IPC mechanisms. The implementation of signals is done using copy of UNIX.
the signal field in the task_struct structure. And every signal is The most interesting feature of Linux is that it allows
conveyed using a bit in the field. So the total number of signals you to load modules dynamically. We have already learned
is limited to the number of bits in a word. how to write a module and load that to the kernel. This
The wait_queue structure, which has a pointer to the feature is remarkable especially when you find that the
associated task_struct, is the one linked to wait queues. Inode kernel is monolithic in nature, but can still support this one
is the filesystem for pipes. It records the extra pipe-specific feature. Another characteristic that needs special emphasis
information in the pipe_inode_info. And it has a wait queue is the treatment of threads in Linux. It views threads and
( for the processes), the quantity of data, and the number of normal processes with the same eye. It has symmetrical

102  |  SEPTEMBER 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

multi-processor (SMP) support (though this is available in HOSTCC scripts/kconfig/conf.o


most of the modern UNIX derivatives, but it is not present scripts/kconfig/conf.c: In function ‘conf_askvalue’:
in the traditional ones) and is pre-emptive. Thus you can scripts/kconfig/conf.c:104: warning: ignoring return value of ‘fgets’, declared
pre-empt any task even if it is running in the kernel. You with attribute warn_unused_result
might have seen this in derivatives like IRIX and Solaris, but scripts/kconfig/conf.c: In function ‘conf_choice’:
not in traditional ones. scripts/kconfig/conf.c:306: warning: ignoring return value of ‘fgets’, declared
The operating system supports an object-oriented device with attribute warn_unused_result
model and sysfs (user-space device filesystem). And you can HOSTCC scripts/kconfig/kxgettext.o
also see that features like STREAMS are not available in Linux. SHIPPED scripts/kconfig/zconf.tab.c
Linux designers avoided many such poorly deployed ideas. SHIPPED scripts/kconfig/lex.zconf.c
What else? It is free and open source! SHIPPED scripts/kconfig/zconf.hash.c
HOSTCC scripts/kconfig/zconf.tab.o
Tips In file included from scripts/kconfig/zconf.tab.c:2486:
1. You can join the Linux kernel mailing list. To subscribe, go scripts/kconfig/confdata.c: In function ‘conf_write’:
to http://vger.kernel.org. One thing that you need to take scripts/kconfig/confdata.c:501: warning: ignoring return value of ‘fwrite’,
care of is that when you subscribe to lists like linux-kernel. declared with attribute warn_unused_result
vger.kernel.org, please create a new folder in your e-mail scripts/kconfig/confdata.c: In function ‘conf_write_autoconf’:
client (or create a filter, if you are in Gmail). Since this is a scripts/kconfig/confdata.c:739: warning: ignoring return value of ‘fwrite’,
very high traffic list, you will get about 200 messages per declared with attribute warn_unused_result
day! By subscribing, you can keep yourself updated and scripts/kconfig/confdata.c:740: warning: ignoring return value of ‘fwrite’,
interact with other developers. declared with attribute warn_unused_result
In file included from scripts/kconfig/zconf.tab.c:2487:
2. I have already mentioned the ways to download the kernel scripts/kconfig/expr.c: In function ‘expr_print_file_helper’:
source to your system (both by a direct download and scripts/kconfig/expr.c:1090: warning: ignoring return value of ‘fwrite’,
command-line based). You can find the source under /usr/ declared with attribute warn_unused_result
src/your_linux_kernel_version. You don’t need to use this HOSTLD scripts/kconfig/conf
when you are performing some edits. You can maintain *** Default configuration is based on ‘i386_defconfig’
a directory under your home folder and do the trials. You #
need to switch to root (su -) only when you are about to # configuration written to .config
install it. Also, when you move from one version to another #
you don’t have to go for a full upgrade, but use patches. For
an incremental patch, you can run the code below (after After executing this, you can find the .config file in
getting inside your source tree): your source tree with entries like this:

$ patch pI < ../patch-a.b.c # Automatically generated make config: don’t edit


# Linux kernel version: 2.6.27.18
Normally, we apply this against the preceding version # Tue Aug 12 08:39:24 2009
of the kernel that you want to use. #
3. There are many utilities that support you while # CONFIG_64BIT is not set
programming and debugging. There are even GUI CONFIG_X86_32=y
versions available for many programs. The make config # CONFIG_X86_64 is not set
(command line) utility allows you to do the modifications CONFIG_X86=y
(customisation) to the kernel (please refer to Day 14). CONFIG_ARCH_DEFCONFIG=”arch/x86/configs/i386_defconfig”
You can also try a ncurses-based graphical utility like
menuconfig for this. Make xconfig (or try gconfig if you make oldconfig is another utility that helps you in
need a gtk+ based one) is another X11-based graphical validating and updating the configuration.
utility that aids you. You can make use of all these for Please note that if you are on an earlier kernel,
customisation options. make dep may be required. You can see that I am now
By using make defconfig you can make a config based trying a more recent version. There are a few hurdles
on the default setting that suits your architecture. This when you change the versions. I remember one reader
is a very good tool for beginners since you may not have had an issue when he used macros instead of syscall. For
configured the kernel earlier: version 2.6.18 onwards, the API has been changed and
you need to use syscall (‘…performs the system call whose
root@GNU-BOX:/usr/src/linux-source-2.6.27# make defconfig assembly language interface has the specified number
HOSTCC scripts/basic/fixdep with the specified arguments.’). More details are available
HOSTCC scripts/basic/docproc at: www.kernel.org/doc/man-pages/online/pages/man2/

www.LinuxForU.com  |  LINUX For You  |  SEPTEMBER 2009  |  103


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

Priority: standard
Description: FAT filesystem support
This includes Windows FAT and VFAT support.

Package: fb-modules
Depends: kernel-image
Priority: standard
Description: Framebuffer modules

Back to action
When you are programming, you can find that the kernel is
not linked to the standard C library, mostly due to the size and
speed factors (as even a decent subset of the C library is quite
large for the kernel). But you can use many libc functions (say,
functions for string manipulation) as they are implemented
Figure 5: Kernel configuration using the graphical gconf tool inside the kernel.
Now, you might ask about the printf() in C. Of course,
syscall.2.html the kernel can’t access printf() but it has access to printk(). It
4. To minimise the build noise, you can use the following uses the syslog program to read the log buffer entries made by
command: printk() and its ( function’s) usage is similar to that of printf().

make > ../some_out Process descriptor and the task structure


Task list (a circular doubly-linked list) is used by the kernel to
This will just direct the output from make (but you record the list of processes.
can still see the errors). I would suggest that you redirect Each element is a process descriptor of type struct task_
them to /dev/null. struct and the reference can be found in the /linux/sched.h:
5. An advanced feature: The make program allows you to split
the entire build process into a number of jobs. So, if your extern void proc_sched_show_task(struct task_struct *p, struct seq_file *m);
processor is quite efficient, you can split the process and extern void proc_sched_set_task(struct task_struct *p);
run the jobs concurrently. This is a recommended way for extern void
hackers, as this will reduce your I/O wait time. You can just print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq);
use the command given below: #else
static inline void
make -jn proc_sched_show_task(struct task_struct *p, struct seq_file *m)
{
...where n is the number of jobs. You may choose this }
(value of n) based on your processor. static inline void proc_sched_set_task(struct task_struct *p)
When it comes to the kernel installation side, it is {
better that you follow your architecture and boot loader }
specific instructions. But for modules you can simply issue static inline void
the command below as the root: print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
{
make modules_install }
#endif
6. Novices can use the files like the package-list, which may
be located in the root of the source ( for example, /linux- extern unsigned long long time_sync_thresh;
source-kernel_version/package-list) to comprehend the files
included, dependency (kernel image and modules), etc. The process descriptor has all the data concerning a
specific process. The task_struct structure is allocated using
Package: acpi-modules the slab allocator (to enable cache colouring and object reuse).
Depends: kernel-image If you look at Figure 6, you can see the struct thread_info. It
Priority: standard remains on the top and bottom of it, for the stacks that grow
Description: Support for ACPI up. The following code shows the thread_info structure defined
on an x86 as:
Package: fat-modules
Depends: kernel-image struct thread_info {

104  |  SEPTEMBER 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

If you wish to see the complete code, please look at


Proecss Kernel Stack the kernel/sched.c (see struct runqueue in the code) file.
highest momory address The real-timeclasses’ related field in a runqueue is given
Start of Stack
here for your reference:

struct rt_rq {
struct rt_prio_array active;
Stack pointer
unsigned long rt_nr_running;
#if defined CONFIG_SMP || defined CONFIG_RT_GROUP_SCHED
int highest_prio; /* highest queued rt task prio */
#endif
struct thread_struct #ifdef CONFIG_SMP
current_thread_info () lowest momory address
unsigned long rt_nr_migratory;
thread_info has a pointer to the process descriptor
int overloaded;
#endif
the process’s struct task_struct
int rt_throttled;
u64 rt_time;
Figure 6: Process descriptor and stack u64 rt_runtime;
/* Nests inside the rq lock: */
struct task_struct *task; spinlock_t rt_runtime_lock;
struct exec_domain *exec_domain;
unsigned long flags; #ifdef CONFIG_RT_GROUP_SCHED
unsigned long status; unsigned long rt_nr_boosted;
__u32 cpu;
__s32 preempt_count; struct rq *rq;
mm_segment_t addr_limit; struct list_head leaf_rt_rq_list;
struct restart_block restart_block; struct task_group *tg;
unsigned long previous_esp; struct sched_rt_entity *rt_se;
__u8 supervisor_stack[0]; #endif
}; };

You may note that the task element of the structure is a #ifdef CONFIG_SMP
pointer to a task’s task_struct.
Runqueue is basically a list of processes that are
Runqueues scheduled to run in a processor, and there will be one
The fundamental data structure in the scheduler such list for each processor. But a given process will not
is runqueue. For its definition, please see the code appear in two lists. You might have noticed that I have
(commented) below: referred to a .c file and not a header one. If you call a
header file, it can include codes outside of the scheduler.
struct runqueue { In order to prevent this from happening, we refer to the
spinlock_t lock; /* spin lock that protects this runqueue */ other file. Two macros are important when it comes to
unsigned long nr_running; /* number of runnable tasks */ runqueue, and they are the macro cpu_rq (which returns
unsigned long nr_switches; /* context switch count */ the pointer to the runqueue linked to a particular
unsigned long expired_timestamp; /* time of last array swap */ processor) and the this_rq() macro (which points to the
unsigned long nr_uninterruptible; /* uninterruptible tasks */ current one). There are other macros also that are linked
unsigned long long timestamp_last_tick; /* last scheduler tick */ to this (like task_rq(task)).
struct task_struct *curr; /* currently running task */ We are done for today. Wait for the next instalment
struct task_struct *idle; /* this processor’s idle task */ to continue the voyage. Happy kernel hacking! 
struct mm_struct *prev_mm; /* mm_struct of last ran task */
struct prio_array *active; /* active priority array */
struct prio_array *expired; /* the expired priority array */ By: Aasis Vinayak PG
struct prio_array arrays[2]; /* the actual priority arrays */ The author is a hacker and a free software activist who does
struct task_struct *migration_thread; /* migration thread */ programming in the open source domain. He is the developer
struct list_head migration_queue; /* migration queue*/ of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
atomic_t nr_iowait; /* number of tasks waiting on I/O */
www.aasisvinayak.com
};

www.LinuxForU.com  |  LINUX For You  |  SEPTEMBER 2009  |  105


A Voyage to the
Kernel
Part 17

Segment: 3.6, Day 16

W
e have covered the details handler does an important job—acknowledging
concerning interrupts and the receipt of interrupts and moving data to/from
their descriptions during the the hardware. All this is covered in the 'top half'.
previous day of our voyage. Let's take an example. Assume that you want
But one point I deliberately missed was that to transfer some data from hardware into memory.
interrupt handlers form only the first half of the This can be handled in the top part and the
interrupt processing method. The main issue processing can be done in the bottom part.
with handlers is that they run asynchronously This is an important feature as far as the
and may even interrupt with other codes programmer is concerned. This enables the
(sometimes even critical ones!). The best way device manager coder to divide the work so as
to avoid this is to run the handlers as quickly as to get the best results. Here are a few guidelines
possible. You can understand this point more you can use while you write your driver:
clearly if you can visualise the way in which  If the work is time-specific (say, you want to
they deal with your hardware. For this, you may finish it soon), use a handler for the work.
consider handlers as part of a whole mechanism  If it is hardware-specific, priority should
set up to manage actual hardware interrupts. again be given to the interrupt handler.
So we can use them for all such time-critical  If you wish to handle the processing of any
activities. data, then consider the bottom half.
But we need to route the 'less critical'
part to another portion where interrupts are
enabled. Thus, managing the interrupts is two- Tip: Before you code your own driver, you
fold. In the first segment (let's call that as 'top are advised to take a look at existing interrupt
half '), the handlers are executed by the kernel handlers and bottom halves. And the key point
asynchronously (as mentioned before) as a you need to remember is: the quicker the
response to the hardware interrupt. When it (handler) execution, the better.
comes to the second part, we deal with actions,
which are linked to interrupts that are left out
by the handler. By using the bottom part, you can limit the
Technically speaking, this second part holds work that you intend to do in the handlers since
the lion's share of the whole work (since we are they run with the current interrupt line disabled
giving only 'quick' works to handlers). But the on all processors. And in the worst case, those

98  |  OCTOBER 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

that employ SA_INTERRUPT will have all local */


interrupts disabled! Hence, you can see that reducing
the time allocated for this (that is, the time for which enum
the interrupts remain disabled) is vital when it comes {
to the overall performance of the system. HI_SOFTIRQ=0,
The logical part is very vital when you decide TIMER_SOFTIRQ,
when you want to draw the line that separates the top NET_TX_SOFTIRQ,
and bottom half (terminologically it should be part NET_RX_SOFTIRQ,
instead of half ). The whole point of this division is to BLOCK_SOFTIRQ,
improvise the system performance. You can see that TASKLET_SOFTIRQ,
by this division we can actually postpone many works. SCHED_SOFTIRQ,
You can perform them when the system is 'less busy'. #ifdef CONFIG_HIGH_RES_TIMERS
In most cases, these bottom halves run just after the HRTIMER_SOFTIRQ,
interrupt returns. This separation will enable us to #endif
have the best system performance. RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq
There are various mechanisms that allow you to */
implement the bottom part effectively. Historically,
Linux offered the 'bottom half ' (BH) for meddling with NR_SOFTIRQS
all the bottom parts. The original interface was quite };
simple and somewhat elegant, providing a statically
created list of 32 bottom halves. The top part set the /* softirq mask and active fields moved to irq_cpustat_t in
bottom to run by assigning a bit in a 32-bit integer. * asm/hardirq.h to get better cache usage. KAO
Each BH was globally synchronised and no two BHs */
were allowed to run in parallel.
But this model had many disadvantages, the struct softirq_action
most important being the inflexibility associated {
with it. The queue had a linked list of functions, void (*action)(struct softirq_action *);
which could be called, and these would be executed };
during the process as per the schedule. And the
driver could register the bottom halves in respective asmlinkage void do_softirq(void);
queues. But this couldn't replace the old BH entirely. asmlinkage void __do_softirq(void);
Unfortunately, we were unable to handle sub-systems extern void open_softirq(int nr, void (*action)(struct softirq_action *));
like networking. But during 2.3 kernel development extern void softirq_init(void);
series, programmers addressed this problem by #define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); }
introducing softirqs and tasklets. The following code while (0)
illustrates softirq 'linked portion' of the interrupt extern void raise_softirq_irqoff(unsigned int nr);
(header) file in kernel source: extern void raise_softirq(unsigned int nr);

#define set_softirq_pending(x) (local_softirq_pending() = (x)) /* This is the worklist that queues up per-cpu softirq work.
#define or_softirq_pending(x) (local_softirq_pending() |= (x)) *
#endif * send_remote_sendirq() adds work to these lists, and
* the softirq handler itself dequeues from them. The queues
/* Some architectures might implement lazy enabling/disabling of * are protected by disabling local cpu interrupts and they must
* interrupts. In some cases, such as stop_machine, we might want * only be accessed by the local cpu that they are for.
* to ensure that after a local_irq_disable(), interrupts have */
* really been disabled in hardware. Such architectures need to DECLARE_PER_CPU(struct list_head [NR_SOFTIRQS], softirq_work_list);
* implement the following hook.
*/ /* Try to send a softirq to a remote cpu. If this cannot be done, the
#ifndef hard_irq_disable * work will be queued to the local cpu.
#define hard_irq_disable() do { } while(0) */
#endif extern void send_remote_softirq(struct call_single_data *cp, int cpu, int
softirq);
/* PLEASE, avoid to allocate new softirqs, if you need not _really_ high
frequency threaded job scheduling. For almost all the purposes /* Like send_remote_softirq(), but the caller must disable local cpu
tasklets are more than enough. F.e. all serial device BHs et interrupts
al. should be converted to tasklets, not to softirqs. * and compute the current cpu, passed in as 'this_cpu'.

www.LinuxForU.com  |  LINUX For You  |  OCTOBER 2009  |  99


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

*/ do {
extern void __send_remote_softirq(struct call_single_data *cp, int cpu, if (pending & 1) {
int this_cpu, int softirq); h->action(h);
rcu_bh_qsctr_inc(cpu);
After their introduction the only issue was the }
compatibility with existing drivers. h++;
softirqs are essentially a set of 32 statically defined pending >>= 1;
bottom halves. They can be made to run in two similar } while (pending);
processors, simultaneously. tasklets, unlike softirqs, are
dynamic and are actually built on top of softirqs. And they local_irq_disable();
can be made to run on different processors, in parallel.
When you begin actual programming, you can see pending = local_softirq_pending();
that tasklets are enough for handling your bottom if (pending && --max_restart)
half processing requirements. You may also note that goto restart;
sometimes we need softirqs for tasks like networking
(owing to reasons concerning performance). The only if (pending)
point you need to keep in your mind is that this requires wakeup_softirqd();
much attention. Now let's have a glance at the initiation:
__local_bh_enable();
#ifndef __ARCH_IRQ_STAT }
irq_cpustat_t irq_stat[NR_CPUS] ____cacheline_aligned; #ifndef __ARCH_HAS_DO_SOFTIRQ
EXPORT_SYMBOL(irq_stat);
#endif asmlinkage void do_softirq(void)
{
static struct softirq_action softirq_vec[32] __cacheline_aligned_in_smp; __u32 pending;
static DEFINE_PER_CPU(struct task_struct *, ksoftirqd); unsigned long flags;

static inline void wakeup_softirqd(void) if (in_interrupt())


{ return;
/* Interrupts are disabled: no need to stop preemption */
struct task_struct *tsk = __get_cpu_var(ksoftirqd); local_irq_save(flags);

if (tsk && tsk->state != TASK_RUNNING) pending = local_softirq_pending();


wake_up_process(tsk);
} if (pending)
__do_softirq();
#define MAX_SOFTIRQ_RESTART 10
local_irq_restore(flags);
asmlinkage void __do_softirq(void) }
{
struct softirq_action *h; EXPORT_SYMBOL(do_softirq);
__u32 pending;
int max_restart = MAX_SOFTIRQ_RESTART; #endif
int cpu;
Another aspect that needs the attention of the
pending = local_softirq_pending(); programmer is that softirqs need to be registered
statically (during compilation) while the code can
local_bh_disable(); dynamically register tasklets. You may also note that
cpu = smp_processor_id(); converting BHs to softirqs (or even tasklets) is a non-
restart: trivial thing!
/* Reset the pending bitmask before enabling irqs */ Fortunately, the 'conversion' later materialised in
local_softirq_pending() = 0; the 2.5 series development. tasklet finally appeared
in the apparel of a modified softirq, which could be
local_irq_enable(); handled easily. (Now you can understand why some
authors of literature-type-texts refer bottom halves
h = softirq_vec; as software interrupts or softirqs.) This finally led to

100  |  OCTOBER 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

the three bottom-half mechanisms that we mostly


deal with (in 2.6 series). The mechanisms are softirqs, struct softirq_action {
tasklets, and work queues. void (*action)(struct softirq_action *); /* function to run */
It is worth mentioning the kernel timer here void *our_data; /* data to pass to the function
itself. This mechanism essentially performs a */
postponement of the tasks for specific intervals };
of time. We will discuss its technical details in the
coming days. Correspondingly, a 32-entry array of this
In this context, you may have a glance at the code structure can be found in the above code file
portion that handles kernel internal timers, kernel (softirq.c). Since one softirq needs one entry, there
timekeeping and basic process system calls. This is can be a maximum of 32 registered softirqs only.
initiated by: Also, you may see that the kernel actually uses only
a fewer entries (out of this 32).
static DEFINE_PER_CPU(tvec_base_t, tvec_bases) = { SPIN_LOCK_ Here is a softirq handler for your reference:
UNLOCKED };
void softirq_handler(struct softirq_action *)
static void check_timer_failed(struct timer_list *timer)
{ The kernel uses a similar action function
static int whine_count; with a pointer to the respective softirq_action
if (whine_count < 16) { structure, when it runs the softirq handler. It is
whine_count++; worth mentioning that the kernel passes the entire
printk("Uninitialised timer!\n"); structure and this facilitates future additions to
printk("This is just a warning. Your computer is OK\n"); the structure without redoing the handler. The
printk("function=0x%p, data=0x%lx\n", handler retrieves the data value by dereferencing
timer->function, timer->data); the argument and looking for the data member.
dump_stack(); Also, you may find that a softirq never attempts
} to preempt another softirq, and the only way to
/* preempt a softirq is by deploying an interrupt
* Now fix it up handler.
*/ You need to check that the registered softirq is
spin_lock_init(&timer->lock); marked properly before executing it (technically
timer->magic = TIMER_MAGIC; termed as raising the softirq) and, normally, the
} handler marks the corresponding softirq for
execution before returning.
And the 'starting' is done using: The pending ones are executed in the following
cases:
void add_timer_on(struct timer_list *timer, int cpu)  Return from a hardware interrupt code
{  ksoftirqd kernel thread
tvec_base_t *base = &per_cpu(tvec_bases, cpu);  Codes that look for pending softirqs
unsigned long flags; The execution occurs when do_softirq()
is called. You can use this just by using the
BUG_ON(timer_pending(timer) || !timer->function); programming logic: if there are any pending ones,
perform do_softirq() loop. Now let's look at this part
check_timer(timer); of do_softirq():

spin_lock_irqsave(&base->lock, flags); u32 pending = softirq_pending(cpu);


internal_add_timer(base, timer);
timer->base = base; if (pending) {
spin_unlock_irqrestore(&base->lock, flags); struct softirq_action *h = softirq_vec;
}
softirq_pending(cpu) = 0;
You can see the actual code that governs softirqs
in kernel/softirq.c. (As I said, we rarely use softirqs. do {
In most cases tasklets are employed and most of the if (pending & 1)
drivers use tasklets for bottom half.) softirqs are h->action(h);
represented using softirq_action which is defined as: h++;

www.LinuxForU.com  |  LINUX For You  |  OCTOBER 2009  |  101


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

pending >>= 1; Now let's take a dip in the 'pool' and the associated
} while (pending); initiation:
}
static int trickle_thresh = INPUT_POOL_WORDS * 28;
This is the core mechanism associated with its
processing. You can see that it looks for pending static DEFINE_PER_CPU(int, trickle_count) = 0;
softirqs and executes them. The handler is then
registered at run-time via open_softirq() which can static struct poolinfo {
take three parameters -- softirq's index, handler int poolwords;
function and value for the data field. int tap1, tap2, tap3, tap4, tap5;
Now let's come to tasklets. First, you need to } poolinfo_table[] = {
remember that they have got nothing to do with /* x^128 + x^103 + x^76 + x^51 +x^25 + x + 1 -- 105 */
tasks! As we discussed earlier, tasklets also work in { 128, 103, 76, 51, 25, 1 },
a fashion similar to that of softirqs. But they have an /* x^32 + x^26 + x^20 + x^14 + x^7 + x + 1 -- 15 */
easy-to-handle interface and extended locking rules. { 32, 26, 20, 14, 7, 1 },
You will be making use of tasklets in most cases. #if 0
Only in very rare cases (when very high-frequency /* x^2048 + x^1638 + x^1231 + x^819 + x^411 + x + 1 -- 115 */
and highly threaded requirements demand it) do { 2048, 1638, 1231, 819, 411, 1 },
we employ softirqs. tasklets are represented by two
softirqs: HI_SOFTIRQ and TASKLET_SOFTIRQ /* x^1024 + x^817 + x^615 + x^412 + x^204 + x + 1 -- 290 */
(which are made to run after executing HI_SOFTIRQ- { 1024, 817, 615, 412, 204, 1 },
based ones). Since they are built on top of softirqs,
the implementation looks almost similar. /* x^1024 + x^819 + x^616 + x^410 + x^207 + x^2 + 1 -- 115 */
tasklet_struct structure is used to represent the { 1024, 819, 616, 410, 207, 2 },
tasklet and is defined by:
/* x^512 + x^411 + x^308 + x^208 + x^104 + x + 1 -- 225 */
struct tasklet_struct { { 512, 411, 308, 208, 104, 1 },
struct tasklet_struct *next; /* next tasklet in the list */
unsigned long state; /* state of the tasklet */ /* x^512 + x^409 + x^307 + x^206 + x^102 + x^2 + 1 -- 95 */
atomic_t count; /* reference counter */ { 512, 409, 307, 206, 102, 2 },
void (*func)(unsigned long); /* tasklet handler function */ /* x^512 + x^409 + x^309 + x^205 + x^103 + x^2 + 1 -- 95 */
unsigned long data; /* argument to the tasklet function */ { 512, 409, 309, 205, 103, 2 },
};
/* x^256 + x^205 + x^155 + x^101 + x^52 + x + 1 -- 125 */
You can employ this in your codes, and the good { 256, 205, 155, 101, 52, 1 },
news is that you don't have to work with the old BH
interface any more, as the developers have removed /* x^128 + x^103 + x^78 + x^51 + x^27 + x^2 + 1 -- 70 */
it completely. { 128, 103, 78, 51, 27, 2 },

Random number generator /* x^64 + x^52 + x^39 + x^26 + x^14 + x + 1 -- 15 */


Linux has implemented a strong random number { 64, 52, 39, 26, 14, 1 },
generator, which is actually based on the PGP's random #endif
number generation method. The generator takes in the };
'environmental noise' from device drivers and directs
them to an entropy pool. We can access this pool using Here the true random number is fully independent
user and kernel modes. The generator is so effective of its generating function. Now, let's see how this
that the outsider can never predict its value. is done. We know from thermodynamics (Physics!)
Those who know the working mechanism of PGP that entropy is a measurement of disorder and
know the importance of these types of numbers in randomness in a system. To represent the randomness
areas like cryptography. Another point that you need in information, John von Neumann suggested the use
to take into account is that this generator produces of the term ‘Shannon entropy’ and Claude Shannon
true random numbers. They are different from used it in his theory. Please read the Wikipedia entry
the ones (pseudo-random) that you create using on Shannon entropy at en.wikipedia.org/wiki/Entropy_
functions in C library—there the problem is that if (information_theory) for more details.
you know one number in the series, you can guess
any other number in the series. static int debug = 0;

102  |  OCTOBER 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

module_param(debug, bool, 0644); long delta, delta2, delta3;


#define DEBUG_ENT(fmt, arg...) do { if (debug) \
printk(KERN_DEBUG "random %04d %04d %04d: " \
fmt,\ extern void add_input_randomness(unsigned int type, unsigned int code,
input_pool.entropy_count,\ unsigned int value)
blocking_pool.entropy_count,\ {
nonblocking_pool.entropy_count,\ static unsigned char last_value;
## arg); } while (0)
#else /* ignore autorepeat and the like */
#define DEBUG_ENT(fmt, arg...) do {} while (0) if (value == last_value)
#endif return;

DEBUG_ENT("input event\n");
struct entropy_store; last_value = value;
struct entropy_store { add_timer_randomness(&input_timer_state,
/* mostly-read data: */ (type << 4) ^ code ^ (code >> 4) ^ value);
struct poolinfo *poolinfo; }
__u32 *pool;
const char *name; void add_interrupt_randomness(int irq)
int limit; {
struct entropy_store *pull; if (irq >= NR_IRQS || irq_timer_state[irq] == 0)
return;
/* read-write data: */
spinlock_t lock ____cacheline_aligned_in_smp; DEBUG_ENT("irq event %d\n", irq);
unsigned add_ptr; add_timer_randomness(irq_timer_state[irq], 0x100 + irq);
int entropy_count; }
int input_rotate;
}; void add_disk_randomness(struct gendisk *disk)
{
It will be good if you could get a copy of A if (!disk || !disk->random)
Mathematical Theory of Communication (written return;
by Shannon himself ). In the book, the idea of /* first major is 1, so we get >= 0x200 here */
information theory is discussed and Shannon entropy DEBUG_ENT("disk event %d:%d\n", disk->major, disk->first_minor);
is introduced from scratch.
Shannon entropy is an important concept when add_timer_randomness(disk->random,
we deal with random number generators. Roughly, 0x100 + MKDEV(disk->major, disk->first_minor));
I can say that high entropy corresponds to 'less }
useful information', which in turn corresponds to a
large amount of random stuff, in a set of characters. EXPORT_SYMBOL(add_disk_randomness);
Linux has an entropy pool, which has data from
non-deterministic device events, making it purely Linux 1.3.30 saw the introduction of kernel random
random. It also calculates the 'entropy level change' number generator, which is considered as a useful tool
(technically called entropy estimate) when the data by programmers.
is fed into the pool (and is used as a measure of With that we have reached the end of today's voyage.
the uncertainty). It does the same when there is a Our next destination point will be time management,
reduction in randomness. Add_timer_randomness is and later we will move on to kernel synchronisation.
shown below: Wait till the next instalment to hack more kernel topics.
Happy Kernel Hacking! 
static void add_timer_randomness(struct timer_rand_state *state,
unsigned num)
{ By: Aasis Vinayak PG
struct { The author is a hacker and a free software activist who does
cycles_t cycles; programming in the open source domain. He is the developer
long jiffies; of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
unsigned num;
www.aasisvinayak.com
} sample;

www.LinuxForU.com  |  LINUX For You  |  OCTOBER 2009  |  103


A Voyage to the
Kernel
Part 18
Segment 3.7, Day 17
W
e have looked at various aspects of *offset += count;
the Linux kernel over the past issues return count;
of this magazine. We have also tried }
to code and load our own module. Of
late, our focus has shifted to the theoretical side of Now let's look at the ikconfig_init segment (a critical
kernel design and implementation. I will try to wind one), which does the initiation:
up the theoretical aspects over a couple of articles,
and then we can devote our time to trials. static int __init ikconfig_init(void)
We have already seen how the system organises {
processes into the user and kernel space. Figure 1 struct proc_dir_entry *entry;
summarises the overall architecture of the design
we have looked at. The figure encapsulates the /* create the current config file */
various concepts we’ve discussed— the interaction entry = create_proc_entry("config.gz", S_IFREG | S_IRUGO,
of applications with the kernel, accessing system &proc_root);
calls, using glibc, etc. If you have missed any one if (!entry)
of the earlier columns, you can find them at www. return -ENOMEM;
linuxforu.com.
In order to avoid some of the perplexities entry->proc_fops = &ikconfig_file_ops;
associated with the actual operations taking place, entry->size = kernel_config_data_size;
take a look at Figure 2, which illustrates the process in
depth so that novice users can comprehend it well. return 0;
When discussing the configuration (modification) }
of kernel properties, I didn't mention the code that
performs the action. Now we can look at some of the And finally, here is the code that performs the clean-
relevant portions that perform the related functions. up work:
If you need to review the 'global and useful constants'
section, here is the associated code: static void __exit ikconfig_cleanup(void)
{
static ssize_t remove_proc_entry("config.gz", &proc_root);
ikconfig_read_current(struct file *file, char __user *buf, }
size_t len, loff_t * offset)
{ module_init(ikconfig_init);
loff_t pos = *offset; module_exit(ikconfig_cleanup);
ssize_t count;
Device drivers
if (pos >= kernel_config_data_size) It is very interesting to meddle around with device
return 0; drivers (DD). And it is even more exciting to write our
own DD! So here in this column, I will restrict myself
count = min(len, (size_t)(kernel_config_data_size - pos)); to some of the basic ideas concerning DD and we will
if (copy_to_user(buf, kernel_config_data + MAGIC_SIZE + pos, come back to this when we start our trial section.
count)) Well, as you might know, in Linux, devices are
return -EFAULT; represented as files. If you have not seen this, I suggest
you glance through your /dev/ directory. You might think

100  |  NOVEMBER 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

that this is not a good idea since it allows unauthorised access to


hardware. But that’s not true. If you implement it in the proper
way, the device can never be accessed wrongly by a program. User Applications
User
In Linux, you can find many drivers and they are identified Space
by their unique major number. It is also interesting to note that GNU C Library (glibc)
since a particular DD may be used to control different physical
and virtual devices (the HDD and partitions, for example), the GNU/ System Call Interface
individual device will be assigned a minor number (ranging from Linux

0 to 255). You can comprehend this better if you look at the box Kernel Kernel
titled ‘Device driver nomenclature’. Space

Note the exceptions mentioned in the box. These exceptions Architecutre-Dependent Kernel Code
are seen for DD corresponding to terminals and serial interfaces
(which are assigned major numbers 4 and 5). Here, the devices Hardware Platform
with the number 4 are essentially virtual consoles, simple serial
interfaces and pseudo-terminals. You may note that the virtual
Figure 1: Tier architecture
consoles are assigned the numbers ranging from 0 (which
obviously corresponds to tty0!) to 63 and /dev/tty0 or /dev/console
User-space Kernel-space
corresponds to the current virtual console.
For a serial interface there are two logical devices—ttySn User Application C-Library Kernel System call
(dial-in device) and cuan (call-out device). When ttySn is
getpid(void) Load arguments
opened, the kernel restricts access to it for other programs eax=_NR_getpid,
transition to kernel (int 80) system_call
till the DTR line is active. And when it comes to the accessing
call
of cuan, the corresponding process will be provided with system_call_table[eax]

sys_getpid()
immediate access to the serial interface (provided it is not in
return
use). This will keep on blocking any process that tries to use syscall_exit

ttySn (assigned with minor numbers 64 to 127). You can also resume_userspace
see that the system assigns the minor numbers from 128 to 255 Return
for pseudo-terminals. The master terminal (ptyn) is assigned
128+n and the corresponding slave (ttypn) is given 192+n.
The major number 5 is assigned for the current terminal
and call-out devices. /dev/tty is given the minor number 0. Figure 2: User space and kernel space
And the corresponding cuan devices are assigned with minor
numbers 64+n. for USB Mass Storage devices:
Here is another list that could be handy when you write DD
to access some input devices: #include <linux/sched.h>
 11 char Raw keyboard device (Linux/SPARC only) #include <linux/errno.h>
• 0 = /dev/kbd (raw keyboard device) #include "usb.h"
 11 char Serial Mux device (Linux/PA-RISC only) #include "initializers.h"
• 0 = /dev/ttyB0 (first mux port) #include "debug.h"
• 1 = /dev/ttyB1 (second mux port) #include "transport.h"
 11 block SCSI CD-ROM devices
• 0 = /dev/scd0 (first SCSI CD-ROM) /* This places the Shuttle/SCM USB<->SCSI bridge devices in multi-target
• 1 = /dev/scd1 (second SCSI CD-ROM) * mode */
Akin to the filesystem (you can guess why it is so!), the int usb_stor_euscsi_init(struct us_data *us)
DD is required to be 'made known' to the kernel. This is made {
possible with the help of the driver modules that are initialised int result;
while booting the system. When you code, the following list of
functions will be helpful while performing this: US_DEBUGP("Attempting to init eUSCSI bridge...\n");
us->iobuf[0] = 0x1;
int register_chrdev(unsigned int major, const char * name, struct result = usb_stor_control_msg(us, us->send_ctrl_pipe,
file_operations *fops); 0x0C, USB_RECIP_INTERFACE | USB_TYPE_VENDOR,
int regi'ster_blkdev(unsigned int major, const char * name, struct 0x01, 0x0, us->iobuf, 0x1, 5*HZ);
file_operations *fops); US_DEBUGP("-- result is %d\n", result);

Since we are dealing with device drivers, I think some of return 0;


you might be interested in the special initialisers that we use }

www.LinuxForU.com  |  LINUX For You  |  NOVEMBER 2009  |  101


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

/* This function is required to activate all four slots on


Device driver nomenclature the UCR-61S2B
* flash reader */
0 Unnamed devices (e.g., non-device mounts) int usb_stor_ucr61s2b_init(struct us_data *us)

0 = reserved as null device number {


struct bulk_cb_wrap *bcb = (struct bulk_cb_
See block major 144, 145, 146 for expansion areas.
wrap*) us->iobuf;
struct bulk_cs_wrap *bcs = (struct bulk_cs_
1 char Memory devices wrap*) us->iobuf;

1 = /dev/mem................................................Physical memory access int res, partial;


static char init_string[] = "\xec\x0a\x06\
2 = /dev/kmem......................................Kernel virtual memory access
x00$PCCHIPS";
3 = /dev/null.......................................................................Null device
4 = /dev/port............................................................... I/O port access US_DEBUGP("Sending UCR-61S2B initialization
5 = /dev/zero..............................................................Null byte source packet...\n");

6 = /dev/core.............................. OBSOLETE - replaced by /proc/kcore


bcb->Signature = cpu_to_le32(US_BULK_CB_
7 = /dev/full................................................. Returns ENOSPC on write
SIGN);
8 = /dev/random...................... Nondeterministic random number gen. bcb->Tag = 0;
9 = /dev/urandom..................Faster, less secure random number gen. bcb->DataTransferLength = cpu_to_le32(0);

10 = /dev/aio............................. Asyncronous I/O notification interface bcb->Flags = bcb->Lun = 0;


bcb->Length = sizeof(init_string) - 1;
11 = /dev/kmsg..............................Writes to this come out as printk's
memset(bcb->CDB, 0, sizeof(bcb->CDB));
1 block RAM disk memcpy(bcb->CDB, init_string, sizeof(init_string)
0 = /dev/ram0.............................................................. First RAM disk - 1);

1 = /dev/ram1..........................................................Second RAM disk


res = usb_stor_bulk_transfer_buf(us, us->send_
...
bulk_pipe, bcb,
250 = /dev/initrd..................................................Initial RAM disk {2.6} US_BULK_CB_WRAP_LEN, &partial);
if(res)

Older kernels had /dev/ramdisk (1, 1) here /dev/initrd refers to a RAM return res;

disk which was preloaded by the boot loader; newer kernels use /dev/
ram0 for the initrd. US_DEBUGP("Getting status packet...\n");
res = usb_stor_bulk_transfer_buf(us, us->recv_
bulk_pipe, bcs,
2 char Pseudo-TTY masters US_BULK_CS_WRAP_LEN, &partial);
0 = /dev/ptyp0........................................................... First PTY master
1 = /dev/ptyp1.......................................................Second PTY master return (res ? -1 : 0);
}
...
255 = /dev/ptyef...................................................... 256th PTY master
We can meddle with all these when we
begin our experimental session!
2 block Floppy disks In case the DD is already registered
0 = /dev/fd0....................................... Controller 0, drive 0, autodetect under a particular major number
and the corresponding file operation
1 = /dev/fd1....................................... Controller 0, drive 1, autodetect
does not match with those, then the
2 = /dev/fd2....................................... Controller 0, drive 2, autodetect register_chrdev() function will return a
3 = /dev/fd3....................................... Controller 0, drive 3, autodetect negative value.
128 = /dev/fd4................................... Controller 1, drive 0, autodetect You may come across two types
of devices—viz., block-oriented and
129 = /dev/fd5................................... Controller 1, drive 1, autodetect
character-oriented devices. In the
130 = /dev/fd6................................... Controller 1, drive 2, autodetect case of the first set of devices, any
131 = /dev/fd7................................... Controller 1, drive 3, autodetect given block can be read or written to
and so on ... at the will of the programmer (i.e.,
Continued on next page… they support random access). This
task is done using cache. This feature

102  |  NOVEMBER 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

of random access is essential for


filesystems. So you can guess why we Continued from previous page… Device driver nomenclature
are mounting them as block devices.
In the second case, the access can However, you may come across an 'exception' to this rule if you look at the
be processed sequentially (without following list:
using a buffer). Devices like printers,
scanners, sound cards, etc, come 4 char TTY devices
under this. But you may note that
some of the internal operations of 0 = /dev/tty0.....................................................Current virtual console
these devices still rely on blocks that 1 = /dev/tty1..........................................................First virtual console
are, again, inaccessible randomly. ...
Polling count helps us to track errors 63 = /dev/tty63..................................................... 63rd virtual console
in the data terminal. It also has a time-
64 = /dev/ttyS0................................................... First UART serial port
out feature, which is illustrated in the
following code: ...
255 = /dev/ttyS191.......................................... 192nd UART serial port
if(need_resched) schedule();
} while(!LP_READY(minor,sta tus) && count <
UART serial ports refer to 8250/16450/16550 series devices.
LP_CHAR(minor));
if (count == LP_CHAR(minor)) { return 0; Older versions of the Linux kernel used this major number for BSD PTY
/* Timeout, current character not printed */ devices. As of Linux 2.1.115, this is no longer supported. Use major
outb_p( Ipchar, numbers 2 and 3.
LP_B(minor));
return 1; 4 block
}
Aliases for dynamically allocated major devices to be used when
it’s not possible to create the real device nodes because the root
You can find that the LP_
filesystem is mounted as read-only.
CHAR(minor) count is set to LP_INIT_
CHAR. This can be changed using ioctl. 0 = /dev/root
The ioctl functionality (part of the
system call) is used to provide access to 5 char Alternate TTY devices
the device kernel space. 0 = /dev/tty.............................................................Current TTY device
Here is the code that performs the
initialisation activity of the function: 1 = /dev/console......................................................... System console
2 = /dev/ptmx.....................................................PTY master multiplex
static long do_ioctl(struct file *filp, unsigned int cmd, 64 = /dev/cua0................................................Callout device for ttyS0
unsigned long arg)
...
{
int error = -ENOTTY;
255 = /dev/cua191......................................Callout device for ttyS191

if (!filp->f_op)
goto out; {
out: struct address_space *mapping = filp-
if (filp->f_op->unlocked_ioctl) { return error; >f_mapping;
error = filp->f_op->unlocked_ioctl(filp, cmd, } int res;
arg); /* do we support this mess? */
if (error == -ENOIOCTLCMD) static int file_ioctl(struct file *filp, unsigned int cmd, if (!mapping->a_ops->bmap)
error = -EINVAL; unsigned long arg) return -EINVAL;
goto out; { if (!capable(CAP_SYS_RAWIO))
} else if (filp->f_op->ioctl) { int error; return -EPERM;
lock_kernel(); int block; if ((error = get_user(block, p)) != 0)
error = filp->f_op->ioctl(filp->f_dentry- struct inode * inode = filp->f_dentry->d_inode; return error;
>d_inode, int __user *p = (int __user *)arg;
filp, cmd, arg); lock_kernel();
unlock_kernel(); switch (cmd) { res = mapping->a_ops-
} case FIBMAP: >bmap(mapping, block);

www.LinuxForU.com  |  LINUX For You  |  NOVEMBER 2009  |  103


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

unlock_kernel(); unsigned int cmd, unsigned long arg);


return put_user(res, p); static int sock_fasync(int fd, struct file *filp, int on);
} static ssize_t sock_readv(struct file *file, const struct iovec *vector,
case FIGETBSZ: unsigned long count, loff_t *ppos);
if (inode->i_sb == NULL) static ssize_t sock_writev(struct file *file, const struct iovec *vector,
return -EBADF; unsigned long count, loff_t *ppos);
return put_user(inode->i_sb->s_blocksize, p); static ssize_t sock_sendpage(struct file *file, struct page *page,
case FIONREAD: int offset, size_t size, loff_t *ppos, int more);
return put_user(i_size_read(inode) - filp->f_pos, p);
} static struct file_operations socket_file_ops = {
.owner = THIS_MODULE,
return do_ioctl(filp, cmd, arg); .llseek = no_llseek,
} .aio_read = sock_aio_read,
.aio_write = sock_aio_write,
Networking .poll = sock_poll,
Today, we will look at some of the fundamental functionalities .unlocked_ioctl = sock_ioctl,
(that can handle network services) required when you code. .mmap = sock_mmap,
Since some of our readers may not have a strong background .open = sock_no_open, /* special open code to disallow open via /proc
in networking, we will spend some time reviewing the basic */
networking related concepts. .release = sock_close,
In Linux, you can use sockets for accessing network .fasync = sock_fasync,
services. And you can employ the following functionalities .readv = sock_readv,
to do higher-end tasks: .writev = sock_writev,
.sendpage = sock_sendpage
int socket(int addr_fami Ly,int type,int protocol); };
int bind(int s,struct sockaddr *address,int address_len);
int listen(int s,int backlog); Various functions like net_family_write_lock are also included:
int connect(int s,struct sockaddr *address,int address_len);
int accept(int s,struct sockaddr *address,int *address_len); static void net_family_write_lock(void)
int send(int s,char *msg,int len,int flags); {
int sendto(int s,char *msg,int len,int flags, struct sockaddr *to, int spin_lock(&net_family_lock);
tolen); while (atomic_read(&net_family_lockct) != 0) {
int recv(int s,char *buf,int Len,int flags); spin_unlock(&net_family_lock);
int recvfrom(int s,char *buf,int len,int flags,
struct sockaddr *froro,int *fromlen); yield();
int getsockopt(int s,int level,int oname,char *ovalue,
int *olen); spin_lock(&net_family_lock);
iht setsockopt(int s,int level,1nt oname,char *ovalue, }
int *olen); }

The above code shows the set of C library routines that As I said, since networking in Linux is a vast subject, we
are included in the interface. It is important to note that these will be dedicating the next article entirely to it. I will also be
functions rely on the system call socketcall. The socket function briefing readers about the basic concepts required to meddle
is initiated by the following code: with networking in Linux. I would recommend that you refer
to an undergraduate module in networking, if you don't have
static int sock_no_open(struct inode *irrelevant, struct file *dontcare); a clear picture. I will be including concepts related to the layer
static ssize_t sock_aio_read(struct kiocb *iocb, char __user *buf, architecture models, protocols, conversion algorithms, etc, in
size_t size, loff_t pos); the next column.
static ssize_t sock_aio_write(struct kiocb *iocb, const char __user *buf, Happy kernel hacking! 
size_t size, loff_t pos);
static int sock_mmap(struct file *file, struct vm_area_struct * vma); By: Aasis Vinayak PG
The author is a hacker and a free software activist who does
static int sock_close(struct inode *inode, struct file *file); programming in the open source domain. He is the developer
static unsigned int sock_poll(struct file *file, of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
struct poll_table_struct *wait);
www.aasisvinayak.com
static long sock_ioctl(struct file *file,

104  |  NOVEMBER 2009  |  LINUX For You  |  www.LinuxForU.com


A Voyage to the
Kernel
Part 19

Day 18: Segment 3.8

A
s I promised in the last issue, in this of data sequences of different lengths from a
article, we will explore networking. given source to the intended destination. This
layer does the routing functions and is also
OSI reference model responsible for fragmentation and reassembly of
The OSI (Open System data in many cases. The network layer can report
Interconnection) reference model is a multi- delivery errors as well; and these facilitate the
layered computer network protocol architecture operation of routers at this layer. The Internet
that has seven distinct layers from top to bottom Protocol (IP) is the most well known among
(refer to Figure 1). These are the application, those belonging to this layer.
presentation, session, transport, network, data The transport layer is responsible for
and physical layers. After covering some of the maintaining the quality of service. It does so by
unique features of the architecture, we will move requesting this quality of the layer below it and,
on to networking in Linux systems. in turn, helping in reliable data transfer. The
The whole purpose of this division is to subtly layer can also perform segmentation and de-
divide the process of networking. Conceptually, segmentation processes. Just like the data layer,
the layer below a particular layer is therefore this layer plays a role in flow and error control as
one layer to provide support to the former. By well. Transmission Control Protocol (TCP) and
definition, you can have a 'link' between two User Datagram Protocol (UDP) are two examples
instances belonging to a particular type of layer. that belong to this category.
You can assume that there is a kind of horizontal The session layer, obviously, deals with the
protocol connection existing between them. handling of sessions. It manages the connection
Let's discuss these layers from the bottom to between two computers (say, a local and remote
the top. The physical layer, as the name suggests, application). The presentation layer is actually
is what deals with the hardware device and the a link layer that does some sort of translation
physical medium. This obviously covers items service. It takes inputs from the higher layer and
like hubs, repeaters, network adapters, etc. You feeds them into the session layer in such a way
can also place cable specifications and pins that the layer can handle them.
under this group. The top-most layer is the application layer
The data layer deals mainly with the that interacts with software applications and is
procedural means to transfer data between the architecturally positioned close to the end user.
network and the other elements in the network. Typical examples could be Hypertext Transfer
This may perplex some people. Well, in order to Protocol (HTTP), File Transfer Protocol (FTP),
avoid that confusion, you can assume that the Simple Mail Transfer Protocol (SMTP), etc.
physical layer simply meddles with the linking In short, you can see that in the OSI model,
of a single entity with the physical medium, the Nth-layer is supported by an (N-1)th layer,
while the layer above it handles multiple devices. which helps the former layer to enable error-free
The data layer has the ability to perform error transfer of data. These architectural changes
correction and control the flow of data. were introduced as per the demand.
The network layer deals with the transferring The Internet has expanded a lot over the last

102  |  DECEMBER 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

15 years. If you look at the ARPANET logical map (see


Figure 2), you can see how simple (!) the network was. The Seven Layers of OSI
Back then, nobody thought that we might run out of IPv4
unique values and have to go for alternate options like
IPv6. This indicates that the layers can change in future to
Transmit User Receive
improve the performance of the network. Data Data
Application Layer
Though I mentioned TCP and UDP while discussing
Layer 4, I think it is necessary to discuss them a bit more. Presentation Layer
The TCP/IP suit is one of the core protocols we use in the Session Layer
Internet. The IP is largely responsible for the delivery of Transport Layer
the data by taking it all the way through the network. But Network Layer
TCP is concerned with issues only at the source and at Data Link Layer
the destination. A typical example could be a Web server Physical Layer
and a browser. Please refer to Figure 3 for information
concerning its header. Physical Link
UDP can be employed to send messages to other
hosts on the network without the requirement that earlier Figure 1: The seven layers of OSI
communications should set up special transmission
channels. Its header format is shown in Figure 4.
One thing that you may notice is that since
UDP doesn't require a formal setting up process for
transmission, it affects the reliability and integrity of the
data transmitted. If you wonder why we need that, then I
would suggest you think about services that are real-
time and those applications in which the waiting for lost
packages is not the preferred option.
If you wish to explore more about networking in
general, I would suggest you read Computer Networks by
Andrew Tanenbum.

Linux networking
Let's ponder a little more about networking in the
Linux platform. First of all, we will look at some basic
commands. Figure 2: ARPANET logical map, March 1977
ifconfig: This is used to configure the kernel-resident (Source: The Computer History Museum)
network interfaces. If you wish to display the status of all
interfaces (including those that are down), you can issue
the following command (please don't skip the results,
as that may help you get accustomed to the available
interfaces):

aasisvinayak@GNU-BOX:~$ ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:25:64:56:f4:e1
inet addr:---hidden--- Bcast:131.227.156.255 Mask:255.255.255.0
inet6 addr: ---hidden--- Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:371084 errors:0 dropped:0 overruns:0 frame:0
TX packets:245076 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000 Figure 3: The TCP/IP header format (Source: RFC761)
RX bytes:476799041 (476.7 MB) TX bytes:22141513 (22.1 MB)
Interrupt:29 Base address:0x6000 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
eth2 Link encap:Ethernet HWaddr 00:26:5e:7d:8d:53 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
inet6 addr: fe80::226:5eff:fe7d:8d53/64 Scope:Link Interrupt:17 Base address:0xc000
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0 lo Link encap:Local Loopback

www.LinuxForU.com  |  LINUX For You  |  DECEMBER 2009  |  103


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

wish to assign an IP address, you may just employ the


address option.
Displaying the output (content) of /proc/interrupts
and analysing it is quite handy in some cases (say, if you
want to look at the number of interrupts per IRQ). You
can do this by issuing the following commands:

aasisvinayak@GNU-BOX:~$ sudo cat /proc/interrupts


[sudo] password for aasisvinayak:
CPU0 CPU1
0: 2275331 1552933 IO-APIC-edge timer
1: 917 457 IO-APIC-edge i8042
Figure 4: The UDP header (Source: RFC768)
8: 0 1 IO-APIC-edge rtc0
9: 9 4 IO-APIC-fasteoi acpi
12: 79030 76412 IO-APIC-edge i8042
17: 4193 2697 IO-APIC-fasteoi eth2
20: 457 159 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb3,
uhci_hcd:usb6
21: 83736 41592 IO-APIC-fasteoi uhci_hcd:usb4, uhci_hcd:usb7,
HDA Intel
22: 27 13 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb5,
Figure 5: The /proc/net/route file uhci_hcd:usb8
28: 79298 55301 PCI-MSI-edge ahci
29: 299499 306516 PCI-MSI-edge eth0
30: 181527 188303 PCI-MSI-edge i915@pci:0000:00:02.0
NMI: 0 0 Non-maskable interrupts
LOC: 1551082 2026604 Local timer interrupts
SPU: 0 0 Spurious interrupts

Figure 6: The output for netstat -r CNT: 0 0 Performance counter interrupts


PND: 0 0 Performance pending work
inet addr:127.0.0.1 Mask:255.0.0.0 RES: 789579 767812 Rescheduling interrupts
inet6 addr: ::1/128 Scope:Host CAL: 156 160 Function call interrupts
UP LOOPBACK RUNNING MTU:16436 Metric:1 TLB: 5216 7330 TLB shootdowns
RX packets:974 errors:0 dropped:0 overruns:0 frame:0 TRM: 0 0 Thermal event interrupts
TX packets:974 errors:0 dropped:0 overruns:0 carrier:0 THR: 0 0 Threshold APIC interrupts
collisions:0 txqueuelen:0 MCE: 0 0 Machine check exceptions
RX bytes:153795 (153.7 KB) TX bytes:153795 (153.7 KB) MCP: 34 34 Machine check polls
ERR: 0
pan0 Link encap:Ethernet HWaddr 66:a2:20:ba:f3:10 MIS:
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0 If you wish to look at the static routing table, you can
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 cat the /proc/net/route file. Figure 5 shows the output on
collisions:0 txqueuelen:0 my PC.
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) You might have tried the netstat command as well.
But the above commands will work even if you don't have
vboxnet0 Link encap:Ethernet HWaddr 0a:00:27:00:00:00 the netstat utility. (I have seen some changes in 2.4 and
BROADCAST MULTICAST MTU:1500 Metric:1 2.6, but you need not worry about this now.)
RX packets:0 errors:0 dropped:0 overruns:0 frame:0 Since most of the distros have this utility, you can
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 issue netstat -r (or -nr) to obtain the routing table—
collisions:0 txqueuelen:1000 readability is better in this case (refer to Figure 6). You
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) can also use route -n for the same purpose.
/etc/services is a very vital file that helps you find how
You may use the down option to shut down an the port numbers are linked to the named services. For
interface, and arp to enable (or disable) the use of the your reference, here is a standard entry in the file:
ARP protocol on an interface. io_addr addr is another
option to start an address in the I/O space. And if you tcpmux 1/tcp # TCP port service multiplexer

104  |  DECEMBER 2009  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

echo 7/tcp group files. Here is a typical entry:


echo 7/udp
discard 9/tcp sink null passwd: compat
discard 9/udp sink null group: compat
systat 11/tcp users shadow: compat
daytime 13/tcp
daytime 13/udp hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4
netstat 15/tcp networks: files
qotd 17/tcp quote
msp 18/tcp # message send protocol protocols: db files
msp 18/udp services: db files
chargen 19/tcp ttytst source ethers: db files
chargen 19/udp ttytst source rpc: db files
ftp-data 20/tcp
ftp 21/tcp netgroup: nis
fsp 21/udp fspd
ssh 22/tcp # SSH Remote Login Protocol We can use the ethtool command followed by the
ssh 22/udp interface name to display the Ethernet card settings:
telnet 23/tcp
smtp 25/tcp mail aasisvinayak@GNU-BOX:~$ sudo ethtool eth0
time 37/tcp timserver Settings for eth0:
time 37/udp timserver Supported ports: [ TP MII ]
rlp 39/udp resource # resource location Supported link modes: 10baseT/Half 10baseT/Full
nameserver 42/tcp name # IEN 116 100baseT/Half 100baseT/Full
whois 43/tcp nicname Supports auto-negotiation: Yes
tacacs 49/tcp # Login Host Protocol (TACACS) Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Another important file is /etc/protocols (you may Advertised auto-negotiation: Yes
need to install nmap for an extensive list). You can use Speed: 100Mb/s
these files to translate protocol names to numbers (so Duplex: Full
that the IP layer on other hosts can understand): Port: MII
PHYAD: 0
ip 0 IP # internet protocol, pseudo protocol number Transceiver: internal
#hopopt 0 HOPOPT # IPv6 Hop-by-Hop Option [RFC1883] Auto-negotiation: on
icmp 1 ICMP # internet control message protocol Supports Wake-on: pumbg
igmp 2 IGMP # Internet Group Management Wake-on: g
ggp 3 GGP # gateway-gateway protocol Current message level: 0x00000033 (51)
ipencap 4 IP-ENCAP # IP encapsulated in IP (officially ``IP'') Link detected: yes
---------[output truncated]-----------
This tool has a wide range of applications—check the
The getprotobyname() function is the one that gives man page for details.
a protoent structure for the line from /etc/protocols by We have covered some of the fundamentals in
matching it with a protocol name. We can show the networking, though we still have a lot more basic stuff
function as follows: to discuss—say files like /etc/inetd.conf, /etc/securetty,
tools like tcpd (access control facility), configuration of
#include <netdb.h> /etc/hosts and so on. We will cover these in the next
edition. Then we shall move on to kernel-specific zones.
struct protoent *getprotoent(void); Happy kernel hacking! 
struct protoent *getprotobyname(const char *name);
struct protoent *getprotobynumber(int proto);
void setprotoent(int stayopen); By: Aasis Vinayak PG
void endprotoent(void); The author is a hacker and a free software activist who does
programming in the open source domain. He is the developer
The system uses /etc/nsswitch.conf to configure of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
which services are to be employed by it in order to find www.aasisvinayak.com
information such as host names, password files, and

www.LinuxForU.com  |  LINUX For You  |  DECEMBER 2009  |  105


A Voyage to the
Kernel
Part 20
Segment: 3.9, Day 19

I
n the last segment, we covered some log_warning_msg "/etc/network/options still exists and it will
of the basic aspects related to Linux be IGNORED! Read README.Debian of netbase."
networking. Some readers asked why }
they were unable to locate a few files
specific to networking. You need to be aware that check_network_file_systems() {
the location of files varies between distros. Your [ -e /proc/mounts ] || return 0
distro's doc/help files will help you find them. For
your reference, I am including Table 1 that gives exec 9<&0 < /proc/mounts
the location of a few important files in different while read DEV MTPT FSTYPE REST; do
distros. case $DEV in
Here is the shell script that performs the boot /dev/nbd*|/dev/nd[a-z]*|/dev/etherd/e*)
actions in my Debian-based distro (this file is log_warning_msg "not deconfiguring network interfaces:
located at /etc/init.d/networking): network devices still mounted."
exit 0
#!/bin/sh -e ;;
PATH="/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/ esac
bin" case $FSTYPE in
[ -x /sbin/ifup ] || exit 0 nfs|nfs4|smbfs|ncp|ncpfs|cifs|coda|ocfs2|gfs|pvfs|pvfs2
. /lib/lsb/init-functions |fuse.httpfs|fuse.curlftpfs)
log_warning_msg "not deconfiguring network interfaces:
# helper function to set the usplash timeout. https://launchpad. network file systems still mounted."
net/bugs/21617 exit 0
usplash_timeout () { ;;
TIMEOUT=$1 esac
if [ -x /sbin/usplash_write ]; then done
/sbin/usplash_write "TIMEOUT $TIMEOUT" || true exec 0<&9 9<&-
fi }
}
…..code truncated
process_options() {
[ -e /etc/network/options ] || return 0 force-reload|restart)

Distro Configuration Boot specific ones

Debian* /etc/init.d/network /etc/rc2.d/

RedHat /etc/rc.d/init.d/network /etc/rc3.d/

Slackware /etc/rc.d/rc.inet1 /etc/rc.d/rc.inet2

(*there could be minor changes in Debian-based distros) Table 1

102  |  JANUARY 2010  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

process_options
Configuration*
log_action_begin_msg "Reconfiguring network interfaces"
As with the standard resolver library, the file /etc/resolv.
conf must be set up before the resolver can function. In
ifdown -a --exclude=lo || true
addition, the file /etc/host.conf contains configuration
if ifup -a --exclude=lo; then
information specific to resolv+.
log_action_end_msg $? The host.conf file should contain one configuration
else keyword per line, followed by appropriate configuration in-
log_action_end_msg $? formation. The keywords recognised are order, trim, multi,
fi nospoof, and reorder. Each keyword is described below:
;; • order: This keyword specifies how host lookups are to
be performed. It should be followed by one or more
<<<<code truncated>>>>
lookup methods, and separated by commas. Valid
methods are bind, hosts and nis.
• trim: This keyword may be listed more than once.
Now, let’s focus on another important command that Each time it should be followed by a single domain
we've discussed previously—ifconfig. In the last issue we name, with the leading dot. When set, the resolv+
looked at how to obtain details like the MAC address and library will automatically trim the given domain name
so on, about the network interfaces. In this edition, we will from the end of any hostname resolved via DNS. This
explore the ways to use this command. is intended for use with local hosts and domains.
 up: This can be used to activate a particular interface (Note: trim will not affect hostnames gathered via NIS
 down: As the name suggests, you may use this to or the hosts file. Care should be taken to insure that
deactivate an interface (in order to obtain the list of the the first hostname for each entry in the host’s file is
fully qualified or non-qualified, as appropriate for the
interface, you can issue the ifconfig command)
local installation.)
 arp: If you wish to enable (or disable) the address • multi: Valid values are On and Off. If set to On, the
resolution protocol on a particular interface, you can resolv+ library will return all valid addresses for a host
try this that appears in the /etc/hosts file, instead of only the
 allmulti: This is another option that is used in the first. This is by default, as it may cause a substantial
intranet zones of many organisations. The main use of performance loss at sites with large hosts’ files.
this command is to enable (or disable) the reception • nospoof: Valid values are On and Off. If set to On, the
of any multicast packets. You may employ this if you resolv+ library will attempt to prevent hostname spoof-
wish to send multicast messages to a set of hosts with ing to enhance the security of rlogin and rsh. It works
as follows: after performing a host address lookup, re-
special network addresses (destination hosts). A typical
solv+ will perform a hostname lookup for that address.
example could be the case where a university is multi- If the two hostnames do not match, the query will fail.
casting special lectures to students of a particular • alert: If this option is set to On and the nospoof option
department who are connected to the university is also set, resolv+ will log a warning of the error via
intranet. the syslog facility. The default value is Off.
 netmask <address>: This helps you to assign a network • reorder: Valid values are On and Off. If set to On,
mask of the network to which you want to connect a resolv+ will attempt to reorder host addresses so that
system. I suggest you explore topics like DHCP and local addresses (that is, on the same subnet) are listed
auto-configure mechanisms to learn more about first when a gethostbyname() is performed. Reorder-
ing is done for all lookup methods. The default value
typical implementations in modern operating systems.
is Off.
 mtu N: If you wish to set a custom maximum *Adapted from resolv+ man page.
transmission unit (MTU) value, you can try this.
Different network types recommend different network
values. Here are a few of them ( format used – Network:  irq <address>: This command is not used frequently.
MTU in bytes): But you can experiment by setting the IRQ of the
• Ethernet: 1500 hardware, using this command.
• IEEE 802.3/802.2: 1492  pointtopoint <address>: You can set the address of the
• 16 Mbit/Sec Token Ring: 17914 machine at the remote end of a point-to-point link.
• 4Mbits/Sec Token Ring: 4464  hw <type> <address>: This allows you to assign the
• FDDI: 4352 hardware (hw) address of devices. You may Google
In my Debian-based system, I can issue a command for “Change MAC address in Linux” or “GNU MAC
like the one below to change the MTU values: Changer” to get more information. (Please be very
sudo ifconfig eth0 mtu 1500 careful while using this command, especially when
 broadcast <address>: You can enable (or disable) the you are in a VPN or in a network that uses special
interface to accept datagrams that are carrying the authentication methods.)
broadcast address. Last month we discussed the ethtool and its uses

www.LinuxForU.com  |  LINUX For You  |  JANUARY 2010  |  103


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

(commands). Here is the shell script that is specific to nameserver <ip address>
this utility:
The keyword 'domain' is used to specify a local
#!/bin/sh domain name and 'search' is used to specify alternate
ETHTOOL=/usr/sbin/ethtool domain names. 'Nameserver' is used to identify the IP
test -x $ETHTOOL || exit 0 address of a DNS to handle queries.
The Avahi daemon also uses this file. Avahi
# Find settings with a given prefix and print them as they appeared in is essentially a Multicast DNS Service Discovery
# /etc/network/interfaces, only with the prefix removed. mechanism (more details towards the end of this
gather_settings () { column), which allows you to find services and
env | awk -F= "/^IF_$1/ { hosts running on a local network without manual
sub(\"^IF_$1\", \"\"); configuration.
gsub(\"_\", \"-\"); It checks the /etc/resolv.conf file entries for
print tolower(\$1), \$2 nameservers. Here is the relevant code:
}"
} dns_reachable() {
$(grep -q nameserver /etc/resolv.conf) || return 1;
# Gather together the mixed bag of settings applied with -s/--change
SETTINGS="\ # If there is no local nameserver and no we have no global ip addresses
${IF_LINK_SPEED:+ speed $IF_LINK_SPEED}\ # then we can't reach any nameservers
${IF_LINK_DUPLEX:+ duplex $IF_LINK_DUPLEX}\ if ! $(egrep -q "nameserver 127.0.0.1|::1" /etc/resolv.conf); then
" # Get addresses of all running interfaces
ADDRS=$(LC_ALL=C ifconfig | grep ' addr:')
# WOL has an optional pass-key # Filter out all local addresses
set -- $IF_ETHERNET_WOL ADDRS=$(echo "${ADDRS}" | egrep -v ':127|Scope:Host|Scope:Link')
SETTINGS="$SETTINGS${1:+ wol $1}${2:+ sopass $2}" if [ -z "${ADDRS}" ] ; then
return 1;
# Autonegotiation can be on|off or an advertising mask fi
case "$IF_ETHERNET_AUTONEG" in fi
'') ;;
on|off) SETTINGS="$SETTINGS autoneg $IF_ETHERNET_AUTONEG" ;; return 0
*) SETTINGS="$SETTINGS autoneg on $IF_ETHERNET_AUTONEG" ;; }
esac
The /etc/host.conf file is another file that you can use
[ -z "$SETTINGS" ] || $ETHTOOL --change "$IFACE" $SETTINGS to configure and manage the resolver. A typical sample is
given below:
SETTINGS="$(gather_settings ETHERNET_PAUSE_)"
[ -z "$SETTINGS" ] || $ETHTOOL --pause "$IFACE" $SETTINGS order hosts,bind
multi on
SETTINGS="$(gather_settings HARDWARE_IRQ_COALESCE_)"
[ -z "$SETTINGS" ] || $ETHTOOL --coalesce "$IFACE" $SETTINGS The above lines ask the resolver to look for the /etc/
hosts (see the description below) file before performing a
SETTINGS="$(gather_settings HARDWARE_DMA_RING_)" query using the nameserver. It also instructs the resolver
[ -z "$SETTINGS" ] || $ETHTOOL --ring "$IFACE" $SETTINGS to add all the addresses in the /etc/hosts file. You may
refer to the box titled 'Configuration' for more details.
SETTINGS="$(gather_settings OFFLOAD_)" We have seen that the /etc/hosts file is where the
[ -z "$SETTINGS" ] || $ETHTOOL --offload "$IFACE" $SETTINGS resolver would be looking for addresses and names
of local hosts. It means that if you are adding the
Now, let's turn our attention to some of the name of the host here, then the resolver will not use
configuration files. The /etc/resolv.conf is the one that the nameserver to get the address; rather it will use
handles the 'name resolver' (and has a single keyword per the corresponding IP address found in the /etc/hosts
line). It may have a domain, search or nameserver entry in file. Normally, you could see entries for the loopback
each line. Here is a typical sample: interface and local host names. Here is a sample entry:

# Generated by NetworkManager 127.0.0.1 localhost


nameserver <ip address> 127.0.1.1 GNU-BOX

104  |  JANUARY 2010  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

# The following lines are desirable for IPv6 capable hosts ensure_rundir
::1 localhost ip6-localhost ip6-loopback cat /etc/resolv.conf | grep "nameserver" | sort > ${TMP_CACHE} || return 0
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix if [ -e ${NS_CACHE} ]; then
ff02::1 ip6-allnodes DIFFERENCE=$(diff -w ${NS_CACHE} ${TMP_CACHE})
ff02::2 ip6-allrouters echo "${DIFFERENCE}" | grep -q '^>'
ff02::3 ip6-allhosts ADDED=$?
echo "${DIFFERENCE}" | grep -q '^<'
For your reference, here are the DNS-specific entries of REMOVED=$?
the Avahi daemon: # Avahi was disabled and no servers were removed, no need to recheck
[ -e ${DISABLE_TAG} ] && [ ${REMOVED} -ne 0 ] && RET=1
PATH=/bin:/usr/bin:/sbin:/usr/sbin # Avahi was enabled and no servers were added, no need to recheck
[ ! -e ${DISABLE_TAG} ] && [ ${ADDED} -ne 0 ] && RET=1
RUNDIR="/var/run/avahi-daemon/" fi
DISABLE_TAG="$RUNDIR/disabled-for-unicast-local"
NS_CACHE="$RUNDIR/checked_nameservers" mv ${TMP_CACHE} ${NS_CACHE}
return ${RET};
…........... }
…................. (code truncated)
The /etc/hosts.allow file is the one that manages the
dns_has_local() { 'host permissions'. So, you can use this file to allow a
# Some magic to do tests particular host to access a service running on your system.
if [ -n "${FAKE_HOST_RETURN}" ] ; then (If you have installed Apache, please glance at the /etc/
if [ "${FAKE_HOST_RETURN}" = "true" ]; then apache2 folder as well.) This file has a format similar to the
return 0; one shown below:
else
return 1; # /etc/hosts.allow
fi #
fi # <service list>: <host list> [: command]

OUT=`LC_ALL=C host -t soa local. 2>&1` Take a look at the syntax:


if [ $? -eq 0 ] ; then  'service list' – used to provide a delimited list of server
if echo "$OUT" | egrep -vq 'has no|not found'; then names for which you are defining the rule (say, telnetd)
return 0  'host list' – also a delimited list of host names (you
fi could also use IP addresses or wild cards here)
else  'command' – an optional parameter used to run a
# Checking the dns servers failed. Assuming no .local unicast dns, but particular command every time the rule is observed.
# remove the nameserver cache so we recheck the next time we're  You may also note that PARANOID matches any host
triggered name that does not tally with the corresponding
rm -f ${NS_CACHE} address.
fi The /etc/inetd.conf file is another important one used
return 1 by the inetd server daemon. You can specify inetd to accept
} connections for a given service. But you need to define
properly how this should be handled. The syntax used is
dns_needs_check() { given below:
TMP_CACHE="${NS_CACHE}.$$"
RET=0 service socket_type proto flags user server_path server_args

www.LinuxForU.com  |  LINUX For You  |  JANUARY 2010  |  105


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

'Service' corresponds to the list of services that we # Disallow all hosts.


specified in the /etc/services file (see the previous column ALL: ALL
for more information). 'Socket_type' is used to specify
the relevant socket for the 'service' entry (raw, rdm, etc) Avahi daemon
and 'proto' will indicate the valid protocols to be used for We have looked at some of the Avahi-related points earlier.
this (tcp, udp, etc). As we discussed, it is a daemon that helps in service
'Flag' can be set to instruct the daemon whether or discovery on a local network. This essentially lets you to
not to free a socket after its use by a connection. You connect your laptop to a network and find other systems/
have to look at your requirements list and the protocols services in the network. It is based on Lennart Poettering's
used, in order to decide the value for this field. mDNS implementation (called FlexMDNS) for Linux.
The 'User' field assigns a particular user in the Here is the code portion that handles the enabling and
system ( from /etc/passwd file) as the owner of the disabling processes:
network daemon. It is a security related entry as you are
restricting the access rights and privileges using this. enable_avahi () {
Typically, the root user will be shown as the owner # no unicast .local conflict, so remove the tag and start avahi again
of the network daemon. You can use 'server_path' to if [ -e ${DISABLE_TAG} ]; then
provide the pathname to the actual server program and rm -f ${DISABLE_TAG}
'server_args' (optional entry) to pass an argument to the start avahi-daemon || :
daemon program. fi
/etc/ftpusers is another important file. It is used to }
deny access to certain users (of the system) from using
the ftp service. The daemon program (ftpd) looks for the disable_avahi () {
entries in this file, every time it receives a request for a [ -e ${DISABLE_TAG} ] && return
connection. Here is a sample entry:
stop avahi-daemon || :
# /etc/ftpusers - users not allowed to login via ftp if [ -x /usr/bin/logger ]; then
root logger -p daemon.warning -t avahi <<EOF
uucp Avahi detected that your currently configured local DNS server serves
bin a domain .local. This is inherently incompatible with Avahi and thus
mail Avahi disabled itself. If you want to use Avahi in this network, please
contact your administrator and convince him to use a different DNS domain,
You can find an entry corresponding to the 'root' user since .local should be used exclusively for Zeroconf technology.
in the above code. This is done due to security reasons. For more information, see http://avahi.org/wiki/AvahiAndUnicastDotLocal
EOF
By using the /etc/securetty file, you can specify which fi
tty devices are open for the root to log in. Here is a typical ensure_rundir
entry: touch ${DISABLE_TAG}
}
# /etc/securetty - tty's on which root is allowed to login
tty1 The daemon is basically a free Zeroconf
tty2 implementation (based on the Apple Zeroconf
tty3 specification) and implements mDNS, DNS-SD and RFC
tty4 3927/IPv4LL. You could find this in almost all Linux
flavours, and it also provides a set of language bindings.
You need to be aware that this is also security related Avahi has already been incorporated into GNOME's
and you should be careful while meddling with this file. Virtual File System (VFS) and KDE's input/output
The /etc/hosts.deny file is yet another configuration architecture.
file. This is used by the /usr/sbin/tcpd program to find the We have covered most of the basic networking aspects.
hosts that are disallowed from using a particular service. Till the next issue, Happy Kernel Hacking!
Here is a valid entry for the same:
By: Aasis Vinayak PG
# /etc/hosts.deny The author is a hacker and a free software activist who does
# programming in the open source domain. He is the developer
# Disallow all hosts with suspect hostnames of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
ALL: PARANOID
www.aasisvinayak.com
#

106  |  JANUARY 2010  |  LINUX For You  |  www.LinuxForU.com


A Voyage to the
Kernel
Part 21
Segment: 4.1, Day 20

I
n Segment 3, we’d looked at some Also, if you try to rebuild the kernel with your
fundamental aspects of Linux kernel code, then it may be hard for the developers to
module programming and we discussed fix the error that may pop up during the process.
commands like lsmod and files like /etc/ Sometimes the error may even prevent you from
modprobe.conf. We also meddled with __init and __exit logging into the system if you have a serious bug in
while writing a simple module. This new segment is the code you just added (since it will be executed
dedicated to kernel programming, in general. during the boot stage).
The easiest way to add code to the kernel is to just If you look at the resource side, then you can see
add the lines to the source tree and recompile it. You that the module saves much of your memory, as you
may recall that you can actually pre-configure the kernel will be loading it only when it is needed. Another fact is
you are about to compile (please refer to the previous that developers (say, hardware vendors) find it easy to
segment for more details). maintain their code if it is written as a module.
We have also seen that we can add some code This is even helpful during the development phase.
to the 'kernel' even when it is alive. This is what You can experiment with your code very easily if you
we did when we wrote simple kernel modules (or write it as a module. It will save you time as you don't
loadable kernel modules) and loaded them to the have to reboot the system each time you update the
kernel. Typically, if you just want to write a driver code. But you also need to note that if you are writing
for new hardware that you have made, and enable a driver for a device that is essential for providing the
the device, then you don't have to compile the entire service, then you should try to incorporate the code
kernel to make the device run. A module will serve into the base kernel itself.
the purpose. So is the case with system calls. This
is one of the properties of the Linux kernel. Well, Useful utilities
for our discussions, I'm going to call the kernel that While developing the module, you may find the
gets loaded while booting up as the 'base kernel' and following utilities quite handy:
loadable ones as modules (as a generic term).  insmod – to load a module into the kernel (refer to
Module programming became popular during the previous segment—for your information, I'm
the mid-90s and attracted many developers (and putting the moduleloader header file code here):
vendors). But the earlier modules were not that #ifndef _LINUX_MODULELOADER_H
easily 'loadable'. It was only by the beginning of this #defi ne _LINUX_MODULELOADER_H
decade that they attained the shape of modules, as #include <linux/module.h>
we see them today. #include <linux/elf.h>
int module_frob_arch_sections(Elf_Ehdr *hdr,
Why should it be loadable? Elf_Shdr *sechdrs,
This is one of the fundamental questions that char *secstrings,
beginners ask. “Why should I write a loadable struct module *mod);
module? Why not incorporate it into the kernel unsigned int arch_mod_section_prepend(struct module *mod,
code itself ?” Well, you may find some reasons if unsigned int section);
you think of yourself as a hardware vendor who's void *module_alloc(unsigned long size);
going to offer a new piece of hardware. As discussed void module_free(struct module *mod, void *module_region);
before, the users’ convenience is a factor. They don't int apply_relocate(Elf_Shdr *sechdrs,
have to rebuild the kernel each time they 'add' a new const char *strtab,
device driver. unsigned int symindex,

102  |  FEBRUARY 2010 | LINUX FoR YoU | www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

unsigned int relsec, access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or


struct module *mod); directory)
int apply_relocate_add(Elf_Shdr *sechdrs, open("/etc/ld.so.cache", O_RDONLY) = 3
const char *strtab, fstat64(3, {st_mode=S_IFREG|0644, st_size=89804, ...}) = 0
unsigned int symindex, mmap2(NULL, 89804, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb76f6000
unsigned int relsec, close(3) = 0
struct module *mod); access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or
int module_finalize(const Elf_Ehdr *hdr, directory)
const Elf_Shdr *sechdrs, open("/lib/tls/i686/cmov/libc.so.6", O_RDONLY) = 3
struct module *mod); read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\
void module_arch_cleanup(struct module *mod); 260l\1\0004\0\0\0"..., 512) = 512
#endif fstat64(3, {st_mode=S_IFREG|0755, st_size=1319364, ...}) = 0
 rmmod – remove a module from the kernel mmap2(NULL, 1325416, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_
 depmod – to find out the dependencies between modules DENYWRITE, 3, 0) = 0xb75b2000
 kerneld – for invoking the kerneld daemon mmap2(0xb76f0000, 12288, PROT_READ|PROT_WRITE, MAP_
 ksyms – To display the symbols exported by the kernel, PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x13e) = 0xb76f0000
which the module can use mmap2(0xb76f3000, 10600, PROT_READ|PROT_WRITE, MAP_
 lsmod – To list the loaded modules (see the code for the PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb76f3000
structure of its output) close(3) = 0
aasisvinayak@GNU-BOX:~$ lsmod mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_
Module Size Used by ANONYMOUS, -1, 0) = 0xb75b1000
cbc 3516 1052 set_thread_area({entry_number:-1 -> 6, base_addr:0xb75b18d0,
aes_i586 8124 1053 limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1,
aes_generic 27484 1 aes_i586 seg_not_present:0, useable:1}) = 0
ecb 2524 1 mprotect(0xb76f0000, 8192, PROT_READ) = 0
binfmt_misc 8356 1 mprotect(0x8052000, 4096, PROT_READ) = 0
ppdev 6688 0 mprotect(0xb772a000, 4096, PROT_READ) = 0
vboxnetadp 78760 0 munmap(0xb76f6000, 89804) = 0
vboxnetflt 85288 0 uname({sys="Linux", node="GNU-BOX", ...}) = 0
vboxdrv 121608 1 vboxnetflt fstat64(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
dm_crypt 12928 0 brk(0) = 0x9dec000
<<<CODE TRUNCATED>>> brk(0x9e0d000) = 0x9e0d000
 modinfo – To display the contents from the modinfo open("/etc/modprobe.conf", O_RDONLY) = -1 ENOENT (No such file or
section. For example, the information about my video directory)
device can be displayed using the following code: open("/etc/modprobe.d", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_
aasisvinayak@GNU-BOX:~$ modinfo videodev DIRECTORY|O_CLOEXEC) = 3
filename: /lib/modules/2.6.31-17-generic-pae/kernel/drivers/media/ fcntl64(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
video/videodev.ko getdents(3, /* 12 entries */, 32768) = 360
alias: char-major-81-* getdents(3, /* 0 entries */, 32768) = 0
license: GPL close(3) = 0
description: Device registrar for Video4Linux drivers v2 open("/etc/modprobe.d/alsa-base.conf", O_RDONLY) = 3
author: Alan Cox, Mauro Carvalho Chehab <mchehab@infradead.org> fstat64(3, {st_mode=S_IFREG|0644, st_size=2497, ...}) = 0
srcversion: 943A5776552E0E369C89F77 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_
depends: v4l1-compat ANONYMOUS, -1, 0) = 0xb770b000
vermagic: 2.6.31-17-generic-pae SMP mod_unload modversions 586 read(3, "# autoloader aliases\ninstall sou"..., 4096) = 2497
 modprobe – helps you to load/remove modules in a better read(3, "", 4096) = 0
way. (We will discuss this later in this segment as well.) close(3) = 0
Here is the output that shows how dummy.ko is loaded munmap(0xb770b000, 4096) = 0
into the kernel:
aasisvinayak@GNU-BOX:~$ strace modprobe dummy I also suggest that beginners to take a look at the /proc/
execve("/sbin/modprobe", ["modprobe", "dummy"], [/* 35 vars */]) = 0 module (please note, that this will get updated even if you
brk(0) = 0x9dec000 leave it open):
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or
directory) msdos 7836 0 - Live 0xf856d000
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ fat 51452 1 msdos, Live 0xfac5e000
ANONYMOUS, -1, 0) = 0xb770c000 cbc 3516 1163 - Live 0xfa37d000

www.LinuxForU.com  |  LINUX For You  |  FEBRUARY 2010  |  103


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

aes_i586 8124 1164 - Live 0xf96a5000 MODULE_LICENSE("GPL v3");


aes_generic 27484 1 aes_i586, Live 0xfa366000 MODULE_AUTHOR("Aasis Vinayak PG ");
ecb 2524 1 - Live 0xf9591000
binfmt_misc 8356 1 - Live 0xf95d6000 static short int short_integer = 0;
ppdev 6688 0 - Live 0xf9578000 static int integer = 100;
vboxnetadp 78760 0 - Live 0xfa06a000 static long int long_integer = 999999;
vboxnetflt 85288 0 - Live 0xfa03a000 static char *string_entry = "GNU-Linux";
vboxdrv 121608 1 vboxnetflt, Live 0xf9fff000 static int integerArray[2] = { -1, -1 };
dm_crypt 12928 0 - Live 0xf827e000 static int arr_arg = 0;
bridge 47952 0 - Live 0xf9732000
stp 2272 1 bridge, Live 0xf971c000 module_param(short_integer, short, S_IRUSR | S_IWUSR | S_IRGRP |
bnep 12060 2 - Live 0xf9713000 S_IWGRP);
snd_hda_codec_idt 59844 1 - Live 0xf96c4000 MODULE_PARM_DESC(short_integer, "A short integer");
snd_hda_intel 26984 4 - Live 0xf9696000 module_param(integer, int, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
snd_hda_codec 75708 2 snd_hda_codec_idt,snd_hda_intel, Live 0xf9674000 MODULE_PARM_DESC(integer, "An integer");
snd_hwdep 7200 1 snd_hda_codec, Live 0xf9653000 module_param(long_integer, long, S_IRUSR);
MODULE_PARM_DESC(long_integer, "A long integer");
Another important file is /etc/modprobe.d/blacklist. You module_param(string_entry, charp, 0000);
may also check other files in this folder. This file (which MODULE_PARM_DESC(string_entry, "A String");
contains a list) allows you to block some of the modules module_param_array(integerArray, int, &arr_arg, 0000);
from getting loaded, thus paving the way to use alternative MODULE_PARM_DESC(integerArray, "An array of integers");
drivers. Sometimes a fake module may also try to block
your 'free access' to a device. You can use this file to block static int __init tutorial1_init(void)
the same from getting loaded. Here is a sample of an entry, {
for your reference: int i;
printk(KERN_INFO "Welcome to another day of this Voyage\n");
# evbug is a debug tool that should be loaded explicitly printk(KERN_INFO "short_integer is a short integer: %hd\n", short_integer);
blacklist evbug printk(KERN_INFO "integer is an integer: %d\n", integer);
# these drivers are very simple, the HID drivers are usually preferred printk(KERN_INFO "long_integer is a long integer: %ld\n", long_integer);
blacklist usbmouse printk(KERN_INFO "string_entry is a string: %s\n", string_entry);
blacklist usbkbd for (i = 0; i < (sizeof integerArray / sizeof (int)); i++)
{
# replaced by e100 printk(KERN_INFO "The integerArray[%d] = %d\n", i, integerArray[i]);
blacklist eepro100 }
printk(KERN_INFO "has %d arguments.\n", arr_arg);
# replaced by tulip return 0;
blacklist de4x5 }
# causes no end of confusion by creating unexpected network interfaces
blacklist eth1394 static void __exit tutorial1__exit(void)
# snd_intel8x0m can interfere with snd_intel8x0, doesn't seem to support much {
# hardware on its own (Ubuntu bug #2011, #6810) printk(KERN_INFO "Exiting the module\n");
blacklist snd_intel8x0m }

<<<OUTPUT TRUNCATED>>> module_init(tutorial1_init);


module_exit(tutorial1__exit);
Having discussed these aspects, let's write a module that
does more functions than the first one we wrote (refer to As you can see in the code, after including the relevant
Segment 3 for the first tutorial): header files (necessary for compiling) and declaring the
Open your editor and write the following lines of code: variables, we start the kernel specific code portion. The next
part of the code is used for handing the parameter. And we
#include <linux/module.h> have already seen why we use _init and _exit. Even if you are
#include <linux/moduleparam.h> a beginner, I am sure that you would have guessed what this
#include <linux/kernel.h> module is doing—we have written a module that can receive
#include <linux/init.h> values as parameters!
#include <linux/stat.h> Now we need to write a make file. And here is the
code for it:

104  |  FEBRUARY 2010  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

obj-m += VTK_module_tutorial_1.o
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean Figure 1: More files are created after running make

Save this in the same directory in which you have saved { 0x45947727, "param_array_set" },
your C program. Now you can cd to that directory from the { 0xb72397d5, "printk" },
terminal and issue the make command: };
static const char __module_depends[]
aasisvinayak@GNU-BOX:~/Desktop/modules$ make __used
make -C /lib/modules/2.6.31-17-generic-pae/build M=/home/aasisvinayak/ __attribute__((section(".modinfo"))) =
Desktop/modules modules "depends=";
make[1]: Entering directory `/usr/src/linux-headers-2.6.31-17-generic-pae'
CC [M] /home/aasisvinayak/Desktop/modules/VTK_module_tutorial_1.o MODULE_INFO(srcversion, "FEC7A3AF90AAF6E9B3EA05B");
Building modules, stage 2.
MODPOST 1 modules Though we have seen a few aspects/properties of these
CC /home/aasisvinayak/Desktop/modules/VTK_module_tutorial_1.mod.o files already, we will revisit these areas and cover them in
LD [M] /home/aasisvinayak/Desktop/modules/VTK_module_tutorial_1.ko depth. Since this is an introductory column for the new
make[1]: Leaving directory `/usr/src/linux-headers-2.6.31-17-generic-pae' segment, I'm avoiding the minute details.
Now you can load the module to the kernel and check if it
You can now see a few other files created in the directory is getting loaded properly:
(refer to Figure 1).
If you are a Java developer, then you must have used aasisvinayak@GNU-BOX:~/Desktop/modules$ sudo insmod VTK_module_
getters and setters while meddling with Java beans. Here, in tutorial_1.ko
one of the automatically generated files (source), you can find aasisvinayak@GNU-BOX:~/Desktop/modules$ lsmod
something similar: Module Size Used by
VTK_module_tutorial_1 2916 0
#include <linux/module.h> msdos 7836 0
#include <linux/vermagic.h> fat 51452 1 msdos
#include <linux/compiler.h> <<<OUTPUT TRUNCATED>>>

MODULE_INFO(vermagic, VERMAGIC_STRING); Now let's test it by passing a parameter. We will


pass a valid argument at first and then we will pass an
struct module __this_module invalid one.
__attribute__((section(".gnu.linkonce.this_module"))) = {
.name = KBUILD_MODNAME, aasisvinayak@GNU-BOX:~/Desktop/modules$ sudo insmod VTK_module_
.init = init_module, tutorial_1.ko string_entry="hooray" integerArray=-1
#ifdef CONFIG_MODULE_UNLOAD aasisvinayak@GNU-BOX:~/Desktop/modules$ sudo rmmod VTK_module_
.exit = cleanup_module, tutorial_1aasisvinayak@GNU-BOX:~/Desktop/modules$ sudo insmod VTK_
#endif module_tutorial_1.ko string_entry="hoo" integerArray=-1 invalid_entry="null"
.arch = MODULE_ARCH_INIT, insmod: error inserting 'VTK_module_tutorial_1.ko': -1 Unknown symbol in
}; module
static const struct modversion_info ____versions[]
__used We have not yet covered some other basic elements
__attribute__((section("__versions"))) = { like checking the modules (loaded) regularly and removing
{ 0x4eead741, "module_layout" }, (cleaning) the ones that are not in use. We will cover these
{ 0xb224fbe2, "param_get_short" }, topics in the next segment.
{ 0x4333eadb, "param_set_short" }, Happy kernel hacking!
{ 0x6980fe91, "param_get_int" },
{ 0xff964b25, "param_set_int" }, By: Aasis Vinayak PG
{ 0x8bd5b603, "param_get_long" }, The author is a hacker and a free software activist who does
{ 0x3457cb68, "param_set_long" }, programming in the open source domain. He is the developer
{ 0x41344088, "param_get_charp" }, of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
{ 0x6ad065f4, "param_set_charp" },
www.aasisvinayak.com
{ 0x43ab66c3, "param_array_get" },

www.LinuxForU.com  |  LINUX For You  |  FEBRUARY 2010  |  105


A Voyage to the
Kernel
Part 22
Segment: 4.2, Day 21

I
n the previous article, we’d looked at how 81 video4linux
to add more features to a module. But you 99 ppdev
couldn’t call what we’d created a 'device 108 ppp
driver'. In this article, we will devote our 116 alsa
time to developing a driver for a simple 'device'. As we 128 ptm
discussed before, Linux treats devices as files. You may 136 pts
scan through your /dev/ directory to see the listing. It is 180 usb
through the addition of a file node in this directory that 189 usb_device
we make a physical device accessible.
You may notice that many files are listed in this <<<OUTPUT TRUNCATED>>>
directory. This does not mean that all these files
(devices) are active. Issue the following command to Here I can see some names and a numerical value.
see the active ones: The numeral is actually the major number associated
with the device driver. (We will discuss major and minor
cat /proc/devices numbers after writing the code for the new driver.)
Having got this information, I can now allocate a
We have also seen that there are character and major number that has not been assigned previously.
block devices. (We looked at network devices as well.) All I need to note is that the new number should not be
The type of the device informs us about the way in on this list.
which data will be written to the corresponding device. Now let me show you the complete code for the
In the case of a character device, it is added serially new driver:
(byte by byte) and for a block device, the data is added
as large segments (an ideal example will be HDD). #include"LFY_device_header.h"
Now I am going to list the active devices in #include<linux/module.h>
my system: #include<linux/init.h>

aasisvinayak@GNU-BOX:~$ cat /proc/devices MODULE_AUTHOR("Aasis Vinayak PG");


Character devices: MODULE_DESCRIPTION("Written for Voyage to Kernel");
1 mem
4 /dev/vc/0 static int LFY_device_init(void);
4 tty static void LFY_device_cleanup(void);
4 ttyS
5 /dev/tty module_init(LFY_device_init);
5 /dev/console module_exit(LFY_device_cleanup);
5 /dev/ptmx
6 lp static int LFY_device_init(void)
7 vcs {
10 misc if(register_chrdev(161,"LFY_device",&LFY_ops))
13 input {
14 sound printk("<1>failed to register");
21 sg }
29 fb return 0;

www.LinuxForU.com  |  LINUX For You  |  MARCH 2010  |  103


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

} Now, let's analyse the code. One of the important files


static void LFY_device_cleanup(void) that the code is referring to is the linux/fs.h file. The file_
{ operations structure is described in this file. And if you look
unregister_chrdev(161,"LFY_device"); at this file carefully, you can see that it contains the necessary
return ; pointers to functions that we used in our driver. In the case of
} an actual device, this will help you in performing the basic
operations on your device. Here, each field is linked to the
As you can see, the above module code contains an include address of a function that we used in the driver. The basic
statement for the file LFY_device_header.h. This is a custom operations, like reading, writing, etc, are handled using these.
header file for our driver. Let me show you that file as well: For your reference, here is the struct file_operations code:

#ifndef _LFY_DEVICE_H struct file_operations {


#define _LFY_DEVICE_H struct module *owner;
#include <linux/fs.h> loff_t (*llseek) (struct file *, loff_t, int);
#include <linux/sched.h> ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
#include <linux/errno.h> ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
#include <asm/current.h> ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned
#include <asm/segment.h> long, loff_t);
#include <asm/uaccess.h> ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned
long, loff_t);
char LFY_device_data[80]="Sample data - A Voyage to Kernel"; int (*readdir) (struct file *, void *, filldir_t);
int LFY_device_open(struct inode *inode,struct file *filp); unsigned int (*poll) (struct file *, struct poll_table_struct *);
int LFY_device_release(struct inode *inode,struct file *filp); int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
ssize_t LFY_device_read(struct file *filp,char *buffer,size_t count,loff_t *offp ); long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
ssize_t LFY_device_write(struct file *filp,const char *buffer,size_t long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
count,loff_t *offp ); int (*mmap) (struct file *, struct vm_area_struct *);
int (*open) (struct inode *, struct file *);
struct file_operations LFY_ops={ int (*flush) (struct file *, fl_owner_t id);
open: LFY_device_open, int (*release) (struct inode *, struct file *);
read: LFY_device_read, int (*fsync) (struct file *, struct dentry *, int datasync);
write: LFY_device_write, int (*aio_fsync) (struct kiocb *, int datasync);
release:LFY_device_release, int (*fasync) (int, struct file *, int);
}; int (*lock) (struct file *, int, struct file_lock *);
int LFY_device_open(struct inode *inode,struct file *filp) ssize_t (*sendpage) (struct file *, struct page *, int, size_t,
{ loff_t *, int);
return 0; unsigned long (*get_unmapped_area)(struct file *, unsigned long,
} unsigned long, unsigned long, unsigned long);
int (*check_flags)(int);
int LFY_device_release(struct inode *inode,struct file *filp) int (*flock) (struct file *, int, struct file_lock *);
{ ssize_t (*splice_write)(struct pipe_inode_info *, struct file *,
return 0; loff_t *, size_t, unsigned int);
} ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_
ssize_t LFY_device_read(struct file *filp,char *buffer,size_t count,loff_t *offp ) info *, size_t, unsigned int);
{ int (*setlease)(struct file *, long, struct file_lock **);
if (copy_to_user(buffer,LFY_device_data,strlen(LFY_device_data)) != 0 ) };
printk( "User-space - copy failed\n" );
return strlen(LFY_device_data); You may also find that many operations are supported by this
} file. It will even help you to perform many other tasks like reading
ssize_t LFY_device_write(struct file *filp,const char *buffer,size_t a directory structure and so on. In our case, we are not performing
count,loff_t *offp ) these types of operations. So you can set the corresponding entries
{ to null. In our code, we used the following format:
if ( copy_from_user(LFY_device_data,buffer,count) != 0 )
printk( "Copy failed\n" ); struct file_operations LFY_ops={
return 0; open: LFY_device_open,
} read: LFY_device_read,
#endif write: LFY_device_write,

104  |  MARCH 2010  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

release:LFY_device_release, Another important one is filp, which is the instance of file


} (struct):

You may note that some modern drivers use another extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int force);
format, as shown below:
Here is the relevant code corresponding to the struct file:
struct file_operations LFY_ops = {
open = LFY_device_open, struct file {
read = LFY_device_read, /*
write = LFY_device_write, * fu_list becomes invalid after file_free is called and queued via
release = LFY_device e_release * fu_rcuhead for RCU freeing
}; */
union {
This is supported by a GCC extension and is helpful struct list_head fu_list;
if you want to port your device driver. You can use the old struct rcu_head fu_rcuhead;
format for this tutorial. } f_u;
Similarly, /dev/ is represented by a file structure (you may struct path f_path;
look at linux/fs.h for more details). There are a few points that #define f_dentry f_path.dentry
you should note here. The 'file' (in this context) is a kernel- #define f_vfsmnt f_path.mnt
level structure and never goes to the user-space program level. const struct file_operations *f_op;
But you should not mistake this for FILE, which is defined by spinlock_t f_lock; /* f_ep_links, f_flags, no IRQ */
the glibc library and is not a part of the kernel-space function. atomic_long_t f_count;
Our 'file' is not a file on the disk but rather an 'abstract open unsigned int f_flags;
file', represented by inode (struct). fmode_t f_mode;
Also, we have the inode_operations. For your information, loff_t f_pos;
here is the struct inode_operations: struct fown_struct f_owner;
const struct cred *f_cred;
struct inode_operations { struct file_ra_state f_ra;
int (*create) (struct inode *,struct dentry *,int, struct nameidata *);
struct dentry * (*lookup) (struct inode *,struct dentry *, struct u64 f_version;
nameidata *); #ifdef CONFIG_SECURITY
int (*link) (struct dentry *,struct inode *,struct dentry *); void *f_security;
int (*unlink) (struct inode *,struct dentry *); #endif
int (*symlink) (struct inode *,struct dentry *,const char *); /* needed for tty driver, and maybe others */
int (*mkdir) (struct inode *,struct dentry *,int); void *private_data;
int (*rmdir) (struct inode *,struct dentry *);
int (*mknod) (struct inode *,struct dentry *,int,dev_t); #ifdef CONFIG_EPOLL
int (*rename) (struct inode *, struct dentry *, /* Used by fs/eventpoll.c to link all the hooks to this file */
struct inode *, struct dentry *); struct list_head f_ep_links;
int (*readlink) (struct dentry *, char __user *,int);
void * (*follow_link) (struct dentry *, struct nameidata *); #endif /* #ifdef CONFIG_EPOLL */
void (*put_link) (struct dentry *, struct nameidata *, void *); struct address_space *f_mapping;
void (*truncate) (struct inode *); #ifdef CONFIG_DEBUG_WRITECOUNT
int (*permission) (struct inode *, int); unsigned long f_mnt_write_state;
int (*setattr) (struct dentry *, struct iattr *); #endif
int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *); }
int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t); You may have noticed that we have used a major
ssize_t (*listxattr) (struct dentry *, char *, size_t); number in our code. This is used while registering the
int (*removexattr) (struct dentry *, const char *); character device:
void (*truncate_range)(struct inode *, loff_t, loff_t);
long (*fallocate)(struct inode *inode, int mode, loff_t offset, extern int register_chrdev_region(dev_t, unsigned, const char *);
loff_t len); extern int register_chrdev(unsigned int, const char *,
int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, const struct file_operations *);
u64 len);
} The major number is employed to indicate the driver

www.LinuxForU.com  |  LINUX For You  |  MARCH 2010  |  105


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

to be used to access the hardware. And each driver will be


assigned a unique major number. All device files carrying
the same major number are handled by the same driver. As
this number is unique, we earlier used cat /proc/devices to
see the major number of the active devices.
You may also note that we have a minor number (which is
a sort of internal number) that is used by the driver to identify
the different hardware it can access. (See the output of cat
/proc/modules in Figure 1.)
You can also see the minor and major numbers of the
devices using ls -l command (Figure 2).
During the 'registration' (see the code given above),
the unsigned int corresponds to the major number and the
const char * refers to the name of the device (to appear in
the output of cat /proc/devices). As we discussed earlier, Figure 1: Output of cat /proc/modules/
we use the struct file_operations *fops to refer to the 'file
operations table'.
When you develop an actual device driver, it is advisable
that you consult the documentation provided by your
distribution to find out an unused major number.
Some of you might be wondering how we are going to
use the driver we wrote, without having hardware (dev) to
test. We can create a virtual one using the mknod command.
Before doing that, let us compile our driver. (If you find it
hard to create a simple 'make file' for this, please refer to the
previous article.)
If you look at the output (Figure 3), you will notice an
error. This is because we forgot to add a licence statement in
our code. You can remove the error by adding the following
Figure 2: Listing the devices with major and minor numbers
line to the code:

MODULE_LICENSE("GPL v3");

Now the driver has been compiled. Let's add (load) the
driver to the kernel by issuing the following command:

sudo insmod LFY_dev.ko


It is time to create our device! We can do that by using the
following command as the root user:

mknod /dev/LFY_device c 161 0 Figure 3: Compiling the driver

This actually creates a new file in the /dev/ directory (Figure 4).
We have also specified (in the command) that it should be
a character device (c) and the major number should be 161.
You can see the new major and minor numbers of the new
device we created, by using the ls -l command—see Figure 5.
Our driver and device are ready for use. Let's send some
data to the device (after becoming the root):

echo "Voyage to Kernel" > /dev/LFY_device

If you look at the code carefully you can see that this
command can effectively keep on sending the data again
and again. And now we can 'read' the device by issuing the
following command: Figure 4: Listing of /dev/ directory

106  |  MARCH 2010  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

cat /dev/LFY_device

It will give us the output shown in Figure 6.


One point we missed out in our discussion is the copy_
from_user (asm/uaccess.h). And here is the code for it:

static inline long copy_from_user(void *to,


const void __user * from, unsigned long n)
{
might_sleep();
if (access_ok(VERIFY_READ, from, n)) Figure 5: Listing the new file created
return __copy_from_user(to, from, n);
else
return n;
}

static inline long copy_to_user(void __user *to,


const void *from, unsigned long n)
{
might_sleep();
if (access_ok(VERIFY_WRITE, to, n))
return __copy_to_user(to, from, n);
else
return n;
}

You can use the code given above to comprehend how the
actual process works.
One new header file that we added in this tutorial is the
errno.h file. And here are the 'important numbers' from the file:

#define ERESTARTSYS 512 Figure 6: Output of cat /dev/LFY_device


#define ERESTARTNOINTR 513
#define ERESTARTNOHAND 514 /* restart if no handler.. */ Now we know how to write a driver for a device. If you
#define ENOIOCTLCMD 515 /* No ioctl command */ wish to dig deeper into the subject by making a simple piece
#define ERESTART_RESTARTBLOCK 516 /* restart by calling sys_restart_syscall */ of actual hardware and controlling it with a Linux driver,
#define EBADHANDLE 521 /* Illegal NFS file handle */ please let me know. I will include such experiments in the
#define ENOTSYNC 522 /* Update synchronization mismatch */ upcoming days of the voyage.
#define EBADCOOKIE 523 /* Cookie is stale */ Happy hacking!
#define ENOTSUPP 524 /* Operation is not supported */
#define ETOOSMALL 525 /* Buffer or request is too small */
#define ESERVERFAULT 526 /* An untranslatable error occurred */ By: Aasis Vinayak PG
#define EBADTYPE 527 /* Type not supported by server */ The author is a hacker and a free software activist who does
#define EJUKEBOX 528 /* Request initiated, but will not complete before programming in the open source domain. He is the developer
timeout */
of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
#define EIOCBQUEUED 529 /* iocb queued, will get completion event */
www.aasisvinayak.com
#define EIOCBRETRY 530 /* iocb queued, will trigger a retry */

www.LinuxForU.com  |  LINUX For You  |  MARCH 2010  |  107


A Voyage to the
Kernel
Part 23

Segment 4.3, Day 22

T
o recapitulate what we’ve covered advantages of GIT over other systems as that is not
earlier, we have looked at some of the what we are interested in right now. If you want to
basic aspects of Linux kernel module find out more about these aspects and learn why
programming and we also wrote a you ought to switch to GIT to manage your project,
few sample drivers for testing. Now, we can start please see http://techblog.aasisvinayak.com/why-
experimenting with the kernel code itself. In order git-should-be-used-for-source-code-management-
to meddle with the core kernel code and contribute scm-or-revision-control/ .
to the project’s development, we need to use a good
source code management (SCM) tool. Getting started with GIT
This edition is dedicated entirely to introducing You can go to the official website (http://git-
GIT—the tool that I will use for source code scm.com/) and download the tar ball. Then,
management. It is a distributed version control install it 'from source'. If you are using a popular
system unlike CVS. This means that if you are using distribution, then you can find this in the
GIT, then you don't have to always rely on a central repository itself.
repository since it allows you to have your own After installing GIT, open the terminal and issue
repository. the following command:
I have tried many revision control solutions
before. And I feel that GIT and Mercurial are the best git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
among them. I am sure that most of you would have linux-2.6.git linux-2.6
used CVS or SVN. Well, accessing and downloading
the repository is not a painful process if you use any This command actually uses *get_repo_path() to
of these; and many of you would have used them to do the following:
download source files from SourceForge (most of
the projects use CVS and a few use GIT, Bazaar, or static char *get_repo_path(const char *repo, int *is_bundle)
Mercurial) or Google Code (which supports SVN {
and Mercurial). static char *suffix[] = { "/.git", ".git", "" };
But the problem is associated with the merging static char *bundle_suffix[] = { ".bundle", "" };
process. Since systems like CVS don't support struct stat st;
atomic operations, the work of a project manager int i;
becomes hectic. Moreover, you will have to rely
on a central repository, since it's not distributed in for (i = 0; i < ARRAY_SIZE(suffix); i++) {
nature. But as far as Linux kernel development is const char *path;
concerned, the distributed nature is very essential; path = mkpath("%s%s", repo, suffix[i]);
besides, the merging process should be fast and if (is_directory(path)) {
efficient. I won’t go into more details about the *is_bundle = 0;

94  | APRIL 2010 | LINUX FoR YoU | www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

return xstrdup(make_nonrelative_path(path));
}
}

for (i = 0; i < ARRAY_SIZE(bundle_suffix); i++) {


const char *path;
path = mkpath("%s%s", repo, bundle_suffix[i]); Figure 1: Creating a kernel 2.6.x tree in your system using GIT
if (!stat(path, &st) && S_ISREG(st.st_mode)) {
*is_bundle = 1;
return xstrdup(make_nonrelative_path(path));
}
}

return NULL;
}

It will create a kernel 2.6 tree in your system. However,


note that it will download around 300 MB of data.
If this step is successful, you will see an output similar to
the one shown in Figure 1.
Figure 2: Giggle, the GUI front-end for GIT
Now run the following command to make sure that you
have the latest kernel tree (this is essentially merging!):

aasisvinayak@GNU-BOX:~/linux-2.6$ git pull


Already up-to-date.

You can see that GIT has created a new directory named
linux-2.6 (since that was the name you gave while issuing
the first command) in your home directory. Now go to that
directory and look at the listings.
In order to create your own branch, issue the following
command:

aasisvinayak@GNU-BOX:~/ git checkout -b new-branch-name master

Issue git status to check the current branch. Figure 3: Who wrote this source code?

Installing GUI tools


In my opinion, Giggle is the best GUI tool for GIT. You First of all, let me add the project package and create an
can download this from http://live.gnome.org/giggle. After empty GIT repository:
installing it, you can open the repository you created and
browse through the contents. aasisvinayak@GNU-BOX:~/Desktop/git_tutorial$ tar xzf Java_app.tar.gz
You may also note that there is a .git/ folder in the aasisvinayak@GNU-BOX:~/Desktop/git_tutorial$ cd Java_app
directory that's used by the GIT for source code revisioning. aasisvinayak@GNU-BOX:~/Desktop/git_tutorial/Java_app$ git init-db
When you hover your mouse pointer over the source code, Initialized empty Git repository in /home/aasisvinayak/Desktop/git_tutorial/Java_app/.git/
a pop-up reads the description of the specific code portion
(author name, date added and so on—see Figure 3). Note that you can also use git init instead of git init-
Another option is the tree view (using History) which will db (refer to the source to find the difference). You can also
virtually summarise the development stages of the particular see that in this step we are not adding any project files, but
file (or the entire project). nonetheless some default files will get created.

Using GIT for source code management static int create_default_files(const char *template_path)
In order to explain source code management, I am going {
to use one of the Java programs that I have written. And I const char *git_dir = get_git_dir();
will show you how to add a new project to GIT and create a unsigned len = strlen(git_dir);
new repository. static char path[PATH_MAX];

www.LinuxForU.com  |  LINUX For You  |  APRIL 2010  |  95


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

struct stat st1;


char repo_version_string[10];
char junk[2];
int reinit;
int filemode;

if (len > sizeof(path)-50)


die("insane git directory %s", git_dir); Figure 4: The tree view summarises the development stages of a file
memcpy(path, git_dir, len);

if (len && path[len-1] != '/')


path[len++] = '/';

/*
* Create .git/refs/{heads,tags}
*/
safe_create_dir(git_path("refs"), 1);
safe_create_dir(git_path("refs/heads"), 1);
safe_create_dir(git_path("refs/tags"), 1);

<<CODE TRUNCATED>>

In the next stage, we will add files to the repository:


Figure 5: Entering a commit message
aasisvinayak@GNU-BOX:~/Desktop/git_tutorial/Java_app$ git add .

And here is the function for your reference:

static int add_files(struct dir_struct *dir, int flags)


{
int i, exit_status = 0;

if (dir->ignored_nr) {
fprintf(stderr, ignore_error);
for (i = 0; i < dir->ignored_nr; i++)
fprintf(stderr, "%s\n", dir->ignored[i]->name);
fprintf(stderr, "Use -f if you really want to add them.\n");
die("no files added");
}
Figure 6: Console reports files added to repository after commit
for (i = 0; i < dir->nr; i++)
if (add_file_to_cache(dir->entries[i]->name, flags)) { int add_files_to_cache(const char *prefix, const char **pathspec, int flags)
if (!ignore_add_errors) {
die("adding files failed"); struct update_callback_data data;
exit_status = 1; struct rev_info rev;
} init_revisions(&rev, prefix);
return exit_status; setup_revisions(0, NULL, &rev, NULL);
} rev.prune_data = pathspec;
rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
Now we can issue the commit command in order to rev.diffopt.format_callback = update_callback;
store it permanently. You might be wondering why we data.flags = flags;
need to do this when we have already 'added' files to the data.add_errors = 0;
repository. Well, previously when we 'added' the files, GIT rev.diffopt.format_callback_data = &data;
simply stored them to a temporary storage called index. As run_diff_files(&rev, DIFF_RACY_IS_MODIFIED);
you can see in the source, there is an add_files_to_cache return !!data.add_errors;
function for this: }

96  |  APRIL 2010  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

Only after issuing the commit command will GIT save it


permanently in the repository.
Once you issue the command, GIT will also ask you to
enter your commit message. It will automatically add some of
the information about the committer as comments. As shown in
Figure 5, enter your 'commit message' and then save the file.
At this stage, you can see that the files are actually added
to the repository (Figure 6), which is handled by the following
lines of code: Figure 7: Giggle shows the sample commit

static int commit_index_files(void)


{
int err = 0;

switch (commit_style) {
case COMMIT_AS_IS:
break; /* nothing to do */
case COMMIT_NORMAL:
err = commit_lock_file(&index_lock);
break;
case COMMIT_PARTIAL:
err = commit_lock_file(&index_lock);
rollback_lock_file(&false_lock);
break;
} Figure 8: Adding a comment line to one of the files

return err;
} for (i = 0; i < dir->nr; i++)
if (add_file_to_cache(dir->entries[i]->name, flags)) {
You can cross-check this one using Giggle as shown in if (!ignore_add_errors)
Figure 7. die("adding files failed");
exit_status = 1;
Testing GIT }
We have created a repository now. Let’s check the source code return exit_status;
revision control mechanism used by GIT. In order to do that,
we need to edit one of the files that we have added. In Figure 8 Use diff to check the revision:
you can see I have edited one of the files by adding a comment
line for testing GIT. aasisvinayak@GNU-BOX:~/Desktop/git_tutorial/Java_app$ git diff --cached
diff --git a/src/java/esd/cw/model/GamesScheme.java b/src/java/esd/cw/model/
Let's issue add now: GameScheme.java
index 31587f1..f2b1ce8 100644
aasisvinayak@GNU-BOX:~/Desktop/git_tutorial/Java_app$ git add ./src/java/ --- a/src/java/esd/cw/model/GamesScheme.java
esd/cw/model/GamesScheme.java +++ b/src/java/esd/cw/model/GameScheme.java
@@ -1,4 +1,5 @@
This will be handled by the add_files function: package esd.cw.model;
static int add_files(struct dir_struct *dir, int flags) +#For testing GIT
{ import java.io.Serialisable;
int i, exit_status = 0; import java.util.Date;
(END)
if (dir->ignored_nr) {
fprintf(stderr, ignore_error); As you can see, we are using an argument 'cached' here.
for (i = 0; i < dir->ignored_nr; i++) You can comprehend the reason by looking at the code:
fprintf(stderr, "%s\n", dir->ignored[i]->name);
fprintf(stderr, "Use -f if you really want to add them.\n"); static int builtin_diff_index(struct rev_info *revs,
die("no files added"); int argc, const char **argv)
} {

www.LinuxForU.com  |  LINUX For You  |  APRIL 2010  |  97


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

int cached = 0; I can communicate with an online GIT repository server


while (1 < argc) { after configuring the certificates (as we do normally).
const char *arg = argv[1]; Now I can issue the following commands:
if (!strcmp(arg, "--cached") || !strcmp(arg, "--staged"))
cached = 1; aasisvinayak@GNU-BOX:~$ git checkout master
else aasisvinayak@GNU-BOX:~$ git remote add origin git@mydomain.com:sample-java-
usage(builtin_diff_usage); project-for-LFYcolumn/sample-java-project-for-LFYcolumn .git
argv++; argc--; aasisvinayak@GNU-BOX:~$ git push origin master
}
if (!cached) This will push the local repository to the externally
setup_work_tree(); hosted one. Now anyone can pull this just like we cloned the
/* repository from kernel.org.
* Make sure there is one revision (i.e. pending object), After pulling, you can create your own branch and
* and there is no revision filtering parameters. your own repository. After making some good additions
*/ to the code, you can ask the owner of the master branch to
if (revs->pending.nr != 1 || 'pull' from your repository. He or she can pull it from your
revs->max_count != -1 || revs->min_age != -1 || repository and check the code. Even if he or she doesn't
revs->max_age != -1) perform this task, your branch will still be the modified one
usage(builtin_diff_usage); (in your repository).
if (read_cache_preload(revs->diffopt.paths) < 0) { Here are a few more commands that you can use while
perror("read_cache_preload"); playing around with GIT:
return -1;  git checkout -b directory
}  git checkout – file
return run_diff_index(revs, cached);  git add file
}  git diff HEAD
 git commit -a -s
Now 'commit' (just like we did before). After committing,  git reset –soft HEAD^
GIT will also give a summary of the changes.  git diff ORIG_HEAD
 git commit -a -c ORIG_HEAD
aasisvinayak@GNU-BOX:~/Desktop/git_tutorial/Java_app$ git commit  git checkout master
[master f199bc0] tested GIT techblog.aasisvinayak.com  git merge 'segment'
1 files changed, 1 insertions(+), 0 deletions(-)  git log –since='time period'

Since I am using the master branch, there is nothing much In the next couple of days we will discuss some more
to do. You can also view the changes using the GUI tools. kernel code and the functions used in it. This will help you
to start experimenting with actual kernel code, and enable
Distributed model you to make contributions to its development. Happy
As you already know, kernel development takes place in a hacking!
distributed way. Let me elucidate this by decentralising our
existing repository.
To begin with, I need to host the code in a server. Before By: Aasis Vinayak PG
uploading the contents, I need to configure the local machine The author is a hacker and a free software activist who does
by issuing the following code: programming in the open source domain. He is the developer
of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
aasisvinayak@GNU-BOX:~$ git config --global user.name "Aasis Vinayak"
www.aasisvinayak.com
aasisvinayak@GNU-BOX:~$ git config --global user.email "aasisvinayak@gmail.com"

98  |  APRIL 2010  |  LINUX For You  |  www.LinuxForU.com


A Voyage to the
Kernel
Part 24
Segment: 4.4, Day 23

I
n the last issue, we looked at how to track In the previous columns, we have seen the use of
the changes in kernel code using GIT. And major and minor numbers. And we know that if we use
we have already discussed how to write very a major number to indicate a particular device, then
simple device drivers. Now, we can start the same driver is used for accessing it, irrespective of
experimenting with code and contribute. If the code is the minor number details. As a consequence, you need
good enough, it will appear in the main kernel tree! to register two different major numbers, say DD1 and
One of the main requests from novices, regarding DD2, to access the same device. Only then can we use it
my column, is to start a new segment on C/C++ as a 'pointer'. But if you do this in each and every case,
programming. So I think it is better to clarify my then you will be essentially wasting RAM resources.
stand here. Well, I don't think it’s a good idea to start This can be solved to an extent by using the misc
another segment to address such matters, mainly due driver. You may note that the misc driver is assigned
to two reasons. First, it is too late to start any such with a static major number 10. You can register your
discussion—we are already meddling with kernel mini drivers (or modules) with this driver and assign a
topics. Second, we will be deviating too much from the number for it.
actual subject. For your reference, here are mini drivers from the
In order to come out with really novel and sound source:
ideas, you must be good at writing algorithms. This
was the reason I dedicated a couple of columns to this #define PSMOUSE_MINOR 1
aspect earlier. From this point of view, it is important #define MS_BUSMOUSE_MINOR 2
that you know C and C++, in depth. Considering #define ATIXL_BUSMOUSE_MINOR 3
these factors, the best thing I could do is to put up /*#define AMIGAMOUSE_MINOR 4 FIXME OBSOLETE */
more tutorials online. I have already posted some #define ATARIMOUSE_MINOR 5
introductory material that you can find at http:// #define SUN_MOUSE_MINOR 6
techblog.aasisvinayak.com/category/cc. If you could #define APOLLO_MOUSE_MINOR 7
specify what topics you would like me to cover, I can #define PC110PAD_MINOR 9
make posts related to that space. So now let's go back /*#define ADB_MOUSE_MINOR 10 FIXME OBSOLETE */
to our core topic. #define WATCHDOG_MINOR 130 /* Watchdog timer */
#define TEMP_MINOR 131 /* Temperature Sensor */
Miscellaneous character drivers #define RTC_MINOR 135
Though we did discuss character drivers, we didn't #define EFI_RTC_MINOR 136 /* EFI Time services */
go too deep into 'Miscellaneous Character Drivers'. #define SUN_OPENPROM_MINOR 139
Consider this scenario: You are writing a small device #define DMAPI_MINOR 140 /* DMAPI */
driver, which is nothing but a small add-on or hack to an #define NVRAM_MINOR 144
existing hardware or piece of software. Though you can #define SGI_MMTIMER 153
register this as a new module and load the same (refer #define STORE_QUEUE_MINOR 155
to the previous columns for more information), it is not #define I2O_MINOR 166
really the recommended way of performing this task. #define MICROCODE_MINOR 184
To do this, the Linux kernel offers an 'interface' that #define TUN_MINOR 200
allows developers to register their small drivers. This is #define MWAVE_MINOR 219 /* ACP/Mwave Modem */
the whole purpose of the misc driver. #define MPT_MINOR 220

100  |  MAY 2010  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

#define MPT2SAS_MINOR 221 .read = seq_read,


#define HPET_MINOR 228 .llseek = seq_lseek,
#define FUSE_MINOR 229 .release = seq_release,
#define KVM_MINOR 232 };
#define MISC_DYNAMIC_MINOR 255
In order to register and unregister, you need to have the
The structure of the miscdevice is defined below: following lines in the code:

struct miscdevice { #include <linux/miscdevice.h>


int minor; int misc_register(struct miscdevice * misc);
const char *name; int misc_deregister(struct miscdevice * misc);
const struct file_operations *fops;
struct list_head list; As mentioned before, you need to assign a unique (i.e., it
struct device *parent; shouldn't be in use) value to the driver and you can do that in a
struct device *this_device; dynamic way as well by using the following lines of code:
const char *devnode; static struct miscdevice my_device;
};
int init_module(void)
The minor used in the structure actually refers to the {
minor number being registered. It is expected that you choose int my_min_no;
a unique number while registering the driver (since you are my_device.minor = MISC_DYNAMIC_MINOR;
going to use this as the pointer to the device from the driver my_device.name = "DD1";
DD1 or DD2). You can easily do this by looking at the entries my_device.fops = &my_fops;
hard coded (see the code that 'listed' the values) and by 'cat- my_min_no = misc_register(&my_device);
ing' the /proc/misc file (which will also show the name of a if (my_min_no) return my_min_no;
particular device registered with the misc driver): printk("DD1 was assigned minor %i\n",my_device.minor);
return 0;
linux-el8p:/proc/bus/usb # cat /proc/misc }
57 vboxnetctl
58 vboxdrv MISC_DYNAMIC_MINOR is used with the misc_
229 fuse register function which will register a miscellaneous device.
59 device-mapper You may note that if you are using MISC_DYNAMIC_
60 rfkill MINOR in the code, then a minor number is generated and
130 watchdog used in the minor field of the structure (which I showed you
175 agpgart earlier). If you are not using this method, then the misc_
61 network_throughput register function will use the number that you have hardcoded
62 network_latency in your source.
63 cpu_dma_latency Here is code that performs the 'registration':
1 psaux
144 nvram int misc_register(struct miscdevice * misc)
228 hpet {
231 snapshot struct miscdevice *c;
227 mcelog dev_t dev;
linux-el8p:/proc/bus/usb # int err = 0;

As you would have guessed, the name used in the INIT_LIST_HEAD(&misc->list);


structure of the miscdevice indicates the name for the device
that you are going to handle (just like the names that appeared mutex_lock(&misc_mtx);
while reading the entries from /proc/misc file). And fops is list_for_each_entry(c, &misc_list, list) {
used to indicate the file operations that you want to perform if (c->minor == misc->minor) {
on a particular device. Here's the code (which can be found in mutex_unlock(&misc_mtx);
miscdevice.h) that handles this structure: return -EBUSY;
}
static const struct file_operations misc_proc_fops = { }
.owner = THIS_MODULE,
.open = misc_seq_open, if (misc->minor == MISC_DYNAMIC_MINOR) {

www.LinuxForU.com  |  LINUX For You  |  MAY 2010  |  101


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

int i = DYNAMIC_MINORS; has already been assigned. So you need to be careful while
while (--i >= 0) choosing the numbers.
if ( (misc_minors[i>>3] & (1 << (i&7))) == 0)
break; USB class devices and the USB device
if (i<0) { filesystem
mutex_unlock(&misc_mtx); If you have started using Linux recently, then you may not
return -EBUSY; have experienced any problems with the USB devices. But
} this was not the case a few years back. I had issues while
misc->minor = i; meddling with different types of data transfers. Control
} transfer operations were comparatively fine in my case, as the
following basic operations (you can guess the 'operation' from
if (misc->minor < DYNAMIC_MINORS) the name itself) were supported by every one:
misc_minors[misc->minor >> 3] |= 1 << (misc->minor & 7); • GET_STATUS
dev = MKDEV(MISC_MAJOR, misc->minor); • CLEAR_FEATURE
• SET_FEATURE
misc->this_device = device_create(misc_class, misc->parent, dev, • SET_ADDRESS
misc, "%s", misc->name); • GET_DESCRIPTOR
if (IS_ERR(misc->this_device)) { • SET_DESCRIPTOR
err = PTR_ERR(misc->this_device); • GET_CONFIGURATION
goto out; • SET_CONFIGURATION
} • GET_INTERFACE
• SET_INTERFACE
• SYNCH_FRAME
list_add(&misc->list, &misc_list); This was not the case when it came to isochronous transfers
out: (especially for video and audio). But now Linux supports almost
mutex_unlock(&misc_mtx); all types of USB class devices—thanks to the USB sub-system!
return err; Since we need to have separate drivers for each and every
USB device, Linux developers came up with the skeleton for
As you can see, the code uses device_create() for the process the drivers. For your reference, here is the structure of the
and in case it returns an error, an error-code is displayed (using (skeleton) USB driver:
linux/errno.h). If everything goes fine, then it returns zero.
After successfully registering the miscellaneous device, it static struct usb_driver skel_driver = {
will live till you call misc_deregister(). Just like in the earlier .name = "skeleton",
case—if deregistering is successful, it returns zero, else it .probe = skel_probe,
shows the error number: .disconnect = skel_disconnect,
.suspend = skel_suspend,
int misc_deregister(struct miscdevice *misc) .resume = skel_resume,
{ .pre_reset = skel_pre_reset,
int i = misc->minor; .post_reset = skel_post_reset,
.id_table = skel_table,
if (list_empty(&misc->list)) .supports_autosuspend = 1,
return -EINVAL; };

mutex_lock(&misc_mtx); Similarly, they also defined the structure of file operations:


list_del(&misc->list);
device_destroy(misc_class, MKDEV(MISC_MAJOR, misc->minor)); static const struct file_operations skel_fops = {
if (i < DYNAMIC_MINORS && i>0) { .owner = THIS_MODULE,
misc_minors[i>>3] &= ~(1 << (misc->minor & 7)); .read = skel_read,
} .write = skel_write,
mutex_unlock(&misc_mtx); .open = skel_open,
return 0; .release = skel_release,
} .flush = skel_flush,
};
You can find more information on this if you check the
misc.c file. One thing that you may notice is that the file And further, they elucidated the operations and this helped
does not contain a chunk that checks if the minor number the USB device vendors to contribute easily to the project.

102  |  MAY 2010  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

Here is the code that handles 'read': 'none'. After mounting the filesystem, we can see the files
listed under it:
static ssize_t skel_read(struct file *file, char *buffer, size_t count, loff_t *ppos)
{ linux-el8p:/home/aasisvinayak # mount -t usbfs none /proc/bus/usb
struct usb_skel *dev; linux-el8p:/home/aasisvinayak # cd /proc/bus/usb
int retval; linux-el8p:/home/aasisvinayak # ls
int bytes_read; 001 002 003 004 005 006 007 devices
linux-el8p:/home/aasisvinayak #
dev = (struct usb_skel *)file->private_data;
You may note that the directories (the entries in
mutex_lock(&dev->io_mutex); monospace bold fonts in the above snippet) will lead you to
if (!dev->interface) { /* disconnect() was called */ the devices mounted. You can also find a file named devices,
retval = -ENODEV; which contains many entries in the following pattern:
goto exit;
} T: Bus=08 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
/* do a blocking bulk read to get data from the device */ D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
retval = usb_bulk_msg(dev->udev, P: Vendor=1d6b ProdID=0001 Rev= 2.06
usb_rcvbulkpipe(dev->udev, dev->bulk_in_endpointAddr), S: Manufacturer=Linux 2.6.31.12-0.1-pae uhci_hcd
dev->bulk_in_buffer, S: Product=UHCI Host Controller
min(dev->bulk_in_size, count), S: SerialNumber=0000:00:1d.2
&bytes_read, 10000); C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
/* if the read was successful, copy the data to userspace */ E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms
if (!retval) {
if (copy_to_user(buffer, dev->bulk_in_buffer, bytes_read)) These entries correspond to a list of all your USB devices
retval = -EFAULT; attached to your system. And here are the explanations of the
else code letters used in the file:
retval = bytes_read; • T – Topology related aspects
} • B – Bandwidth specific details
• D – Device descriptor information
exit: • P – Product ID and vendor information
mutex_unlock(&dev->io_mutex); • S – For adding any 'string descriptors' (see the code)
return retval; • C – Carries descriptions concerning configuration
} • I – Interface descriptions
• E – Endpoint descriptions
Having mentioned the drivers, let's focus our attention on Coming back to the directory structure, you may have
the USB filesystem (USBFS). This essentially tries to make noticed that it follows this pattern: /proc/bus/usb/BBB/. Here
the USB devices (that you have attached to your system) work 'BBB' is the number of the USB bus. And this number is used
like ordinary files in Linux. by the application for communicating with the USB device.
USBFS tracks all the USB devices that you are attaching As you can see in the above terminal snippet, the name starts
and removing from the system ('bus'). You may also note that the from 001 and goes up to the total number of devices.
filesystem was initially developed as an extension of the devfs
filesystem. And USBFS is quite similar to the /proc filesystem, if Meddling with USB devices
you consider the dynamic nature of both the filesystems. If you try to read the device file, the code will return the raw
An important point worth mentioning here is that we follow a USB descriptor. This includes the device descriptor (D) and
set of 'rules' by convention. Though your filesystem may still work the configuration descriptors (C). You can use ioctl calls if
properly even if you don't follow these. For example, you can mount you want to send or receive data. Here is the structure of
the filesystem anywhere, but the recommended location is /proc/ usbdevfs_ioctl:
bus/usb so that the user-space utilities are not affected. Also, while
mounting this filesystem, we use the following code (as the root): struct usbdevfs_ioctl {
int ifno; /* interface 0..N ; negative numbers reserved */
mount -t usbfs none /proc/bus/usb int ioctl_code; /* MUST encode size + direction of data so the
* macros in <asm/ioctl.h> give correct values */
Here, too, you can choose your own custom name instead void __user *data; /* param buffer (in, or out) */
of 'none'; some people prefer to use 'usbdevfs' instead of };

www.LinuxForU.com  |  LINUX For You  |  MAY 2010  |  103


A Voyage to the Kernel  |  Guest Column ______________________________________________________________________________

Another structure that you may find useful while writing * unless there is an application bug, nobody will be accessing this.
code is the bulk transfer: */
if (!list_empty(&ctx->open_devs))
struct usbdevfs_bulktransfer { usbi_warn(ctx, "application left some devices open");
unsigned int ep;
unsigned int len; usbi_io_exit(ctx);
unsigned int timeout; /* in milliseconds */ if (usbi_backend->exit)
void __user *data; usbi_backend->exit();
};
pthread_mutex_lock(&default_context_lock);
You may also find the 'names given to the constants' if (ctx == usbi_default_context) {
useful, while writing the drivers: usbi_dbg("freeing default context");
usbi_default_context = NULL;
#define USBDEVFS_URB_SHORT_NOT_OK 0x01 }
#define USBDEVFS_URB_ISO_ASAP 0x02 pthread_mutex_unlock(&default_context_lock);
#define USBDEVFS_URB_NO_FSBR 0x20
#define USBDEVFS_URB_ZERO_PACKET 0x40 free(ctx);
#define USBDEVFS_URB_NO_INTERRUPT 0x80 }

#define USBDEVFS_URB_TYPE_ISO 0 Writing code using libusb


#define USBDEVFS_URB_TYPE_INTERRUPT 1 Let us assume that we want to print the names of all the USB
#define USBDEVFS_URB_TYPE_CONTROL 2 devices attached to your system. Well, you can do that without
#define USBDEVFS_URB_TYPE_BULK 3 using libusb. But if you use the library, the code will appear
more compact.
How hard is it to write a USB driver? Let me write a simple code and demonstrate how to use
libusb to do this:
Since you have all these tools (especially, the ioctls,
and information about structures), you can start writing #include <stdio.h>
a USB device driver, provided you are a power user. I #include <sys/types.h>
know that beginners may find it hard to start off at this #include <libusb/libusb.h>
stage with just the information available in this article. So
I suggest you get a copy of the USB specifications—you static void print_all_devices(libusb_device **devs)
can ask Google to fetch the URL for you—so that you {
know where to start.
What about the programmers of intermediate level? //write all declarations here
Well, you may face various hurdles. But you don't have
to worry; I’m going to introduce you to a new library while ((mydevice = devs[i++]) != NULL) {
that resides on top of these tools and provides a simple struct libusb_device_descriptor desc;
interface—libusb! int retval = libusb_get_device_descriptor(mydevice, &desc);
You can get a copy of the library from http://www.libusb.
org/. Then you can use the following functions to perform all //perform check for retval
the basic tasks for you:
printf("%04x:%04x (bus %d, device %d)\n",
usb_init(); desc.idVendor, desc.idProduct,
usb_find_busses(); libusb_get_bus_number(mydevice), libusb_get_device_
usb_find_devices(); address(mydevice));
}
If you want to exit, you can just use the API provided by }
this and it will take care of all the housekeeping work for you:
int main(void)
API_EXPORTED void libusb_exit(struct libusb_context *ctx)
{ {
USBI_GET_CONTEXT(ctx); libusb_init(NULL);
usbi_dbg(""); cnt = libusb_get_device_list(NULL, &devs);

/* a little sanity check. doesn't bother with open_devs locking because //write code to the necessary checks for cnt

104  |  MAY 2010  |  LINUX For You  |  www.LinuxForU.com


______________________________________________________________________________ Guest Column  |  A Voyage to the Kernel

print_all_devices(devs);
libusb_free_device_list(devs, 1); Error codes returned by usb_submit_urb
libusb_exit(NULL);
return 0; Non-USB type
} 0 URB submission went fine
-ENOMEM no memory for allocation of internal structures
As you can see in this code, we initiated the use of the
libusb_init() function and called the inbuilt libusb_get_ USB related
device_list() function to get the list. Then we used a custom -ENODEV specified USB-device or bus doesn't exist
function, print_all_devices(), to print all the devices (devs). -ENOENT specified interface or endpoint does not exist or is
You may have noticed that this function, in turn, invoked the not enabled
libusb_device_descriptor() to get the descriptors and print -ENXIO host controller driver does not support queuing of
them. The only function that is obscure here is libusb_free_ this type of urb (treat it as a host controller bug)
device_list(), which frees a list of devices previously found. -EINVAL a) Invalid transfer type specified (or not supported)
Then, we safely exit the program. b) Invalid or unsupported periodic transfer interval
Isn't this quite easy? The only point that you need to be c) ISO: attempted to change transfer interval
aware of is to add a few declarations and perform some checks
d) ISO: number_of_packets is < 0
for the variables (see the comments in the code).
e) various other cases
You might be aware of the fact that all USB devices have
their own vendor and product identification values. So, if you -EAGAIN a) specified ISO started frame too early
want to refine the list to a particular device, all you need to do b) (using ISO-ASAP) too much scheduled for the
is perform another check for these two IDs: future, wait some time and try again.
-EFBIG Host controller driver can't schedule that many ISO
if ((mydevice->descriptor.idVendor == YOUR_ID1) && (mydevice->descriptor. frames.
idProduct == YOUR_ID2)) -EPIPE Specified endpoint is stalled. For non-control
endpoints, reset this status with usb_clear_halt().
{ -EMSGSIZE (a) endpoint maxpacket size is zero; it is not usable
return mydevice; in the current interface’s altsetting.
} (b) ISO packet is larger than the endpoint maxpacket.
(c) requested data transfer length is invalid: nega-
From our previous discussions, it is clear that this code is tive or too large for the host controller.
returning a pointer to the device (if the device is found). And -ENOSPC This request would overcommit the USB bandwidth
this actually corresponds to the structure usb_device that we reserved for periodic transfers (interrupt, isochronous).
saw earlier. -ESHUTDOWN The device or host controller has been disabled due
This means that you can now start doing basic operations to some problem that could not be worked around.
(you may refer to the file operations we described earlier). Let's -EPERM Submission failed because urb->reject was set.
say you want to open the device to initiate a 'communication'. -EHOSTUNREACH URB was rejected because the device is suspended.
You can perform the same, using the following lines of code:

usb_opr = usb_open(usb_dev);
if (usb_opr== NULL) { It will be helpful if you remember some of the error codes
fprintf(stderr, returned by usb_submit_urb (in case of an asynchronous
"Failed to open\n"); transfer request). This will help you to fix bugs in the codes.
goto exit; You can find the details in the box on “Error codes returned by
} usb_submit_urb”.
You are now ready to write drivers for your USB devices.
That's it! Go ahead and give it a try. If you need any further assistance
You might find it interesting to note that the usb_opr is in writing a good USB driver, please let me know.
of the type struct usb_dev_handle and thus libusb is able to Happy kernel hacking!
initiate the communication with a particular USB device. The
moment you issue that, it will perform all the initial steps By: Aasis Vinayak PG
required to start the communication. The author is a hacker and a free software activist who does
Once you are done with the communication, you can use programming in the open source domain. He is the developer
the following code to close it safely: of V-language—a programming language that employs AI
and ANN. His research work/publications are available at
www.aasisvinayak.com
usb_close(usb_handle);

www.LinuxForU.com  |  LINUX For You  |  MAY 2010  |  105