Вы находитесь на странице: 1из 128

Introduction to Operating

Systems: A Hands-On
Approach Using the
OpenSolaris Project
Instructor Guide

Sun Microsystems, Inc.


4150 Network Circle
Santa Clara, CA 95054
U.S.A.

Part No: 819–5580–12


August, 2007
Copyright 2007 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved.

Sun Microsystems, Inc. has intellectual property rights relating to technology embodied in the product that is described in this document. In particular,
and without limitation, these intellectual property rights may include one or more U.S. patents or pending patent applications in the U.S. and in other
countries.
U.S. Government Rights – Commercial software. Government users are subject to the Sun Microsystems, Inc. standard license agreement and
applicable provisions of the FAR and its supplements.
This distribution may include materials developed by third parties.
Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S.
and other countries, exclusively licensed through X/Open Company, Ltd.
Sun, Sun Microsystems, the Sun logo, the Solaris logo, the Java Coffee Cup logo, docs.sun.com, Java, and Solaris are trademarks or registered trademarks
of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of
SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun
Microsystems, Inc.
The OPEN LOOK and SunTM Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the
pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a
non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun's licensees who implement OPEN LOOK GUIs
and otherwise comply with Sun's written license agreements.
Products covered by and information contained in this publication are controlled by U.S. Export Control laws and may be subject to the export or
import laws in other countries. Nuclear, missile, chemical or biological weapons or nuclear maritime end uses or end users, whether direct or indirect,
are strictly prohibited. Export or reexport to countries subject to U.S. embargo or to entities identified on U.S. export exclusion lists, including, but not
limited to, the denied persons and specially designated nationals lists is strictly prohibited.
DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.

Copyright 2007 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. Tous droits réservés.
Sun Microsystems, Inc. détient les droits de propriété intellectuelle relatifs à la technologie incorporée dans le produit qui est décrit dans ce document.
En particulier, et ce sans limitation, ces droits de propriété intellectuelle peuvent inclure un ou plusieurs brevets américains ou des applications de
brevet en attente aux Etats-Unis et dans d'autres pays.
Cette distribution peut comprendre des composants développés par des tierces personnes.
Certaines composants de ce produit peuvent être dérivées du logiciel Berkeley BSD, licenciés par l'Université de Californie. UNIX est une marque
déposée aux Etats-Unis et dans d'autres pays; elle est licenciée exclusivement par X/Open Company, Ltd.
Sun, Sun Microsystems, le logo Sun, le logo Solaris, le logo Java Coffee Cup, docs.sun.com, Java et Solaris sont des marques de fabrique ou des marques
déposées de Sun Microsystems, Inc. aux Etats-Unis et dans d'autres pays. Toutes les marques SPARC sont utilisées sous licence et sont des marques de
fabrique ou des marques déposées de SPARC International, Inc. aux Etats-Unis et dans d'autres pays. Les produits portant les marques SPARC sont
basés sur une architecture développée par Sun Microsystems, Inc.
L'interface d'utilisation graphique OPEN LOOK et Sun a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun reconnaît les
efforts de pionniers de Xerox pour la recherche et le développement du concept des interfaces d'utilisation visuelle ou graphique pour l'industrie de
l'informatique. Sun détient une licence non exclusive de Xerox sur l'interface d'utilisation graphique Xerox, cette licence couvrant également les
licenciés de Sun qui mettent en place l'interface d'utilisation graphique OPEN LOOK et qui, en outre, se conforment aux licences écrites de Sun.
Les produits qui font l'objet de cette publication et les informations qu'il contient sont régis par la legislation américaine en matière de contrôle des
exportations et peuvent être soumis au droit d'autres pays dans le domaine des exportations et importations. Les utilisations finales, ou utilisateurs
finaux, pour des armes nucléaires, des missiles, des armes chimiques ou biologiques ou pour le nucléaire maritime, directement ou indirectement, sont
strictement interdites. Les exportations ou réexportations vers des pays sous embargo des Etats-Unis, ou vers des entités figurant sur les listes
d'exclusion d'exportation américaines, y compris, mais de manière non exclusive, la liste de personnes qui font objet d'un ordre de ne pas participer,
d'une façon directe ou indirecte, aux exportations des produits ou des services qui sont régis par la legislation américaine en matière de contrôle des
exportations et la liste de ressortissants spécifiquement designés, sont rigoureusement interdites.
LA DOCUMENTATION EST FOURNIE "EN L'ETAT" ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES OU
TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT TOUTE
GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L'APTITUDE A UNE UTILISATION PARTICULIERE OU A L'ABSENCE
DE CONTREFACON.

070816@18135
Instructor Notes

Contents

1 What is the OpenSolaris Project? ............................................................................................. 7


Country Portals ............................................................................................................................9
Web Resources for OpenSolaris ..............................................................................................10
Discussions .................................................................................................................................10
Communities .............................................................................................................................10
Projects ........................................................................................................................................11
Source Repositories ...................................................................................................................12
OpenGrok ...................................................................................................................................12

2 OpenSolaris Advocacy .............................................................................................................13


Why Use OpenSolaris? ..............................................................................................................14
Price .............................................................................................................................................14
Innovative Core Features ..........................................................................................................14
Backward Compatibility ...........................................................................................................15
Hardware Platform Neutrality .................................................................................................15
Development Tools ...................................................................................................................16
Acknowledgments .....................................................................................................................17

3 Planning the OpenSolaris Environment ............................................................................... 19


Development Environment Configuration ............................................................................20
Networking .................................................................................................................................21
Network Auto-Configuration Daemon ..................................................................................23
Zone Overview ...........................................................................................................................24
Zones Administration ...............................................................................................................25
Getting Started With Zones Administration ..................................................................26

3
Contents
Instructor Notes
Web Server Virtualization With Zones ...................................................................................29
Creating Non-Global Zones ..............................................................................................30
Creating ZFS Storage Pools and File Systems .........................................................................34
Creating a Mirrored ZFS Storage Pool ....................................................................................35
Creating ZFS File Systems as Home Directories ....................................................................37
Creating a RAID-Z Configuration ..........................................................................................40

4 Userland Consolidations .........................................................................................................43


Userland Consolidations and Descriptions ...........................................................................44

5 Core Features of the Solaris OS ............................................................................................... 45


Development Process and Coding Style .................................................................................46
Overview .....................................................................................................................................49
FireEngine ..................................................................................................................................49
Least Privilege ............................................................................................................................51
Packet Filtering ..........................................................................................................................52
Zones ...........................................................................................................................................54
Branded Zones (BrandZ) ..........................................................................................................55
Zones Networking .....................................................................................................................56
Predictive Self-Healing ..............................................................................................................58
Dynamic Tracing (DTrace) ......................................................................................................59
Modular Debugger (MDB) .......................................................................................................60
ZFS File System ..........................................................................................................................60
Services Management Facility (SMF) ......................................................................................61

6 Programming Concepts ...........................................................................................................63


Process and System Management ............................................................................................64
Threaded Programming ...........................................................................................................64
CPU Scheduling .........................................................................................................................67
Kernel Overview ........................................................................................................................69
Process Debugging ....................................................................................................................72

Contents 4
Contents
Instructor Notes
7 Getting Started With DTrace ................................................................................................... 75
Enabling Simple DTrace Probes ..............................................................................................76
Listing Traceable Probes ...........................................................................................................79
Programming in D .....................................................................................................................82

8 Debugging Applications With DTrace ................................................................................... 85


Enabling User Mode Probes .....................................................................................................86
DTracing Applications ......................................................................................................87

9 Debugging C++ Applications With DTrace ........................................................................... 91


Using DTrace to Profile and Debug A C++ Program ............................................................92

10 Managing Memory with DTrace and MDB .......................................................................... 103


Software Memory Management ............................................................................................104
Using DTrace and MDB to Examine Virtual Memory .......................................................105

11 Debugging Drivers With DTrace ........................................................................................... 117


Porting the smbfs Driver from Linux to the Solaris OS ......................................................118

A OpenSolaris Resources ..........................................................................................................127

5
6
Instructor Notes

1
M O D U L E 1

What is the OpenSolaris Project?

Objectives
The objective of this course is to learn about operating system computing by
using the SolarisTM Operating System source code that is freely available through
the OpenSolaris project.

Tip – To receive a free OpenSolaris Starter Kit that includes training materials,
source code, and developer tools, register online at
http://get.opensolaris.org.

We'll start by showing you the user groups, portals, and documentation you will
use to get started with UNIX® computing. Next, we'll show you where to go to
access the code, communities, discussions, projects, and source browser for the
OpenSolaris project. Then, we'll give you steps to configure zones, ZFS,
networking, and the environment. Finally, we'll demonstrate debugging
processes, applications, page faults, and device drivers with DTrace in the lab
exercises.

The OpenSolaris project was launched on June 14, 2005 to create a community
development effort using the Solaris OS code as a starting point. It is a nexus for a
community development effort where contributors from Sun and elsewhere can
collaborate on developing and improving operating system technology. The
OpenSolaris source code will find a variety of uses, including being the basis for
future versions of the Solaris OS product, other operating system projects,

7
What is the OpenSolaris Project?
Instructor Notes
third-party products and distributions of interest to the community. The
OpenSolaris project is currently sponsored by Sun Microsystems, Inc.

In the first two years, over 60,000 participants have become registered members.
The engineering community is continually growing and changing to meet the
needs of developers, system administrators, and end users of the Solaris
Operating System.

Teaching with the OpenSolaris project provides the following advantages over
instructional operating systems:
■ Access to code for the revolutionary technologies in the Solaris 10 operating
system
■ Access to code for a commercial OS that is used in many environments and
that scales to large systems
■ Superior observability and debugging tools
■ Hardware platform support including SPARC, x86 and x64 architectures
■ Leadership on 64–bit computing
■ $0.00 for infinite right-to-use
■ Free, exciting, innovative, complete, seamless, and rock-solid code base
■ Availability under the OSI-approved Common Development and
Distribution License (CDDL) allows royalty-free use, modification, and
derived works

Module 1 • What is the OpenSolaris Project? 8


Country Portals
Instructor Notes

Country Portals
The Internationalization and Localization Community is helping to translate the
OpenSolaris English web site into many languages. So far, eight country portals
are under development, as follows:
■ India portal – http://in.opensolaris.org
■ China portal – http://cn.opensolaris.org
■ Japan portal – http://jp.opensolaris.org
■ Poland portal – http://pl.opensolaris.org
■ France portal – http://fr.opensolaris.org
■ Brazil Portal – http://opensolaris.org/os/project/br
■ Spanish Portal – http://opensolaris.org/os/project/es

Portals for Germany, Russia, Czech Republic, Spain, Korea, and Mexico are
planned. See the OpenSolaris Portals project to get involved, or chat on one of the
seven OpenSolaris chat rooms using IRC at irc.freenode.net. See
http://opensolaris.org/os/chat/

Module 1 • What is the OpenSolaris Project? 9


Web Resources for OpenSolaris
Instructor Notes

Web Resources for OpenSolaris


You can download the OpenSolaris source, view the license terms and access
instructions for building source and installing the pre-built archives at:
http://www.opensolaris.org/os/downloads.

The icons in the upper-right of the OpenSolaris web pages link you to
discussions, communities, projects, downloads, and source browser resources.

In addition, the OpenSolaris web site provides search across all of the site content
and aggregated blogs.

Discussions
Discussions provide you with access to the experts who are working on new open
source technologies. Discussions also provide an archive of previous
conversations that you can reference for answers to your questions. See
http://www.opensolaris.org/os/discussions for the complete list of forums
to which you can subscribe.

Communities
Communities provide connections to other participants with similar interests in
the OpenSolaris project. Communities form around interest groups,
technologies, support, tools, and user groups, for example:

Academic and http://www.opensolaris.org/os/community/edu


Research

DTrace http://www.opensolaris.org/os/community/dtrace

ZFS http://www.opensolaris.org/os/community/zfs

Networking http://www.opensolaris.org/os/community/networking

Zones http://www.opensolaris.org/os/community/zones

Documentation http://www.opensolaris.org/os/community/documentation

Device Drivers http://www.opensolaris.org/os/community/device_drivers

Module 1 • What is the OpenSolaris Project? 10


Web Resources for OpenSolaris
Instructor Notes

Tools http://www.opensolaris.org/os/community/tools ➊ Sun intends to have non-Sun contributors and wants


Advocates http://www.opensolaris.org/os/community/advocacy to foster collaborations across industrial and academic
affiliations.
Security http://www.opensolaris.org/os/community/security

Performance http://www.opensolaris.org/os/community/performance ➋ The OpenSolaris project will empower and expand the
Storage http://www.opensolaris.org/os/community/storage
existing Solaris community.

System Administrators http://www.opensolaris.org/os/community/sysadmin ➌ The OpenSolaris project will allow for the creation of
entirely new communities.
These are only a few of 30 communities actively working on OpenSolaris. See
➍ Projects give you the opportunity to share files, disk
➊–➌ http://www.opensolaris.org/os/communities for the complete list.
space, and an email alias.

➎ You can collaborate with other engineers across the


Projects globe to work on a new technology or an
improvement to existing technology.
Projects hosted on the http://www.opensolaris.org/ web site are collaborative
efforts that produce objects such as code changes, documents, graphics, or
joint-authored products. Projects have code repositories and committers and
may live within a community or independently.

New projects are initiated by participants by request on the discussions. Projects


that are submitted and accepted by at least one other interested participant in the
sponsoring community are given space on the projects page to get started. See
http://www.opensolaris.org/os/projects for the current list of new projects.

➍–➎ Chime Visualization http://www.opensolaris.org/os/project/dtrace-chime


Tool for DTrace

Google Summer of http://www.opensolaris.org/os/project/powerPC


Code

Indiana http://www.opensolaris.org/os/project/indiana

OpenGrok http://www.opensolaris.org/os/project/opengrok

Programming Contest http://www.opensolaris.org/os/project/contest

Starter Kit http://www.opensolaris.org/os/project/starterkit

Solaris iSCSI Target http://www.opensolaris.org/os/project/iscsitgt

Module 1 • What is the OpenSolaris Project? 11


Web Resources for OpenSolaris
Instructor Notes

Source Repositories
Centralized and distributed source repositories are hosted on the
opensolaris.org web site. The centralized source management model uses the
Subversion (SVN) source control management program. Repositories managed
in a distributed fashion use the Mercurial (hg) source control management
program.

The creation of a source repository on opensolaris.org is completed by a


Project Leader by using the Project web pages. Developers with commit rights
access repositories through their opensolaris.org accounts. Commit rights are
managed by Project Leaders. If you need an account, you may sign up to acquire
one. Additionally, you will have to provide a Secure Shell (SSH) public key. Refer
to the tools community for the most recent source control information,
downloads and instructions http://opensolaris.org/os/community/tools

OpenGrok
OpenGrokTM is the fast and usable source code search and cross reference engine
used in OpenSolaris. See http://cvs.opensolaris.org/source to try it out!

The first project to be hosted on opensolaris.org was OpenGrok. See


http://www.opensolaris.org/os/project/opengrok to find out about the
ongoing development project.

Take an online tour of the source and you'll discover cleanly written, extensively
commented code that reads like a book. If you're interested in working on an
OpenSolaris project, you can download the complete codebase. If you just need to
know how some features work in the Solaris OS, the source code browser
provides a convenient alternative. OpenGrok understands various program file
formats and version control histories like SCCS, RCS, and CVS, so that you can
better understand the open source.

Module 1 • What is the OpenSolaris Project? 12


Instructor Notes

2
M O D U L E 2

OpenSolaris Advocacy

Objectives
The Advocates Community exists to help people around the world get involved
in the OpenSolaris Community. We welcome participation from people of all
languages and cultures and people with all levels of technical and non-technical
skills. Everyone has something to contribute.

See http://opensolaris.org/os/community/advocacy/

In the Advocates community you will find independent user group projects,
presentations, news, articles, blogs, technical & non-technical content, videos
and podcasts, events and conferences, community metrics, swag, badges,
buttons, and a variety of other promotional projects.

13
Why Use OpenSolaris?
Instructor Notes

Why Use OpenSolaris?


This section describes practical reasons to consider choosing to use OpenSolaris
as your development platform.

Price
Since the availability of the Solaris 10 Operating System in January 2005, its
popularity has exploded. As of July 2007, in excess of 8.7 million copies were
registered, more than all of the previous versions of the Solaris OS combined.
Further fueling this frenzy was the release of OpenSolaris in June 2005. Given this
surge in the number of users, more developers (commercial and open-source
alike) are seeing the Solaris operating system as a viable target for their software.

One of the reasons the Solaris OS enjoyed a huge popularity boost was its price:
$0 for everyone, for any use (commercial and non-commercial), on any machine
(using both SPARC and x86 platforms). Another reason was Sun's promise (and
delivering on that promise) of making the Solaris source code available under an
OSI-approved open-source license, the Common Development and Distribution
License (CDDL).

Innovative Core Features


However, the most important reason for the popularity of the Solaris OS is
arguably the vast wealth of features it offers. In no particular order, these include
the following:
■ Solaris Zones– Provide the ability to partition a machine into numerous
virtual machines, each of which is isolated from the others.
■ DTrace – A comprehensive dynamic tracing tool for investigating system
behavior, safely on production machines.
■ New IP stack– Providing vastly increased performance.
■ ZFS– A 128-bit, state-of-the-art file system, with end-to-end error checking
and correction, a simple command-line interface, and virtually limitless
storage capacity.

Module 2 • OpenSolaris Advocacy 14


Why Use OpenSolaris?
Instructor Notes

Backward Compatibility
All of these features build on what long-time Solaris OS users have come to
expect: rock-solid stability, huge scalability, high performance, and guaranteed
backwards compatibility. The last of these is especially important to commercial
software developers, because maintenance is usually the largest expense
associated with a piece of software. With its backwards compatibility guarantee,
software vendors know that (provided they use only published APIs) software
built for Solaris N will run correctly on Solaris N+1 and subsequent versions.
(Contrast this with some other operating systems, where incompatible changes to
system components -- for example, libraries -- are made without regard to the
effect on applications. The net effect is application breakage, resulting in
increased maintenance costs and frustration for application vendors and users.)

Hardware Platform Neutrality


The preceding paragraphs give some reasons why we should develop for the
Solaris OS, but there are additional reasons to develop on the Solaris platform.
One is that Solaris is a multi-OSOplatform OS, supporting both SPARC and x86
architectures (a community-driven port to Power is in the works, too). Although
there was an issue a few years ago with the Solaris OS for x86 platforms, the fact
that Sun has introduced a range of AMD-based servers and workstations
demonstrates the company's commitment to x86 technology.

From the developer's perspective, the Solaris versions for SPARC and x86
platforms have the same feature set and APIs. This means that developers can
concentrate on the other issues endemic to cross-platform development, like
CPU endianness. The SPARC platform is big-endian and x86 is little-endian, so
an application that is developed and tested on the Solaris platform has a high
probability of being free from endian-related problems. The Solaris OS also
supports both 32-bit and 64-bit applications on both platforms, thus helping to
eliminate bugs due to assumptions about word size.

Perhaps the most compelling reason to develop software on the Solaris OS is the
wealth of professional-grade development tools available for it.

Module 2 • OpenSolaris Advocacy 15


Why Use OpenSolaris?
Instructor Notes

Development Tools
One of the most important features of an OS from a developer's point of view is
the variety and quality of the development tools available. Compilers and
debuggers are the most obvious examples of these tools, but other examples
include code checkers (to ensure that our code is free from subtle errors the
compiler might not catch), cross-reference generators (to see which functions
reference other functions and variables), and performance analyzers.

The Sun Studio suite is the product of choice for Solaris OS developers. Available
as a free download from the http://developers.sun.com web site, Sun Studio
software is a collection of professional-grade compilers and tools. It includes C,
C++, and FORTRAN compilers, code analysis tools, an integrated development
environment (IDE), the dbx source-level debugger, and editors. Other tools
included with Studio software are cscope (an interactive source browser), ctrace
(a tool to generate a self-tracing version of our programs), cxref (a C program
cross-referencer), dmake (for distributed parallel makes), and lint (the C
program checker).

The Solaris OS ships with the GNU C compiler, gcc, and its companion
source-level debugger, gdb. The Solaris OS also ships with the very powerful
modular debugger, mdb. However, mdb is not a source-level debugger. It is most
useful when we are debugging kernel code, or performing post-mortem analysis
on programs for which the source is not available. See the Solaris Modular
Debugger Guide and Solaris Performance and Tools by McDougall, Mauro, and
Gregg for more information about mdb.

Module 2 • OpenSolaris Advocacy 16


Acknowledgments
Instructor Notes

Acknowledgments
The following members of the OpenSolaris Community reviewed and provided
feedback on this document:
■ Boyd Adamson
■ Pradhap Devarajan
■ Alan Coopersmith
■ Brian Gupta
■ Rainer Heilke
■ Eric Lowe
■ Ben Rockwood
■ Cindy Swearingen

The following OpenSolaris community members provided excellent new


content:
■ Dong-Hai Han
■ Narayana Janga
■ Shivani Khosa
■ Rich Teer
■ Sunay Tripathi
■ Yifan Xu

Many thanks also go to Steven Cogorno, David Comay, Teresa Giacomini,


Stephen Hahn, Patrick Finch, and Sue Weber for their work to make the initial
version possible.

To participate in future reviews of this document, use the instructions at the


following URL:

http://www.opensolaris.org/os/community/documentation/reviews

Module 2 • OpenSolaris Advocacy 17


18
Instructor Notes

3
M O D U L E 3

Planning the OpenSolaris Environment

Objectives
The objective of this module is to understand the system requirements, support
information, and documentation available for the OpenSolaris project
installation and configuration.

Additional Resources
■ Solaris Express Developer Edition Installation Guide: Laptop Installations. Sun
Microsystems, Inc., 2007.
■ Resources for Running Solaris OS on a Laptop:
http://www.sun.com/
bigadmin/features/articles/laptop_resources.html
■ OpenSolaris Laptop Community:
http://opensolaris.org/os/community/laptop
■ OpenSolaris Starter Kit: http://opensolaris.org/os/project/starterkit
■ System Administration Guide: IP Services, Sun Microsystems, Inc., 2007
■ OpenSolaris Networking Community at
http://www.opensolaris.org/os/community/networking
■ ZFS Administration Guide and man pages:
http://opensolaris.org/os/community/zfs/docs

19
Development Environment Configuration
Instructor Notes

Development Environment Configuration ➊ The OpenSolaris 64-bit kernel is seamless: 32-bit


applications run unmodified on it.
There is no substitute for hands-on experience with operating system code and
direct access to kernel modules. The unique challenges of kernel development ➋ One may alternate between the 32-bit and 64-bit
and access to root privileges for a system are made simpler by the tools, forums, kernel with only a reboot.
and documentation provided for the OpenSolaris project.
➌ All of the architectures supported by the OpenSolaris
project are built from the source code basis. The 64-bit
Tip – To receive an OpenSolaris Starter Kit that includes training materials, source kernel isn't a separate version or variant of the system.
code, and developer tools, go to http://get.opensolaris.org.
➍ 32-bit and 64-bit applications and libraries coexist
seamlessly.
➊–➍ Consider the following features of OpenSolaris as you plan your development
environment:

TABLE 3–1 Configurable Lab Component Support

Configurable Component Support From the OpenSolaris Project

Hardware OpenSolaris supports systems that use the SPARC® and x86 families of processor
architectures: UltraSPARC®, SPARC64, AMD64, Pentium, and Xeon EM64T.
For supported systems, see the Solaris OS Hardware Compatibility List at
http://www.sun.com/bigadmin/hcl.

Source files See http://opensolaris.org/os/downloads for detailed instructions about


how to build from source.

Install images Pre-built OpenSolaris distributions are limited to the Solaris Express:
Community Edition [DVD Version], Build 32 or newer, Solaris Express:
Developer Edition, Nexenta, Schillix, Martux and Belenix.
For the OpenSolaris kernel with the GNU user environment, try
http://www.gnusolaris.org/gswiki/Download-form.

BFU archives The on-bfu-DATE.PLATFORM.tar.bz2 file is provided if you are installing from
pre-built archives.

Build tools The SUNWonbld-DATE.PLATFORM.tar.bz2 file is provided if you build from


source.

Compilers and tools Sun Studio 11 compilers and tools are freely available for use by OpenSolaris
developers. See
http://www.opensolaris.org/os/community/tools/sun_studio_tools/ for
instructions about how to download and install the latest versions. Also, refer to
http://www.opensolaris.org/os/community/tools/gcc for the gcc
community.

Module 3 • Planning the OpenSolaris Environment 20


Development Environment Configuration
Instructor Notes
TABLE 3–1 Configurable Lab Component Support (Continued) ➊ Problem: machines are underutilized; utilization can
Configurable Component Support From the OpenSolaris Project
be increased through virtualization with Zones. Each
Memory/Disk ■ Memory requirement: 256M minimum (text installer zone looks, feels, and smells like its own machine, you
Requirements only), 1GB recommended can even reboot them!
■ Memory Requirement: 768M minimum Solaris Express
Developer Edition installer. ➋ Most other virtualization technologies virtualize at the
■ Disk space requirement: 350M bytes hardware layer.
Virtual OS Zones and Branded Zones in OpenSolaris provide protected and virtualized
environments operating system environments within an instance of Solaris, allowing one or
➌ Zones are a new facility in OpenSolaris that instead
more processes to run in isolation from other activity on the system. virtualizes at the operating system layer.
OpenSolaris supports Xen, an open-source virtual machine monitor developed
by the Xen team at the University of Cambridge Computer Laboratory. See
http://www.opensolaris.org/os/community/xen/ for details and links to the
Xen project.
OpenSolaris is also a VMWareTM guest, see
http://www.opensolaris.org/os/project/content for a recent article
describing how to get started.

➊–➌ Refer to Module 4 for more information about how Zones and Branded Zones
enable kernel and user mode development of Solaris and Linux applications
without impacting developers in separate zones.

Participation in the OpenSolaris project can improve overall performance across


your network with the latest technologies. Your lab environment becomes
self-sustaining when hosted on OpenSolaris because you are always running the
latest and greatest environment, empowered to update it yourself.

Networking
The OpenSolaris project meets future networking challenges by radically
improving your network performance without requiring changes to your existing
applications.
■ Speeds application performance by about 50 percent by using an enhanced
TCP/IP stack
■ Supports many of the latest networking technologies, such as 10 Gigabit
Ethernet, wireless networking, and hardware offloading

Module 3 • Planning the OpenSolaris Environment 21


Development Environment Configuration
Instructor Notes
■ Accommodates high-availability, streaming, and Voice over IP (VoIP)
networking features through extended routing and protocol support
■ Supports current IPv6 specifications

Find out more about ongoing networking developments from the OpenSolaris
Networking Community:
http://www.opensolaris.org/os/community/networking.

Module 3 • Planning the OpenSolaris Environment 22


Network Auto-Configuration Daemon
Instructor Notes

Network Auto-Configuration Daemon


The Solaris Express Developer Edition 5/07 release booting process runs the
nwamd daemon. This daemon implements an alternate instance of the SMF
service, svc:/network/physical, which enables automated network
configuration with minimal intervention.

The nwamd daemon monitors the Ethernet port and automatically enables DHCP
on the appropriate IP interface. If no cable is plugged into a wired network, the
nwamd daemon conducts a wireless scan and sends queries to the user for a WiFi
access point to connect to.

You don't need to spend extensive amounts of time manually configuring the
interfaces on your systems. Automatic configuration also aids you in
administration, because you can reconfigure network addresses with minimal
intervention.

To view your NWAM status, type the following command in a terminal window.

# svcs nwam

STATE STIME FMRI

online 11:29:50 svc:/network/physical:nwam

The OpenSolaris Network Auto-Magic Phase 0 page and nwamd man page contain
further details, including instructions for turning off the nwamd daemon, if
preferred. For more information and a link to the nwamd(1M) man page, see
http://www.opensolaris.org/os/project/nwam.

Module 3 • Planning the OpenSolaris Environment 23


Zone Overview
Instructor Notes

Zone Overview
A zone can be thought of as a container in which one or more applications run
isolated from all other applications on the system. Most software that runs on
OpenSolaris will run unmodified in a zone. Since zones do not change the
OpenSolaris Application Programming Interface (APIs) or Application Binary
Interface (ABI), recompiling an application is not necessary in order to run it
inside a zone.

Module 3 • Planning the OpenSolaris Environment 24


Zones Administration
Instructor Notes

Zones Administration
Zone administration consists of the following commands:
■ zonecfg – Creates zones, configures zones (add resources and properties).
Stores the configuration in a private XML file under /etc/zones.
■ zoneadm – Performs administrative steps for zones such as list, install,
(re)boot, and halt.
■ zlogin – Allows user to log in to the zone to perform maintenance tasks.
■ zonename – Displays the current zone name.

The following global scope properties are used with zones:


■ zonepath – Path in the global zone to the root directory under which the zone
will be installed
■ autoboot – To boot or not to boot when global zone boots
■ pool – Resource pools to which zones should be bound
Resources may include any of the following types:
■ fs – file system
■ Inherit-pkg-dir – Directory that has its associated packages inherited
from the global zone
■ net – Network device
■ device – Devices

Module 3 • Planning the OpenSolaris Environment 25


Getting Started With Zones Administration
Instructor Notes

Getting Started With Zones Administration


This lab exercise will introduce you to creating zones.

Summary
This exercise uses detailed examples to help you understand the process of
creating, installing, and booting a zone.

Note – This procedure does not apply to an lx branded zone.

Module 3 • Planning the OpenSolaris Environment 26


Getting Started With Zones Administration
Instructor Notes

To Create, Install, and Boot a Zone


1 Use the following example to configure your new zone:

Note – The following example uses a shared-IP stack, which is the default for a
zone.

# zonecfg -z Apache
Apache: No such zone configured
Use ’create’ to begin configuring a new zone.
zonecfg:Apache> create
zonecfg:Apache> set zonepath=/export/home/Apache
zonecfg:Apache> add net
zonecfg:Apache:net> set address=192.168.0.50
zonecfg:Apache:net> set physical=bge0
zonecfg:Apache:net> end
zonecfg:Apache> verify
zonecfg:Apache> commit
zonecfg:Apache> exit

2 Use the following example to install and boot your new zone:
# zoneadm -z Apache install
Preparing to install zone <Apache>.
Creating list of files to copy from the global zone.
Copying <6029> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <1038> packages on the zone.
Initialized <1038> packages on zone.
Zone <Apache> is initialized.
Installation of these packages generated warnings: ....
The file </export/home/Apache/root/var/sadm/system/logs/install_log>
contains a log of the zone installation.

The necessary directories are created. The zone is ready for booting.

3 View the directories:


# ls /export/home/Apache/root
bin etc home mnt platform sbin
tmp var dev export lib opt
proc system usr

Module 3 • Planning the OpenSolaris Environment 27


Getting Started With Zones Administration
Instructor Notes
Packages are not reinstalled.

# /etc/mount
/export/home/Apache/root/lib on /lib read only
/export/home/Apache/root/platform on /platform read only
/export/home/Apache/root/sbin on /sbin read only
/export/home/Apache/root/usr on /usr read only
/export/home/Apache/root/proc on proc
read/write/setuid/nodevices/zone=Apache

4 Boot the zone.


# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL>
mtu 8232 index 1 inet 127.0.0.1 netmask ff000000
bge0: flags=1004803<UP,BROADCAST,MULTICAST,DHCP,IPv4> mtu 1500 index 2
inet 192.168.0.4 netmask ffffff00 broadcast 192.168.0.255
ether 0:c0:9f:61:88:c9
# zoneadm -z Apache boot
# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL>
mtu 8232 index 1 inet 127.0.0.1 netmask ff000000
lo0:1: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL>
mtu 8232 index 1 zone Apache inet 127.0.0.1
bge0: flags=1004803 inet 192.168.0.4 netmask ffffff00 broadcast
192.168.0.255 ether 0:c0:9f:61:88:c9
bge0:1: flags=1000803mtu 1500 index 2 zone Apache inet
192.168.0.50 netmask ffffff00 broadcast 192.168.0.255

5 Configure the zone and login:


# zlogin -C Apache
[Connected to zone ’Apache’ pts/5]
# ifconfig -a
lo0:2: flags=2001000849 mtu 8232 index 1 inet 127.0.0.1
netmask ff000000
bge0:2: flags=1000803 inet 192.168.0.50 netmask ffffff00
broadcast 192.168.0.255
# ping -s 192.168.0.50
64 bytes from 192.168.0.50: icmp_seq=0. time=0.146 ms
# exit
[Connection to zone ’Apache’ pts/5 closed]

Module 3 • Planning the OpenSolaris Environment 28


Web Server Virtualization With Zones
Instructor Notes

Web Server Virtualization With Zones


Each zone has its own characteristics, for example, zonename, IP addresses,
hostname, naming services, root and non-root users. By default, the OS runs in a
global zone. The administrator can virtualize the execution environment by
defining one or more non-global zones. Network services can be run limiting the
damage possible in the event of security violation. Since zones are implemented
in software, they aren't limited to granularity defined by hardware boundaries.
Instead zones offer sub-CPU granularity.

Module 3 • Planning the OpenSolaris Environment 29


Creating Non-Global Zones
Instructor Notes

Creating Non-Global Zones


This lab exercise will demonstrate how to support two different sets of web server
user groups on one physical host.

Summary
Simultaneous access to both web servers will be configured so that each web
server and system will be protected should one become compromised.

Module 3 • Planning the OpenSolaris Environment 30


Creating Non-Global Zones
Instructor Notes

Creating Two Non-Global Zones


1 Create non-global zone Apache1:
# zonecfg -z Apache1 info
zonepath: /export/home/Apache1
autoboot: false
pool:
inherit-pkg-dir: dir: /lib
inherit-pkg-dir: dir: /platform
inherit-pkg-dir: dir: /sbin
inherit-pkg-dir: dir: /usr
net: address: 192.168.0.100/24
physical: bge0

2 Create non-global zone Apache2:


# zonecfg -z Apache2 info
zonepath: /export/home/Apache2
autoboot: false
pool:
inherit-pkg-dir: dir: /lib
inherit-pkg-dir: dir: /platform
inherit-pkg-dir: dir: /sbin
inherit-pkg-dir: dir: /usr
net: address: 192.168.0.200/24
physical: bge0

3 Log in to Apache1 and install the application:


# zlogin Apache1
# zonename
Apache1
# ls /Apachedir
apache_1.3.9 apache_1.3.9-i86pc-sun-solaris2.270.tar
#cd /Apachedir/apache_1.3.9 ; ./install-bindist.sh /local
You now have successfully installed the Apache 1.3.9 HTTP server.

4 Log in to Apache2 and install the application:


# zlogin Apache2
# zonename
Apache2
# ls /Apachedir
httpd-2.0.50 httpd-2.0.50-i386-pc-solaris2.8.tar
# cd /Apachedir/httpd-2.0.50; ./install-bindist.sh /local
You now have successfully installed the Apache 2.0.50 HTTP server.

Module 3 • Planning the OpenSolaris Environment 31


Creating Non-Global Zones
Instructor Notes
5 Start the Apache1 application:
# zonename
Apache1
# hostname
Apache1zone
# /local/bin/apachectl start
/local/bin/apachectl start: httpd started

6 Start the Apache2 application:


# zonename
Apache2
# hostname
Apache2zone
# /local/bin/apachectl start
/local/bin/apachectl start: httpd started

7 In the global zone, edit /etc/hosts file:


# cat /etc/hosts
#
# Internet host table
#
127.0.0.1 localhost
192.168.0.1 loghost
192.168.0.100 Apache1zone
192.168.0.200 Apache2zone

8 Open a web browser and navigate to the following URL:


http://apache1zone/manual/index.html
The Apache1 web server is up and running.

9 Open a web browser and navigate to the following URL:

10 http://apache2zone/manual/
The Apache2 web server is up and running.

Module 3 • Planning the OpenSolaris Environment 32


Creating Non-Global Zones
Instructor Notes

Discussion
The end user sees each zone as a different system. Each web server has it's own
name service:
■ /etc/nsswitch.conf
■ /etc/resolv.conf

A malicious attack on one web server is contained to that zone. Port conflicts are
no longer a problem!

Module 3 • Planning the OpenSolaris Environment 33


Creating ZFS Storage Pools and File Systems
Instructor Notes

Creating ZFS Storage Pools and File Systems ➊ The most basic building block for a storage pool is a
piece of physical storage. This can be any block device
Each ZFS storage pool is comprised of one or more virtual devices, which of at least 128 Mbytes in size. Typically, this is a hard
describe the layout of physical storage and its fault characteristics. drive that is visible to the system in the /dev/dsk
directory.
➊–➋ In this module, we'll start by learning about mirrored storage pool configuration.
➌–➏ Then we'll show you how to create a RAID-Z configuration. ➋ A storage device can be a whole disk (c0t0d0) or an
individual slice (c0t0d0s7). The recommended mode
of operation is to use an entire disk, in which case the
disk does not need to be specially formatted. ZFS
formats the disk using an EFI label to contain a single,
large slice.

➌ ZFS uses the concept of storage pools to manage


physical storage. Historically, file systems were
constructed on top of a single physical device. To
address multiple devices and provide for data
redundancy, the concept of a volume manager was
introduced to provide the image of a single device so
that file systems would not have to be modified to
take advantage of multiple devices. This design added
another layer of complexity and ultimately prevented
certain file system advances, because the file system
had no control over the physical placement of data on
the virtualized volumes.

➍ Application issues a read. ZFS mirror tries the first disk.

➎ Checksum reveals that the block is corrupt on disk. ZFS


tries the second disk.

➏ Checksum indicates that the block is good. ZFS returns


good data to the application and repairs the damaged
block on the first disk.

Module 3 • Planning the OpenSolaris Environment 34


Creating a Mirrored ZFS Storage Pool
Instructor Notes

Creating a Mirrored ZFS Storage Pool


The objective of this lab exercise is to create and list a mirrored storage pool using
the zpool command.

For information about determining whether a ZFS mirrored storage pool


configuration or a RAID-Z storage pool configuration is right for your
environment, go to: http://www.solarisinternals.com/
wiki/index.php/ZFS_Best_Practices_Guide

Summary
ZFS is easy, so let's get on with it! It's time to create your first pool:

Module 3 • Planning the OpenSolaris Environment 35


Creating a Mirrored ZFS Storage Pool
Instructor Notes

To Create Mirrored Storage Pools


1 Open a terminal window.

2 Create a mirrored storage pool named tank. Then, display information about the
pool.
# zpool create tank mirror c1t1d0 c2t2d0
# zpool status tank
pool: tank
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM


tank ONLINE 0 0 0
mirror ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0

errors: No known data errors

The capacity of the c1t1d0 and c2t2d0 disks is 36 Gbytes each. Because the disks
are mirrored, the total capacity of the pool reflects the approximate size of one of
the disks. Pool metadata consumes a small quantity of disk space. For example:

# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
tank 33.8G 89K 33.7G 0% ONLINE -

Module 3 • Planning the OpenSolaris Environment 36


Creating ZFS File Systems as Home Directories
Instructor Notes

Creating ZFS File Systems as Home Directories


The objective of this lab exercise is to learn how to set up ZFS file systems as
several home directories.

By using ZFS file system features, available in the OpenSolaris project, you might
be able to simplify your kernel development environment by implementing
snapshots and their rollback features.

Summary
In this lab, we'll use the zfs command to create a filesystem and set its
mountpoint.

Module 3 • Planning the OpenSolaris Environment 37


Creating ZFS File Systems as Home Directories
Instructor Notes

To Create ZFS File Systems as Home Directories


1 Display the default ZFS file system that is created automatically when the storage
pool is created.
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 86K 33.2G 24.5K /tank

2 Create the tank/home file system:


# zfs create tank/home

3 Then, set the mount point for the tank/home file system:
# zfs set mountpoint=/export/home tank/home

4 Finally, create tank/home file systems for all of your developers:


# zfs create tank/home/developer1
# zfs create tank/home/developer2
# zfs create tank/home/developer3
# zfs create tank/home/developer4

The mountpoint property is inherited as a pathname prefix. That is,


tank/home/developer1 is automatically mounted at /export/home/developer1
because tank/home is mounted at /export/home.

5 Confirm that the ZFS file systems are created:


# zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 246K 33.2G 26.5K /tank
tank/home 128K 33.2G 29.5K /export/home
tank/home/developer1 24.5K 33.2G 24.5K /export/home/developer1
tank/home/developer2 24.5K 33.2G 24.5K /export/home/developer2
tank/home/developer3 24.5K 33.2G 24.5K /export/home/developer3
tank/home/developer4 24.5K 33.2G 24.5K /export/home/developer4

6 Take a recursive snapshot of the tank/home file system. Then, display the
snapshot information:
# zfs snapshot -r tank/home@today
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 252K 33.2G 26.5K /tank
tank/home 128K 33.2G 29.5K /tank/home

Module 3 • Planning the OpenSolaris Environment 38


Creating ZFS File Systems as Home Directories
Instructor Notes
tank/home@today 0 - 29.5K -
tank/home/developer1 24.5K 33.2G 24.5K /tank/home/developer1
tank/home/developer1@today 0 - 24.5K -
tank/home/developer2 24.5K 33.2G 24.5K /tank/home/developer2
tank/home/developer2@today 0 - 24.5K -
tank/home/developer3 24.5K 33.2G 24.5K /tank/home/developer3
tank/home/developer3@today 0 - 24.5K -
tank/home/developer4 24.5K 33.2G 24.5K /tank/home/developer4
tank/home/developer4@today 0 - 24.5K -

For more information, see zfs.1m.

Module 3 • Planning the OpenSolaris Environment 39


Creating a RAID-Z Configuration
Instructor Notes

Creating a RAID-Z Configuration


The objective of this lab exercise is to introduce you to the RAID-Z configuration.

Summary
You might want to create a RAID-Z configuration as an alternative to a mirrored
storage pool configuration if you need to maximize disk space.

Module 3 • Planning the OpenSolaris Environment 40


Creating a RAID-Z Configuration
Instructor Notes

To Create a RAID-Z Configuration


1 Open a terminal window.

2 Create a pool with a single RAID-Z device consisting of 5 disks. Then, display
information about the storage pool.
# zpool create tank raidz c1t1d0 c2t2d0 c3t3d0 c4t4d0 c5t5d0
# zpool status tank
pool: tank
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM


tank ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c4t4d0 ONLINE 0 0 0
c5t5d0 ONLINE 0 0 0

errors: No known data errors

Disks can be specified by using their shorthand name or the full path. For
example, /dev/dsk/c4t4d0 is identical to c4t4d0.
It is possible to use disk slices for both mirrored and RAID-Z storage pool
configurations, but these configurations are not recommended for production
environments. For more information about using ZFS in production
environments, go to: http://www.solarisinternals.com/
wiki/index.php/ZFS_Best_Practices_Guide.

Module 3 • Planning the OpenSolaris Environment 41


42
Instructor Notes

4
M O D U L E 4

Userland Consolidations

Objectives
The objective of this module is to introduce you to the userland consolidations of
OpenSolaris. In general, you can think of userland consolidations as existing
outside of the kernel and as components with which users interact. Each of the
following consolidations deliver source files to the opensolaris.org web site or
download center. To access each consolidation, refer to the following URL:
http://opensolaris.org/os/downloads/

43
Userland Consolidations and Descriptions
Instructor Notes

Userland Consolidations and Descriptions


Application Server The Glassfish Application Server

Developer Product Tools (DevPro) The system math library, the media library, the microtasking
library, SCCS and make and C++ runtime libraries.

Documentation (Docs) Developer and administration technical documentation.

Globalization Support (G11N) Internationalization and localization support.

Installation Support (Install) Installation support and packaging tools.

Java Desktop (JDS) A secure and comprehensive enterprise desktop software


solution.

Java Platform, Standard Edition Binaries for the Java Development Kit (JDK) and Java Runtime
(Java SE) Environment (JRE) are available.

Manual Pages Source code to the SunOS Reference Manual Pages.

Message Queue The Sun Java System Message Queue.

Network Storage (NWS) Network storage device support.

SFW (Solaris FreeWare) Open source software that is bundled with Solaris/OpenSolaris.

SPARC Graphics Support The SPARC graphics consolidation has drivers available in binary
form.

Test OpenSolaris Test Suites and Test Tools.

X Window System (X11) X11 software.

Module 4 • Userland Consolidations 44


Instructor Notes

5
M O D U L E 5

Core Features of the Solaris OS

Objectives
The objective of this module is to describe the core features of the Solaris OS and
how the features have fundamentally changed operating system computing.

Additional Resources
OpenSolaris Development Process;
http://www.opensolaris.org/os/community/onnv/os_dev_process/

C Style and Coding Standards for SunOS; http://www.opensolaris.org/


os/community/documentation/getting_started_docs/

45
Development Process and Coding Style
Instructor Notes

Development Process and Coding Style


The development process steps and the coding style that is used by the OS/Net
consolidation (ON) are used to deliver the core Operating System and
Networking components to Solaris. ON contains the source for the kernel and all
platforms (on all architectures), the bulk of the drivers, filesystems, core libraries,
and basic commands that you'd expect to find on a Solaris system.The
development process for the OpenSolaris project follows the following high-level
steps:
1. Idea
First, someone has an idea for an enhancement or has a gripe about a defect.
Search for an existing bug or file a new bug or request for enhancement (RFE)
by using the http://bugs.opensolaris.org/ web page. Next, announce it to
other developers on the appropriate E-mail list. The announcement has the
following benefits:
■ Precipitate discussion of the change or enhancement
■ Determine the complexity of the proposed change(s)
■ Gauge community interest
■ Identify potential team members
2. Design
The Design phase determines whether or not a formal design review is even
needed. If a formal review is needed, complete the following next steps:
■ Identify design and architectural reviewers
■ Write a design document
■ Write a test plan
■ Conduct design reviews and get the appropriate approvals
3. Implementation
The Implementation phase consists of the following:
■ Writing of the actual code in accordance with policies and standards
Download C Style and Coding Standards for SunOS here:
http://www.opensolaris.org/
os/community/documentation/getting_started_docs/.
■ Writing the test suites
■ Passing various unit and pre-integration tests

Module 5 • Core Features of the Solaris OS 46


Development Process and Coding Style
Instructor Notes
■ Writing or updating the user documentation, if needed ➊ Sometimes, the integrated change needs to be
■ Identifying code reviewers in preparation for integration communicated by sending heads-up messages to
4. Integration appropriate communities and possibly presenting a
transfer of information (TOI) to a support organization
Integration happens after all reviews have been completed and permission to to help them understand the change.
integrate has been granted.

The Integration phase is to make sure everything that was supposed to be done
has in fact been done, which means conducting reviews for code, documentation,
and completeness.

➊ The formal process document for OpenSolaris describes the previous steps in
greater detail, with flow charts that illustrate the development phases. That
document also details the following design principles and core values that are to
be applied to source code development for the OpenSolaris project:
■ Reliability – OpenSolaris must perform correctly, providing accurate results
with no data loss or corruption.
■ Availability – Services must be designed to be restartable in the event of an
application failure and OpenSolaris itself must be able to recover from
non-fatal hardware failures.
■ Serviceability – It must be possible to diagnose both fatal and transient issues
and wherever possible, automate the diagnosis.
■ Security – OpenSolaris security must be designed into the operating system,
with mechanisms in place in order to audit changes done to the system and by
whom.
■ Performance – The performance of OpenSolaris must be second to none
when compared to other operating systems running on identical
environments.
■ Manageability – It must allow for the management of individual components,
software or hardware, in a consistent and straightforward manner.
■ Compatibility – New subsystems and interfaces must be extensible and
versioned in order to allow for future enhancements and changes without
sacrificing compatibility.
■ Maintainability – OpenSolaris must be architected so that common
subroutines are combined into libraries or kernel modules that can be used by
an arbitrary number of consumers.

Module 5 • Core Features of the Solaris OS 47


Development Process and Coding Style
Instructor Notes
■ Platform Neutrality – OpenSolaris must continue to be platform neutral and ➊ This coding style is very similar to that used by the
lower level abstractions must be designed with multiple and future platforms Linux kernel, BSD systems, and many other non-GNU
in mind. projects (the GNU project uses its own unique coding
style). Also, examine the files in
Refer to http://www.opensolaris.org/os/community/onnv/os_dev_process/
usr/src/prototypes; these provide examples of the
for more detailed information about the process that is used for collaborative
correct general layout and style for most types of
development of OpenSolaris code.
source files.
Like many projects, OpenSolaris enforces a coding style on contributed code,
regardless of its source. This style is described in detail at ➋ There are style mistakes that cannot be caught by any
http://opensolaris.org/os/community/onnv/. reasonable tool, and others that cannot be caught by
the particular implementations.
➊ Two tools for checking many elements of the coding style are available as part of
the OpenSolaris distribution. These tools are cstyle(1) for verifying compliance ➌ Improving the accuracy and completeness of these
of C code with most style guidelines, and hdrchk(1) for checking the style of C tools is an ongoing task.
➋–➌ and C++ headers.

Module 5 • Core Features of the Solaris OS 48


Overview
Instructor Notes

Overview
Now that you have considered the development environment, processes, and
values applied to engineering by OpenSolaris developers, let's discuss in more
depth, features of the operating system that exemplify performance, security,
serviceability, and manageability:
■ Performance
■ FireEngine
■ Nemo
■ Crossbow
■ Security
■ Least Privilege
■ Packet Filtering
■ Zones
■ Branded Zones (BrandZ)
■ Serviceability
■ Predictive Self-Healing
■ Dynamic Tracing Facility (DTrace)
■ Modular Debugger (MDB)
■ Manageability
■ Services Management Facility (SMF)
■ ZFS

FireEngine
The "FireEngine" approach in Solaris 10 merges all protocol layers into one
STREAMS module that is fully multithreaded. Inside the merged module, instead
of per-data structure locks, a per-CPU synchronization mechanism called
"vertical perimeter" is used. The "vertical perimeter" is implemented using a
serialization queue abstraction called "squeue." Each squeue is bound to a CPU,
and each connection is in turn bound to a squeue that provides any
synchronization and mutual exclusion needed for the connection-specific data
structures.

Module 5 • Core Features of the Solaris OS 49


Overview
Instructor Notes

Synchronization
Since the stack is fully multithreaded (barring the per-CPU serialization enforced
by the vertical perimeter), it uses a reference-based scheme to ensure that the
connection instance is available when needed. For an established TCP
connection, three references are guaranteed to be on it. Each protocol layer has a
reference on the instance (one each for TCP and IP) and the classifier itself has a
reference since it is an established connection. Each time a packet arrives for the
connection and the classifier looks up the connection instance, an extra reference
is placed, which is dropped when the protocol layer finishes processing that
packet.

TCP, IP, and UDP


The Solaris 10 OS provides the same view for TCP as previous releases -- that is,
TCP appears as a clone device but it is actually a composite, with the TCP and IP
code merged into a single D_MP STREAMS module. The operational part of
TCP is fully protected by the vertical perimeter that entered through the squeue
primitives. FireEngine changes the interface between TCP and IP from the
existing STREAMS- based message passing interface to a functional call-based
interface, both in the control and data paths.

There is a fully multithreaded UDP module running under the same protection
domain as IP. Though UDP and IP are running in the same protection domain,
they are still separate STREAMS modules. Therefore, STREAMS plumbing is
kept unchanged and a UDP module instance is always pushed above IP. The
Solaris 10 platform allows for the following plumbing modes:
■ Normal– IP is first opened and later UDP is pushed directly on top. This is the
default action that happens when a UDP socket or device is opened.
■ SNMP– UDP is pushed on top of a module other than IP. When this happens,
only SNMP semantics will be supported.

GLDv3
Solaris 10 software introduces a new device driver framework called GLDv3
along with the new stack. Most of the major device drivers were ported to this
framework, and all future and 10Gb device drivers will be based on this
framework. This framework also provided a STREAMS-based DLPI layer for
backward compatibility (to allow external, non-IP modules to continue to work).

Module 5 • Core Features of the Solaris OS 50


Overview
Instructor Notes
GLDv3 architecture virtualizes Layer two of the network stack. A one-to-one
correspondence between network interfaces and devices no longer exists.

Refer to the Nemo Project hosted on opensolaris.org for more information


about the framework, the MAC services module, and the Data-Link Services
module.

Virtualization
Project Crossbow creates virtual stacks around any service (HTTP, HTTPS, FTP,
NFS, etc.), protocol (TCP, UDP, SCTP, etc.), or Solaris Containers technology.
The virtual stacks are separated by means of a H/W classification engine such that
traffic for one stack does not impact other virtual stacks. Each virtual stack can be
assigned its own priority and bandwidth on a shared NIC without causing
performance degradation to the system or the service/container. The architecture
dynamically manages priority and bandwidth resources, and can provide better
defense against denial-of-service attacks directed at a particular service or
container by isolating the impact to just that service or container.

Least Privilege
UNIX® has historically had an all-or-nothing privilege model that imposes the
following restrictions:
■ No way to limit root user privileges
■ No way for non-root users to perform privileged operations
■ Applications needing only a few privileged operations must run as root
■ Very few are trusted with root privileges and virtually no students are so
trusted

In the Solaris OS we've developed fine-grained privileges. Fine-grained privileges


allows applications and users to run with just the privileges they need. The least
privilege allows students to be granted the privileges that they need to complete
their course work, participate in research, and maintain a portion of the campus
or department infrastructure.

Module 5 • Core Features of the Solaris OS 51


Overview
Instructor Notes

Packet Filtering
Solaris IP Filter provides stateful packet filtering and network address translation
(NAT). Solaris IP Filter is derived from the open source IP Filter software. IP
Filter can filter by IP address, port, protocol, or network interface according to
filter rules.

IP Filter
The Packet Filtering Hooks (PFHooks) API has been introduced since Solaris 10
Update 4, to replace the STREAMS-based implementation of IP Filter. Using the
PFHooks framework, the performance of firewall software like IP Filter is
significantly improved. PFhooks also provides the ability to intercept loopback
and inter-zone traffic. Third-party firewall software is developed and registered
with the PFHooks API using the net_register_hook(info, event, hook);
hook.

Enabling Simple Packet Filters


The objective of this exercise is to learn about packet filtering. Solaris IP Filter is
installed with the Solaris operating system. However, packet filtering is not
enabled by default. IP Filter can filter by IP address, port, protocol, or network
interface according to filter rules. Following is an example filter rule:

block in on ce0 proto tcp from 192.168.0.0/16 to any port = 23

To use Solaris IP Filter, simply enter your filter rules in the /etc/ipf/ipf.conf
file. Then, enable and restart the svc:network/ipfilter service by using the
svcadm command.

Note – You can also use the ipf command to work with the rule sets.

Solaris IP Filter can perform network address translation (NAT) for a source
address or a destination address according to NAT rules. Following is an example
of a NAT rule:

map ce0 192.168.1.0/24 -> 10.1.0.0/16

Module 5 • Core Features of the Solaris OS 52


Overview
Instructor Notes
To use network address translation, enter your NAT rules in the
/etc/ipf/ipnat.conf file. Then, enable and restart the
svc:/network/ipfilter service by using the svcadm command.

Note – You can also use the ipnat command to work with rule sets.

Sample Packet Filtering Rules


This section includes various examples of filtering rule syntax. Invoke the rules by
adding them to the /etc/ipf/ipf.conf file. Then, enable Solaris IP Filter and
reboot your machine as detailed in the previous exercise.
Log all inbound packets on le0 which has IP options present.

log in on le0 from any to any with ipopts

Block any inbound packets on le0 which are fragmented and too short on which
to do any meaningful comparison. This actually only applies to TCP packets
which can be missing the flags/ports (depending on which part of the fragment
you see).

block in log quick on le0 from any to any with short frag

Log all inbound TCP packets with the SYN flag (only) set.

Note – If it was an inbound TCP packet with the SYN flag set and it had IP options
present, this rule and the above rule would cause it to be logged twice.

log in on le0 proto tcp from any to any flags S/SA

Block and log any inbound ICMP unreachables.

block in log on le0 proto icmp from any to any icmp-type unreach

Block and log any inbound UDP packets on le0 which are going to port 2049 (the
NFS port).

block in log on le0 proto udp from any to any port = 2049

Quickly allow any packets to/from a particular pair of hosts

Module 5 • Core Features of the Solaris OS 53


Overview
Instructor Notes
pass in quick from any to 10.1.3.2/32
pass in quick from any to 10.1.0.13/32
pass in quick from 10.1.3.2/32 to any
pass in quick from 10.1.0.13/32 to any

Block (and stop matching) any packet with IP options present.

block in quick on le0 from any to any with ipopts

Allow any packet through

pass in from any to any

Block any inbound UDP packets destined for these subnets.

block in on le0 proto udp from any to 10.1.3.0/24


block in on le0 proto udp from any to 10.1.1.0/24
block in on le0 proto udp from any to 10.1.2.0/24

Block any inbound TCP packets with only the SYN flag set that are destined for
these subnets.

block in on le0 proto tcp from any to 10.1.3.0/24 flags S/SA


block in on le0 proto tcp from any to 10.1.2.0/24 flags S/SA
block in on le0 proto tcp from any to 10.1.1.0/24 flags S/SA

Block any inbound ICMP packets destined for these subnets.

block in on le0 proto icmp from any to 10.1.3.0/24


block in on le0 proto icmp from any to 10.1.1.0/24
block in on le0 proto icmp from any to 10.1.2.0/24

Zones
A zone is a virtual operating system abstraction that provides a protected
environment in which applications run. The applications are protected from each
other to provide software fault isolation. To ease the labor of managing multiple
applications and their environments, they co-exist within one operating system
instance, and are usually managed as one entity.

A small number of applications which are normally run as root or with certain
privileges may not run inside a zone if they rely on being able to access or change

Module 5 • Core Features of the Solaris OS 54


Overview
Instructor Notes
some global resource. An example might be the ability to change the system's ➊ Super-user in a zone can't affect or obtain privileges in
time-of-day clock. The few applications which fall into this category may need other zones.
applications to run properly inside a zone or in some cases, should continue to be
used within the global zone. ➋ This allows students a safe sandbox in which to
Here are some guidelines: experiment.

■ An application which accesses the network and files, and performs no other ➌ Zones can be used as instructional tool or
I/O, should work correctly. infrastructure component
■ Applications which require direct access to certain devices, for example, a disk
partition, will usually work if the zone is configured correctly. However, in ➍ For example, you can allocate each student an IP
some cases this may increase security risks. address and a zone and allow them all to safely share
one machine.
■ Applications which require direct access to these devices may need to be
modified to work correctly. For example, /dev/kmem, or a network device.
Applications should instead use one of the many IP services.
➊–➍ Zones can be combined with the resource management facilities which are
present in OpenSolaris to provide more complete, isolated environments. While
the zone supplies the security, name space and fault isolation, the resource
management facilities can be used to prevent processes in one zone from using
too much of a system resource or to guarantee them a certain service level.
Together, zones and resource management are often referred to as containers.
See http://opensolaris.org/os/community/zones/faq for answers to a large
number of common questions about zones and links to the latest administration
documentation.
Zones provide protected environments for Solaris applications.Separate and
protected run-time environments are available through the OpenSolaris project,
by using BrandZ.

Branded Zones (BrandZ)


BrandZ is a framework that extends the zones infrastructure to create Branded
Zones, which are zones that contain non-native operating environments. A
branded zone may be as simple as an environment where the standard Solaris
utilities are replaced by their GNU equivalents, or as complex as a complete
Linux user space.
BrandZ extends the Zones infrastructure in user space in the following ways:

Module 5 • Core Features of the Solaris OS 55


Overview
Instructor Notes
■ A brand is an attribute of a zone, set at zone configuration time.
■ Each brand provides its own installation routine, which allows us to install an
arbitrary collection of software in the branded zone.
■ Each brand may provide pre-boot and post-boot scripts that allow us to do
any final boot-time setup or configuration.
■ The zonecfg and zoneadm tools can set and report a zone's brand type.
BrandZ provides a set of interposition points in the kernel:
■ These points are found in the syscall path, process loading path, thread
creation path, etc.
■ These interposition points are only applied to processes in a branded zone.
■ At each of these points, a brand may choose to supplement or replace the
standard behavior of the Solaris OS.
■ Fundamentally different brands may require new interposition points.
The lx brand enables Linux binary applications to run unmodified on Solaris,
within zones that are running a complete Linux user space. The lx brand enables
user-level Linux software to run on a machine with a OpenSolaris kernel, and
includes the tools necessary to install a CentOS or Red Hat Enterprise Linux
distribution inside a zone on a Solaris system. The lx brand will run on x86/x64
systems booted with either a 32-bit or 64-bit kernel. Regardless of the underlying
kernel, only 32-bit Linux applications are able to run. This feature is only
available for x86 and AMD x64 architectures at this time. However, porting to
SPARC might be an interesting community project because BrandZ lx is still very
much a work in progress.
Refer to http://opensolaris.org/os/community/brandz/install for the
installation requirements and instructions.
The OpenSolaris project addresses the unique challenges of operating system
development and testing for application performance using features like zones.

Zones Networking
Solaris zones can be designated as one of the following:
■ Exclusive-IP zone
■ Shared-IP zone

Module 5 • Core Features of the Solaris OS 56


Overview
Instructor Notes
Exclusive-IP zones have their own IP stacks and may have their own physical
interfaces. An exclusive-IP zone may also have its own VLAN interfaces. The
configuration of exclusive-IP zones is the same as that of a physical machine.
Shared-IP zones share the IP stack with the global zone, so shared-IP zones are
shielded from the configuration details for devices, routing and so on. Each
shared-IP zone can be assigned IPv4/IPv6 addresses. Each shared-IP zone also
has its own port space. Applications can bind to INADDR_ANY and will only
receive traffic for that zone.
Both type of zones cannot see the traffic of other zones. Packets coming from a
zone have a source address belonging to that zone. A shared-IP zone can only
send packets on an interface on which it has an address. A shared-IP zone can
only use a default router if it is directly reachable from the zone. The default
router has to be in the same IP subnet as the zone.

Shared-IP zones cannot change their network configuration or routing table and
cannot see the configuration of other zones. /dev/ip is not present in the
shared-IP zone. SNMP agents must open /dev/arp instead. Multiple shared-IP
zones can share a broadcast address and may join the same multi-cast group.

Shared-IP zones have the following networking limitations:


■ Can not put a physical interface inside a zone
■ IPFilter does not work between zones
■ No DHCP for Zones IP addresses
■ No Dynamic Routing

Exclusive-IP zones do not have the above limitations, and can change their
network configuration or routing table inside the zone. /dev/ip is present in the
exclusive-IP zone.

Zones Identity, CPU Visibility, and Packaging


Each zone controls its node name, timezone, and naming services like LDAP and
NIS. The sysidtool can set this up. Separate /etc/passwd files mean that root
privileges can be delegated to the zone. User IDs may map to different names
when domains differ.

By default, all zones see all CPUs. Restricted view is enabled automatically when
resource pools are enabled.

Module 5 • Core Features of the Solaris OS 57


Overview
Instructor Notes
Zones can add their own packages. Patches can be made to those packages.
System Patches are applied in the global zone. Then, in non-global zones the zone
will automatically boot -s to apply the patch. The SUNW_PKG_ALLZONES
package should be kept consistent between the global zone and all non-global
zones. The SUNW_PKG_HOLLOW causes package name to appear in
non-global zones (NGZ) for dependency purposes but the contents are not
installed.

Zones Devices
Each zone has its own devices. Zones see a subset of safe pseudo devices in their
/dev directory. Applications reference the logical path to a device presented in
/dev. The /dev directory exists in non-global zones, the /devices directory does
not. Devices like random, console, and null are safe, but others like /dev/kmem are
not.

Zones can modify the permissions of their devices but cannot issue mknod(2).
Physical device files like those for raw disks can be put in a zone with caution.
Devices maybe shared among zones, but need careful security concerns before
doing this.

For example, you might have devices that you want to assign to specific zones.
Allowing unprivileged users to access block devices could permit those devices to
be used to cause system panic, bus resets, or other adverse effects. Placing a
physical device into more than one zone can create a covert channel between
zones. Global zone applications that use such a device risk the possibility of
compromised data or data corruption by a non-global zone.

Predictive Self-Healing
Predictive self-healing was implemented in two ways in the Solaris 10 OS. This
section describes the new Fault Management Architecture and Services
Management Facility that make up the self-healing technology.

Fault Management Architecture (FMA)


The Solaris OS provides a new architecture, FMA, for building resilient error
handlers, error telemetry, automated diagnosis software, response agents, and a
consistent model of system failures for a management stack. Many parts of

Module 5 • Core Features of the Solaris OS 58


Overview
Instructor Notes
Solaris are already participating in FMA, including the CPU and Memory error ➊ The Fault Manager associates diagnosis state with
handling for UltraSPARC III and IV, the UltraSPARC PCI HBAs, and Opteron. A persistent identifiers corresponding to the system
variety of projects are underway, including full support for CPU, Memory, and resources, such as hardware serial numbers. As a
I/O faults on Opteron, conversion of key device drivers, and integration with result, the Fault Manager automatically updates this
various management stacks. state after most repair actions, without requiring any
manual intervention.
When a subsystem is converted to participate in Fault Management, error
handling is made resilient so that the system can continue to operate despite ➋ The legacy UNIX failure model was simply to leave
some underlying failure, and telemetry events are produced that drive automated error handling up to each subsystem author, and
diagnosis and response. The Fault Management tools and architecture enable simply provide the ability to emit an error message for
development of self-healing content for software and hardware failures, for both a human to the system log in a non-standard format.
microscopic and macroscopic system resources, all with a unified, simple view for
administrators and system management software. ➌ Historically, it has been difficult to look inside of
complicated software systems with no way to see
➊ See http://opensolaris.org/os/community/fm for information about how to interactions between processes and no way to
participate in the Fault Management community or to download the Fault observe kernel activity.
➋ Management MIB that is currently in development.
➍ This makes it difficult to understand even a single
application.
Dynamic Tracing (DTrace) ➎ DTrace is a new facility in the OpenSolaris project for
DTrace provides a powerful infrastructure to permit administrators, developers, systemic dynamic instrumentation.
and service personnel to concisely answer arbitrary questions about the behavior
of the operating system and user programs. DTrace enables you to do the ➏ By using a special-purpose control language, DTrace
following: can give concise answers to arbitrary questions about
the system.
➌–➏ ■ Dynamically enable and manage thousands of probes
■ Dynamically associate predicates and actions with probes ➐ This gives students the means to take software apart
■ Dynamically manage trace buffers and probe overhead and understand its inner workings.
■ Examine trace data from a live system or from a system crash dump
■ Implement new trace data providers that plug into DTrace ➑ It enables computer science educators to show
■ Implement trace data consumers that provide data display principles from the classroom on a real, production
■ Implement tools that configure DTrace probes machine.

Find the DTrace community pages here ➒ It allows researchers to better and more quickly
http://opensolaris.org/os/community/dtrace. understand and improve software systems.
In addition to DTrace, the OpenSolaris project provides debugging facilities for
➐–➒ low-level types of development, for example, device driver development.

Module 5 • Core Features of the Solaris OS 59


Overview
Instructor Notes

Modular Debugger (MDB) ➊ Traditional file systems that do provide


checksumming have performed it on a per-block
MDB is a debugger designed to facilitate analysis of problems that require basis, out of necessity due to the volume
low-level debugging facilities, examination of core files, and knowledge of management layer and traditional file system design.
assembly language to diagnose and correct. Generally, kernel and device The traditional design means that certain failure
developers rely on mdb to determine why and where their code went wrong. modes, such as writing a complete block to an
MDB is available as two commands that share common features: mdb and kmdb. incorrect location, can result in properly
You can use the mdb command interactively or in scripts to debug live user checksummed data that is actually incorrect. ZFS
processes, user process core files, kernel crash dumps, the live operating system, checksums are stored in a way such that these failure
object files, and other files. You can use the kmdb command to debug the live modes are detected and can be recovered from
operating system kernel and device drivers when you also need to control and gracefully.
halt the execution of the kernel.
➋ Old way: create one filesystem, such as
There is an active community for MDB, where you can ask the experts or review /export/home, to manage many user subdirectories.
previous conversations and common questions. See
http://opensolaris.org/os/community/mdb ➌ ZFS way: just create one filesystem per user.

➍ Thousands of file systems can draw from a common


ZFS File System storage pool, each one consuming only as much space
as it actually needs.
ZFS filesystems are not constrained to specific devices, so they can be created
easily and quickly like directories. They grow automatically within the space
allocated to the storage pool.

Checksumming and Data Recovery


With ZFS, all data and metadata is checksummed using a user-selectable
algorithm. All checksumming and data recovery is done at the file system layer,
and is transparent to applications. In addition, ZFS provides for self-healing data.
ZFS supports storage pools with varying levels of data redundancy, including
mirroring and a variation on RAID-5. When a bad data block is detected, ZFS
fetches the correct data from another redundant copy, and repairs the bad data,
replacing it with the good copy.
➊–➌ ZFS presents a pooled storage model that eliminates the concept of volumes and
the associated problems of partitions, provisioning, wasted bandwidth, and
stranded storage.
➍ The combined I/O bandwidth of all devices in the pool is available to all
filesystems at all times.

Module 5 • Core Features of the Solaris OS 60


Overview
Instructor Notes
Each storage pool is comprised of one or more virtual devices, which describe the ➊ All traditional RAID-5-like algorithms including
layout of physical storage and its fault characteristics. See RAID-4. RAID-5. RAID-6, RDP, and EVEN-ODD, for
http://opensolaris.org/os/community/zfs/demos/basics for 100 Mirrored example, suffer from a problem known as the "RAID-5
Filesystems in 5 Minutes, a demonstration of administering ZFS file systems. write hole": if only part of a RAID-5 stripe is written,
and power is lost before all blocks have made it to disk,
RAID-Z the parity will remain out of sync with data.
In addition to pooled storage, ZFS provides redundant mirrored and RAID-Z
➋ The parity is therefore useless forever, unless a
data redundancy configurations. A RAID-Z configuration is a virtual device that
subsequent full-stripe write overwrites it.
stores data and parity on multiple disks, similar to RAID-5.

➊–➋ In RAID-Z, ZFS uses variable-width RAID stripes so that all writes are full-stripe
writes. This feature is only possible because ZFS integrates filesystem and device
management in such a way that the filesystem's metadata has enough
information about the underlying data replication model to handle
variable-width RAID stripes. RAID-Z is the world's first software-only solution
to the RAID-5 write hole.

Services Management Facility (SMF)


SMF creates a supported, unified model for management of an enormous
number of services, such as email delivery, ftp requests, and remote command
execution in the OpenSolaris project. The smf(5) framework replaces (in a
compatible manner) the existing init.d(4) startup mechanism and includes an
enhanced inetd(1M). SMF gives developers the following:
■ Automated restart of services in dependency order due to administrative
errors, software bugs, or uncorrectable hardware errors
■ A single API for service management, configuration, and observation
■ Access to service-based resource management
■ Simplified boot-process debugging

See http://opensolaris.org/os/community/smf/scfdot to see a graph of the


SMF services and their dependencies on an x86 system freshly installed with the
Solaris OS Nevada.

Module 5 • Core Features of the Solaris OS 61


62
Instructor Notes

6
M O D U L E 6

Programming Concepts

Objectives
This module provides a high-level description of the fundamental concepts of the
OpenSolaris programming environment, as follows:
■ Process and System Management
■ Threaded Programming
■ Kernel Overview
■ CPU Scheduling
■ Process Debugging

Additional Resources
■ Solaris Internals (2nd Edition), Prentice Hall PTR (May 12, 2006) by Jim
Mauro and Richard McDougall
■ Solaris Systems Programming, Prentice Hall PTR (August 19, 2004), by Rich
Teer
■ Multithreaded Programming Guide. Sun Microsystems, Inc., 2005.
■ STREAMS Programming Guide. Sun Microsystems, Inc., 2005.
■ Solaris 64-bit Developer’s Guide. Sun Microsystems, Inc., 2005.

63
Process and System Management
Instructor Notes

Process and System Management


The basic unit of workload is the process. Process IDs (PIDs) are numbered
sequentially throughout the system. By default, each user is assigned by the
system administrator to a project, which is a network-wide administrative
identifier. Each successful login to a project creates a new task, which is a
grouping mechanism for processes. A task contains the login process as well as
subsequent child processes.
The resource pools facility brings together process-bindable resources into a
common abstraction called a pool. Processor sets and other entities are
configured, grouped, and labelled such that workload components are associated
with a subset of a system's total resources. When the pools facility is disabled, all
processes belong to the same pool, pool_default, and processor sets are
managed through the pset() system call. When the pools facility is enabled,
processor sets must be managed by using the pools facility. New pools can be
created and associated with processor sets. Processes may be bound to pools that
have non-empty resource sets.
If we search OpenGrok for pool.c, we find extensive code comments that
describe relationships between tasks, pools, projects, and processes, as follows:
The operation that binds tasks and projects to pools is atomic. That is, either all
processes in a given task or a project will be bound to a new pool, or (in case of an
error) they will be all left bound to the old pool. Processes in a given task or a
given project can only be bound to different pools if they were rebound
individually one by one as single processes. Threads or LWPs of the same process
do not have pool bindings, and are bound to the same resource sets associated
with the resource pool of that process.
Processes can optionally be run inside a zone. Zones are set up by system
administrators, often for security purposes, in order to isolate groups of users or
processes from one another.

Threaded Programming
Now that we've learned about processes in the context of tasks, projects, resource
pools, zones, and branded zones, let's discuss processes in the context of threads.
Traditional UNIX already supports the concept of threads. Each process contains

Module 6 • Programming Concepts 64


Process and System Management
Instructor Notes
a single thread, so programming with multiple processes is programming with ➊ Creating a thread is less expensive than creating a new
multiple threads. But, a process is also an address space, and creating a process process because the newly created thread uses the
involves creating a new address space. current process address space.
➊–➌ Communication between the threads of one process is simple because the threads
➋ The time that is required to switch between threads is
share everything, inlcuding a common address space and open file descriptors.
less than the time required to switch between
So, data produced by one thread is immediately available to all the other threads.
processes.
The libraries are libpthread for POSIX threads, and libthread for OpenSolaris
threads. Multithreading provides flexibility by decoupling kernel-level and ➌ A switch between threads is faster because no
user-level resources. In OpenSolaris, multithreading support for both sets of switching between address spaces occurs.
interfaces is provided by the standard C library.
Use pthread_create(3C) to add a new thread of control to the current process.

int pthread_create(pthread_t *tid, const pthread_attr_t *tattr,


void*(*start_routine)(void *), void *arg);

The pthread_create() function is called with attr that has the necessary state
behavior. start_routine is the function with which the new thread begins
execution. When start_routine returns, the thread exits with the exit status set
to the value returned by start_routine. pthread_create() returns zero when
the call completes successfully. Any other return value indicates that an error
occurred. Go to /on/usr/src/lib/libc/spec/threads.spec in OpenGrok for
the complete list of pthread functions and declarations.
Thread synchronization enables you to control program flow and access to
shared data for concurrently executing threads. The four synchronization objects
are mutex locks, read/write locks, condition variables, and semaphores.
■ Mutex locks allow only one thread at a time to execute a specific section of
code, or to access specific data.
■ Read/write locks permit concurrent reads and exclusive writes to a protected
shared resource. To modify a resource, a thread must first acquire the
exclusive write lock. An exclusive write lock is not permitted until all read
locks have been released.
■ Condition variables block threads until a particular condition is true.
■ Counting semaphores typically coordinate access to resources. The count is
the limit on how many threads can have access to a semaphore. When the
count is reached, the thread that is trying to access the resource blocks.

Module 6 • Programming Concepts 65


Process and System Management
Instructor Notes

Synchronization ➊ We can use OpenGrok to find libthread in the source


Synchronization objects are variables in memory that you access just like data. code tree, and the second most relevant result is
Threads in different processes can communicate with each other through found in mutex.c, accompanied by the following
synchronization objects that are placed in threads-controlled shared memory. code comment excerpt:
The threads can communicate with each other even though the threads in
different processes are generally invisible to each other. Synchronization objects
can also be placed in files. The synchronization objects can have lifetimes beyond
the life of the creating process.

We can use OpenGrok to find libthread in the source code tree, and the second
most relevant result is found in mutex.c, accompanied by the following code
comment excerpt:

➊ Implementation of all threads interfaces between ld.so.1 and


libthread. In a non-threaded environment all thread interfaces are
vectored to noops. When called via _ld_concurrency() from libthread
these vectors are reassigned to real threads interfaces. Two models
are supported:

TI_VERSION == 1 Under this model libthread provides


rw_rwlock/rw_unlock, through which we vector all
rt_mutex_lock/rt_mutex_unlock calls. Under lib/libthread these
interfaces provided _sigon/_sigoff (unlike lwp/libthread that
provided signal blocking via bind_guard/bind_clear.

TI_VERSION == 2 Under this model only libthreads


bind_guard/bind_clear and thr_self interfaces are used. Both
libthreads block signals under the bind_guard/bind_clear interfaces.
Lower level locking is derived from internally bound _lwp_
interfaces. This removes recursive problems encountered when
obtaining locking interfaces from libthread. The use of mutexes over
reader/writer locks also enables the use of condition variables for
controlling thread concurrency (allows access to objects only after
their .init has completed).

Now that you understand a bit about how synchronization objects are defined in
multi-threaded programming, let's learn how these objects are managed by using
scheduling classes.

Module 6 • Programming Concepts 66


Process and System Management
Instructor Notes

CPU Scheduling
Processes run in a scheduling class with a separate scheduling policy applied to
each class, as follows:
■ Realtime (RT) – The highest-priority scheduling class provides a policy for
those processes that require fast response and absolute user or application
control of scheduling priorities. RT scheduling can be applied to a whole
process or to one or more lightweight processes (LWPs) in a process. You
must have the proc_priocntl privilege to use the Realtime class. See the
privileges(5) man page for details.
■ System (SYS) – The middle-priority scheduling class, the system class cannot
be applied to a user process.
■ Timeshare (TS) – The lowest-priority scheduling class is TS, which is also the
default class. The TS policy distributes the processing resource fairly among
processes with varying CPU consumption characteristics. Other parts of the
kernel can monopolize the processor for short intervals without degrading the
response time seen by the user.
■ Inter-Active (IA) – The IA policy distributes the processing resource fairly
among processes with varying CPU consumption characteristics, while also
providing good responsiveness for user interaction.
■ Fair Share (FSS) – The FSS policy distributes the processing resource fairly
among projects, independent of the number of processes they own by
specifying shares to control the process entitlement to CPU resources.
Resource usage is remembered over time, so that entitlement is reduced for
heavy usage and increased for light usage with respect to other projects.
■ Fixed-Priority (FX) – The FX policy provides a fixed priority preemptive
scheduling policy for those processes requiring that the scheduling priorities
do not get dynamically adjusted by the system and that the user or application
have control of the scheduling priorities. This class is a useful starting point
for affecting CPU allocation policies.

A scheduling class is maintained for each lightweight process (LWP). Threads


have the scheduling class and priority of their underlying LWPs. Each LWP in a
process can have a unique scheduling class and priority that are visible to the
kernel. Thread priorities regulate contention for synchronization objects.

Module 6 • Programming Concepts 67


Process and System Management
Instructor Notes
The RT and TS scheduling classes both call priocntl(2) to set the priority level of ➊ We can use OpenGrok to quickly find the file and view
processes or LWPs within a process. Using OpenGrok to search the code base for its comments.
priocntl, we find the variables that are used in the RT and TS scheduling classes
in the rtsched.c file as follows:

27 #pragma ident "@(#)rtsched.c 1.10 05/06/08 SMI"


28
29 #include "lint.h"
30 #include "thr_uberdata.h"
31 #include <sched.h>
32 #include <sys/priocntl.h>
33 #include <sys/rtpriocntl.h>
34 #include <sys/tspriocntl.h>
35 #include <sys/rt.h>
36 #include <sys/ts.h>
37
38 /*
39 * The following variables are used for caching information
40 * for priocntl TS and RT scheduling classs.
41 */
42 struct pcclass ts_class, rt_class;
43
44 static rtdpent_t *rt_dptbl; /* RT class parameter table */
45 static int rt_rrmin;
46 static int rt_rrmax;
47 static int rt_fifomin;
48 static int rt_fifomax;
49 static int rt_othermin;
50 static int rt_othermax;
...

➊ Typing the man priocntl command in a terminal window shows the details of
each scheduling class and describes attributes and usage. For example:

% man priocntl
Reformatting page. Please Wait... done

User Commands priocntl(1)

NAME
priocntl - display or set scheduling parameters of specified
process(es)

SYNOPSIS

Module 6 • Programming Concepts 68


Process and System Management
Instructor Notes
priocntl -l ➊ We can use the man command to view detailed man
priocntl -d [-i idtype] [idlist] pages for more usage information and an explanation
of the command options.
priocntl -s [-c class] [ class-specific options] [-
i idtype] [idlist]

priocntl -e [-c class] [ class-specific options] command


[argument(s)]

DESCRIPTION
The priocntl command displays or sets scheduling parameters
of the specified process(es). It can also be used to display
the current configuration information for the system’s pro-
cess scheduler or execute a command with specified schedul-
ing parameters.

Processes fall into distinct classes with a separate


scheduling policy applied to each class. The process classes
currently supported are the real-time class, time-sharing
class, interactive class, fair-share class, and the fixed
priority class. The characteristics of these classes and the
class-specific options they accept are described below in
the USAGE section under the headings Real-Time Class, Time-
Sharing Class, Inter-Active Class, Fair-Share Class, and
➊ --More--(4%)

Kernel Overview
Now that you have a high-level understanding of processes, threads, and
scheduling, let's discuss the kernel and how kernel modules are different from
user programs. The Solaris kernel does the following:
■ Manages the system resources, including file systems, processes, and physical
devices.
■ Provides applications with system services such as I/O management, virtual
memory, and scheduling.
■ Coordinates interactions of all user processes and system resources.
■ Assigns priorities, services resource requests, and services hardware interrupts
and exceptions.
■ Schedules and switches threads, pages memory, and swaps processes.

Module 6 • Programming Concepts 69


Process and System Management
Instructor Notes
The following section discusses several important differences between kernel
modules and user programs.

Execution Differences Between Kernel Modules and User


Programs
The following characteristics of kernel modules highlight important differences
between the execution of kernel modules and the execution of user programs:
■ Kernel modules have separate address space. A module runs in kernel space.
An application runs in user space. System software is protected from user
programs. Kernel space and user space have their own memory address
spaces.
■ Kernel modules have higher execution privilege. Code that runs in kernel
space has greater privilege than code that runs in user space.
■ Kernel modules do not execute sequentially. A user program typically
executes sequentially and performs a single task from beginning to end. A
kernel module does not execute sequentially. A kernel module registers itself
in order to serve future requests.
■ Kernel modules can be interrupted. More than one process can request your
kernel module at the same time. For example, an interrupt handler can request
your kernel module at the same time that your kernel module is serving a
system call. In a symmetric multiprocessor (SMP) system, your kernel module
could be executing concurrently on more than one CPU.
■ Kernel modules must be preemptable. You cannot assume that your kernel
module code is safe just because your driver code does not block. Design your
driver assuming your module might be preempted.
■ Kernel modules can share data. Different threads of an application program
need not share data. By contrast, the data structures and routines that
constitute a driver are shared by all threads that use the driver. Your driver
must be able to handle contention issues that result from multiple requests.
Design your driver data structures carefully to keep multiple threads of
execution separate.

Module 6 • Programming Concepts 70


Process and System Management
Instructor Notes

Structural Differences Between Kernel Modules and User


Programs
The following characteristics of kernel modules highlight important differences
between the structure of kernel modules and the structure of user programs:
■ Kernel modules do not define a main program. Kernel modules, including
device drivers, have no main() routine. Instead, a kernel module is a collection
of subroutines and data.
■ Kernel modules are linked only to the kernel. Kernel modules do not link in
the same libraries that user programs link in. The only functions a kernel
module can call are functions that are exported by the kernel.
■ Kernel modules use different header files. Kernel modules require a different
set of header files than user programs require. The required header files are
listed in the man page for each function. Kernel modules can include header
files that are shared by user programs if the user and kernel interfaces within
such shared header files are defined conditionally using the _KERNEL macro.
■ Kernel modules should avoid global variables. Avoiding global variables in
kernel modules is even more important than avoiding global variables in user
programs. As much as possible, declare symbols as static. When you must
use global symbols, give them a prefix that is unique within the kernel. Using
this prefix for private symbols within the module also is a good practice.
■ Kernel modules can be customized for hardware. Kernel modules can
dedicate process registers to specific roles. Kernel code can be optimized for a
specific processor. You can also have customized libraries as well, something
which OpenSolaris has for some of the more recent x86/x64 and UltraSPARC
platforms. So, while the kernel can dedicate certain registers to certain roles,
otherwise customized code can be written for both kernel and user/libraries.
■ Kernel modules can be loaded and unloaded on demand. The collection of
subroutines and data that constitute a device driver can be compiled into a
single loadable module of object code. This loadable module can then be
statically or dynamically linked into the kernel and unlinked from the kernel.
You can add functionality to the kernel while the system is up and running.
You can test new versions of your driver without rebooting your system.

Module 6 • Programming Concepts 71


Process and System Management
Instructor Notes

Process Debugging ➊ Again, use OpenGrok to quickly find the file and view
its code comments, as excerpted here:
Debugging processes at all levels of the development stack is a key part of writing
kernel modules.
A full search for libthread in OpenGrok, reveals the following code comments
in the mdb_tdb.c file that describe the connection between multi-threaded
debugging and how mdb works:
➊ In order to properly debug multi-threaded programs, the proc target
must be able to query and modify information such as a thread’s
register set using either the native LWP services provided by
libproc (if the process is not linked with libthread), or using the
services provided by libthread_db (if the process is linked with
libthread). Additionally, a process may begin life as a
single-threaded process and then later dlopen() libthread, so we
must be prepared to switch modes on-the-fly. There are also two
possible libthread implementations (one in /usr/lib and one in
/usr/lib/lwp) so we cannot link mdb against libthread_db directly;
instead, we must dlopen the appropriate libthread_db on-the-fly
based on which libthread.so the victim process has open. Finally,
mdb is designed so that multiple targets can be active
simultaneously, so we could even have *both* libthread_db’s open at
the same time. This might happen if you were looking at two
multi-threaded user processes inside of a crash dump, one using
/usr/lib/libthread.so and the other using
/usr/lib/lwp/libthread.so. To meet these requirements, we implement
a libthread_db "cache" in this file. The proc target calls
mdb_tdb_load() with the pathname of a libthread_db to load, and if it
is not already open, we dlopen() it, look up the symbols we need to
reference, and fill in an ops vector which we return to the caller.
Once an object is loaded, we don’t bother unloading it unless the
entire cache is explicitly flushed. This mechanism also has the nice
property that we don’t bother loading libthread_db until we need it,
so the debugger starts up faster.
The following mdb commands can be used to access the LWPs of a multi-threaded
program:
■ $l Prints the LWP ID of the representative thread if the target is a user process.

Module 6 • Programming Concepts 72


Process and System Management
Instructor Notes
■ $L Prints the LWP IDs of each LWP in the target if the target is a user process.
■ pid::attach Attaches to process by using the pid, or process ID.
■ ::release Releases the previously attached process or core file. The process
can subsequently be continued by prun(1) or it can be resumed by applying
MDB or another debugger.
■ address::context Context switch to the specified process. These commands
to set conditional breakpoints are often useful.
■ [ addr ] ::bp [+/-dDestT] [-c cmd] [-n count] sym ... Set a
breakpoint at the specified locations.
■ addr ::delete [id | all] Delete the event specifiers with the given ID
number.

DTrace probes are constructed in a manner similar to MDB queries. We'll


continue the hands-on lab exercises with DTrace and then add MDB when the
debugging becomes more complex.

Module 6 • Programming Concepts 73


74
Instructor Notes

7
M O D U L E 7

Getting Started With DTrace


➊ The DTrace provider includes three probes, BEGIN,
END, and ERROR.

➋ BEGIN is the first probe to fire. All BEGIN clauses will


fire before any other probe fires. BEGIN is typically
used to initialize.

➌ END will fire after all other probes are completed and
can be used to output results.

➍ ERROR fires under an error condition and is used for


error handling.

Objectives
The objective of this lab is to introduce you to DTrace using a probe script for a
➊–➍ system call using DTrace.

Additional Resources
■ Solaris Dynamic Tracing Guide. Sun Microsystems, Inc., 2007.
■ DTrace User Guide, Sun Microsystems, Inc., 2006

75
Enabling Simple DTrace Probes
Instructor Notes

Enabling Simple DTrace Probes


Completion of the lab exercise will result in basic understanding of DTrace
probes.

Summary
We're going to start learning DTrace by building some very simple requests using
the probe named BEGIN, which fires once each time you start a new tracing
request. You can use the dtrace(1M) utility's -n option to enable a probe using its
string name.

Module 7 • Getting Started With DTrace 76


Enabling Simple DTrace Probes
Instructor Notes

To Enable a Simple DTrace Probe


1 Open a terminal window.

2 Enable the probe:


# dtrace -n BEGIN

After a brief pause, you will see dtrace tell you that one probe was enabled and
you will see a line of output indicating that the BEGIN probe fired. Once you see
this output, dtrace remains paused waiting for other probes to fire. Since you
haven't enabled any other probes and BEGIN only fires once, press Control-C in
your shell to exit dtrace and return to your shell prompt:

3 Return to your shell prompt by pressing Control-C:


# dtrace -n BEGIN
dtrace: description ’BEGIN’ matched 1 probe
CPU ID FUNCTION:NAME
0 1 :BEGIN
^C
#

The output tells you that the probe named BEGIN fired once and both its name
and integer ID, 1, are printed. Notice that by default, the integer name of the CPU
on which this probe fired is displayed. In this example, the CPU column indicates
that the dtrace command was executing on CPU 0 when the probe fired.

Module 7 • Getting Started With DTrace 77


Enabling Simple DTrace Probes
Instructor Notes
You can construct DTrace requests using arbitrary numbers of probes and
actions. Let's create a simple request using two probes by adding the END probe
to the previous example command. The END probe fires once when tracing is
completed.

4 Add the END probe:


# dtrace -n BEGIN -n END
dtrace: description ’BEGIN’ matched 1 probe
dtrace: description ’END’ matched 1 probe
CPU ID FUNCTION:NAME 0 1 :BEGIN
^C
0 2 :END
#

The END probe fires once when tracing is completed. As you can see, pressing
Control-C to exit DTrace triggers the END probe. DTrace reports this probe
firing before exiting.

Module 7 • Getting Started With DTrace 78


Listing Traceable Probes
Instructor Notes

Listing Traceable Probes


The objective of this lab is to explore probes in more detail and to show you how
to list the probes on a system.

Summary
In the preceding examples, you learned to use two simple probes named BEGIN
and END. But where did these probes come from? DTrace probes come from a set
of kernel modules called providers, each of which performs a particular kind of
instrumentation to create probes. For example, the syscall provider provides
probes in every system call and the fbt provider provides probes into every
function in the kernel.

When you use DTrace, each provider is given an opportunity to publish the
probes it can provide to the DTrace framework. You can then enable and bind
your tracing actions to any of the probes that have been published.

Module 7 • Getting Started With DTrace 79


Listing Traceable Probes
Instructor Notes

To List Traceable Probes


1 Open a terminal window.

2 Type the following command:


# dtrace

The dtrace command options are printed to the output.

3 Type the dtrace command with the -l option:


# dtrace -l | more
ID PROVIDER MODULE FUNCTION NAME
1 dtrace BEGIN
2 dtrace END
3 dtrace ERROR
4 lockstat genunix mutex_enter adaptive-acquire
5 lockstat genunix mutex_enter adaptive-block
6 lockstat genunix mutex_enter adaptive-spin
7 lockstat genunix mutex_exit adaptive-release
--More--

The probes that are available on your system are listed with the following five
pieces of data:
■ ID - Internal ID of the probe listed.
■ Provider - Name of the Provider. Providers are used to classify the probes.
This is also the method of instrumentation.
■ Module - The name of the Unix module or application library of the probe.
■ Function - The name of the function in which the probe exists.
■ Name - The name of the probe.

4 Pipe the previous command to wc to find the total number of probes in your
system:
# dtrace -l | wc -l
30122

The number of probes that your system is currently aware of is listed in the
output. The number will vary depending on your system type.

5 Add one of the following options to filter the list:

Module 7 • Getting Started With DTrace 80


Listing Traceable Probes
Instructor Notes
■ -P for provider
■ -m for module
■ -f for function
■ -n for name
Consider the following examples:

# dtrace -l -P lockstat
ID PROVIDER MODULE FUNCTION NAME
4 lockstat genunix mutex_enter adaptive-acquire
5 lockstat genunix mutex_enter adaptive-block
6 lockstat genunix mutex_enter adaptive-spin
7 lockstat genunix mutex_exit adaptive-release

Only the probes that are available in the lockstat provider are listed in the
output.

# dtrace -l -m ufs
ID PROVIDER MODULE FUNCTION NAME
15 sysinfo ufs ufs_idle_free ufsinopage
16 sysinfo ufs ufs_iget_internal ufsiget
356 fbt ufs allocg entry

Only the probes that are in the UFS module are listed in the output.

# dtrace -l -f open
ID PROVIDER MODULE FUNCTION NAME
4 syscall open entry
5 syscall open return
116 fbt genunix open entry
117 fbt genunix open return

Only the probes with the function name open are listed.

# dtrace -l -n start
ID PROVIDER MODULE FUNCTION NAME
506 proc unix lwp_rtt_initial start
2766 io genunix default_physio start
2768 io genunix aphysio start
5909 io nfs nfs4_bio start

The above command lists all the probes that have the probe name start.

Module 7 • Getting Started With DTrace 81


Programming in D
Instructor Notes

Programming in D
Now that you understand a little bit about naming, enabling, and listing probes,
you're ready to write the DTrace version of everyone's first program, "Hello,
World."

Summary
This lab demonstrates that, in addition to constructing DTrace experiments on
the command line, you can also write them in text files using the D programming
language.

Module 7 • Getting Started With DTrace 82


Programming in D
Instructor Notes

To Write a DTrace Program


1 Open a terminal window.

2 In a text editor, create a new file called hello.d.

3 Type in your first D program:


BEGIN
{
trace("hello, world");
exit(0);
}

4 Save the hello.d file.

5 Run the program by using the dtrace -s option:


# dtrace -s hello.d
dtrace: script ’hello.d’ matched 1 probe
CPU ID FUNCTION:NAME
0 1 :BEGIN hello, world
#

As you can see, dtrace printed the same output as before followed by the text
“hello, world”. Unlike the previous example, you did not have to wait and press
Control-C, either. These changes were the result of the actions you specified for
your BEGIN probe in hello.d. Let's explore the structure of your D program in
more detail in order to understand what happened.

Module 7 • Getting Started With DTrace 83


Programming in D
Instructor Notes

Discussion
Each D program consists of a series of clauses, each clause describing one or more
probes to enable, and an optional set of actions to perform when the probe fires.
The actions are listed as a series of statements enclosed in braces { } following the
probe name. Each statement ends with a semicolon (;).

Your first statement uses the function trace() to indicate that DTrace should
record the specified argument, the string “hello, world”, when the BEGIN probe
fires, and then print it out. The second statement uses the function exit() to
indicate that DTrace should cease tracing and exit the dtrace command.

DTrace provides a set of useful functions like trace() and exit() for you to call
in your D programs. To call a function, you specify its name followed by a
parenthesized list of arguments. The complete set of D functions is described in
Solaris Dynamic Tracing Guide.

By now, if you're familiar with the C programming language, you've probably


realized from the name and our examples that DTrace's D programming
language is very similar to C and awk(1). Indeed, D is derived from a large subset
of C combined with a special set of functions and variables to help make tracing
easy.

If you've written a C program before, you will be able to immediately transfer


most of your knowledge to building tracing programs in D. If you've never
written a C program before, learning D is still very easy. But first, let's take a step
back from language rules and learn more about how DTrace works, and then
we'll return to learning how to build more interesting D programs.

Module 7 • Getting Started With DTrace 84


Instructor Notes

8
M O D U L E 8

Debugging Applications With DTrace

Objectives
The objective of this module is to use DTrace to monitor application events.

Additional Resources
Application Packaging Developer’s Guide. Sun Microsystems, Inc., 2005.

85
Enabling User Mode Probes
Instructor Notes

Enabling User Mode Probes ➊ The pid provider is extremely flexible and allows you
to instrument any instruction in user land including
DTrace allows you to dynamically add probes into user level functions. The user entry and exit.
code does not need any recompilation, special flags, or even a restart. DTrace
probes can be turned on just by calling the provider. ➋ The pid provider creates probes on the fly when they
are needed. This is why they do not appear in the
➊–➌ A probe description has the following syntax: dtrace -l listing.
pid:mod:function:name
➌ You can use the pid provider to trace Function
Boundaries or any arbitrary instruction in a given
■ pid: format pid processid (for example pid5234) function.
■ mod: name of the library or a.out (executable)
■ function: name of the function
■ name: entry for function entry return for function return

Module 8 • Debugging Applications With DTrace 86


DTracing Applications
Instructor Notes

DTracing Applications
In this exercise we will learn to use DTrace on user applications.

Summary
This lab builds on the use of a process ID in the probe description to trace the
associated application. The steps increase in complexity to the end of the exercise,
increasing the amount and depth of information about the application behavior
that is output.

Module 8 • Debugging Applications With DTrace 87


DTracing Applications
Instructor Notes

To DTrace gcalctool
1 From the Application or Program menu, start the calculator.

2 Find the process ID of the process you just started


# pgrep gcalctool
8198

This number is the process ID of the calc process, we will call it procid.

3 Follow the steps below to create a D-script that counts the number of times any
function in the gcalctool is called.

a. In a text editor, create a new file called proc_func.d.

b. Use pid$1:::entry as the probe-description.


$1 is the first argument that you will send to your script, leave the predicate
part empty.

c. In the action section, add an aggregate to count the number of times the
function is called using the aggregate statement @[probefunc]=count().
pid$1:::entry
{
@[probefunc]=count();
}

d. Run the script that you just wrote.


# dtrace -qs proc_func.d procid

Replace procid with the process ID of your gcalctool

e. Perform a calculation on the calculator.

f. Press Control+C in the window where you ran the D-script.

Note – The DTrace script collects data and waits for you to stop the collection by
pressing Control+C. If you do not need to print the aggregation you collected,
DTrace will print it for you.

Module 8 • Debugging Applications With DTrace 88


DTracing Applications
Instructor Notes
4 Now, modify the script to only count functions from the libc library.

a. Copy the proc_func.d to proc_libc.d.

b. Modify the probe description in the proc_libc.d file to the following:


pid$1:libc::entry

c. Your new script should look like the following:


pid$1:libc::entry
{ @[probefunc]=count();
}

5 Now run the script.


# dtrace -qs proc_libc.d procid

Replace procid with the process ID of your gcalctool

a. Perform a calculation on the calculator.

b. Press Control+C in the window where you ran the D-script to see the output.

6 Finally, modify the script to find how much time is spent in each function.

a. Create a file and name it func_time.d.


We will use two probe descriptions in func_time.d.

b. Write the first probe as follows:


pid$1:::entry

c. Write the second probe as follows:


pid$1:::return

d. In the action section of the first probe, save timestamp in variable ts.
Timestamp is a DTrace built-in that counts the number of nanoseconds from
a point in the past.

e. In the action section of the second probe calculate nanoseconds that have
passed using the following aggregation:
@[probefunc]=sum(timestamp - ts)

Module 8 • Debugging Applications With DTrace 89


DTracing Applications
Instructor Notes
f. The new func_time.d script should match the following:
pid$1:::entry
{ ts = timestamp;
}
pid$1:::return /ts/
{ @[probefunc]=sum(timestamp - ts);
}

7 Run the new func_time.d script:


# dtrace -qs func_time.d procid

Replace procid with the process ID of your gcalctool

a. Perform a calculation on the calculator.

b. Press Control+C in the window where you ran the D-script to see the output.
^C
gdk_xid__equal 2468
_XSetLastRequestRead 2998
_XDeq 3092

...

The left column shows you the name of the function and the right column shows
you the amount of wall clock time that was spent in that function. The time is in
nanoseconds.

Module 8 • Debugging Applications With DTrace 90


Instructor Notes

9
M O D U L E 9

Debugging C++ Applications With DTrace


➊ The DTrace command compiles the D language Script.
DTrace instructs the provider to enable the probes.

➋ The intermediate code is checked for safety (like Java).

➌ The compiled code is executed in the kernel by DTrace.

➍ As soon as the D program exits all instrumentation is


removed.

Objectives
The examples in this module demonstrate the use of DTrace to diagnose C++
application errors. These examples are also used to compare DTrace with other
➊–➍ application debugging tools, including Sun Studio 10 software and mdb.

91
Using DTrace to Profile and Debug A C++ Program
Instructor Notes

Using DTrace to Profile and Debug A C++ Program ➊ There is no limit (except system resources) on the
number of D scripts that can be run simultaneously.
A sample program CCtest was created to demonstrate an error common to C++
applications -- the memory leak. In many cases, a memory leak occurs when an ➋ Different users can debug the system simultaneously
object is created, but never destroyed, and such is the case with the program without causing data corruption or collision issues.
contained in this module.

➊–➋ When debugging a C++ program, you may notice that your compiler converts
some C++ names into mangled, semi-intelligible strings of characters and digits.
This name mangling is an implementation detail required for support of C++
function overloading, to provide valid external names for C++ function names
that include special characters, and to distinguish instances of the same name
declared in different namespaces and classes.

For example, using nm to extract the symbol table from a sample program named
CCtest produces the following output:

# /usr/ccs/bin/nm CCtest
...
[61] | 134549248| 53|FUNC |GLOB |0 |9 |__1cJTestClass2T5B6M_v_
[85] | 134549301| 47|FUNC |GLOB |0 |9 |__1cJTestClass2T6M_v_
[76] | 134549136| 37|FUNC |GLOB |0 |9 |__1cJTestClass2t5B6M_v_
[62] | 134549173| 71|FUNC |GLOB |0 |9 |__1cJTestClass2t5B6Mpc_v_
[64] | 134549136| 37|FUNC |GLOB |0 |9 |__1cJTestClass2t6M_v_
[89] | 134549173| 71|FUNC |GLOB |0 |9 |__1cJTestClass2t6Mpc_v_
[80] | 134616000| 16|OBJT |GLOB |0 |18 |__1cJTestClassG__vtbl_
[91] | 134549348| 16|FUNC |GLOB |0 |9 |__1cJTestClassJClassName6kM_pc_
...

Note – Source code and makefile for CCtest are included at the end of this module.

From this output, you may correctly assume that a number of these mangled
symbols are associated with a class named TestClass, but you cannot readily
determine whether these symbols are associated with constructors, destructors,
or class functions.

The Sun Studio compiler includes the following three utilities that can be used to
translate the mangled symbols to their C++ counterparts: nm -C, dem, and
c++filt.

Module 9 • Debugging C++ Applications With DTrace 92


Using DTrace to Profile and Debug A C++ Program
Instructor Notes

Note – Sun Studio 10 software is used here, but the examples were tested with both
Sun Studio 9 and 10.

If your C++ application was compiled with gcc/g++, you have an additional
choice for demangling your application -- in addition to c++filt, which
recognizes both Sun Studio and GNU mangled names, the open source gc++filt
found in /usr/sfw/bin can be used to demangle the symbols contained in your
g++ application.

Examples: Sun Studio symbols without c++filt:

# nm CCtest | grep TestClass


[65] | 134549280| 37|FUNC |GLOB |0 |9 |__1cJTestClass2t6M_v_
[56] | 134549352| 54|FUNC |GLOB |0 |9 |__1cJTestClass2t6Mi_v_
[92] | 134549317| 35|FUNC |GLOB |0 |9 |__1cJTestClass2t6Mpc_v_
...

Sun Studio symbols with c++filt:

# nm CCtest | grep TestClass | c++filt


[65] | 134549280| 37|FUNC |GLOB |0 |9 |TestClass::TestClass()
[56] | 134549352| 54|FUNC |GLOB |0 |9 |TestClass::TestClass(int)
[92] | 134549317| 35|FUNC |GLOB |0 |9 |TestClass::TestClass(char*)
...

g++ symbols without gc++filt:

[86] | 134550070| 41|FUNC |GLOB |0 |12 |_ZN9TestClassC1EPc


[110] | 134550180| 68|FUNC |GLOB |0 |12 |_ZN9TestClassC1Ei
[114] | 134549984| 43|FUNC |GLOB |0 |12 |_ZN9TestClassC1Ev
...

g++ symbols with gc++filt:

# nm gCCtest | grep TestClass | gc++filt


[86] | 134550070| 41|FUNC |GLOB |0 |12 |TestClass::TestClass(char*)
[110] | 134550180| 68|FUNC |GLOB |0 |12 |TestClass::TestClass(int)
[114] | 134549984| 43|FUNC |GLOB |0 |12 |TestClass::TestClass()
...

And finally, displaying symbols with nm -C:

Module 9 • Debugging C++ Applications With DTrace 93


Using DTrace to Profile and Debug A C++ Program
Instructor Notes
[64] | 134549344| 71|FUNC |GLOB |0 |9 |TestClass::TestClass()
[__1cJTestClass2t6M_v_]
[87] | 134549424| 70|FUNC |GLOB |0 |9 |TestClass::TestClass(const char*)
[__1cJTestClass2t6Mpkc_v_]
[57] | 134549504| 95|FUNC |GLOB |0 |9 |TestClass::TestClass(int)
[__1cJTestClass2t6Mi_v_]

Let's use this information to create a DTrace script to perform an aggregation on


the object calls associated with our test program. We can use the DTrace pid
provider to enable probes associated with our mangled C++ symbols.
To test our constructor/destructor theory, let's start by counting the following:
■ The number of objects created -- calls to new()
■ The number of objects destroyed -- calls to delete()
Use the following script to extract the symbols corresponding to the new() and
delete() functions from the CCtest program:

# dem ‘nm CCtest | awk -F\| ’{ print $NF; }’‘ | egrep "new|delete"
__1c2k6Fpv_v_ == void operator delete(void*)
__1c2n6FI_pv_ == void*operator new(unsigned)

The corresponding DTrace script is used to enable probes on new() and


delete() (saved as CCagg.d):

#!/usr/sbin/dtrace -s

pid$1::__1c2n6FI_pv_:
{
@n[probefunc] = count();
}
pid$1::__1c2k6Fpv_v_:
{
@d[probefunc] = count();
}

END
{
printa(@n);
printa(@d);
}

Start the CCtest program in one window, then execute the script we just created
in another window as follows:

Module 9 • Debugging C++ Applications With DTrace 94


Using DTrace to Profile and Debug A C++ Program
Instructor Notes
# dtrace -s ./CCagg.d ‘pgrep CCtest‘ | c++filt

The DTrace output is piped through c++filt to demangle the C++ symbols, with
the following caution.

Caution – You can't exit the DTrace script with a ^C as you would do normally
because c++filt will be killed along with DTrace and you're left with no output.
To display the output of this command, go to another window on your system
and type:

# pkill dtrace

Use this sequence of steps for the rest of the exercises:

Window 1:

# ./CCtest

Window 2:

# dtrace -s scriptname | c++filt

Window 3:

# pkill dtrace

The output of our aggregation script in window 2 should look like this:

void*operator new(unsigned) 12
void operator delete(void*) 8

So, we may be on the right track with the theory that we are creating more objects
than we are deleting.

Let's check the memory addresses of our objects and attempt to match the
instances of new() and delete(). The DTrace argument variables are used to
display the addresses associated with our objects. Since a pointer to the object is
contained in the return value of new(), we should see the same pointer value as
arg0 in the call to delete(). With a slight modification to our initial script, we
now have the following script, named CCaddr.d:

Module 9 • Debugging C++ Applications With DTrace 95


Using DTrace to Profile and Debug A C++ Program
Instructor Notes
#!/usr/sbin/dtrace -s

#pragma D option quiet


/*
__1c2k6Fpv_v_ == void operator delete(void*)
__1c2n6FI_pv_ == void*operator new(unsigned)
*/

/* return from new() */


pid$1::__1c2n6FI_pv_:return
{
printf("%s: %x\n", probefunc, arg1);
}

/* call to delete() */ pid$1::__1c2k6Fpv_v_:entry


{
printf("%s: %x\n", probefunc, arg0);
}

Execute this script:

# dtrace -s ./CCaddr.d ‘pgrep CCtest‘ | c++filt

Wait for a bit, then type this in window 3:

# pkill dtrace

Our output looks like a repeating pattern of three calls to new() and two calls to
delete():

void*operator new(unsigned): 809e480


void*operator new(unsigned): 8068a70
void*operator new(unsigned): 809e4a0
void operator delete(void*): 8068a70
void operator delete(void*): 809e4a0

As you inspect the repeating output, a pattern emerges. It seems that the first
new() of the repeating pattern does not have a corresponding call to delete(). At
this point we have identified the source of the memory leak!

Let's continue with DTrace and see what else we can learn from this information.
We still do not know what type of class is associated with the object created at
address 809e480. Including a call to ustack() on entry to new() provides a hint.
Here's the modification to our previous script, renamed CCstack.d:

Module 9 • Debugging C++ Applications With DTrace 96


Using DTrace to Profile and Debug A C++ Program
Instructor Notes
#!/usr/sbin/dtrace -s

#pragma D option quiet

/*
__1c2k6Fpv_v_ == void operator delete(void*)
__1c2n6FI_pv_ == void*operator new(unsigned)
*/

pid$1::__1c2n6FI_pv_:entry
{
ustack();
}
pid$1::__1c2n6FI_pv_:return
{
printf("%s: %x\n", probefunc, arg1);
}
pid$1::__1c2k6Fpv_v_:entry
{
printf("%s: %x\n", probefunc, arg0);
}

Execute CCstack.d in Window 2, then type pkill dtrace in Window 3 to print


the following output:

# dtrace -s ./CCstack.d ‘pgrep CCtest‘ | c++filt

libCrun.so.1‘void*operator new(unsigned)
CCtest‘main+0x19
CCtest‘0x8050cda
void*operator new(unsigned): 80a2bd0

libCrun.so.1‘void*operator new(unsigned)
CCtest‘main+0x57
CCtest‘0x8050cda
void*operator new(unsigned): 8068a70

libCrun.so.1‘void*operator new(unsigned)
CCtest‘main+0x9a
CCtest‘0x8050cda
void*operator new(unsigned): 80a2bf0
void operator delete(void*): 8068a70
void operator delete(void*): 80a2bf0

Module 9 • Debugging C++ Applications With DTrace 97


Using DTrace to Profile and Debug A C++ Program
Instructor Notes
The ustack() data tells us that new() is called from main+0x19, main+0x57, and
main+0x9a -- we're interested in the object associated with the first call to new(),
at main+0x19.

To determine the type of constructor called at main+0x19, we can use mdb as


follows:

# gcore ‘pgrep CCtest‘


gcore: core.1478 dumped
# mdb core.1478
Loading modules: [ libc.so.1 ld.so.1 ]
> main::dis
main: pushl %ebp
main+1: movl %esp,%ebp
main+3: subl $0x38,%esp
main+6: movl %esp,-0x2c(%ebp)
main+9: movl %ebx,-0x30(%ebp)
main+0xc: movl %esi,-0x34(%ebp)
main+0xf: movl %edi,-0x38(%ebp)
main+0x12: pushl $0x8
main+0x14: call -0x2e4 <PLT=libCrun.so.1‘__1c2n6FI_pv_>
main+0x19: addl $0x4,%esp
main+0x1c: movl %eax,-0x10(%ebp)
main+0x1f: movl -0x10(%ebp),%eax
main+0x22: pushl %eax
main+0x23: call +0x1d5 <__1cJTestClass2t5B6M_v_>
...

Our constructor is called after the call to new, at offset main+0x23. So, we have
identified a call to the constructor __1cJTestClass2t5B6M_v_ that is never
destroyed. Using dem to demangle this symbol produces:

# dem __1cJTestClass2t5B6M_v_
__1cJTestClass2t5B6M_v_ == TestClass::TestClass #Nvariant 1()

Thus, a call to new TestClass() at main+0x19 is the cause of the memory leak.
Examining the CCtest.cc source file reveals:

...
t = new TestClass();
cout << t->ClassName();

t = new TestClass((const char *)"Hello.");


cout << t->ClassName();

Module 9 • Debugging C++ Applications With DTrace 98


Using DTrace to Profile and Debug A C++ Program
Instructor Notes

tt = new TestClass((const char *)"Goodbye.");


cout << tt->ClassName();

delete(t);
delete(tt);
...

It's clear that the first use of the variable t = new TestClass(); is overwritten by
the second use: t = new TestClass((const char *)"Hello.");. The memory
leak has been identified and a fix can be implemented.
The DTrace pid provider allows you to enable a probe at any instruction
associated with a process that is being examined. This example is intended to
model the DTrace approach to interactive process debugging. DTrace features
used in this example include: aggregations, displaying function arguments and
return values, and viewing the user call stack. The dem and c++filt commands in
Sun Studio software and the gc++filt in gcc were used to extract the function
probes from the program symbol table and display the DTrace output in a
source-compatible format. Source files created for this example:

EXAMPLE 9–1 TestClass.h

class TestClass
{
public:
TestClass();
TestClass(const char *name);
TestClass(int i);
virtual ~TestClass();
virtual char *ClassName() const;
private:
char *str;
};

TestClass.cc:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include "TestClass.h"

TestClass::TestClass() {

Module 9 • Debugging C++ Applications With DTrace 99


Using DTrace to Profile and Debug A C++ Program
Instructor Notes
EXAMPLE 9–1 TestClass.h (Continued)

str=strdup("empty.");
}

TestClass::TestClass(const char *name) {


str=strdup(name);
}

TestClass::TestClass(int i) {
str=(char *)malloc(128);
sprintf(str, "Integer = %d", i);
}

TestClass::~TestClass() {
if ( str )
free(str);
}

char *TestClass::ClassName() const {


return str;
}

EXAMPLE 9–2 CCtest.cc

#include <iostream.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include "TestClass.h"

int main(int argc, char **argv)


{
TestClass *t;
TestClass *tt;

while (1) {
t = new TestClass();
cout << t->ClassName();

t = new TestClass((const char *)"Hello.");


cout << t->ClassName();

tt = new TestClass((const char *)"Goodbye.");


cout << tt->ClassName();

Module 9 • Debugging C++ Applications With DTrace 100


Using DTrace to Profile and Debug A C++ Program
Instructor Notes
EXAMPLE 9–2 CCtest.cc (Continued)

delete(t);
delete(tt);
sleep(1);
}
}

EXAMPLE 9–3 Makefile

OBJS=CCtest.o TestClass.o
PROGS=CCtest

CC=CC

all: $(PROGS)
echo "Done."

clean:
rm $(OBJS) $(PROGS)

CCtest: $(OBJS)
$(CC) -o CCtest $(OBJS)

.cc.o:
$(CC) $(CFLAGS) -c $<

Module 9 • Debugging C++ Applications With DTrace 101


102
Instructor Notes

10
M O D U L E 1 0

Managing Memory with DTrace and MDB

Objectives
This module will build on what we've learned about using DTrace to observe
processes by examining a page fault. Then, we'll incorporate low-level debugging
with MDB to find the problem in the code.

Additional Resources
Solaris Modular Debugger Guide Sun Microsystems, Inc., 2007.

103
Software Memory Management
Instructor Notes

Software Memory Management ➊ The particular fault shown in this module is a major
page fault, that is, it results in I/O on the disk.
OpenSolaris memory management uses software constructs called segments to
manage virtual memory of processes as well as the kernel itself. Most of the data ➋ By contrast, a minor page fault does not result in I/O.
structures involved in the software side of memory management are defined in
/usr/include/vm/*.h. In this module, we'll examine the code and data ➌ For example, paging in a page of code for an
➊–➍ structures used to handle page faults. executable is a major fault.

➍ Faulting a new heap page is a minor fault. Heap pages


can simply be allocated and zeroed out (no need to
access the disk).

Module 10 • Managing Memory with DTrace and MDB 104


Using DTrace and MDB to Examine Virtual Memory
Instructor Notes

Using DTrace and MDB to Examine Virtual Memory


The objective of this lab is to examine a page fault using DTrace and MDB.

Summary
We'll start with a DTrace script to trace the actions of a single page fault for a
given process. The script prints the user virtual address that caused the fault, and
then traces every function that is called from the time of the fault until the page
fault handler returns. We'll use the output of the script to determine what source
code needs to be examined for more detail.

Note – In this module, we've added text to the extensive code output to guide the
exercise. Look for the <----symbol to find associated text in the output.

Module 10 • Managing Memory with DTrace and MDB 105


Using DTrace and MDB to Examine Virtual Memory
Instructor Notes

DTracing a Page Fault for a Single Process


1 Open a terminal window.

2 Create a file called pagefault.d with the following script:


#!/usr/sbin/dtrace -s

#pragma D option flowindent

pagefault:entry
/execname == $$1/
{
printf("fault occurred on address = %p\n", args[0]);
self->in = 1;
}

pagefault:return
/self->in == 1/
{
self->in = 0;
exit(0);
}

entry
/self->in == 1/
{
}

return
/self->in == 1/
{
}

3 Run the script on Mozilla.

Note – You need to specify mozilla-bin as the executable name, as mozilla is not
an exact match with the name. Also, assertions are turned on, so you'll see various
calls to mutex_owner(), for instance, which is only used with ASSERT().
Assertions are turned on only for debug kernels.

# ./pagefault.d mozilla-bin
dtrace: script ’./pagefault.d’ matched 42626 probes

Module 10 • Managing Memory with DTrace and MDB 106


Using DTrace and MDB to Examine Virtual Memory
Instructor Notes
CPU FUNCTION
0 -> pagefault fault occurred on address = fb985ea2

0 | pagefault:entry <-- i86pc/vm/vm_machdep.c or sun4/vm/vm_dep.c


0 -> as_fault <-- generic address space fault common/vm/vm_as.c
0 -> as_segat
0 -> avl_find <-- segments are in AVL tree
0 -> as_segcompar <-- search segments for segment
0 <- as_segcompar <-- containing fault address
0 -> as_segcompar <-- common/vm/vm_as.c
0 <- as_segcompar
0 -> as_segcompar
0 <- as_segcompar
0 -> as_segcompar
0 <- as_segcompar
0 -> as_segcompar
0 <- as_segcompar
0 -> as_segcompar
0 <- as_segcompar
0 -> as_segcompar
0 <- as_segcompar
0 -> as_segcompar
0 <- as_segcompar
0 <- avl_find
0 <- as_segat
0 -> segvn_fault<-- segment containing fault is found, (not SEGV)
<-- common/vm/seg_vn.c
0 -> hat_probe <-- look for page table entry for page
<-- i86pc/vm/hat_i86.c or sfmmu/vm/hat_sfmmu.c
0 -> htable_getpage <-- page tables are hashed on x86
0 -> htable_getpte <-- i86pc/vm/htable.c
0 -> htable_lookup
0 <- htable_lookup
0 -> htable_va2entry
0 <- htable_va2entry
0 -> x86pte_get <-- return a page table entry
0 -> x86pte_access_pagetable
0 -> hat_kpm_pfn2va
0 <- hat_kpm_pfn2va
0 <- x86pte_access_pagetable
0 -> x86pte_release_pagetable
0 <- x86pte_release_pagetable
0 <- x86pte_get
0 <- htable_getpte
0 <- htable_getpage
0 -> htable_release

Module 10 • Managing Memory with DTrace and MDB 107


Using DTrace and MDB to Examine Virtual Memory
Instructor Notes
0 <- htable_release
0 <- hat_probe
0 -> fop_getpage <-- file operation to retrieve page(s)
0 -> ufs_getpage<--file in ufs fs(common/fs/ufs/ufs_vnops.c)
0 -> bmap_has_holes <-- check for sparse file
0 <- bmap_has_holes
0 -> page_lookup <-- check for page already in memory
0 -> page_lookup_create <-- common/vm/vm_page.c
0 <- page_lookup_create <-- create page if needed
0 <- page_lookup
0 -> ufs_getpage_miss <-- page wasn’t in memory
0 -> bmap_read <-- get block number of page from inode
0 -> bread_common
0 -> getblk_common
0 <- getblk_common
0 <- bread_common
0 <- bmap_read
0 -> pvn_read_kluster <-- read pages (common/vm/vm_pvn.c)
0 -> page_create_va <-- create some pages
0 <- page_create_va
0 -> segvn_kluster
0 <- segvn_kluster
0 <- pvn_read_kluster
0 -> pageio_setup <-- setup page(s) for io common/os/bio.c
0 <- pageio_setup
0 -> lufs_read_strategy <-- logged ufs read
0 -> bdev_strategy <-- read device common/os/driver.c
0 -> cmdkstrategy <-- common disk driver (cmdk(7D))
<-- common/io/dktp/disk/cmdk.c
0 -> dadk_strategy <-- direct attached disk (dad(7D))
<-- for ide disks(common/io/dktp/dcdev/dadk.c)
<-- driver sets up dma and starts page in
0 <- dadk_strategy
0 <- cmdkstrategy
0 <- bdev_strategy
0 -> biowait <-- wait for pagein complete common/os/bio.c
0 -> sema_p <-- wakeup sema_v from completion interrupt
0 -> swtch <-- let someone else run(common/disp/disp.c)
0 -> disp <-- dispatch to next thread to run
0 <- disp
0 -> resume <-- actual switching occurs here
<-- intel/ia32/ml/swtch.s
0 -> savectx <-- save old context
0 <- savectx
<-- someone else is running here...
0 -> restorectx <-- restore context (we’re awakened)

Module 10 • Managing Memory with DTrace and MDB 108


Using DTrace and MDB to Examine Virtual Memory
Instructor Notes
0 <- restorectx
0 <- resume
0 <- swtch
0 <- sema_p
0 <- biowait
0 -> pageio_done <-- undo pageio_setup
0 <- pageio_done
0 -> pvn_plist_init
0 <- pvn_plist_init
0 <- ufs_getpage_miss <-- page is in memory
0 <- ufs_getpage
0 <- fop_getpage
0 -> segvn_faultpage <-- call hat to load pte(s) for page(s)
0 -> hat_memload
0 -> page_pptonum <-- get page frame number
0 <- page_pptonum
0 -> hati_mkpte <-- build page table entry
0 <- hati_mkpte
0 -> hati_pte_map <-- locate entry in page table
0 -> x86_hm_enter
0 <- x86_hm_enter
0 -> hment_prepare
0 <- hment_prepare
0 -> x86pte_set <-- fill in pte into page table
0 -> x86pte_access_pagetable
0 -> hat_kpm_pfn2va
0 <- hat_kpm_pfn2va
0 <- x86pte_access_pagetable
0 -> x86pte_release_pagetable
0 <- x86pte_release_pagetable
0 <- x86pte_set
0 -> hment_assign
0 <- hment_assign
0 -> x86_hm_exit
0 <- x86_hm_exit
0 <- hati_pte_map
0 <- hat_memload
0 <- segvn_faultpage
0 <- segvn_fault
0 <- as_fault
0 <- pagefault

Remember that the above output has been shortened. At a high level, the
following has happened on the page fault:

Module 10 • Managing Memory with DTrace and MDB 109


Using DTrace and MDB to Examine Virtual Memory
Instructor Notes
■ The pagefault() routine is called to handle page faults.
■ The pagefault() routine calls as_fault() to handle faults on a given address
space.
■ as_fault() walks an AVL tree of seg structures looking for a segment
containing the faulting address. If no such segment is found, the process is
sent a SIGSEGV (segmentation violation) signal.
■ If the segment is found, a segment specific fault handler is called. For most
segments, this is segvn_fault()
■ segvn_fault() looks for the faulting page already in memory. If the page
already exists (but has been freed), it is "reclaimed" off the free list. If the page
does not already exist, we need to page it in. Here, the page is not already in
memory, so we call ufs_getpage().
■ ufs_getpage() finds the block number(s) of the page(s) within the file system
by calling bmap_read().
■ Then we call a device driver strategy routine, see strategy(9E) for an
overview of what the strategy routine is supposed to do.
■ While the page is being read, the thread causing the page fault blocks (i.e.,
switches out) via a call to swtch(). At this point, other threads will run.
■ When the paging I/O has completed, the disk driver interrupt handler wakes
up the blocked mozilla-bin thread.
■ The disk driver returns through the file system code out to segvn_fault().
■ segvn_fault() then calls segvn_faultpage().
■ segvn_faultpage() calls the HAT (Hardware Address Translation) layer to
load the page table entry(s) (PTE)s for the page.
■ At this point, the virtual address that caused the page fault should now be
mapped to a valid physical page. When pagefault() returns, the instruction
causing the page fault will be retried and should now complete successfully.

4 Use mdb to examine the kernel data structures and locate the page of physical
memory that corresponds to the fault as follows:

a. Open a terminal window.

Module 10 • Managing Memory with DTrace and MDB 110


Using DTrace and MDB to Examine Virtual Memory
Instructor Notes
b. Find the number of segments used by mozilla by using pmap as follows:
# pmap -x ‘pgrep mozilla-bin‘ | wc
368 2730 23105
#
The output shows that there are 368 segments.

Note – The search for the segment containing the fault address found the
correct segment after 8 segments. See calls to as_segcompar in the DTrace
output above. Using an AVL tree shortens the search!

c. Use mdb to locate the segment containing the fault address.

Note – If you want to follow along, you may want to use: ::log /tmp/logfile
in mdb and then !vi /tmp/logfile to search. Or, you can just run mdb within
an editor buffer.

# mdb -k
Loading modules: [ unix krtld genunix specfs dtrace
ufs ip sctp usba random fctl s1394
nca lofs crypto nfs audiosup sppp cpc fcip ptm ipc ]
> ::ps !grep mozilla-bin <-- find the mozilla-bin process
R 933 919 887 885 100 0x42014000 ffffffff81d6a040 mozilla-bin

> ffffffff81d6a040::print proc_t p_as | ::walk seg | ::print struct seg


<-- Lots of output has been omitted... -->
{
s_base = 0xfb800000 <-- the seg we want, fault addr (fb985ea2)
s_size = 0x561000 <-- greater/equal to base and < base+size
s_szc = 0
s_flags = 0
s_as = 0xffffffff828b61d0
s_tree = {
avl_child = [ 0xffffffff82fa7920, 0xffffffff82fa7c80 ]
avl_pcb = 0xffffffff82fa796d
}
s_ops = segvn_ops
s_data = 0xffffffff82d85070
}
<-- and lots more output omitted -->

> ffffffff82d85070::print segvn_data_t <-- from s_data

Module 10 • Managing Memory with DTrace and MDB 111


Using DTrace and MDB to Examine Virtual Memory
Instructor Notes
{
lock = {
_opaque = [ 0 ]
}
segp_slock = {
_opaque = [ 0 ]
}
pageprot = 0x1
prot = 0xd
maxprot = 0xf
type = 0x2
offset = 0
vp = 0xffffffff82f9e480 <-- points to a vnode_t
anon_index = 0
amp = 0 <-- we’ll look at anonymous space later
vpage = 0xffffffff82552000
cred = 0xffffffff81f95018
swresv = 0
advice = 0
pageadvice = 0x1
flags = 0x490
softlockcnt = 0
policy_info = {
mem_policy = 0x1
mem_reserved = 0
}
}

> ffffffff82f9e480::print vnode_t v_path


v_path = 0xffffffff82f71090
"/usr/sfw/lib/mozilla/components/libgklayout.so"

> fb985ea2-fb800000=K <-- offset within segment


185ea2 <-- rounding down gives 185000 (4kpage size)

> ffffffff82f9e480::walk page !wc <-- walk list of pages on vnode_t


1236 1236 21012 <-- 1236 pages,(not all are necessarily valid)

> ffffffff82f9e480::walk page | ::print page_t<-- walk pg list on vnode


<-- lots of pages omitted in output -->
{
p_offset = 0x185000 <-- here is matching page
p_vnode = 0xffffffff82f9e480
p_selock = 0
p_selockpad = 0
p_hash = 0xfffffffffae21c00

Module 10 • Managing Memory with DTrace and MDB 112


Using DTrace and MDB to Examine Virtual Memory
Instructor Notes
p_vpnext = 0xfffffffffaca9760
p_vpprev = 0xfffffffffb3467f8
p_next = 0xfffffffffad8f800
p_prev = 0xfffffffffad8f800
p_lckcnt = 0
p_cowcnt = 0
p_cv = {
_opaque = 0
}
p_io_cv = {
_opaque = 0
}
p_iolock_state = 0
p_szc = 0
p_fsdata = 0
p_state = 0
p_nrm = 0x2
p_embed = 0x1
p_index = 0
p_toxic = 0
p_mapping = 0xffffffff82d265f0
p_pagenum = 0xbd62 <-- the page frame number of page
p_share = 0
p_sharepad = 0
p_msresv_1 = 0
p_mlentry = 0x185
p_msresv_2 = 0
}

<-- and lots more output omitted -->

> bd62*1000=K <-- multiple page frame number time page size (hex)
bd62000 <-- here is physical address of page

> bd62000+ea2,10/K <-- dump 16 64-bit hex values at physical address


0xbd62ea2: 2ccec81ec8b55 e8575653f0e48300 32c3815b00000000
5d89d46589003ea7 840ff6850c758be0 e445c7000007df
1216e8000000 dbe850e4458d5650 7d830cc483ffeeea
791840f00e4 c085e8458904468b 500c498b088b2474
8b17eb04c483d1ff e8458de05d8bd465 c483ffeeeac8e850
458b0000074ce904

> bd62000+ea2,10/ai <-- data looks like code, let’s try dumping as code
0xbd62ea2:
0xbd62ea2: pushq %rbp
0xbd62ea3: movl %esp,%ebp

Module 10 • Managing Memory with DTrace and MDB 113


Using DTrace and MDB to Examine Virtual Memory
Instructor Notes
0xbd62ea5: subl $0x2cc,%esp
0xbd62eab: andl $0xfffffff0,%esp
0xbd62eae: pushq %rbx
0xbd62eaf: pushq %rsi
0xbd62eb0: pushq %rdi
0xbd62eb1: call +0x5 <0xbd62eb6>
0xbd62eb6: popq %rbx
0xbd62eb7: addl $0x3ea732,%ebx
0xbd62ebd: movl %esp,-0x2c(%rbp)
0xbd62ec0: movl %ebx,-0x20(%rbp)
0xbd62ec3: movl 0xc(%rbp),%esi
0xbd62ec6: testl %esi,%esi
0xbd62ec8: je +0x7e5 <0xbd636ad>
0xbd62ece: movl $0x0,-0x1c(%rbp)

> ffffffff81d6a040::context <--change context from kernel to mozilla-bin


debugger context set to proc ffffffff81d6a040, the address of process

> fb985ea2,10/ai <-- and dump from faulting virtual address


0xfb985ea2:
0xfb985ea2: pushq %rbp <-- looks like a match
0xfb985ea3: movl %esp,%ebp
0xfb985ea5: subl $0x2cc,%esp
0xfb985eab: andl $0xfffffff0,%esp
0xfb985eae: pushq %rbx
0xfb985eaf: pushq %rsi
0xfb985eb0: pushq %rdi
0xfb985eb1: call +0x5 <0xfb985eb6>
0xfb985eb6: popq %rbx
0xfb985eb7: addl $0x3ea732,%ebx
0xfb985ebd: movl %esp,-0x2c(%rbp)
0xfb985ec0: movl %ebx,-0x20(%rbp)
0xfb985ec3: movl 0xc(%rbp),%esi
0xfb985ec6: testl %esi,%esi
0xfb985ec8: je +0x7e5 <0xfb9866ad>
0xfb985ece: movl $0x0,-0x1c(%rbp)

> 0::context
debugger context set to kernel

> ffffffff81d6a040::print proc_t p_as <-- get as for mozilla-bin


p_as = 0xffffffff828b61d0

> fb985ea2::vtop -a ffffffff828b61d0 <-- check our work


virtual fb985ea2 mapped to physical bd62ea2 <--physical address matches

Module 10 • Managing Memory with DTrace and MDB 114


Using DTrace and MDB to Examine Virtual Memory
Instructor Notes
Once the segment is found, we print the segvn_data structure. In this
segment, a vnode_t maps the segment data. The vnode_t contains a list of
pages that "belong to" the vnode_t. We locate the page corresponding to the
offset within the segment. Once the page_t is located, we have the page frame
number. We then convert the page frame number to a physical address and
examine some of the data at the address. It turns out this data is code. We then
check the physical address by using the vtop (virtual-to-physical) mdb
command.

d. Extra credit: walk the page tables of the process to see how a virtual address
gets translated into a physical one.

Module 10 • Managing Memory with DTrace and MDB 115


116
Instructor Notes

11
M O D U L E 1 1

Debugging Drivers With DTrace

Objectives
The objective of this module is to learn about how you can use DTrace to debug
your driver development projects by reviewing a case study.

117
Porting the smbfs Driver from Linux to the Solaris OS
Instructor Notes

Porting the smbfs Driver from Linux to the Solaris OS


This case study focuses on leveraging the DTrace capability for device driver
development.

Historically, debugging a device driver required that a developer use function


calls like cmn_err() to log diagnostic information to the /var/adm/messages file.
This cumbersome process requires guesswork, re-compilation, and system
reboots to uncover software coding errors. Developers with a talent for assembly
language can use adb and create custom modules in C for mdb to diagnose
software errors. However, historical approaches to kernel development and
debugging are quite time-consuming.

DTrace provides a diagnostic short-cut. Instead of sifting through the


/var/adm/messages file or pages of truss output, DTrace can be used to capture
information on only the events that you as a developer wish to view. The
magnitude of the benefit provided by DTrace can best be provided through a few
simple examples.

First, create an smbfs driver template based on Sun's nfs driver. After the driver
compiles successfully, test that the driver can be loaded and unloaded
successfully. First copy the prototype driver to /usr/kernel/fs and attempt to
modload it by hand:

# modload /usr/kernel/fs/smbfs
can’t load module: Out of memory or no room in system tables

And the /var/adm/messages file contains:

genunix: [ID 104096 kern.warning] WARNING: system call missing


from bind file

Searching for the system call missing message, reveals it is in the function
mod_getsysent() in the file modconf.c, on a failed call to mod_getsysnum.
Instead of manually searching the flow of mod_getsysnum() from source file to
source file, here's a simple DTrace script to enable all entry and return events in
the fbt (Function Boundary Tracing) provider once mod_getsynum() is entered.

#!/usr/sbin/dtrace -s

#pragma D option flowindent

Module 11 • Debugging Drivers With DTrace 118


Porting the smbfs Driver from Linux to the Solaris OS
Instructor Notes

fbt::mod_getsysnum:entry
/execname == "modload"/
{
self->follow = 1;
}

fbt::mod_getsysnum:return
{
self->follow = 0;
trace(arg1);
}

fbt:::entry
/self->follow/
{
}

fbt:::return
/self->follow/
{
trace(arg1);
}

Note – trace(arg1) displays the function's return value.

Executing this script and running the modload command in another window
produces the following output:

# ./mod_getsysnum.d
dtrace: script ’./mod_getsysnum.d’ matched 35750 probes

CPU FUNCTION
0 -> mod_getsysnum
0 -> find_mbind
0 -> nm_hash
0 <- nm_hash 41
0 -> strcmp
0 <- strcmp 4294967295
0 -> strcmp
0 <- strcmp 7
0 <- find_mbind 0
0 <- mod_getsysnum 4294967295

Module 11 • Debugging Drivers With DTrace 119


Porting the smbfs Driver from Linux to the Solaris OS
Instructor Notes
Thus either find_mbind() returning '0', or nm_hash() returning '41' is the culprit.
A quick look at find_mbind() reveals that a return value of 0 indicates an error
state. Viewing the source to find_mbind() in
/usr/src/uts/common/os/modsubr.c, reveals that we're searching for a char
string in a hash table. Let's use DTrace to display the contents of the search string
and hash table.

To view the contents of the search string we add a strcmp() trace to our previous
mod_getsysnum.d script:

fbt::strcmp:entry
{
printf("name:%s, hash:%s", stringof(arg0),
stringof(arg1));
}

Here are the results of our next attempt to load our driver:

# ./mod_getsysnum.d
dtrace: script ’./mod_getsysnum.d’ matched 35751 probes
CPU FUNCTION
0 -> mod_getsysnum
0 -> find_mbind
0 -> nm_hash
0 <- nm_hash 41
0 -> strcmp
0 | strcmp:entry name:smbfs,
hash:timer_getoverrun
0 <- strcmp 4294967295
0 -> strcmp
0 | strcmp:entry name:smbfs,
hash:lwp_sema_post
0 <- strcmp 7
0 <- find_mbind 0
0 <- mod_getsysnum 4294967295

So we're looking for smbfs in a hash table, and it's not present. How does smbfs
get into this hash table? Let's return to find_mbind() and observe that the hash
table variable sb_hashtab is passed to the failing nm_hash() function.

A quick search of the source code reveals that sb_hashtab is initialized with a call
to read_binding_file(), which takes as its arguments a config file, the hash
table, and a function pointer. A few more clicks on our source code browser

Module 11 • Debugging Drivers With DTrace 120


Porting the smbfs Driver from Linux to the Solaris OS
Instructor Notes
reveal the contents of the config file to be defined as /etc/name_to_sysnum in
the file /usr/src/uts/common/os/modctl.c. It looks like we forgot to include a
configuration entry for my driver. Add the following to the
/etc/name_to_sysnum file and reboot.

’smbfs 177’
(read_binding_file() is read once at boot time.)

After rebooting the driver can be loaded successfully.

# modload /usr/kernel/fs/smbfs

Verify that the driver is loaded with the modinfo command:

# modinfo | grep smbfs


160 feb21a58 351ac 177 1 smbfs (SMBFS syscall,client,comm)
160 feb21a58 351ac 24 1 smbfs (network filesystem)
160 feb21a58 351ac 25 1 smbfs (network filesystem version 2)
160 feb21a58 351ac 26 1 smbfs (network filesystem version 3)

Note – Remember that this driver was based on an nfs template, which explains
this output.

Let's make sure we can also unload the module:

# modunload -i 160
can’t unload the module: Device busy

This is most likely due to an EBUSY errno return value. But now, since the smbfs
driver is a loaded module, we have access to all of the smbfs functions:

# dtrace -l fbt:smbfs:: | wc -l
1002

This is amazing! Without any special coding, we now have access to 1002 entry
and return events contained in the driver. These 1002 function handles allow us
to debug my work without a special 'instrumented code' version of the driver!
Let's monitor all smbfs calls when modunload is called, using this simple DTrace
script:

Module 11 • Debugging Drivers With DTrace 121


Porting the smbfs Driver from Linux to the Solaris OS
Instructor Notes
#!/usr/sbin/dtrace -s

#pragma D option flowindent

fbt:smbfs::entry
{
}

fbt:smbfs::return
{
trace(arg1);
}

It seems that the smbfs code is not being accessed by modunload. So, let's use
DTrace to look at modunload with this script:

#!/usr/sbin/dtrace -s

#pragma D option flowindent

fbt::modunload:entry
{
self->follow = 1;
trace(execname);
trace(arg0);
}

fbt::modunload:return
{
self->follow = 0;
trace(arg1);
}

fbt:::entry
/self->follow/
{
}

fbt:::return
/self->follow/
{
trace(arg1);
}

Module 11 • Debugging Drivers With DTrace 122


Porting the smbfs Driver from Linux to the Solaris OS
Instructor Notes
Here’s the output of this script:

# ./modunload.d
dtrace: script ’./modunload.d’ matched 36695 probes
CPU FUNCTION
0 -> modunload modunload 160
0 | modunload:entry
0 -> mod_hold_by_id
0 -> mod_circdep
0 <- mod_circdep 0
0 -> mod_hold_by_modctl
0 <- mod_hold_by_modctl 0
0 <- mod_hold_by_id 3602566648
0 -> moduninstall
0 <- moduninstall 16
0 -> mod_release_mod
0 -> mod_release
0 <- mod_release 3602566648
0 <- mod_release_mod 3602566648
0 <- modunload 16

Observe that the EBUSY return value '16' is coming from moduninstall. Let's take
a look at the source code for moduninstall. moduninstall returns EBUSY in a few
locations, so let's look at the following possibilities:
1. if (mp->mod_prim || mp->mod_ref || mp->mod_nenabled != 0) return
(EBUSY);
2. if ( detach_driver(mp->mod_modname) != 0 ) return (EBUSY);
3. if ( kobj_lookup(mp->mod_mp, "_fini") == NULL )
4. A failed call to smbfs _fini() routine

We can't directly access all of these possibilities, but let's approach them from a
process of elimination. We'll use the following script to display the contents of the
various structures and return values in moduninstall:

#!/usr/sbin/dtrace -s

#pragma D option flowindent

fbt::moduninstall:entry
{
self->follow = 1;
printf("mod_prim:%d\n",

Module 11 • Debugging Drivers With DTrace 123


Porting the smbfs Driver from Linux to the Solaris OS
Instructor Notes
((struct modctl *)arg0)->mod_prim);
printf("mod_ref:%d\n",
((struct modctl *)arg0)->mod_ref);
printf("mod_nenabled:%d\n",
((struct modctl *)arg0)->mod_nenabled);
printf("mod_loadflags:%d\n",
((struct modctl *)arg0)->mod_loadflags);
}

fbt::moduninstall:return
{
self->follow = 0;
trace(arg1);
}

fbt::kobj_lookup:entry
/self->follow/
{
}

fbt::kobj_lookup:return
/self->follow/
{
trace(arg1);
}

fbt::detach_driver:entry
/self->follow/
{
}

fbt::detach_driver:return
/self->follow/
{
trace(arg1);
}

This script produces the following output:

# ./moduninstall.d
dtrace: script ’./moduninstall.d’ matched 6 probes
CPU FUNCTION
0 -> moduninstall
mod_prim:0
mod_ref:0

Module 11 • Debugging Drivers With DTrace 124


Porting the smbfs Driver from Linux to the Solaris OS
Instructor Notes
mod_nenabled:0
mod_loadflags:1
0 -> detach_driver
0 <- detach_driver 0
0 -> kobj_lookup
0 <- kobj_lookup 4273103456
0 <- moduninstall 16

Comparing this output to the code tells us that the failure is not due to the mp
structure values or the return values from detach_driver() of kobj_lookup().
Thus, by a process of elimination, it must be the status returned via the status =
(*func)(); call, which calls the smbfs _fini() routine. And here's what the
smbfs _fini() routine contains:

int _fini(void)
{
/* don’t allow module to be unloaded */
return (EBUSY);
}

Changing the return value to '0' and recompiling the code results in a driver that
we can now load and unload, thus we have completed the objectives of this
exercise. We've used the Function Boundary Tracing provider exclusively in
these examples. Note that fbt is only one of DTrace's many providers.

Module 11 • Debugging Drivers With DTrace 125


126
Instructor Notes

A
A P P E N D I X A

OpenSolaris Resources

To get more information, support, and training, use the following resources.
■ Community Documentation —
http://opensolaris.org/os/community/documentation
■ Sun Documentation — http://www.sun.com/documentation
■ Sun Support — http://www.sun.com/support
■ Sun Training — Sun offers a complete range of professional Solaris training
and certification options to help you apply this powerful platform for greater
success in your operations. To find out more about Solaris training, please go
to:http://www.sun.com/training/catalog/operating_systems/index.xml

127
128

Вам также может понравиться