Академический Документы
Профессиональный Документы
Культура Документы
Systems: A Hands-On
Approach Using the
OpenSolaris Project
Instructor Guide
Sun Microsystems, Inc. has intellectual property rights relating to technology embodied in the product that is described in this document. In particular,
and without limitation, these intellectual property rights may include one or more U.S. patents or pending patent applications in the U.S. and in other
countries.
U.S. Government Rights – Commercial software. Government users are subject to the Sun Microsystems, Inc. standard license agreement and
applicable provisions of the FAR and its supplements.
This distribution may include materials developed by third parties.
Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S.
and other countries, exclusively licensed through X/Open Company, Ltd.
Sun, Sun Microsystems, the Sun logo, the Solaris logo, the Java Coffee Cup logo, docs.sun.com, Java, and Solaris are trademarks or registered trademarks
of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of
SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun
Microsystems, Inc.
The OPEN LOOK and SunTM Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the
pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a
non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun's licensees who implement OPEN LOOK GUIs
and otherwise comply with Sun's written license agreements.
Products covered by and information contained in this publication are controlled by U.S. Export Control laws and may be subject to the export or
import laws in other countries. Nuclear, missile, chemical or biological weapons or nuclear maritime end uses or end users, whether direct or indirect,
are strictly prohibited. Export or reexport to countries subject to U.S. embargo or to entities identified on U.S. export exclusion lists, including, but not
limited to, the denied persons and specially designated nationals lists is strictly prohibited.
DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Copyright 2007 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. Tous droits réservés.
Sun Microsystems, Inc. détient les droits de propriété intellectuelle relatifs à la technologie incorporée dans le produit qui est décrit dans ce document.
En particulier, et ce sans limitation, ces droits de propriété intellectuelle peuvent inclure un ou plusieurs brevets américains ou des applications de
brevet en attente aux Etats-Unis et dans d'autres pays.
Cette distribution peut comprendre des composants développés par des tierces personnes.
Certaines composants de ce produit peuvent être dérivées du logiciel Berkeley BSD, licenciés par l'Université de Californie. UNIX est une marque
déposée aux Etats-Unis et dans d'autres pays; elle est licenciée exclusivement par X/Open Company, Ltd.
Sun, Sun Microsystems, le logo Sun, le logo Solaris, le logo Java Coffee Cup, docs.sun.com, Java et Solaris sont des marques de fabrique ou des marques
déposées de Sun Microsystems, Inc. aux Etats-Unis et dans d'autres pays. Toutes les marques SPARC sont utilisées sous licence et sont des marques de
fabrique ou des marques déposées de SPARC International, Inc. aux Etats-Unis et dans d'autres pays. Les produits portant les marques SPARC sont
basés sur une architecture développée par Sun Microsystems, Inc.
L'interface d'utilisation graphique OPEN LOOK et Sun a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun reconnaît les
efforts de pionniers de Xerox pour la recherche et le développement du concept des interfaces d'utilisation visuelle ou graphique pour l'industrie de
l'informatique. Sun détient une licence non exclusive de Xerox sur l'interface d'utilisation graphique Xerox, cette licence couvrant également les
licenciés de Sun qui mettent en place l'interface d'utilisation graphique OPEN LOOK et qui, en outre, se conforment aux licences écrites de Sun.
Les produits qui font l'objet de cette publication et les informations qu'il contient sont régis par la legislation américaine en matière de contrôle des
exportations et peuvent être soumis au droit d'autres pays dans le domaine des exportations et importations. Les utilisations finales, ou utilisateurs
finaux, pour des armes nucléaires, des missiles, des armes chimiques ou biologiques ou pour le nucléaire maritime, directement ou indirectement, sont
strictement interdites. Les exportations ou réexportations vers des pays sous embargo des Etats-Unis, ou vers des entités figurant sur les listes
d'exclusion d'exportation américaines, y compris, mais de manière non exclusive, la liste de personnes qui font objet d'un ordre de ne pas participer,
d'une façon directe ou indirecte, aux exportations des produits ou des services qui sont régis par la legislation américaine en matière de contrôle des
exportations et la liste de ressortissants spécifiquement designés, sont rigoureusement interdites.
LA DOCUMENTATION EST FOURNIE "EN L'ETAT" ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES OU
TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT TOUTE
GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L'APTITUDE A UNE UTILISATION PARTICULIERE OU A L'ABSENCE
DE CONTREFACON.
070816@18135
Instructor Notes
Contents
3
Contents
Instructor Notes
Web Server Virtualization With Zones ...................................................................................29
Creating Non-Global Zones ..............................................................................................30
Creating ZFS Storage Pools and File Systems .........................................................................34
Creating a Mirrored ZFS Storage Pool ....................................................................................35
Creating ZFS File Systems as Home Directories ....................................................................37
Creating a RAID-Z Configuration ..........................................................................................40
Contents 4
Contents
Instructor Notes
7 Getting Started With DTrace ................................................................................................... 75
Enabling Simple DTrace Probes ..............................................................................................76
Listing Traceable Probes ...........................................................................................................79
Programming in D .....................................................................................................................82
5
6
Instructor Notes
1
M O D U L E 1
Objectives
The objective of this course is to learn about operating system computing by
using the SolarisTM Operating System source code that is freely available through
the OpenSolaris project.
Tip – To receive a free OpenSolaris Starter Kit that includes training materials,
source code, and developer tools, register online at
http://get.opensolaris.org.
We'll start by showing you the user groups, portals, and documentation you will
use to get started with UNIX® computing. Next, we'll show you where to go to
access the code, communities, discussions, projects, and source browser for the
OpenSolaris project. Then, we'll give you steps to configure zones, ZFS,
networking, and the environment. Finally, we'll demonstrate debugging
processes, applications, page faults, and device drivers with DTrace in the lab
exercises.
The OpenSolaris project was launched on June 14, 2005 to create a community
development effort using the Solaris OS code as a starting point. It is a nexus for a
community development effort where contributors from Sun and elsewhere can
collaborate on developing and improving operating system technology. The
OpenSolaris source code will find a variety of uses, including being the basis for
future versions of the Solaris OS product, other operating system projects,
7
What is the OpenSolaris Project?
Instructor Notes
third-party products and distributions of interest to the community. The
OpenSolaris project is currently sponsored by Sun Microsystems, Inc.
In the first two years, over 60,000 participants have become registered members.
The engineering community is continually growing and changing to meet the
needs of developers, system administrators, and end users of the Solaris
Operating System.
Teaching with the OpenSolaris project provides the following advantages over
instructional operating systems:
■ Access to code for the revolutionary technologies in the Solaris 10 operating
system
■ Access to code for a commercial OS that is used in many environments and
that scales to large systems
■ Superior observability and debugging tools
■ Hardware platform support including SPARC, x86 and x64 architectures
■ Leadership on 64–bit computing
■ $0.00 for infinite right-to-use
■ Free, exciting, innovative, complete, seamless, and rock-solid code base
■ Availability under the OSI-approved Common Development and
Distribution License (CDDL) allows royalty-free use, modification, and
derived works
Country Portals
The Internationalization and Localization Community is helping to translate the
OpenSolaris English web site into many languages. So far, eight country portals
are under development, as follows:
■ India portal – http://in.opensolaris.org
■ China portal – http://cn.opensolaris.org
■ Japan portal – http://jp.opensolaris.org
■ Poland portal – http://pl.opensolaris.org
■ France portal – http://fr.opensolaris.org
■ Brazil Portal – http://opensolaris.org/os/project/br
■ Spanish Portal – http://opensolaris.org/os/project/es
Portals for Germany, Russia, Czech Republic, Spain, Korea, and Mexico are
planned. See the OpenSolaris Portals project to get involved, or chat on one of the
seven OpenSolaris chat rooms using IRC at irc.freenode.net. See
http://opensolaris.org/os/chat/
The icons in the upper-right of the OpenSolaris web pages link you to
discussions, communities, projects, downloads, and source browser resources.
In addition, the OpenSolaris web site provides search across all of the site content
and aggregated blogs.
Discussions
Discussions provide you with access to the experts who are working on new open
source technologies. Discussions also provide an archive of previous
conversations that you can reference for answers to your questions. See
http://www.opensolaris.org/os/discussions for the complete list of forums
to which you can subscribe.
Communities
Communities provide connections to other participants with similar interests in
the OpenSolaris project. Communities form around interest groups,
technologies, support, tools, and user groups, for example:
DTrace http://www.opensolaris.org/os/community/dtrace
ZFS http://www.opensolaris.org/os/community/zfs
Networking http://www.opensolaris.org/os/community/networking
Zones http://www.opensolaris.org/os/community/zones
Documentation http://www.opensolaris.org/os/community/documentation
Performance http://www.opensolaris.org/os/community/performance ➋ The OpenSolaris project will empower and expand the
Storage http://www.opensolaris.org/os/community/storage
existing Solaris community.
System Administrators http://www.opensolaris.org/os/community/sysadmin ➌ The OpenSolaris project will allow for the creation of
entirely new communities.
These are only a few of 30 communities actively working on OpenSolaris. See
➍ Projects give you the opportunity to share files, disk
➊–➌ http://www.opensolaris.org/os/communities for the complete list.
space, and an email alias.
Indiana http://www.opensolaris.org/os/project/indiana
OpenGrok http://www.opensolaris.org/os/project/opengrok
Source Repositories
Centralized and distributed source repositories are hosted on the
opensolaris.org web site. The centralized source management model uses the
Subversion (SVN) source control management program. Repositories managed
in a distributed fashion use the Mercurial (hg) source control management
program.
OpenGrok
OpenGrokTM is the fast and usable source code search and cross reference engine
used in OpenSolaris. See http://cvs.opensolaris.org/source to try it out!
Take an online tour of the source and you'll discover cleanly written, extensively
commented code that reads like a book. If you're interested in working on an
OpenSolaris project, you can download the complete codebase. If you just need to
know how some features work in the Solaris OS, the source code browser
provides a convenient alternative. OpenGrok understands various program file
formats and version control histories like SCCS, RCS, and CVS, so that you can
better understand the open source.
2
M O D U L E 2
OpenSolaris Advocacy
Objectives
The Advocates Community exists to help people around the world get involved
in the OpenSolaris Community. We welcome participation from people of all
languages and cultures and people with all levels of technical and non-technical
skills. Everyone has something to contribute.
See http://opensolaris.org/os/community/advocacy/
In the Advocates community you will find independent user group projects,
presentations, news, articles, blogs, technical & non-technical content, videos
and podcasts, events and conferences, community metrics, swag, badges,
buttons, and a variety of other promotional projects.
13
Why Use OpenSolaris?
Instructor Notes
Price
Since the availability of the Solaris 10 Operating System in January 2005, its
popularity has exploded. As of July 2007, in excess of 8.7 million copies were
registered, more than all of the previous versions of the Solaris OS combined.
Further fueling this frenzy was the release of OpenSolaris in June 2005. Given this
surge in the number of users, more developers (commercial and open-source
alike) are seeing the Solaris operating system as a viable target for their software.
One of the reasons the Solaris OS enjoyed a huge popularity boost was its price:
$0 for everyone, for any use (commercial and non-commercial), on any machine
(using both SPARC and x86 platforms). Another reason was Sun's promise (and
delivering on that promise) of making the Solaris source code available under an
OSI-approved open-source license, the Common Development and Distribution
License (CDDL).
Backward Compatibility
All of these features build on what long-time Solaris OS users have come to
expect: rock-solid stability, huge scalability, high performance, and guaranteed
backwards compatibility. The last of these is especially important to commercial
software developers, because maintenance is usually the largest expense
associated with a piece of software. With its backwards compatibility guarantee,
software vendors know that (provided they use only published APIs) software
built for Solaris N will run correctly on Solaris N+1 and subsequent versions.
(Contrast this with some other operating systems, where incompatible changes to
system components -- for example, libraries -- are made without regard to the
effect on applications. The net effect is application breakage, resulting in
increased maintenance costs and frustration for application vendors and users.)
From the developer's perspective, the Solaris versions for SPARC and x86
platforms have the same feature set and APIs. This means that developers can
concentrate on the other issues endemic to cross-platform development, like
CPU endianness. The SPARC platform is big-endian and x86 is little-endian, so
an application that is developed and tested on the Solaris platform has a high
probability of being free from endian-related problems. The Solaris OS also
supports both 32-bit and 64-bit applications on both platforms, thus helping to
eliminate bugs due to assumptions about word size.
Perhaps the most compelling reason to develop software on the Solaris OS is the
wealth of professional-grade development tools available for it.
Development Tools
One of the most important features of an OS from a developer's point of view is
the variety and quality of the development tools available. Compilers and
debuggers are the most obvious examples of these tools, but other examples
include code checkers (to ensure that our code is free from subtle errors the
compiler might not catch), cross-reference generators (to see which functions
reference other functions and variables), and performance analyzers.
The Sun Studio suite is the product of choice for Solaris OS developers. Available
as a free download from the http://developers.sun.com web site, Sun Studio
software is a collection of professional-grade compilers and tools. It includes C,
C++, and FORTRAN compilers, code analysis tools, an integrated development
environment (IDE), the dbx source-level debugger, and editors. Other tools
included with Studio software are cscope (an interactive source browser), ctrace
(a tool to generate a self-tracing version of our programs), cxref (a C program
cross-referencer), dmake (for distributed parallel makes), and lint (the C
program checker).
The Solaris OS ships with the GNU C compiler, gcc, and its companion
source-level debugger, gdb. The Solaris OS also ships with the very powerful
modular debugger, mdb. However, mdb is not a source-level debugger. It is most
useful when we are debugging kernel code, or performing post-mortem analysis
on programs for which the source is not available. See the Solaris Modular
Debugger Guide and Solaris Performance and Tools by McDougall, Mauro, and
Gregg for more information about mdb.
Acknowledgments
The following members of the OpenSolaris Community reviewed and provided
feedback on this document:
■ Boyd Adamson
■ Pradhap Devarajan
■ Alan Coopersmith
■ Brian Gupta
■ Rainer Heilke
■ Eric Lowe
■ Ben Rockwood
■ Cindy Swearingen
http://www.opensolaris.org/os/community/documentation/reviews
3
M O D U L E 3
Objectives
The objective of this module is to understand the system requirements, support
information, and documentation available for the OpenSolaris project
installation and configuration.
Additional Resources
■ Solaris Express Developer Edition Installation Guide: Laptop Installations. Sun
Microsystems, Inc., 2007.
■ Resources for Running Solaris OS on a Laptop:
http://www.sun.com/
bigadmin/features/articles/laptop_resources.html
■ OpenSolaris Laptop Community:
http://opensolaris.org/os/community/laptop
■ OpenSolaris Starter Kit: http://opensolaris.org/os/project/starterkit
■ System Administration Guide: IP Services, Sun Microsystems, Inc., 2007
■ OpenSolaris Networking Community at
http://www.opensolaris.org/os/community/networking
■ ZFS Administration Guide and man pages:
http://opensolaris.org/os/community/zfs/docs
19
Development Environment Configuration
Instructor Notes
Hardware OpenSolaris supports systems that use the SPARC® and x86 families of processor
architectures: UltraSPARC®, SPARC64, AMD64, Pentium, and Xeon EM64T.
For supported systems, see the Solaris OS Hardware Compatibility List at
http://www.sun.com/bigadmin/hcl.
Install images Pre-built OpenSolaris distributions are limited to the Solaris Express:
Community Edition [DVD Version], Build 32 or newer, Solaris Express:
Developer Edition, Nexenta, Schillix, Martux and Belenix.
For the OpenSolaris kernel with the GNU user environment, try
http://www.gnusolaris.org/gswiki/Download-form.
BFU archives The on-bfu-DATE.PLATFORM.tar.bz2 file is provided if you are installing from
pre-built archives.
Compilers and tools Sun Studio 11 compilers and tools are freely available for use by OpenSolaris
developers. See
http://www.opensolaris.org/os/community/tools/sun_studio_tools/ for
instructions about how to download and install the latest versions. Also, refer to
http://www.opensolaris.org/os/community/tools/gcc for the gcc
community.
➊–➌ Refer to Module 4 for more information about how Zones and Branded Zones
enable kernel and user mode development of Solaris and Linux applications
without impacting developers in separate zones.
Networking
The OpenSolaris project meets future networking challenges by radically
improving your network performance without requiring changes to your existing
applications.
■ Speeds application performance by about 50 percent by using an enhanced
TCP/IP stack
■ Supports many of the latest networking technologies, such as 10 Gigabit
Ethernet, wireless networking, and hardware offloading
Find out more about ongoing networking developments from the OpenSolaris
Networking Community:
http://www.opensolaris.org/os/community/networking.
The nwamd daemon monitors the Ethernet port and automatically enables DHCP
on the appropriate IP interface. If no cable is plugged into a wired network, the
nwamd daemon conducts a wireless scan and sends queries to the user for a WiFi
access point to connect to.
You don't need to spend extensive amounts of time manually configuring the
interfaces on your systems. Automatic configuration also aids you in
administration, because you can reconfigure network addresses with minimal
intervention.
To view your NWAM status, type the following command in a terminal window.
# svcs nwam
The OpenSolaris Network Auto-Magic Phase 0 page and nwamd man page contain
further details, including instructions for turning off the nwamd daemon, if
preferred. For more information and a link to the nwamd(1M) man page, see
http://www.opensolaris.org/os/project/nwam.
Zone Overview
A zone can be thought of as a container in which one or more applications run
isolated from all other applications on the system. Most software that runs on
OpenSolaris will run unmodified in a zone. Since zones do not change the
OpenSolaris Application Programming Interface (APIs) or Application Binary
Interface (ABI), recompiling an application is not necessary in order to run it
inside a zone.
Zones Administration
Zone administration consists of the following commands:
■ zonecfg – Creates zones, configures zones (add resources and properties).
Stores the configuration in a private XML file under /etc/zones.
■ zoneadm – Performs administrative steps for zones such as list, install,
(re)boot, and halt.
■ zlogin – Allows user to log in to the zone to perform maintenance tasks.
■ zonename – Displays the current zone name.
Summary
This exercise uses detailed examples to help you understand the process of
creating, installing, and booting a zone.
Note – The following example uses a shared-IP stack, which is the default for a
zone.
# zonecfg -z Apache
Apache: No such zone configured
Use ’create’ to begin configuring a new zone.
zonecfg:Apache> create
zonecfg:Apache> set zonepath=/export/home/Apache
zonecfg:Apache> add net
zonecfg:Apache:net> set address=192.168.0.50
zonecfg:Apache:net> set physical=bge0
zonecfg:Apache:net> end
zonecfg:Apache> verify
zonecfg:Apache> commit
zonecfg:Apache> exit
2 Use the following example to install and boot your new zone:
# zoneadm -z Apache install
Preparing to install zone <Apache>.
Creating list of files to copy from the global zone.
Copying <6029> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <1038> packages on the zone.
Initialized <1038> packages on zone.
Zone <Apache> is initialized.
Installation of these packages generated warnings: ....
The file </export/home/Apache/root/var/sadm/system/logs/install_log>
contains a log of the zone installation.
The necessary directories are created. The zone is ready for booting.
# /etc/mount
/export/home/Apache/root/lib on /lib read only
/export/home/Apache/root/platform on /platform read only
/export/home/Apache/root/sbin on /sbin read only
/export/home/Apache/root/usr on /usr read only
/export/home/Apache/root/proc on proc
read/write/setuid/nodevices/zone=Apache
Summary
Simultaneous access to both web servers will be configured so that each web
server and system will be protected should one become compromised.
10 http://apache2zone/manual/
The Apache2 web server is up and running.
Discussion
The end user sees each zone as a different system. Each web server has it's own
name service:
■ /etc/nsswitch.conf
■ /etc/resolv.conf
A malicious attack on one web server is contained to that zone. Port conflicts are
no longer a problem!
Creating ZFS Storage Pools and File Systems ➊ The most basic building block for a storage pool is a
piece of physical storage. This can be any block device
Each ZFS storage pool is comprised of one or more virtual devices, which of at least 128 Mbytes in size. Typically, this is a hard
describe the layout of physical storage and its fault characteristics. drive that is visible to the system in the /dev/dsk
directory.
➊–➋ In this module, we'll start by learning about mirrored storage pool configuration.
➌–➏ Then we'll show you how to create a RAID-Z configuration. ➋ A storage device can be a whole disk (c0t0d0) or an
individual slice (c0t0d0s7). The recommended mode
of operation is to use an entire disk, in which case the
disk does not need to be specially formatted. ZFS
formats the disk using an EFI label to contain a single,
large slice.
Summary
ZFS is easy, so let's get on with it! It's time to create your first pool:
2 Create a mirrored storage pool named tank. Then, display information about the
pool.
# zpool create tank mirror c1t1d0 c2t2d0
# zpool status tank
pool: tank
state: ONLINE
scrub: none requested
config:
The capacity of the c1t1d0 and c2t2d0 disks is 36 Gbytes each. Because the disks
are mirrored, the total capacity of the pool reflects the approximate size of one of
the disks. Pool metadata consumes a small quantity of disk space. For example:
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
tank 33.8G 89K 33.7G 0% ONLINE -
By using ZFS file system features, available in the OpenSolaris project, you might
be able to simplify your kernel development environment by implementing
snapshots and their rollback features.
Summary
In this lab, we'll use the zfs command to create a filesystem and set its
mountpoint.
3 Then, set the mount point for the tank/home file system:
# zfs set mountpoint=/export/home tank/home
6 Take a recursive snapshot of the tank/home file system. Then, display the
snapshot information:
# zfs snapshot -r tank/home@today
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 252K 33.2G 26.5K /tank
tank/home 128K 33.2G 29.5K /tank/home
Summary
You might want to create a RAID-Z configuration as an alternative to a mirrored
storage pool configuration if you need to maximize disk space.
2 Create a pool with a single RAID-Z device consisting of 5 disks. Then, display
information about the storage pool.
# zpool create tank raidz c1t1d0 c2t2d0 c3t3d0 c4t4d0 c5t5d0
# zpool status tank
pool: tank
state: ONLINE
scrub: none requested
config:
Disks can be specified by using their shorthand name or the full path. For
example, /dev/dsk/c4t4d0 is identical to c4t4d0.
It is possible to use disk slices for both mirrored and RAID-Z storage pool
configurations, but these configurations are not recommended for production
environments. For more information about using ZFS in production
environments, go to: http://www.solarisinternals.com/
wiki/index.php/ZFS_Best_Practices_Guide.
4
M O D U L E 4
Userland Consolidations
Objectives
The objective of this module is to introduce you to the userland consolidations of
OpenSolaris. In general, you can think of userland consolidations as existing
outside of the kernel and as components with which users interact. Each of the
following consolidations deliver source files to the opensolaris.org web site or
download center. To access each consolidation, refer to the following URL:
http://opensolaris.org/os/downloads/
43
Userland Consolidations and Descriptions
Instructor Notes
Developer Product Tools (DevPro) The system math library, the media library, the microtasking
library, SCCS and make and C++ runtime libraries.
Java Platform, Standard Edition Binaries for the Java Development Kit (JDK) and Java Runtime
(Java SE) Environment (JRE) are available.
SFW (Solaris FreeWare) Open source software that is bundled with Solaris/OpenSolaris.
SPARC Graphics Support The SPARC graphics consolidation has drivers available in binary
form.
5
M O D U L E 5
Objectives
The objective of this module is to describe the core features of the Solaris OS and
how the features have fundamentally changed operating system computing.
Additional Resources
OpenSolaris Development Process;
http://www.opensolaris.org/os/community/onnv/os_dev_process/
45
Development Process and Coding Style
Instructor Notes
The Integration phase is to make sure everything that was supposed to be done
has in fact been done, which means conducting reviews for code, documentation,
and completeness.
➊ The formal process document for OpenSolaris describes the previous steps in
greater detail, with flow charts that illustrate the development phases. That
document also details the following design principles and core values that are to
be applied to source code development for the OpenSolaris project:
■ Reliability – OpenSolaris must perform correctly, providing accurate results
with no data loss or corruption.
■ Availability – Services must be designed to be restartable in the event of an
application failure and OpenSolaris itself must be able to recover from
non-fatal hardware failures.
■ Serviceability – It must be possible to diagnose both fatal and transient issues
and wherever possible, automate the diagnosis.
■ Security – OpenSolaris security must be designed into the operating system,
with mechanisms in place in order to audit changes done to the system and by
whom.
■ Performance – The performance of OpenSolaris must be second to none
when compared to other operating systems running on identical
environments.
■ Manageability – It must allow for the management of individual components,
software or hardware, in a consistent and straightforward manner.
■ Compatibility – New subsystems and interfaces must be extensible and
versioned in order to allow for future enhancements and changes without
sacrificing compatibility.
■ Maintainability – OpenSolaris must be architected so that common
subroutines are combined into libraries or kernel modules that can be used by
an arbitrary number of consumers.
Overview
Now that you have considered the development environment, processes, and
values applied to engineering by OpenSolaris developers, let's discuss in more
depth, features of the operating system that exemplify performance, security,
serviceability, and manageability:
■ Performance
■ FireEngine
■ Nemo
■ Crossbow
■ Security
■ Least Privilege
■ Packet Filtering
■ Zones
■ Branded Zones (BrandZ)
■ Serviceability
■ Predictive Self-Healing
■ Dynamic Tracing Facility (DTrace)
■ Modular Debugger (MDB)
■ Manageability
■ Services Management Facility (SMF)
■ ZFS
FireEngine
The "FireEngine" approach in Solaris 10 merges all protocol layers into one
STREAMS module that is fully multithreaded. Inside the merged module, instead
of per-data structure locks, a per-CPU synchronization mechanism called
"vertical perimeter" is used. The "vertical perimeter" is implemented using a
serialization queue abstraction called "squeue." Each squeue is bound to a CPU,
and each connection is in turn bound to a squeue that provides any
synchronization and mutual exclusion needed for the connection-specific data
structures.
Synchronization
Since the stack is fully multithreaded (barring the per-CPU serialization enforced
by the vertical perimeter), it uses a reference-based scheme to ensure that the
connection instance is available when needed. For an established TCP
connection, three references are guaranteed to be on it. Each protocol layer has a
reference on the instance (one each for TCP and IP) and the classifier itself has a
reference since it is an established connection. Each time a packet arrives for the
connection and the classifier looks up the connection instance, an extra reference
is placed, which is dropped when the protocol layer finishes processing that
packet.
There is a fully multithreaded UDP module running under the same protection
domain as IP. Though UDP and IP are running in the same protection domain,
they are still separate STREAMS modules. Therefore, STREAMS plumbing is
kept unchanged and a UDP module instance is always pushed above IP. The
Solaris 10 platform allows for the following plumbing modes:
■ Normal– IP is first opened and later UDP is pushed directly on top. This is the
default action that happens when a UDP socket or device is opened.
■ SNMP– UDP is pushed on top of a module other than IP. When this happens,
only SNMP semantics will be supported.
GLDv3
Solaris 10 software introduces a new device driver framework called GLDv3
along with the new stack. Most of the major device drivers were ported to this
framework, and all future and 10Gb device drivers will be based on this
framework. This framework also provided a STREAMS-based DLPI layer for
backward compatibility (to allow external, non-IP modules to continue to work).
Virtualization
Project Crossbow creates virtual stacks around any service (HTTP, HTTPS, FTP,
NFS, etc.), protocol (TCP, UDP, SCTP, etc.), or Solaris Containers technology.
The virtual stacks are separated by means of a H/W classification engine such that
traffic for one stack does not impact other virtual stacks. Each virtual stack can be
assigned its own priority and bandwidth on a shared NIC without causing
performance degradation to the system or the service/container. The architecture
dynamically manages priority and bandwidth resources, and can provide better
defense against denial-of-service attacks directed at a particular service or
container by isolating the impact to just that service or container.
Least Privilege
UNIX® has historically had an all-or-nothing privilege model that imposes the
following restrictions:
■ No way to limit root user privileges
■ No way for non-root users to perform privileged operations
■ Applications needing only a few privileged operations must run as root
■ Very few are trusted with root privileges and virtually no students are so
trusted
Packet Filtering
Solaris IP Filter provides stateful packet filtering and network address translation
(NAT). Solaris IP Filter is derived from the open source IP Filter software. IP
Filter can filter by IP address, port, protocol, or network interface according to
filter rules.
IP Filter
The Packet Filtering Hooks (PFHooks) API has been introduced since Solaris 10
Update 4, to replace the STREAMS-based implementation of IP Filter. Using the
PFHooks framework, the performance of firewall software like IP Filter is
significantly improved. PFhooks also provides the ability to intercept loopback
and inter-zone traffic. Third-party firewall software is developed and registered
with the PFHooks API using the net_register_hook(info, event, hook);
hook.
To use Solaris IP Filter, simply enter your filter rules in the /etc/ipf/ipf.conf
file. Then, enable and restart the svc:network/ipfilter service by using the
svcadm command.
Note – You can also use the ipf command to work with the rule sets.
Solaris IP Filter can perform network address translation (NAT) for a source
address or a destination address according to NAT rules. Following is an example
of a NAT rule:
Note – You can also use the ipnat command to work with rule sets.
Block any inbound packets on le0 which are fragmented and too short on which
to do any meaningful comparison. This actually only applies to TCP packets
which can be missing the flags/ports (depending on which part of the fragment
you see).
block in log quick on le0 from any to any with short frag
Log all inbound TCP packets with the SYN flag (only) set.
Note – If it was an inbound TCP packet with the SYN flag set and it had IP options
present, this rule and the above rule would cause it to be logged twice.
block in log on le0 proto icmp from any to any icmp-type unreach
Block and log any inbound UDP packets on le0 which are going to port 2049 (the
NFS port).
block in log on le0 proto udp from any to any port = 2049
Block any inbound TCP packets with only the SYN flag set that are destined for
these subnets.
Zones
A zone is a virtual operating system abstraction that provides a protected
environment in which applications run. The applications are protected from each
other to provide software fault isolation. To ease the labor of managing multiple
applications and their environments, they co-exist within one operating system
instance, and are usually managed as one entity.
A small number of applications which are normally run as root or with certain
privileges may not run inside a zone if they rely on being able to access or change
■ An application which accesses the network and files, and performs no other ➌ Zones can be used as instructional tool or
I/O, should work correctly. infrastructure component
■ Applications which require direct access to certain devices, for example, a disk
partition, will usually work if the zone is configured correctly. However, in ➍ For example, you can allocate each student an IP
some cases this may increase security risks. address and a zone and allow them all to safely share
one machine.
■ Applications which require direct access to these devices may need to be
modified to work correctly. For example, /dev/kmem, or a network device.
Applications should instead use one of the many IP services.
➊–➍ Zones can be combined with the resource management facilities which are
present in OpenSolaris to provide more complete, isolated environments. While
the zone supplies the security, name space and fault isolation, the resource
management facilities can be used to prevent processes in one zone from using
too much of a system resource or to guarantee them a certain service level.
Together, zones and resource management are often referred to as containers.
See http://opensolaris.org/os/community/zones/faq for answers to a large
number of common questions about zones and links to the latest administration
documentation.
Zones provide protected environments for Solaris applications.Separate and
protected run-time environments are available through the OpenSolaris project,
by using BrandZ.
Zones Networking
Solaris zones can be designated as one of the following:
■ Exclusive-IP zone
■ Shared-IP zone
Shared-IP zones cannot change their network configuration or routing table and
cannot see the configuration of other zones. /dev/ip is not present in the
shared-IP zone. SNMP agents must open /dev/arp instead. Multiple shared-IP
zones can share a broadcast address and may join the same multi-cast group.
Exclusive-IP zones do not have the above limitations, and can change their
network configuration or routing table inside the zone. /dev/ip is present in the
exclusive-IP zone.
By default, all zones see all CPUs. Restricted view is enabled automatically when
resource pools are enabled.
Zones Devices
Each zone has its own devices. Zones see a subset of safe pseudo devices in their
/dev directory. Applications reference the logical path to a device presented in
/dev. The /dev directory exists in non-global zones, the /devices directory does
not. Devices like random, console, and null are safe, but others like /dev/kmem are
not.
Zones can modify the permissions of their devices but cannot issue mknod(2).
Physical device files like those for raw disks can be put in a zone with caution.
Devices maybe shared among zones, but need careful security concerns before
doing this.
For example, you might have devices that you want to assign to specific zones.
Allowing unprivileged users to access block devices could permit those devices to
be used to cause system panic, bus resets, or other adverse effects. Placing a
physical device into more than one zone can create a covert channel between
zones. Global zone applications that use such a device risk the possibility of
compromised data or data corruption by a non-global zone.
Predictive Self-Healing
Predictive self-healing was implemented in two ways in the Solaris 10 OS. This
section describes the new Fault Management Architecture and Services
Management Facility that make up the self-healing technology.
Find the DTrace community pages here ➒ It allows researchers to better and more quickly
http://opensolaris.org/os/community/dtrace. understand and improve software systems.
In addition to DTrace, the OpenSolaris project provides debugging facilities for
➐–➒ low-level types of development, for example, device driver development.
➊–➋ In RAID-Z, ZFS uses variable-width RAID stripes so that all writes are full-stripe
writes. This feature is only possible because ZFS integrates filesystem and device
management in such a way that the filesystem's metadata has enough
information about the underlying data replication model to handle
variable-width RAID stripes. RAID-Z is the world's first software-only solution
to the RAID-5 write hole.
6
M O D U L E 6
Programming Concepts
Objectives
This module provides a high-level description of the fundamental concepts of the
OpenSolaris programming environment, as follows:
■ Process and System Management
■ Threaded Programming
■ Kernel Overview
■ CPU Scheduling
■ Process Debugging
Additional Resources
■ Solaris Internals (2nd Edition), Prentice Hall PTR (May 12, 2006) by Jim
Mauro and Richard McDougall
■ Solaris Systems Programming, Prentice Hall PTR (August 19, 2004), by Rich
Teer
■ Multithreaded Programming Guide. Sun Microsystems, Inc., 2005.
■ STREAMS Programming Guide. Sun Microsystems, Inc., 2005.
■ Solaris 64-bit Developer’s Guide. Sun Microsystems, Inc., 2005.
63
Process and System Management
Instructor Notes
Threaded Programming
Now that we've learned about processes in the context of tasks, projects, resource
pools, zones, and branded zones, let's discuss processes in the context of threads.
Traditional UNIX already supports the concept of threads. Each process contains
The pthread_create() function is called with attr that has the necessary state
behavior. start_routine is the function with which the new thread begins
execution. When start_routine returns, the thread exits with the exit status set
to the value returned by start_routine. pthread_create() returns zero when
the call completes successfully. Any other return value indicates that an error
occurred. Go to /on/usr/src/lib/libc/spec/threads.spec in OpenGrok for
the complete list of pthread functions and declarations.
Thread synchronization enables you to control program flow and access to
shared data for concurrently executing threads. The four synchronization objects
are mutex locks, read/write locks, condition variables, and semaphores.
■ Mutex locks allow only one thread at a time to execute a specific section of
code, or to access specific data.
■ Read/write locks permit concurrent reads and exclusive writes to a protected
shared resource. To modify a resource, a thread must first acquire the
exclusive write lock. An exclusive write lock is not permitted until all read
locks have been released.
■ Condition variables block threads until a particular condition is true.
■ Counting semaphores typically coordinate access to resources. The count is
the limit on how many threads can have access to a semaphore. When the
count is reached, the thread that is trying to access the resource blocks.
We can use OpenGrok to find libthread in the source code tree, and the second
most relevant result is found in mutex.c, accompanied by the following code
comment excerpt:
Now that you understand a bit about how synchronization objects are defined in
multi-threaded programming, let's learn how these objects are managed by using
scheduling classes.
CPU Scheduling
Processes run in a scheduling class with a separate scheduling policy applied to
each class, as follows:
■ Realtime (RT) – The highest-priority scheduling class provides a policy for
those processes that require fast response and absolute user or application
control of scheduling priorities. RT scheduling can be applied to a whole
process or to one or more lightweight processes (LWPs) in a process. You
must have the proc_priocntl privilege to use the Realtime class. See the
privileges(5) man page for details.
■ System (SYS) – The middle-priority scheduling class, the system class cannot
be applied to a user process.
■ Timeshare (TS) – The lowest-priority scheduling class is TS, which is also the
default class. The TS policy distributes the processing resource fairly among
processes with varying CPU consumption characteristics. Other parts of the
kernel can monopolize the processor for short intervals without degrading the
response time seen by the user.
■ Inter-Active (IA) – The IA policy distributes the processing resource fairly
among processes with varying CPU consumption characteristics, while also
providing good responsiveness for user interaction.
■ Fair Share (FSS) – The FSS policy distributes the processing resource fairly
among projects, independent of the number of processes they own by
specifying shares to control the process entitlement to CPU resources.
Resource usage is remembered over time, so that entitlement is reduced for
heavy usage and increased for light usage with respect to other projects.
■ Fixed-Priority (FX) – The FX policy provides a fixed priority preemptive
scheduling policy for those processes requiring that the scheduling priorities
do not get dynamically adjusted by the system and that the user or application
have control of the scheduling priorities. This class is a useful starting point
for affecting CPU allocation policies.
➊ Typing the man priocntl command in a terminal window shows the details of
each scheduling class and describes attributes and usage. For example:
% man priocntl
Reformatting page. Please Wait... done
NAME
priocntl - display or set scheduling parameters of specified
process(es)
SYNOPSIS
DESCRIPTION
The priocntl command displays or sets scheduling parameters
of the specified process(es). It can also be used to display
the current configuration information for the system’s pro-
cess scheduler or execute a command with specified schedul-
ing parameters.
Kernel Overview
Now that you have a high-level understanding of processes, threads, and
scheduling, let's discuss the kernel and how kernel modules are different from
user programs. The Solaris kernel does the following:
■ Manages the system resources, including file systems, processes, and physical
devices.
■ Provides applications with system services such as I/O management, virtual
memory, and scheduling.
■ Coordinates interactions of all user processes and system resources.
■ Assigns priorities, services resource requests, and services hardware interrupts
and exceptions.
■ Schedules and switches threads, pages memory, and swaps processes.
Process Debugging ➊ Again, use OpenGrok to quickly find the file and view
its code comments, as excerpted here:
Debugging processes at all levels of the development stack is a key part of writing
kernel modules.
A full search for libthread in OpenGrok, reveals the following code comments
in the mdb_tdb.c file that describe the connection between multi-threaded
debugging and how mdb works:
➊ In order to properly debug multi-threaded programs, the proc target
must be able to query and modify information such as a thread’s
register set using either the native LWP services provided by
libproc (if the process is not linked with libthread), or using the
services provided by libthread_db (if the process is linked with
libthread). Additionally, a process may begin life as a
single-threaded process and then later dlopen() libthread, so we
must be prepared to switch modes on-the-fly. There are also two
possible libthread implementations (one in /usr/lib and one in
/usr/lib/lwp) so we cannot link mdb against libthread_db directly;
instead, we must dlopen the appropriate libthread_db on-the-fly
based on which libthread.so the victim process has open. Finally,
mdb is designed so that multiple targets can be active
simultaneously, so we could even have *both* libthread_db’s open at
the same time. This might happen if you were looking at two
multi-threaded user processes inside of a crash dump, one using
/usr/lib/libthread.so and the other using
/usr/lib/lwp/libthread.so. To meet these requirements, we implement
a libthread_db "cache" in this file. The proc target calls
mdb_tdb_load() with the pathname of a libthread_db to load, and if it
is not already open, we dlopen() it, look up the symbols we need to
reference, and fill in an ops vector which we return to the caller.
Once an object is loaded, we don’t bother unloading it unless the
entire cache is explicitly flushed. This mechanism also has the nice
property that we don’t bother loading libthread_db until we need it,
so the debugger starts up faster.
The following mdb commands can be used to access the LWPs of a multi-threaded
program:
■ $l Prints the LWP ID of the representative thread if the target is a user process.
7
M O D U L E 7
➌ END will fire after all other probes are completed and
can be used to output results.
Objectives
The objective of this lab is to introduce you to DTrace using a probe script for a
➊–➍ system call using DTrace.
Additional Resources
■ Solaris Dynamic Tracing Guide. Sun Microsystems, Inc., 2007.
■ DTrace User Guide, Sun Microsystems, Inc., 2006
75
Enabling Simple DTrace Probes
Instructor Notes
Summary
We're going to start learning DTrace by building some very simple requests using
the probe named BEGIN, which fires once each time you start a new tracing
request. You can use the dtrace(1M) utility's -n option to enable a probe using its
string name.
After a brief pause, you will see dtrace tell you that one probe was enabled and
you will see a line of output indicating that the BEGIN probe fired. Once you see
this output, dtrace remains paused waiting for other probes to fire. Since you
haven't enabled any other probes and BEGIN only fires once, press Control-C in
your shell to exit dtrace and return to your shell prompt:
The output tells you that the probe named BEGIN fired once and both its name
and integer ID, 1, are printed. Notice that by default, the integer name of the CPU
on which this probe fired is displayed. In this example, the CPU column indicates
that the dtrace command was executing on CPU 0 when the probe fired.
The END probe fires once when tracing is completed. As you can see, pressing
Control-C to exit DTrace triggers the END probe. DTrace reports this probe
firing before exiting.
Summary
In the preceding examples, you learned to use two simple probes named BEGIN
and END. But where did these probes come from? DTrace probes come from a set
of kernel modules called providers, each of which performs a particular kind of
instrumentation to create probes. For example, the syscall provider provides
probes in every system call and the fbt provider provides probes into every
function in the kernel.
When you use DTrace, each provider is given an opportunity to publish the
probes it can provide to the DTrace framework. You can then enable and bind
your tracing actions to any of the probes that have been published.
The probes that are available on your system are listed with the following five
pieces of data:
■ ID - Internal ID of the probe listed.
■ Provider - Name of the Provider. Providers are used to classify the probes.
This is also the method of instrumentation.
■ Module - The name of the Unix module or application library of the probe.
■ Function - The name of the function in which the probe exists.
■ Name - The name of the probe.
4 Pipe the previous command to wc to find the total number of probes in your
system:
# dtrace -l | wc -l
30122
The number of probes that your system is currently aware of is listed in the
output. The number will vary depending on your system type.
# dtrace -l -P lockstat
ID PROVIDER MODULE FUNCTION NAME
4 lockstat genunix mutex_enter adaptive-acquire
5 lockstat genunix mutex_enter adaptive-block
6 lockstat genunix mutex_enter adaptive-spin
7 lockstat genunix mutex_exit adaptive-release
Only the probes that are available in the lockstat provider are listed in the
output.
# dtrace -l -m ufs
ID PROVIDER MODULE FUNCTION NAME
15 sysinfo ufs ufs_idle_free ufsinopage
16 sysinfo ufs ufs_iget_internal ufsiget
356 fbt ufs allocg entry
Only the probes that are in the UFS module are listed in the output.
# dtrace -l -f open
ID PROVIDER MODULE FUNCTION NAME
4 syscall open entry
5 syscall open return
116 fbt genunix open entry
117 fbt genunix open return
Only the probes with the function name open are listed.
# dtrace -l -n start
ID PROVIDER MODULE FUNCTION NAME
506 proc unix lwp_rtt_initial start
2766 io genunix default_physio start
2768 io genunix aphysio start
5909 io nfs nfs4_bio start
The above command lists all the probes that have the probe name start.
Programming in D
Now that you understand a little bit about naming, enabling, and listing probes,
you're ready to write the DTrace version of everyone's first program, "Hello,
World."
Summary
This lab demonstrates that, in addition to constructing DTrace experiments on
the command line, you can also write them in text files using the D programming
language.
As you can see, dtrace printed the same output as before followed by the text
“hello, world”. Unlike the previous example, you did not have to wait and press
Control-C, either. These changes were the result of the actions you specified for
your BEGIN probe in hello.d. Let's explore the structure of your D program in
more detail in order to understand what happened.
Discussion
Each D program consists of a series of clauses, each clause describing one or more
probes to enable, and an optional set of actions to perform when the probe fires.
The actions are listed as a series of statements enclosed in braces { } following the
probe name. Each statement ends with a semicolon (;).
Your first statement uses the function trace() to indicate that DTrace should
record the specified argument, the string “hello, world”, when the BEGIN probe
fires, and then print it out. The second statement uses the function exit() to
indicate that DTrace should cease tracing and exit the dtrace command.
DTrace provides a set of useful functions like trace() and exit() for you to call
in your D programs. To call a function, you specify its name followed by a
parenthesized list of arguments. The complete set of D functions is described in
Solaris Dynamic Tracing Guide.
8
M O D U L E 8
Objectives
The objective of this module is to use DTrace to monitor application events.
Additional Resources
Application Packaging Developer’s Guide. Sun Microsystems, Inc., 2005.
85
Enabling User Mode Probes
Instructor Notes
Enabling User Mode Probes ➊ The pid provider is extremely flexible and allows you
to instrument any instruction in user land including
DTrace allows you to dynamically add probes into user level functions. The user entry and exit.
code does not need any recompilation, special flags, or even a restart. DTrace
probes can be turned on just by calling the provider. ➋ The pid provider creates probes on the fly when they
are needed. This is why they do not appear in the
➊–➌ A probe description has the following syntax: dtrace -l listing.
pid:mod:function:name
➌ You can use the pid provider to trace Function
Boundaries or any arbitrary instruction in a given
■ pid: format pid processid (for example pid5234) function.
■ mod: name of the library or a.out (executable)
■ function: name of the function
■ name: entry for function entry return for function return
DTracing Applications
In this exercise we will learn to use DTrace on user applications.
Summary
This lab builds on the use of a process ID in the probe description to trace the
associated application. The steps increase in complexity to the end of the exercise,
increasing the amount and depth of information about the application behavior
that is output.
To DTrace gcalctool
1 From the Application or Program menu, start the calculator.
This number is the process ID of the calc process, we will call it procid.
3 Follow the steps below to create a D-script that counts the number of times any
function in the gcalctool is called.
c. In the action section, add an aggregate to count the number of times the
function is called using the aggregate statement @[probefunc]=count().
pid$1:::entry
{
@[probefunc]=count();
}
Note – The DTrace script collects data and waits for you to stop the collection by
pressing Control+C. If you do not need to print the aggregation you collected,
DTrace will print it for you.
b. Press Control+C in the window where you ran the D-script to see the output.
6 Finally, modify the script to find how much time is spent in each function.
d. In the action section of the first probe, save timestamp in variable ts.
Timestamp is a DTrace built-in that counts the number of nanoseconds from
a point in the past.
e. In the action section of the second probe calculate nanoseconds that have
passed using the following aggregation:
@[probefunc]=sum(timestamp - ts)
b. Press Control+C in the window where you ran the D-script to see the output.
^C
gdk_xid__equal 2468
_XSetLastRequestRead 2998
_XDeq 3092
...
The left column shows you the name of the function and the right column shows
you the amount of wall clock time that was spent in that function. The time is in
nanoseconds.
9
M O D U L E 9
Objectives
The examples in this module demonstrate the use of DTrace to diagnose C++
application errors. These examples are also used to compare DTrace with other
➊–➍ application debugging tools, including Sun Studio 10 software and mdb.
91
Using DTrace to Profile and Debug A C++ Program
Instructor Notes
Using DTrace to Profile and Debug A C++ Program ➊ There is no limit (except system resources) on the
number of D scripts that can be run simultaneously.
A sample program CCtest was created to demonstrate an error common to C++
applications -- the memory leak. In many cases, a memory leak occurs when an ➋ Different users can debug the system simultaneously
object is created, but never destroyed, and such is the case with the program without causing data corruption or collision issues.
contained in this module.
➊–➋ When debugging a C++ program, you may notice that your compiler converts
some C++ names into mangled, semi-intelligible strings of characters and digits.
This name mangling is an implementation detail required for support of C++
function overloading, to provide valid external names for C++ function names
that include special characters, and to distinguish instances of the same name
declared in different namespaces and classes.
For example, using nm to extract the symbol table from a sample program named
CCtest produces the following output:
# /usr/ccs/bin/nm CCtest
...
[61] | 134549248| 53|FUNC |GLOB |0 |9 |__1cJTestClass2T5B6M_v_
[85] | 134549301| 47|FUNC |GLOB |0 |9 |__1cJTestClass2T6M_v_
[76] | 134549136| 37|FUNC |GLOB |0 |9 |__1cJTestClass2t5B6M_v_
[62] | 134549173| 71|FUNC |GLOB |0 |9 |__1cJTestClass2t5B6Mpc_v_
[64] | 134549136| 37|FUNC |GLOB |0 |9 |__1cJTestClass2t6M_v_
[89] | 134549173| 71|FUNC |GLOB |0 |9 |__1cJTestClass2t6Mpc_v_
[80] | 134616000| 16|OBJT |GLOB |0 |18 |__1cJTestClassG__vtbl_
[91] | 134549348| 16|FUNC |GLOB |0 |9 |__1cJTestClassJClassName6kM_pc_
...
Note – Source code and makefile for CCtest are included at the end of this module.
From this output, you may correctly assume that a number of these mangled
symbols are associated with a class named TestClass, but you cannot readily
determine whether these symbols are associated with constructors, destructors,
or class functions.
The Sun Studio compiler includes the following three utilities that can be used to
translate the mangled symbols to their C++ counterparts: nm -C, dem, and
c++filt.
Note – Sun Studio 10 software is used here, but the examples were tested with both
Sun Studio 9 and 10.
If your C++ application was compiled with gcc/g++, you have an additional
choice for demangling your application -- in addition to c++filt, which
recognizes both Sun Studio and GNU mangled names, the open source gc++filt
found in /usr/sfw/bin can be used to demangle the symbols contained in your
g++ application.
# dem ‘nm CCtest | awk -F\| ’{ print $NF; }’‘ | egrep "new|delete"
__1c2k6Fpv_v_ == void operator delete(void*)
__1c2n6FI_pv_ == void*operator new(unsigned)
#!/usr/sbin/dtrace -s
pid$1::__1c2n6FI_pv_:
{
@n[probefunc] = count();
}
pid$1::__1c2k6Fpv_v_:
{
@d[probefunc] = count();
}
END
{
printa(@n);
printa(@d);
}
Start the CCtest program in one window, then execute the script we just created
in another window as follows:
The DTrace output is piped through c++filt to demangle the C++ symbols, with
the following caution.
Caution – You can't exit the DTrace script with a ^C as you would do normally
because c++filt will be killed along with DTrace and you're left with no output.
To display the output of this command, go to another window on your system
and type:
# pkill dtrace
Window 1:
# ./CCtest
Window 2:
Window 3:
# pkill dtrace
The output of our aggregation script in window 2 should look like this:
void*operator new(unsigned) 12
void operator delete(void*) 8
So, we may be on the right track with the theory that we are creating more objects
than we are deleting.
Let's check the memory addresses of our objects and attempt to match the
instances of new() and delete(). The DTrace argument variables are used to
display the addresses associated with our objects. Since a pointer to the object is
contained in the return value of new(), we should see the same pointer value as
arg0 in the call to delete(). With a slight modification to our initial script, we
now have the following script, named CCaddr.d:
# pkill dtrace
Our output looks like a repeating pattern of three calls to new() and two calls to
delete():
As you inspect the repeating output, a pattern emerges. It seems that the first
new() of the repeating pattern does not have a corresponding call to delete(). At
this point we have identified the source of the memory leak!
Let's continue with DTrace and see what else we can learn from this information.
We still do not know what type of class is associated with the object created at
address 809e480. Including a call to ustack() on entry to new() provides a hint.
Here's the modification to our previous script, renamed CCstack.d:
/*
__1c2k6Fpv_v_ == void operator delete(void*)
__1c2n6FI_pv_ == void*operator new(unsigned)
*/
pid$1::__1c2n6FI_pv_:entry
{
ustack();
}
pid$1::__1c2n6FI_pv_:return
{
printf("%s: %x\n", probefunc, arg1);
}
pid$1::__1c2k6Fpv_v_:entry
{
printf("%s: %x\n", probefunc, arg0);
}
libCrun.so.1‘void*operator new(unsigned)
CCtest‘main+0x19
CCtest‘0x8050cda
void*operator new(unsigned): 80a2bd0
libCrun.so.1‘void*operator new(unsigned)
CCtest‘main+0x57
CCtest‘0x8050cda
void*operator new(unsigned): 8068a70
libCrun.so.1‘void*operator new(unsigned)
CCtest‘main+0x9a
CCtest‘0x8050cda
void*operator new(unsigned): 80a2bf0
void operator delete(void*): 8068a70
void operator delete(void*): 80a2bf0
Our constructor is called after the call to new, at offset main+0x23. So, we have
identified a call to the constructor __1cJTestClass2t5B6M_v_ that is never
destroyed. Using dem to demangle this symbol produces:
# dem __1cJTestClass2t5B6M_v_
__1cJTestClass2t5B6M_v_ == TestClass::TestClass #Nvariant 1()
Thus, a call to new TestClass() at main+0x19 is the cause of the memory leak.
Examining the CCtest.cc source file reveals:
...
t = new TestClass();
cout << t->ClassName();
delete(t);
delete(tt);
...
It's clear that the first use of the variable t = new TestClass(); is overwritten by
the second use: t = new TestClass((const char *)"Hello.");. The memory
leak has been identified and a fix can be implemented.
The DTrace pid provider allows you to enable a probe at any instruction
associated with a process that is being examined. This example is intended to
model the DTrace approach to interactive process debugging. DTrace features
used in this example include: aggregations, displaying function arguments and
return values, and viewing the user call stack. The dem and c++filt commands in
Sun Studio software and the gc++filt in gcc were used to extract the function
probes from the program symbol table and display the DTrace output in a
source-compatible format. Source files created for this example:
class TestClass
{
public:
TestClass();
TestClass(const char *name);
TestClass(int i);
virtual ~TestClass();
virtual char *ClassName() const;
private:
char *str;
};
TestClass.cc:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include "TestClass.h"
TestClass::TestClass() {
str=strdup("empty.");
}
TestClass::TestClass(int i) {
str=(char *)malloc(128);
sprintf(str, "Integer = %d", i);
}
TestClass::~TestClass() {
if ( str )
free(str);
}
#include <iostream.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include "TestClass.h"
while (1) {
t = new TestClass();
cout << t->ClassName();
delete(t);
delete(tt);
sleep(1);
}
}
OBJS=CCtest.o TestClass.o
PROGS=CCtest
CC=CC
all: $(PROGS)
echo "Done."
clean:
rm $(OBJS) $(PROGS)
CCtest: $(OBJS)
$(CC) -o CCtest $(OBJS)
.cc.o:
$(CC) $(CFLAGS) -c $<
10
M O D U L E 1 0
Objectives
This module will build on what we've learned about using DTrace to observe
processes by examining a page fault. Then, we'll incorporate low-level debugging
with MDB to find the problem in the code.
Additional Resources
Solaris Modular Debugger Guide Sun Microsystems, Inc., 2007.
103
Software Memory Management
Instructor Notes
Software Memory Management ➊ The particular fault shown in this module is a major
page fault, that is, it results in I/O on the disk.
OpenSolaris memory management uses software constructs called segments to
manage virtual memory of processes as well as the kernel itself. Most of the data ➋ By contrast, a minor page fault does not result in I/O.
structures involved in the software side of memory management are defined in
/usr/include/vm/*.h. In this module, we'll examine the code and data ➌ For example, paging in a page of code for an
➊–➍ structures used to handle page faults. executable is a major fault.
Summary
We'll start with a DTrace script to trace the actions of a single page fault for a
given process. The script prints the user virtual address that caused the fault, and
then traces every function that is called from the time of the fault until the page
fault handler returns. We'll use the output of the script to determine what source
code needs to be examined for more detail.
Note – In this module, we've added text to the extensive code output to guide the
exercise. Look for the <----symbol to find associated text in the output.
pagefault:entry
/execname == $$1/
{
printf("fault occurred on address = %p\n", args[0]);
self->in = 1;
}
pagefault:return
/self->in == 1/
{
self->in = 0;
exit(0);
}
entry
/self->in == 1/
{
}
return
/self->in == 1/
{
}
Note – You need to specify mozilla-bin as the executable name, as mozilla is not
an exact match with the name. Also, assertions are turned on, so you'll see various
calls to mutex_owner(), for instance, which is only used with ASSERT().
Assertions are turned on only for debug kernels.
# ./pagefault.d mozilla-bin
dtrace: script ’./pagefault.d’ matched 42626 probes
Remember that the above output has been shortened. At a high level, the
following has happened on the page fault:
4 Use mdb to examine the kernel data structures and locate the page of physical
memory that corresponds to the fault as follows:
Note – The search for the segment containing the fault address found the
correct segment after 8 segments. See calls to as_segcompar in the DTrace
output above. Using an AVL tree shortens the search!
Note – If you want to follow along, you may want to use: ::log /tmp/logfile
in mdb and then !vi /tmp/logfile to search. Or, you can just run mdb within
an editor buffer.
# mdb -k
Loading modules: [ unix krtld genunix specfs dtrace
ufs ip sctp usba random fctl s1394
nca lofs crypto nfs audiosup sppp cpc fcip ptm ipc ]
> ::ps !grep mozilla-bin <-- find the mozilla-bin process
R 933 919 887 885 100 0x42014000 ffffffff81d6a040 mozilla-bin
> bd62*1000=K <-- multiple page frame number time page size (hex)
bd62000 <-- here is physical address of page
> bd62000+ea2,10/ai <-- data looks like code, let’s try dumping as code
0xbd62ea2:
0xbd62ea2: pushq %rbp
0xbd62ea3: movl %esp,%ebp
> 0::context
debugger context set to kernel
d. Extra credit: walk the page tables of the process to see how a virtual address
gets translated into a physical one.
11
M O D U L E 1 1
Objectives
The objective of this module is to learn about how you can use DTrace to debug
your driver development projects by reviewing a case study.
117
Porting the smbfs Driver from Linux to the Solaris OS
Instructor Notes
First, create an smbfs driver template based on Sun's nfs driver. After the driver
compiles successfully, test that the driver can be loaded and unloaded
successfully. First copy the prototype driver to /usr/kernel/fs and attempt to
modload it by hand:
# modload /usr/kernel/fs/smbfs
can’t load module: Out of memory or no room in system tables
Searching for the system call missing message, reveals it is in the function
mod_getsysent() in the file modconf.c, on a failed call to mod_getsysnum.
Instead of manually searching the flow of mod_getsysnum() from source file to
source file, here's a simple DTrace script to enable all entry and return events in
the fbt (Function Boundary Tracing) provider once mod_getsynum() is entered.
#!/usr/sbin/dtrace -s
fbt::mod_getsysnum:entry
/execname == "modload"/
{
self->follow = 1;
}
fbt::mod_getsysnum:return
{
self->follow = 0;
trace(arg1);
}
fbt:::entry
/self->follow/
{
}
fbt:::return
/self->follow/
{
trace(arg1);
}
Executing this script and running the modload command in another window
produces the following output:
# ./mod_getsysnum.d
dtrace: script ’./mod_getsysnum.d’ matched 35750 probes
CPU FUNCTION
0 -> mod_getsysnum
0 -> find_mbind
0 -> nm_hash
0 <- nm_hash 41
0 -> strcmp
0 <- strcmp 4294967295
0 -> strcmp
0 <- strcmp 7
0 <- find_mbind 0
0 <- mod_getsysnum 4294967295
To view the contents of the search string we add a strcmp() trace to our previous
mod_getsysnum.d script:
fbt::strcmp:entry
{
printf("name:%s, hash:%s", stringof(arg0),
stringof(arg1));
}
Here are the results of our next attempt to load our driver:
# ./mod_getsysnum.d
dtrace: script ’./mod_getsysnum.d’ matched 35751 probes
CPU FUNCTION
0 -> mod_getsysnum
0 -> find_mbind
0 -> nm_hash
0 <- nm_hash 41
0 -> strcmp
0 | strcmp:entry name:smbfs,
hash:timer_getoverrun
0 <- strcmp 4294967295
0 -> strcmp
0 | strcmp:entry name:smbfs,
hash:lwp_sema_post
0 <- strcmp 7
0 <- find_mbind 0
0 <- mod_getsysnum 4294967295
So we're looking for smbfs in a hash table, and it's not present. How does smbfs
get into this hash table? Let's return to find_mbind() and observe that the hash
table variable sb_hashtab is passed to the failing nm_hash() function.
A quick search of the source code reveals that sb_hashtab is initialized with a call
to read_binding_file(), which takes as its arguments a config file, the hash
table, and a function pointer. A few more clicks on our source code browser
’smbfs 177’
(read_binding_file() is read once at boot time.)
# modload /usr/kernel/fs/smbfs
Note – Remember that this driver was based on an nfs template, which explains
this output.
# modunload -i 160
can’t unload the module: Device busy
This is most likely due to an EBUSY errno return value. But now, since the smbfs
driver is a loaded module, we have access to all of the smbfs functions:
# dtrace -l fbt:smbfs:: | wc -l
1002
This is amazing! Without any special coding, we now have access to 1002 entry
and return events contained in the driver. These 1002 function handles allow us
to debug my work without a special 'instrumented code' version of the driver!
Let's monitor all smbfs calls when modunload is called, using this simple DTrace
script:
fbt:smbfs::entry
{
}
fbt:smbfs::return
{
trace(arg1);
}
It seems that the smbfs code is not being accessed by modunload. So, let's use
DTrace to look at modunload with this script:
#!/usr/sbin/dtrace -s
fbt::modunload:entry
{
self->follow = 1;
trace(execname);
trace(arg0);
}
fbt::modunload:return
{
self->follow = 0;
trace(arg1);
}
fbt:::entry
/self->follow/
{
}
fbt:::return
/self->follow/
{
trace(arg1);
}
# ./modunload.d
dtrace: script ’./modunload.d’ matched 36695 probes
CPU FUNCTION
0 -> modunload modunload 160
0 | modunload:entry
0 -> mod_hold_by_id
0 -> mod_circdep
0 <- mod_circdep 0
0 -> mod_hold_by_modctl
0 <- mod_hold_by_modctl 0
0 <- mod_hold_by_id 3602566648
0 -> moduninstall
0 <- moduninstall 16
0 -> mod_release_mod
0 -> mod_release
0 <- mod_release 3602566648
0 <- mod_release_mod 3602566648
0 <- modunload 16
Observe that the EBUSY return value '16' is coming from moduninstall. Let's take
a look at the source code for moduninstall. moduninstall returns EBUSY in a few
locations, so let's look at the following possibilities:
1. if (mp->mod_prim || mp->mod_ref || mp->mod_nenabled != 0) return
(EBUSY);
2. if ( detach_driver(mp->mod_modname) != 0 ) return (EBUSY);
3. if ( kobj_lookup(mp->mod_mp, "_fini") == NULL )
4. A failed call to smbfs _fini() routine
We can't directly access all of these possibilities, but let's approach them from a
process of elimination. We'll use the following script to display the contents of the
various structures and return values in moduninstall:
#!/usr/sbin/dtrace -s
fbt::moduninstall:entry
{
self->follow = 1;
printf("mod_prim:%d\n",
fbt::moduninstall:return
{
self->follow = 0;
trace(arg1);
}
fbt::kobj_lookup:entry
/self->follow/
{
}
fbt::kobj_lookup:return
/self->follow/
{
trace(arg1);
}
fbt::detach_driver:entry
/self->follow/
{
}
fbt::detach_driver:return
/self->follow/
{
trace(arg1);
}
# ./moduninstall.d
dtrace: script ’./moduninstall.d’ matched 6 probes
CPU FUNCTION
0 -> moduninstall
mod_prim:0
mod_ref:0
Comparing this output to the code tells us that the failure is not due to the mp
structure values or the return values from detach_driver() of kobj_lookup().
Thus, by a process of elimination, it must be the status returned via the status =
(*func)(); call, which calls the smbfs _fini() routine. And here's what the
smbfs _fini() routine contains:
int _fini(void)
{
/* don’t allow module to be unloaded */
return (EBUSY);
}
Changing the return value to '0' and recompiling the code results in a driver that
we can now load and unload, thus we have completed the objectives of this
exercise. We've used the Function Boundary Tracing provider exclusively in
these examples. Note that fbt is only one of DTrace's many providers.
A
A P P E N D I X A
OpenSolaris Resources
To get more information, support, and training, use the following resources.
■ Community Documentation —
http://opensolaris.org/os/community/documentation
■ Sun Documentation — http://www.sun.com/documentation
■ Sun Support — http://www.sun.com/support
■ Sun Training — Sun offers a complete range of professional Solaris training
and certification options to help you apply this powerful platform for greater
success in your operations. To find out more about Solaris training, please go
to:http://www.sun.com/training/catalog/operating_systems/index.xml
127
128