Вы находитесь на странице: 1из 45

The Coda File System

Imagine, if you can, a network file system so realiable that the clients keep working even if the server falls over. Imagine using it at home without dialling up, or connecting once a day for a minute to syncronise with the server. That's Coda; not vapourware, but something that exists now. I'll discuss the Coda logo later.

Understanding Coda
Should I use it? How does it work? How do I set it up? How do I manage it? After reading this, I hope you can answer the question `Should I use Coda?' And to help you answer it, I'll try to explain what it does and how it works, what's involved in setting it up, and what's involved in keeping it going. I'll begin with an overview of Coda, continue with a deeper look at certain points, and then give a summary and an evaluation of Coda and its current state of development and usefulness.

Concepts and Features


We'll start with a little history, some concepts, and a quick look at a few big features of Coda.

Network File Systems


NFS AFS and relatives Sprite, GFS, Novell, Samba, and others I've found it surprisingly difficult to pin down the early history of distributed file systems. The first reference I can find to Sun's NFS is a paper presented in 1985, when the software was already working and deployed. A few years later, when we started buying Hewlett Packard workstations in Manchester, Sun was urging other Unix vendors to include NFS with their Operating Systems, and HP were encouraging their customers to switch to NFS from the HP system, which I never tried myself. (It was called Remote File Access, or something like that.) The success of NFS is one example of the success of open standards in software. In the same year 1985 I find the first paper referring to the AFS file system, the immediate ancestor of Coda and several other closely related file systems. AFS was developed (under a different name) at Carnegie Mellon University in Pittsburgh, and soon after 1985 its name was changed to 'the Andrews File System' (after Andrew Carnegie and Andrew Mellon, the two patrons of Carnegie Mellon University). CMU spawned the Transarc Company to develop and market AFS. I don't know exactly when the other network file systems arose, but I doubt any of them were earlier than NFS or AFS; some of them of course arrived on the scene much later.

AFS and its relatives


AFS Arla Coda DFS Coda is one of several relatives of AFS, which resemble each other in many respects, and which differ from other shared files systems. AFS itself has been a commercial product since about 1987. It still exists, and it got a new lease on life when IBM bought Transarc. AFS is very well documented, and the documents are all available free on line; it's also discussed in a number of books about file systems. If you're interested in Coda, some of this information may be relevant. I'll say something about Coda's own documentation later. The Arla project at the Royal Technical Hochschule (KTH) in Stockholm is an attempt to create a free clone of AFS that interoperates with it. Arla has concentrated on duplicating the AFS client software and the user interface, though some work is now being done on Arla servers. This project is being encouraged by IBM and Transarc, who have donated some code to it, and some of the early AFS developers are helping out in small ways, at least. Coda is a research project that began at CMU about 1987. It was designed as an AFS-like file system which supports mobile computing, and which is more robust than AFS when it faces network problems and server failures. A paper which appeared in 1987 says that the design of Coda is complete, but that no code has yet been written. The file system has been developed at CMU since then by the systems group of M. Satyanarayanan. DFS is a file system that is based on AFS and was originally intended to supersede it. Several major computer manufacturers, including IBM, DEC, and HP, put together a 'Distributed Computing Environment' which they hoped would become a sort of industry standard; the 'Common Desktop Environment' CDE is perhaps the most successful piece of DCE. DFS itself has never really caught on, partially because of some early difficulties in getting it working, and partially, in my view, because of the excessive licence fees. I have had some experience with all four of these systems, though I have seen relatively little of DFS, because we in Manchester decided not to buy it. These file systems are quite different from NFS or the other systems I have named. In some ways, the structure of data in AFS and DFS is managed globally, so that all AFS or DFS users in the world see the same file system. If I make a change to my AFS cell, it is visible immediately not only to all the machines in my cell, but to everyone else as well. Coda lacks the global aspect; so far as I can see, I can't make two Coda cells visible on one client (at least not without running two independent Coda client processes for two different mount points). But Coda management is cell-based, like that of AFS and DFS. In other words, if I mount something in the file system, it immediately appears everywhere in the cell.

Kinds of machine
Clients File and Data Servers System Control Machine (SCM) Backup machines A cell contains several kinds of machine, including, of course, the clients. A client is any machine which can see a cell's Coda file system. In addition to clients, there is another kind of machine which we may call a file and data server, since Coda deals with a lot of data besides just files. One of the file and data servers plays a special role in a cell: the System Control Machine, or SCM. It is the master server, and needs to be installed first. And if you plan to backup your Coda system, you will need a backup machine. A single machine may perform all four functions, so that you can have a cell with only one machine in it. Servers do not need to be clients in a Coda cell, nor do backup machines. But you will need more than one file and data server to take advantage of Coda's best features. The organisation of a Coda cell has some implications for security: you need to protect the integrity of your file and data servers. In general, a hacker who breaks into a Coda client cannot compromise the security of the servers. Of course, if he lurks undetected, with root access, the blackguard may learn something compromising from another user of the cell.

Clients: the Cache


In (virtual) memory or on disk 100 - 200 MB? Cache manager (venus) A client machine has an area called `the cache', where file system data stays temporarily. This cache is normally on disk, though in AFS at least it can be in virtual memory. The advantage of having all (or part) of your cache in memory is that, under optimal conditions, you can read files and directories faster than if they were on local disks. (The Linux buffer cache has something of the same effect.) The cache needs to be big enough to hold the biggest file you wish to read or write, as well as some other data. So if I have a cache of 200mb, I won't be able to read a file larger than this. Coda documents recommend a cache of about 20mb, but I prefer to use a much larger one. It depends on the specific use you make of Coda. AFS and, I think, DFS allow you to cache only parts of a large file, and so get round the size limitation, though with some problems. A local process called the cache manager manages the data in the cache; on Coda clients it is named `venus'.

Cache manager (venus)


Program calls open() Kernel VFS layer Kernel Coda module Venus process Contact file server Return location of data to kernel Suppose a program running on my machine tries to open a file in the Coda file system. The program calls the function open(), which contacts the kernel's virtual file system layer. This notices that the file is in the Coda file system, and passes the request to the kernel's Coda module. This contacts the venus process through a character device in /dev. The venus process is responsible for getting the requested data into the cache, and for making sure it is up-to-date. It may contact the file server to do this, and it may need to flush older data out of the cache to make room for the file. If it already has a copy of the file locally, it may ask the file server whether this is current, to avoid having to recopy it. Files can persist in the cache even across reboots, and of course the cache manager doesn't contact the server during disconnected operation. Incidentally, if you have read or written a file, and you want to be sure it isn't in the cache for security reasons, you can force the cache manager to get rid of it; but the data in the file will probably remain on disk. When the cache manager has a current copy of the file in the cache, it returns this information to the kernel, which then opens it like any other file.

Volumes
Virtual partition Mounted globally Quota Cloning Different kinds of volume The building block of a Coda file system is the volume. A volume is something like a virtual partition. It may exist physically on one server, or it may have several identical copies on two or more servers. It must be mounted cell-wide if it is to be read or written, but it can be backed up without being mounted. Quota is managed by setting a limit on each volume. A volume may have no quota set, in which case it can grow until one of the physical partitions where it lives fills up. Coda does something to volumes called cloning, which we need to look at more closely. Also there are several different kinds of volume.

Cloning
Like fork: copy on write Occupies same space (initially) Like a collection of hard links One copy is read-only Helps backup Used to replicate (copy) volumes Cloning does to a volume something like what fork does to a Unix process. The original volume and the clone share the same disk space, which in the clone is read-only, and in the original volume is copy-on-write. The original and its clone are rather like two identical collections of hard links to the same files. When you change a file in the volume, the link to the read-only copy of that file is broken, and a new file is created. So the clone is like a backup of a volume at the time it was taken. It requires very little disk space unless the volume changes substantially. The clone of a volume can itself serve as a limited backup of the volume. In one of our cells, we clone all user volumes every night, and if you accidentally delete or mangle a file, you can read or copy the file from the clone without bothering the system administrator. When you backup a Coda system, you don't need to stop using it; instead, you clone each volume to be backed up, and you dump the clone to tape, or to a staging disk from which you can do something else with it. Cloning was used to produce read-only replicated volumes, but Coda seems to be dropping support for these.

Types of volume
Simple (read-write) Read-only (e.g., backup) Replicated (read-only) Replicated (read-write) (Coda only) We have already seen two types of volume, a simple read-write volume, and a read-only volume such as a clone. When you create a full dump of a Coda volume, you can restore this as a readonly volume. Read-only volumes are also useful because you can make identical copies on several servers, and access to these read-only replicas may be much more efficient than access to a readwrite volume. A replicated volume is fully usable as long as at least one running server has a copy of it; this may improve the availability of the data. (Coda seems to be dropping support for readonly replicas. It seems that a slight gain in efficiency is offset by the increased complexity of the code.) Besides volumes of these kinds, which all Coda's cousins support, Coda also has read-write replicas. These are convenient because they give the advantage of high availability, though writing to them may be less efficient than writing to simple read-write volumes. Coda classes some volumes as read-write replicas even though they exist on only one server. Read-write replicas, whether they live on one server or on several, are the normal kind of volume in Coda cells, and you should plan your cell structure with this in mind. Replication is something like having a highly configurable, flexible RAID system in software. It is true that there is some management overhead, but we find that hardware RAID systems can be down for a long time if the controller breaks, which has happened to us several times now.

Disconnected operation
Mobile computing Accidental disconnection High availability Deliberate disconnection Hoarding (selection and priority) Reconnecting Fixing conflicts The idea of disconnected operation is central to Coda, and was part of its design from the beginning; the earliest Coda papers speak of `mobile computing'. A client may lose contact with the cell's servers accidentally, whether through network failure or because servers go down, or deliberately, because you are removing the client, or because networking over a phone line is too expensive to do continuously. Besides files that may happen to be in the cache, Coda lets you select portions of the file system which the cache manager tries (if possible) to `hoard'; that is, to keep current. You can set priorities to guide the cache manager in this. Reconnecting is easy if you haven't changed anything. If you have, if you created or edited files while the system was disconnected, you need to tell the system to reintegrate your changes. Of course, it is possible while your system is disconnected, for two people to change the same file or directory independently. In that case, Coda provides a mechanism for comparing the different versions, harmonising them, and resolving the conflict.

Coda Logo Artist Gaich Muramatsu The Japanese artist Gaich Muramatsu has created several entertaining drawings about Coda, which illustrate its various features. The Coda logo derives from a pair of pictures which contrast AFS users, who are unhappy when they lose their network connections, and Coda users, who continue to work without a problem in similar circumstances.

What happens with AFS

What happens with Coda

Supported Platforms
Linux (all platforms?) FreeBSD NetBSD (all platforms?) Solaris (sparc and i386) Windows 95/98 NT (Coda server only) Coda runs under Linux on all platforms, though I suspect it has been well tested only on the i386 and perhaps the sparc Linux. It was developed originally on BSD systems based on the Mach microkernel, so it has been ported to FreeBSD and NetBSD; in the latter case again I suspect it hasn't been tested as thoroughly on some of the less common platforms. Recently Solaris ports were announced, for the sparc and i386 platforms; these are fairly new, and not as well-used as the others. Coda can be compiled in the CygWin environment from Cygnus, and the resulting client and server does work under Windows 95 and 98, though there are known to be some problems with this. The Coda server works under NT as well, though the Coda client does not.

Features
I'd like to look next at some of Coda's features, particularly those in which it differs significantly from NFS. These cause some slight limitations, which you need to know if you plan to use Coda.

Users and Groups


Internal to Coda pdbtool groups have owners users or groups may own groups owners should be able to manage their groups but in Coda this does not work Coda has its own users and groups. These are internal to Coda; they are not the same as Unix users or groups. Coda users correspond to positive integers, while groups correspond to negative integers. Your life will be easier if most Coda users and Unix users have the same names and ID numbers. The program `pdbtool' manages users and groups. It must be run by root on the System Control Machine. Coda groups have owners, and both users and groups may own groups. Under AFS and DFS, any user may create a group, delete the groups he owns, and add other users to his groups. None of this works in a Coda cell, and this is in my opinion something that needs to be fixed. At present, a cell with many users will probably be more of a burden to its administrator because of this.

Passwords and Authentication


Choice of three systems: Coda internal system Kerberos 4 Kerberos 5 MIT Kerberos and KTH Kerberos supported When you set up a Coda cell, you can choose from one of several authentication systems: Coda's internal system, Kerberos 4, or Kerberos 5. Coda's current internal authentication system is very weak and should not be used in a production cell. Coda can use both authentic MIT Kerberos or the versions of Kerberos 4 and 5 available from KTH in Stockholm, the same place where the Arla project is being developed. If you use KTH Kerberos, you need to make some small changes to the Coda source if you compile it yourself. Patches are available for this; you'll need to change a couple of Makefiles and about 11 lines in one source file. Once you have a Kerberos ticket, the program 'kclog' gets a Coda token for you without further interaction, so it could in principle be placed in the profile of a Unix machine which uses Kerberos authentication in its login procedure.

Access Control Lists


On Directories Not on Files Give permissions to Coda users and groups Effect modified by mode bits on files Hard links allowed only in the same directory Permissions may be positive or negative A major feature of the way objects are protected in Coda is the use of Access Control Lists on directories. An Access Control List, or ACL, is attached to each directory in Coda, and determines to a considerable extent who can do what in that directory. You cannot attach a Coda ACL to a file, unlike DFS ACLs. Each item in an ACL gives or takes away some permission to or from a Coda user or group. As we shall see, the effect of a directory ACL is modified slightly by the mode bits on files in that directory. Because the ACL on a directory has much more influence on its contents than Unix directory modes have ordinarily, Coda does not allow you to make a hard link in one directory to a file in another directory. This occasionally may cause problems when you extract tar files inside Coda, or when some Makefile assumes you can create hard links between files in different directories. Coda ACLs are not the same as Posix ACLs. Also, some Unix programs may have problems on a Coda client if they make 'illegitimate' use of mode bits instead of calling access() to determine what they are able to do.

Access Control
R Read files in this directory L Attach to and list this directory I Insert a file into this directory D Delete a file from this directory K Lock files in this directory (not Coda) W Write or append to files in this directory A Change ACLs on this directory These are the seven access permissions which may appear in an ACL. R or read access lets you read files in a directory. L, list, or lookup access lets you attach to a directory and list its contents. Coda does not support `hidden' directories, which a user can access without being able to do a directory listing. None of the other permissions actually works unless you have L access to the directory as well. I or insert access, and D or delete access let you insert or delete files in a directory respectively, or create or delete a subdirectory respectively, or mount or unmount a volume there. The K or lock access is intended to control the ability to lock files in a directory, but this has no effect in a Coda system. W or write access lets you write or append to files in a directory, and A or admin access lets you change a directory's Access Control List.

Mode Bits
Ignored for directories For files, group and other mode bits ignored R mode bit must be set to read a file R and X bits must be set to execute a file W bit must be set to write to or delete a file Suid and sgid may not do what you expect In Coda, the Unix mode bits on directories are ignored. On files, the group and other mode bits are ignored; only the user bits have any effect. To read a file you need read and lookup access to the directory through its ACLs, and the file needs its R mode bit set. (While other Unix file systems allow you to read files in `hidden' directories if you know their names, Coda does not.) To execute a file, you need read and lookup ACLs, and you need both R and X mode bits set. (While some other Unix file systems allow you to execute a file which you cannot read, Coda does not.) And you cannot write or delete a file unless its W bit is set. Coda remembers the Coda user ID of a file, the ID of the Coda user who created it. But the owner and group of a file have no influence on who can access it. The Coda system administrator can make an executable file in Coda SUID, in which case the command gets the effective Unix user ID of the file owner. Note that this does not give the program the access which that Coda user has; it does not give a running SUID program an effective Coda ID. This may cause problems if you want cron jobs to write into Coda. All files and directories in Coda appear to have the Unix group `nogroup', so SGID programs are not useful.

Warnings
Flush() does not send data to fileservers: you must close() file before data gets written. Coda is not suitable for many log files. Commands like 'tail -f' may not work. When a process on one machine is writing, it may not be visible on another machine. Besides the above-mentioned problems related to ACLs and mode bits, you should be aware that a Coda client does not send data to fileservers when a running program calls flush(). You have to close() a file before it gets sent to the fileserver, and even then it doesn't happen if you are disconnected. After disconnected operation, you can't reintegrate without authenticating first. For this reason, Coda may not be a good place to write log files. Commands like 'tail -f' don't work unless you are on the same client machine. A file being written by a process on one machine cannot be read on another machine until the writing process closes the file.

More Warnings
Databases should not have data files in Coda. Data may be lost if the writing process fails. Processes on other machines may not see data. But this warning applies only to databases managed by daemons which leave them open. For this reason, Coda is not in general a good place to store databases. If the writing process aborts, or the client reboots, data that has been written may be lost. If you intend to share data across machines, you may have problems when a file is open. Note that this applies only to data base files which get left open by their managers; Berkeley DB-style databases are usually OK, as are others which get closed after being changed.

Looking Deeper
I hope by this time you have some idea of what Coda can do and how it works. Now I'd like to look a little more deeply into system administration and disconnected operation.

Internal Data
Files: /vicepa Data about volumes (kind, where, backed up) Data about protection Data for encryption (shared keys) Data about backup Data about cell Recoverable virtual memory Coda file and data servers store several kinds of data. The data which appears on clients as files, directories, and so on is stored on the file server, of course, but not in this format. Each Coda file server has at least one physical partition, usually named '/vicepa', '/vicepb', and so forth, where some of the contents of the Coda file system are stored in a binary format. There is approximately one file per file, but they are arranged in a hash tree. The file servers store information about the cell's volumes (what kind they are, where they are, and so on), the mapping between Coda users and groups and their ID numbers, the list of groups to which each user belongs, the list of members for each group, the shared secrets which allow servers to communicate with each other, information about backup (whether each volume has been or will be backed up, when, so that you can do incremental backups); information about the cell (which machine is the master SCM server, what other servers there are, and so on). Data on its way to its final home in the vice partitions gets stored in 'recoverable virtual memory' on that file server, to protect its integrity. Normally RVM requires two raw partitions on each file server, in addition to the partition(s) used for /vicepa, (/vicepb,...). That is, if you plan to make a machine a Coda file server, you should have at least three physical partitions to be devoted to Coda: two raw partitions to be used by RVM, and one file system mounted on /vicepa to contain Coda files.

Servers
codasrv: the workhorse updateclnt and updatesrv rpc2portmap kauth2 Now let's look at the processes that run on a Coda server. A file server always has a process called `codasrv', which is the workhorse: it serves files to Coda clients on other machines, and it keeps the vice partitions upto-date. Processes called `updateclnt' and `updatesrv' transfer between the various servers in a Coda cell the other data which Coda keeps. This data is kept in one directory, /vice/db, on each server. Every 30 seconds these processes check to see whether the data is the same on all servers, and if not, they copy it across. This may be very inefficient if you have many volumes or many users, at least if this data changes frequently. The rpc2portmap daemon listens for incoming requests for Coda services and passes them on. kauth2 receives requests for Coda tokens, and passes them out when the client has a valid Kerberos ticket.

Server Security
If you are a hacker who breaks into a Coda client, you can't access protected data in Coda without a Coda token. You can, of course, read the contents of the local cache if you have root access. You can't change Coda files unless they are world writable; that is, unless the Coda group System:AnyUser has write and lookup access to that directory. On the other hand, a hacker who breaks into a server can run pdbtool to create users, delete data, fake a Coda token, and do a great deal of damage. We may say that Coda affords some protection to the clients and the data, but you must take pains to guard the Coda server against attack.

Backing Up
backup program backup.sh script Requires physical backup partition well documented (with exceptions) configured by dumplist file: 7F000002 IIIFIII coda.proj 7F000003 IFIIIII coda.user Coda comes with a backup program `backup' and a shell script `backup.sh' which runs it. Together they read a list of volumes to be backed up, and do this (if possible). Backup clones the volume to be backed up, then dumps the clone to a file. It supports both full and incremental backups, and is designed to be run daily or weekly. There is a script `backup.sh' is designed to be run by cron. It is the script used at CMU, and the installation does not attempt to edit it for your cell. The backup program must backup volumes to a physical partition, since it is intended to be used with some utility like 'dump' to copy the whole partition to tape. The backup procedure is fairly well documented, though the documents do not say that you must customise backup.sh to make it work in your cell. Here are a couple of lines from a sample dumplist file: 7F000002 IIIFIII coda.proj 7F000003 IFIIIII coda.user These mean that the volume with groupID 7F000002, named coda.proj, should have a full dump on Wednesdays and incremental dumps every other day, while the other volume has its full dump on Mondays. Dumplist is updated automatically when you create a read-write replicated volume; otherwise you need to edit it by hand.

Restoring from backups


Documentation is weaker than for backup. merge combines full and incremental dumps. No program or script The procedure produces a read-only volume, a copy of the clone. You must copy files and ACLs manually if you wish to restore a whole volume. The Coda system for restoring backed-up volumes is less satisfactory than the backup system. In particular the documentation is rather incomplete and contains some mistakes. You need to experiment with restoring volumes to understand exactly what you need to do. The program `merge' combines one full and one incremental dump file to create a single dump file for the volume to be restored. If you have create several incremental dumps, you need to run merge several times. There is no script to restore a volume; you must figure out the correct commands and arguments to use. This procedure creates a new read-only volume, which you need to mount somewhere. If you wish to restore a whole volume from its backup, you need to create a new read-write volume, and manually to copy the files, mount points, and ACLs from the restored volume. The notes say that restoring the whole system after a crash is difficult; it is in fact a very tedious job.

Disconnected operation Hoard files


clear add /coda/projects 100:c+ priority: 1 -- 1000 optional codes for directories: c, c+, d, d+ volume mounted in hoarded volume does not get hoarded; need explicit reference spy program Coda keeps on each client a list of instructions about what (if anything) it should try to cache, and with what priority. You configure this list with the hoard command, which normally takes its instructions from a hoard file. That is, you could give it a lot of commands manually, but it might get rather tedious. Here is a simple example of a hoard file: clear add /coda/projects 100:c+ It says to clear the stored hoard list, and to add the directory /coda/projects with a priority of 100. Files and directories with larger priority get preference over others. The code `c+' says to include with this directory all the files and directories in it (though not their subdirectories), as well as files and directories that might get added to /coda/projects in the future. The optional directory codes are c for children and d for descendents, where d includes all subdirectories in a volume. A `+' on c or d means include not just the present contents but also future contents. Note that clear and add can be abbreviated to c and a. The children or descendents of a directory do not include volumes mounted in the directory, since a mount mount is something like a symbolic link. If you wish to hoard files or directories which are in another volume, you must specify at least one path to that volume in the hoard file. Coda comes with a program called spy, which is designed to help you construct hoard files. Suppose you intend to use some program inside Coda which has a lot of subdirectories and files, only some of which you may need; for example, suppose you intend to use emacs. You start spy, redirecting its output to a file, Run emacs, doing the sort of thing you expect to be doing. Spy records all the bits of emacs inside Coda that you have used.

Hoard command
hoard -f <filename> hoard clear hoard walk hoard list hoard off hoard on hoard delete <filename> The hoard command configures the way the cache manager keeps files and directories in the cache. The command hoard -f reads a hoard file and sends the commands in it to the cache manager. Venus remembers these commands and checks the cache every 10 minutes to see that these files are there. Note that you cannot tell hoard to save files or directories which you have no right to access; you must have a Coda token to run most hoard commands. `clear' tells the cache manager to forget its hoard list. `walk' forces venus to check the cache now and add files and directories if necessary. `list' prints a list of the cached objects; it reassures me that hoarding is going on. There are also `off' and `on' commands which let you switch the automatic hoard monitoring function off and back on without actually clearing the hoard list, and a `delete' command which removes a file or directory from the hoard list.

Start disconnected server


venus -h 130.88.200.1 -r coda.root & IP address of the SCM name of the Coda root volume, the one to be mounted on /coda This command starts the cache manager in disconnected mode: venus -h 130.88.200.1 -r coda.root & The arguments on the command line are the IP address of the SCM, which is the main Coda data server, and the name of the volume which is to be mounted as /coda, the `Coda root volume'.

Authenticating
kclog kclog -tofile tokenfile kclog -fromfile tokenfile ctokens cunlog By `authenticating' I mean proving to Coda that I am the person who is allowed to do certain things in the file system. The Coda authentication program is `kclog'. Before you run it, you need the Kerberos tickets for a Coda user. On its own, `kclog' or `kclog [username]' checks that your Kerberos tickets are valid, and if so, it gives you a Coda token for a limited period. With the arguments `-tofile [filename]', it also writes the token to a file, so that you can use it in disconnected mode. With the arguments `-fromfile [filename]' it reads and sets the Coda token from a file which you created earlier, without trying to verify your Kerberos credentials. The command `ctokens' lists your Coda tokens, and the command `cunlog' destroys them. The default time limit for Coda tokens is 25 hours. I find this uselessly short for disconnected operation, but the only way to get a longer token is to edit the source of the kauth2 daemon and recompile it. I think there should be some way to specify the lifetime of a token. Getting a very long token, one that lasts weeks or months, is something of a security risk, as is saving it to a file. On the other hand, it is difficult to see how this can be avoided. As far as I can tell, Coda associates the user's Coda token with the Unix user ID which requested it, which is not the safest policy. It seems to me that Coda needs some idea like the Program Authentication Group (or PAG) of AFS, which lets you control this a little better.

Shut down a client


vutil shutdown umount /coda

venus.init stop There are two ways to shut down a client safely. One is to use the command `vutil shutdown', then `umount /coda'. The other is to use the supplied script `venus.init' with the `stop' argument. If you don't shut down a client properly, you risk losing data.

Reintegrate
Start venus (venus &) Get Coda token. cfs fr /coda/myfiles/mountpoint Resolve any conflicts. To reintegrate after disconnected operation, start venus as usual, and get a Coda token, since without a token, you won't be allowed to write changes to the file server. Give the command cfs fr /coda/myfiles/mountpoint (which means, Coda file system, force reintegration on this volume) specifying the mount point of each volume you wish to reintegrate. (This is a minor nuisance if you forget that you have changed some other volume.) If conflicts are reported, you need to resolve them as soon as you can. Conflicts are handled by a process called `repairing', and there are several commands that help you do this. When you begin a repair, the volume is locked, and the inconsistent files are `spread out' as distinct objects in the file system. You can then edit the files and harmonise them before ending the repair and reintegrating the file system again.

Security
Use Kerberos 5. Guard your file servers. Not for highly sensitive data possible security holes because most of Coda has not been stressed Now I'd like to make some points about security. I think you should in any case use Kerberos 5 with Coda, because the default internal authentication is so poor. You need to guard your file servers, since they are by far the most vulnerable point in the cell. And there are potential security holes because most of Coda has not been the subject of serious security testing. Despite this, I suspect that Coda may be more secure than NFS, and certainly not substantially less secure than AFS or DFS, bearing in mind its inadequate encryption. The Coda developers are well aware of the limitations of Coda's encryption, and intend to replace it with something more serious.

Planning and Installing


If you should decide to use Coda, what do you need to know before you install it? How many machines do you need, and what resources should you have on each of them? What should you read to learn more about the system?

Installing Coda
Plan carefully; have one or two trials Many scripts may need to be edited. Join the mailing list, or find a friendly expert. Beware prepackaged binaries. Read through documentation and look at the mailing list archives. It's easier to install Coda when you have a little experience in doing so, but that's true of a lot of software. Begin by doing a one-server cell with one additional client. This will give you the feel of the software and remind you of some critical issues. Next try a small, real cell with two or three other users. To learn the backup system, get it working, then do a full backup and two incrementals on successive days, then delete the whole cell and restore it. This will be very painful. You will probably need to edit many of the scripts that come with Coda; I had to change seven of them for my use, but then I don't like to install every executable in /usr/bin. If you use Coda, you ought to join the Coda mailing list, or at least find an expert to help you over the rough spots. The mailing list is very friendly and (within reasonable bounds) very tolerant of newcomers. If I say `beware of prepackaged binaries', I don't mean that you shouldn't use them, but that they may make choices which you ought to change. I suspect most people will find a prepackaged binary is the best way to try Coda for the first time, but I don't think any of them should be used in a production cell. It is worth reading through the main documentation carefully. Also look at recent archives of the mailing list; the problems you face have probably bothered someone else, and the solutions probably exist already.

Documentation
9 manuals Coda Administration and User Manual Notes and Explanation about Coda Security man pages The Coda documentation, which incidentally is not included with the Coda source, contains 9 manuals. The most useful of these is the Coda Administration and User Manual, though you should also read the Notes and Explanations about Coda Security. The Administration and User Manual contains man pages, though they are (I think) in HTML or PostScript format rather than in troff source format. (Someone has put out a collection of plain man pages, but, the last time I looked, they were a different version from those in the Admin manual.)

Good Points about Coda Docs


Usually very full Many clear explanations Many examples Coda documentation is, for the most part, comprehensive and clear. It contains many examples, which are quite helpful to a new Coda administrator.

Bad Points about Coda Docs


Out of date (until recently): Nothing about new commands or options Sometimes completely misleading: (1) Read-only root volume

(2) Disconnected operation Until recently, Coda's documents were seriously out of date. They didn't cover new commands and new options which had been added to the system in the past two years. They referred to and described at length several procedures which depended essentially on commands which no longer exist. And sometimes they were completely misleading. Since I gave the first version of this talk, this has improved substantially; a new version of the Admin manual has been produced which corrects many of these faults. For example, the Admin manual in several places recommended that you create a replicated read-only root volume for your cell, and gave details of how to do this. The procedure didn't work, and is not supported in the current version of Coda. Disconnected operation involves running `hoard walk'; this appears in the manual, but not in the man page for hoard, which is quite incomplete. Saving Coda tokens for disconnected operation does not appear in the documentation, nor are there instructions for starting venus in disconnected mode. The cfs subcommand `fr' or `forcereintegrate' does not appear in the documentation. In short, the documentation has flaws and omissions; it still needs some work.

Summary
To sum up, I shall give my evaluation of some of Coda's functionality, and suggest work I think it is ready to do, though with some cautions and qualifications. I shall also suggest possible future developments.

Core Functions
File serving Mobile operation Resilience High availability The main thing Coda does is to serve files. The Coda files works extremely well under stress and does not seem to fail. Coda is resilient, whether the network fails, the file server goes down, or you do something unexpected. It is reliable. If you are looking for high availability, Coda is certainly a serious option.

Attractive Applications
Web server Shared files: in, e.g., Beowulf cluster or across an enterprise especially where NFS is inadequate mobile/disconnected operation There are some applications which Coda is particularly suited to support. One of those is web serving. If you need to share web pages across several servers, as in a Linux Virtual Server cluster, or if you need extra security, Coda may be just what you need, and its caching could have been designed with web serving in mind. For the same reasons Coda's cousin AFS got its lifetime extended and found new customers. Coda may be a good choice wherever you need to share files across several machines, such as in a Beowulf cluster, or across a department or a larger enterprise; this is especially the case where NFS is not an appropriate solution because of its poor functionality and limited performance. Coda was designed with mobile, disconnected operation in mind, It works well with slow connections, it allows you to connect briefly and update your hoard, then to disconnect and work for hours, then to reconnect again briefly and reintegrate again.

User Issues
Few users = few problems especially for system administrators There is not much for users to learn Beware of risky acts by users on clients. The difficulties of users of Coda will depend on how many they are, and on how complex their usage is. If you have many users, especially if they want their own groups, if their file store has to be backed up, and so on, it would be quite a headache for the system administrator. Because Coda's file semantics differ slightly from those of plain Unix file systems, you might think users need to learn a lot before they are turned loose on the system. But this is not so: beginners do not need to learn everything, unless perhaps they want to use disconnected operation. The number of utilities which will have problems because of Coda's nonclassical-Unix semantics is surprisingly small.

Stored Data
Not too much hard to administer hard to restore from backups No sensitive data limited security, limited stress The problem of adapting to large scale usage affects the amount of data as well as the number of users. Coda's protection and volume databases will not work well for very large cells in their present form. But the larger problem lies in administering, backing up, and restoring all that data, and not in Coda's internal functions. Also, I have said that Coda is not suited to holding really sensitive data, because of its limited security at present; that is, because its internal encryption routines are crippled, and because things may remain in a client's cache for a long time. Moreover, Coda's constituent parts have not had a serious security review, or been exposed yet to systematic attack or exploitation attempts. On the other hand, you should probably not serve secure data using NFS or any other shared file system. Of course, you can always use encrypted loopback on Coda files on a Linux box.

Needs Work
Documentation Better administration Remove dependence on SCM Coda needs a lot of work. The documents need to be brought up to date and expanded, especially to explain things clearly to ordinary users.