Вы находитесь на странице: 1из 12

Personal Statement and Outline of Proposed Research to support a PhD application

Stephen Kell March 2006

1
1.1

Interests
General Interests

My primary research interests lie in the elds of operating systems, distributed systems and programming languages. I also have interests in software engineering, networks, continuous media applications, sentient environments and human-computer interaction. I intend to pursue a career in research, and during 20056 have been a Research Assistant at the Computer Laboratory of the University of Cambridge. Further details may be found in my CV.

1.2

Immediate Interests

I am particularly interested in the ways in which the design and implementation of operating systems and programming languages aect the development and use of applications. This includes, for example, supporting application-level qualityof-service, reliability and security guarantees, and facilitating the adaptation, re-use and redeployment of software components. In todays increasingly complex, mobile and distributed systems, these properties are ever more important. Their absence in real systems leads to recurrent problems in system development [14], deployment [15] and evolution; their importance in achieving reliability [16, 3], quality-of-service [4] and security [3] is also well-known. Modularity is a structural property which I will dene loosely (for the moment) as the ability to adapt, combine, re-use and redeploy components of a software system. These abilities are not only useful in their own right, for instance in reducing development and administration costs, but can contribute to the provision of quality-of-service, reliability, veriability and security, among others. A modular approach can enable the structural enforcement of these properties across the entire system.

In contrast to traditional software engineering work, my focus is on dynamic modularity, by which I mean modularity among an open-ended set of independently-developed components. This kind of modularity is dependent on the support of the operating systems linker and system call interface, since these enforce the boundaries between components. For my PhD I intend to explore techniques for providing modularity within such systems.

Problems

Conventional programming models make it hard to write highly modular applications. For example, the Unix-like model espoused by most programming languages standard libraries is inadequate in the following ways. There is insucient indirection when accessing services external to a particular component. Here a component is a run-time instance of any unit of code: the state corresponding to a single source le, a library, an executable or some other grouping. Conventionally, a variety of low-level interfaces provide IPC, I/O and intra-process linkage in the case of Unix, these are devices, les, sockets, processes and the linker. Each of these has its own namespace and is restricted to a particular set of implementations Unixs heavyweight processes, and the sets of devices, VFS implementations and socket types known to the operating system. The application programmer must commit to these implementations at development time. The low-level nature of these interfaces forces applications to layer their own abstractions on top. This leads to many similar but mutually incompatible conventions for data encoding, type systems, procedure call, resource management, communication protocols and so on. This precludes direct interoperability between components which do not use the same conventions. For example, it is hard for an application written in one language to make use of a library written in another. Likewise, an application targetting one networking stack or le format cannot be made to use another, even when the application uses only those abstract features which are common to both (e.g. IPv6 versus IPv4 [15]). As a result of the above, software at higher layers of a system is tightly coupled to particular lower-layer implementations. This is witnessed in deployment problems (e.g. IPv6) and in code duplication. Even within the open-source community, where code is freely available, it is common for libraries to be duplicated in dierent languages, or applications to be dependent on a particular windowing toolkit or run-time system. To overcome these dense inter-module dependencies may require relinking, recompilation or ad-hoc coding eort [17]. The implementation-specic nature of these interfaces leads programmers to make assumptions which subsequently limit re-use and scalability. For

example, distributed lesystems on Unix are notoriously problematic because programmers conventionally assume access to a local le to be fast, and ignore partial failure modes [18]. Similarly, shared libraries frequently expose data structures directly, assuming that clients reside in the same address space. The only provision for access control, quality of service, reliability and other cross-cutting concerns is implementation-specic. For example, the Unix le system provides access control at the API level, but for network communication this must be implemented in the application. Use of a shared library precludes memory-based fault isolation since it implies a shared address space. There is frequently no means of propagating quality-of-service requirements to foreign modules. Lack of a pervasive type system limits the potential for static analysis across module boundaries, leading to undetected bugs and security exploits. Existing middlewares [19], virtual machines [20] and component systems [5] attempt to solve problems of dynamic modularity, with some success. However, they are typically mutually incompatible, have high compulsory overheads [21], and still carry dependencies on particular network protocols, programming paradigms or other implementation details. Moreover, since middlewares are perceived as (and often focussed towards) tackling only problems of distribution rather than the more general modularity, they are not popular among software developers, except where the need for both modularity and distribution is obvious from the design stages. Since modularitys primary goals of reusability, adaptibility and replaceability all address problems caused by a lack of foresight among developers, there is a strong argument that modularity should be naturally provided by default, within the most basic programming models targeted by application developers, and without requiring pre-commitment to particular implementations. Accordingly, I intend to research ways of supporting modularity at the operating system level, by devising new programming models which lead naturally to modular applications. This should include demonstrable improvements to both internal (reusability, replaceability, adaptibility) and one or more external (quality-of-service, security, reliability) characteristics, and be accessible to all kinds of applications and supporting user-space code.

3
3.1

Ideas and Approach


Foundations

My approach will be based on the following principles, motivated by well-known existing research. 1. Separation of interface from implementation: this principle, introduced by Parnas as information hiding [9], is now well-accepted. It is embodied in

many programming languages [22] and operating systems [6, 23]. Nemesis provides a particularly strong separation in its programming model [1], and is a useful starting point. However, it does not solve the interface mismatch problem, since it employs direct linking to pre-dened interfaces specied using an IDL. 2. Correct placement of abstraction: Engler et al [10] argue that operating systems should not include compulsory abstractions, since they limit application-level exibility and, ultimately, compromise performance and reliability. This can be inverted: applications should not build in abstractions themselves, since this hinders exibility, reusability and portability. This argument motivates a three-layered approach, where a middle layer of abstraction implementations sits underneath applications. In this model, operating system services are directed towards the middle layer, rather than towards applications themselves. 3. First-class consideration of connectivity, separate from the components themselves: this principle follows naturally from the point 2. Shaw [8] approaches the same issue, albeit from a static closed-project perspective, and covers many directly relevant problems. In summary, she concludes that a software component should contain as little specication as possible of how it connects to others, but that these should be formalised in a separate domain, supporting abstractions analogous to (but distinct from) those in component languages. I add that to allow dynamic modularity, this formalisation must be supported by the run-time system, specically the operating system. 4. Unication of interfaces and namespaces: Unix [11] achieved much of its power and elegance by partially unifying the programming interfaces used to access les, devices and communication streams. Many later developments, including the VFS interface [12] and Plan 9 [7], extended this idea by noting that the abstract data type exposed by a Unix le is very general. Use of a single API also enforces a consistent interface to access control. However, extending this simple storage-oriented interface too far causes problems: programmers may make incorrect assumptions (e.g. about distribution-hiding interfaces [18]) and all interactions must somehow be characterised as read or write operations (a problem rst acknowledged, but hardly solved, by Unixs ioctl()).1 It is worth exploring the trade-o here: increased unication of programming interfaces may oer better modularity, but the resulting interfaces may also be more dicult to use, since they allow fewer assumptions on the part of the programmer. 5. The benets of reection: reection [25] is a technique by which the internal workings of a system are rendered tractable from within that systems computational processes. This has been applied to programming
fact, the Unix lesystem API is often found inadequate even for storage applications. Policroniades [24] presents one argument.
1 In

languages [13] and middlewares [4] to enable run-time extension, adaptation and other aspects of dynamic modularity. Reection is often realised as a unied programming interface (referring back to point 4): for example, consider Javas xed set of interfaces for manipulation of the JVMs run-time type metadata. 6. The importance of names: inuential papers by Saltzer [2], Needham [26] and others [27] motivate the importance of naming within systems. Names are crucial to both sharing and protection, since any component may only access that which it can name. As the fundamental mechanism for indirection, names are also crucial to abstraction: using a more abstract namespace removes dependency on particular implementations. Delaying binding until link-, load- or run-time provides is a common technique for adding exibility: examples include dynamic linking, virtual functions in C++[22], environment variables in Unix [11], the Internets domain name system and countless others. Naming is a particularly deep area, and there is a rich taxonomy of names: pure or impure; structured or at; well-known or secret. 7. The benets of type systems: static type-checking is well known as a useful way to detect and avoid bugs during software development. However, retaining typing information at run-time is also useful in any application where logic dealing in higher-level semantic concerns may be replaced or extended at run-time. This includes security policies [3], application scripting, dynamic extensibility and adaptation [28, 4]. Additionally, a pervasive type system aids verication across module boundaries and at run-time: static analysis, perhaps augmented by trusted toolchains and proof-carrying code, can be used to guarantee correctness and reliability properties without the need for heavyweight run-time checks [28, 20, 29].

3.2

Novel Contributions

I contribute two possibly-novel suggestions. Firstly, I propose that to counter interface mismatch problems, we begin by admitting defeat. Political solutions (i.e. standardisation) cannot succeed in ensuring interface matching when there is no common administration between component developers. Instead, I propose a technical solution which makes explicit provision for mismatched interfaces. Developers should not need, and in fact should not attempt, to target their code at existing concrete interfaces. Rather, they should devise their own abstract interface, and rely on the runtime support of the operating system to allow this to be joined together with the interfaces exposed by supporting components. Selection of these components should be left until run-time. Under such a system, each module exposes its own interface to higher-layer components, by exporting a namespace of typed objects. To these, the run-time system will glue the abstract interfaces targeted by client components. This

embodies point 3 above: inter-module connectivity is given rst-class consideration and run-time support. Making this suciently powerful and ecient will be a substantial part of the proposed research. Secondly, I present a generalisation of the familiar concept of name to a naming expression. Names are typically a subcategory of expressions in formal languages: while expressions are tree-structured entities evaluated against an environment by some well-dened reduction process, names are atomic or linearstructured objects resolved against a context by some well-dened resolution algorithm. By introducing an expression-like name, subscribing roughly to descriptivist theories of naming2 , the dynamism and exibility provided by names (as outlined in point 6 above) can be applied to the problems of inter-module connectivity and, particularly, interface mismatch. One approach is described in the following section.

3.3

Outline of Proposed System

Consider, for example, embedding features of a functional language such as ML [30] within the name service of a le system. Instead of supplying the name of a pre-existing directory to a call such as opendir(), a program might supply a function application expression, whose evaluation yields the set of objects which to open. In this way, the programs logic may be applied to a set of objects which do not reside in a physical directory on disk. In other words, by increasing the expressivity of names, we have removed some implementation dependency and hence improved the programs exibility. With a suitable language design, this technique may be extended to provide arbitrary transformations of all kinds of interfaces, not just lesystems, albeit possibly requiring complex naming expressions. The key idea is that the naming expression supplied by the user species how to adapt the foreign modules exported interface into the abstract one targeted by the local module. Since adaptation and glue code is most easily specied in functional or scripting-oriented languages, this is a convenient approach. Crucially, the necessary connection logic can be supplied at run time, allowing components to be replaced without the need for recompilation or relinking.3 Design of the naming language, including its type system and computational power (e.g. ability to express recursion), is a matter for research. Some other remaining questions concern how to integrate this system with the programming languages used to write components. These questions include the following. What does a naming expression denote? This question eectively asks what primitives the model should include. In the spirit of information hiding, it is expected that the model will be oriented around functions (or sets of functions) rather than plain data. There must also be some notion
an example see Russell, On Denoting, Mind, 1910. does not preclude grouping of particularly useful transformations into libraries, where each would appear as a named function.
3 This 2 For

of environment, i.e. a function mapping from names to interfaces, where environment is itself an interface similar to Nemesiss Context. How does code get hold of a name? Some names will be explicitly represented in input data; others, for example command-line arguments or Unixs standard I/O streams, must be assumed by the program and are bound at run time, often implicitly. These implicit bindings eectively bootstrap a component, by dening the initial space of nameable interfaces. One implementation could involve some sort of hereditary environment, similar to the Unix environment or Nemesiss per-domain contexts. How is a name resolved by a components code? A call analogous to open() is a possibility; rather than specifying an access mode and (implicitly) calling identity, as in Unix, the caller should specify an authority and an interface type.4 Dynamic type checking can conrm whether the name resolves to an object exporting the specied interface, hence allowing some level of type-safety guarantee. How are foreign objects accessed by the component code? An open() call must return some kind of reference to the foreign interface. This would probably include an unforgeable token identifying the interface reference to the system i.e. a capability, or the analogue of a le handle. It must then be possible to invoke() named operations, and close() the interface. This raises many questions about how arguments and return values are represented, what other operations are required, and how component languages might abstract away from these basic operations. How does a component export its own interfaces? In general this is done by updating some naming environment which is accessible to potential clients. A writable environment might be a subtype of the basic environment, supporting bind() and (optionally) unbind() operations. These would allow a local object to be exported into some widely-accessible environment. How are the type systems of the naming language and component language resolved? Clearly, a correspondence must be known for each component language, and the naming languages types must have some run-time representation within the component language. Note that the run-time system need only be ported once per language, and from that point will allow free interoperability between that language and all other supported languages. By contrast, current systems typically involve binding eort per library as well as per language, although the necessary code can sometimes be autogenerated by tools.
that this is a natural generalisation of Unixs open(), where the programmer species an access mode but must assume other interface characteristics, such as support for particular ioctl() or seek() operations. These unstated assumptions force the programmer to handle additional error cases; making the assumptions explicit removes the need for this.
4 Note

How is resource management performed? The open() and close() pattern puts bounds on the period of an objects use by each client. These calls themselves may be hidden by the component languages usual resource management constructs, e.g. scope-based as in C++, or collectorbased as in Java. The usual problems of partial failure and resource leakage will be subjects of research. How are access control and quality-of-service features integrated into the model? Use of interface references naturally suggests a capability-based approach to access control; possible implementations will be a subject of research. Quality-of-service information might be integrated into the notion of interface: when specifying a set of operations which the named entity must perform, a client may annotate these with service-level requirements, enabling admission control to be performed during the call to open() analogously with type-checking.

Implementation and Evaluation


The programming model must be naturally modular. Applications developed against the model should not be subject to signicant performance penalties, relative to applications developed conventionally. There must be demonstrable, quantiable improvements to modularity among applications developed using the model, compared to the nearest equivalent under conventional models. There should be demonstrable, quantiable improvements to one or more externally-visible characteristics, i.e. reliability, security or provision for quality of service, among applications developed using the model. In addition to these, I suggest the following practical constraints. The model must be developed for an existing widely-used system, most likely GNU/Linux. It must be implemented so as to provide backwards compatibility and interoperability with legacy (i.e. conventionally-developed) applications on the same system. It must be possible to show a transition path towards the new model for existing applications. In order to achieve these, the following approaches may be helpful.

The discussion so far has identied the following requirements.

To prototype the model, a mock-up could be produced in a high-level language, or perhaps as a C library. Toy bindings for a variety of languages could then be created as proof of concept. Note that this approach will not provide actual modularity until dependency on the underlying system call interface is removed. A new class of process could be added to Linux, with a new set of system calls. These calls should functionally (but not syntactically) subsume all previous interfaces, allowing communication with legacy processes but oering improved modularity. This may be achieved through elimination of implementation-specic interfaces, and unication of naming. However, it may not be possible to oer high performance without making extensive changes to the Linux kernel. A tool could be developed which splits an existing program, say a monolithic application written in C, into a set of modules. This could be done by static analysis on the dependencies between object les. Note that this does not address inter-process module boundaries, and reconstruction of abstract, strongly-typed interfaces between components would be extremely dicult. However, it may be feasible in some limited cases, and is worthy of research. Some existing work on modularising monolithic code may be helpful [33, 34]. For evaluation purposes, some or all of the following will also be required. A rigorous denition of the kind of modularity under consideration, and one or more corresponding measures. This is remarkably dicult, and is not attempted here. Existing software measurement work, such as that of Fenton [31, 32], provides a useful starting point. Tools or methods to evaluate the measure on real software. Suitable measures for the chosen external characteristics, and the tools to measure them. Empirical data on the modularity (and other characteristics) of software developed using the new model, either from deliberate reimplementations of existing software or (preferably) experiments conducted on real programmers asked to develop a piece of software using the new model and a set of existing components.

Afterword

I am currently working on a more detailed proposal, entitled Operating System Support for Application Modularity, which I will forward to supplement my application in due course.

References
[1] T. Roscoe, The Structure of a Multi-Service Operating System, PhD thesis, University of Cambridge Computer Laboratory, April 1995. [2] J.H. Saltzer, Naming and Binding of Objects, Lecture Notes in Computer Science, vol. 60, pp. 99208, 1978. [3] T.A. Linden, Operating System Structures to Support Security and Reliable Software, ACM Computing Survey, 8(4), pp. 409445, 1976. [4] G.S. Blair, G. Coulson, P. Robin, M. Papathomas, An Architecture For Next Generation Middleware, Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing, 1998. [5] W. Emmerich, Distributed Component Technologies and their Software Engineering Implications, Proceedings of the 24rd International Conference on Software Engineering, 2002. [6] I.M. Leslie, D. McAuley, R. Black, T. Roscoe, P. Barham, D. Evers, R. Fairbairns and E. Hyden, The Design and Implementation of an Operating System to Support Distributed Multimedia Applications, IEEE Journal on Selected Areas in Communications, 1996. [7] R. Pike, D. Presotto, S. Dorward, R. Flandrena, K. Thompson, H. Trickey, P. Winterbottom, Plan 9 From Bell Labs, Computing Systems, 1995. [8] M. Shaw, Procedure Calls Are the Assembly Language of Software Interconnection: Connectors Deserve First-Class Status, ICSE Workshop on Studies of Software Design, 1993. [9] D.L. Parnas, On the criteria to be used in decomposing systems into modules, Communications of the ACM, 15(12) , pp. 10531058, December 1972. [10] D.R. Engler, M.F. Kaashoek, Exterminate All Operating System Abstractions, Proceedings of the 5th IEEE Workshop on Hot Topics in Operating Systems, 1995. [11] D.M. Ritchie, K. Thompson, The Unix Time-Sharing System, Communications of the ACM, Communications of ACM, 7(7), July 1974 [12] S.R. Kleiman, Vnodes: An Architecture for Multiple File System Types in Sun UNIX, USENIX Association Summer Conference Proceedings, Atlanta, 1986. [13] J. Gosling, B. Joy, G. Steele, G. Bracha, The Java Language Specication, second edition, Addison Wesley, 2000.

[14] D. Garlan, R. Allen, J. Ockerbloom, Architectural Mismatch or Why Its Hard to Build Systems out of Existing Parts, Proceedings of the 17th International Conference on Software Enginneering, pp. 179185, Seattle, Washington, April 1995. [15] H. A, L. Toutain, Methods for IPv4-IPv6 transition, Proceedings of the Fourth IEEE Symposium on Computers and Communications, p. 478, 1999. [16] M.M. Swift, B.N. Bershad, H.M. Levy, Improving the Reliability of Commodity Operating Systems, in Proc. 19th Symp. on Operating Systems Principles (SOSP), October 2003. [17] A. Fraser, Orion: Named Flows With Access Control, invited talk, University of Cambridge Computer Laboratory, November 2005. [18] J. Waldo, G. Wyant, A. Wollrath, S. Kendall, A Note On Distributed Computing, Sun Microsystems Technical Report SMLI TR-94-29, November 1994. [19] Object Management Group, The Common Object Request Broker: Architecture and Specication, OMG TC Document Number 91.12.1, Revision 1.1, December 1991. [20] T. Lindholm, F. Yellin, The Java virtual machine specication, Addison Wesley, September 1996. [21] S. Lakin, S. Mount, R.M. Newman, Communication in ad hoc networks or: CORBA considered harmful, Workshop on Building Software for Pervasive Computing at the 19th Annual ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA04), Vancouver, Canada, October 2004. [22] B. Stroustrup, The Design and Evolution of C++, Addison Wesley, 1994. [23] D.A. Solomon, H. Custer, Inside Windows NT, second edition, Microsoft Press, 1998. [24] C. Policroniades, Datom: A Proposal for an Alternative Storage System API, invited talk, University of Cambridge Computer Laboratory, August 2005. [25] B.C. Smith, Procedural Reection in Programming Languages, PhD thesis, Mass. Inst. of Technology, January 1982. [26] R.M. Needham, Names, chapter in S. Mullender (ed.) Distributed Systems, pp. 315-327, Addison Wesley, 1993. [27] R. Pike, P. Weinberger, The Hideous Name, USENIX Summer Conference Proceedings 1985, pp 563-568.

[28] G. Hunt, J. Larus, M. Abadi, M. Aiken, P. Barham, M. Fhndrich, C. a Hawblitzel, O. Hodson, S. Levi, N. Murphy, B. Steensgaard, D. Tarditi, T. Wobber, B. Zill, An Overview of the Singularity Project, Microsoft Research Technical Report MSR-TR-2005-135, October 2005. [29] B.N. Bershad, S. Savage, P. Pardyak, E.G. Sirer, M.E. Fiuczynski, D. Becker, C. Chambers, S. Eggers, Extensibility, safety and performance in the SPIN operating system, Proceedings of the fteenth ACM Symposium on Operating Systems Principles, 1995. [30] R. Milner, M. Tofte, R. Harper, D. MacQueen, The Denition of Standard ML, MIT Press, revised 1997. [31] N. Fenton, Software Measurement: A Necessary Scientic Basis, IEEE Transactions on Software Engineering, vol. 20, issue 3, pp. 199206, March 1994. [32] N. Fenton, A. Melton, Deriving structurally based software measures, Journal of Systems and Software, vol. 12, issue 3, pp. 177187, July 1990. [33] L. Deri, Droplets: Breaking Monolithic Applications Apart, IBM Research Report RZ 2799, September 1995. [34] R. Schwanke, An intelligent tool for re-engineering software modularity, Proceedings of the 13th International Conference on Software Engineering, pp. 8392, May 1991.

Вам также может понравиться