Вы находитесь на странице: 1из 4

Understandability of JBoss Application Server

M EASUREMENT OF IDENTIFIER QUALITY


Benjamin Udink ten Cate, Hogeschool van Amsterdam

Amsterdam, The Netherlands

ben@effacts.com, student no: 500160185

August 2010

Keywords : Understandability, Identifier quality, Identifier flaws, Code metrics

Abstract Identifiers are vital to developers when they want to


understand a part of the program by reading its source
Identifier names are an important source of information for code. I wonder if the quality of identifiers changes over
programmers trying to understanding a program. Identifier time when different programmers contribute to the
names are expected to change over time as a program software. JBoss Application Server is very interesting since
becomes more complex because code is added and it is open source software. This means many programmers
maintenance is done to its source code. I found that the with different a education level and approach to object
amount of unique identifiers rarely change over time. oriented JAVA software development contribute to the
software.
This paper shows that the identifier flaws in JBoss
Application Server 4.2 version are not solved. Even though Identifiers have been the subject of programming
a modern IDE like Eclipse or IntelliJ supports the guidelines since the first days of programming. For the Java
renaming of all instances of an identifier. language an extensive set of guidelines is available [4].
Most of these guidelines are based on guidelines for older
Because the amount of unique identifiers remains the same programming languages. Most large software projects have
and identifier flaws are not solved, the quality of identifiers their own set of programming guidelines. JBoss also has
remains the same. these guidelines but they do not define identifier usage
guidelines.
This article covers the 4.2 version line of the JBoss
Application Server. In this small part the readability and I want to measure the software quality by looking at the
thus understandability of the software does not change understandability quality aspect. There are different views
during two years of evolution. on software quality characteristics. Understandability is
often paired with usability. This is because
1. Introduction understandability applies to an entire system and all of its
documentation. Usability applies to just the final software
JBoss AS is a free open source application server. It is Java product. Because there is no standard or mainstream
based software developed with the “professional open accepted collection of software quality characteristic I will
source” model. Professional open source means the core use a set of three definitions [5] for understandability.
developers get a salary and offer their services to the
community. JBoss AS is the most used middleware server The degree to which the meaning of a software
today and competes with WebLogic and WebSphere. component is clear to a user.
(Clarity, Self-Descriptiveness) Ease of
The Java code of JBoss AS is written by open source comprehending the meaning of the software
developers. They have to write understandable source code (opposite of complexity).
because an entire community of open source developers Low complexity and Documentation.
contributes to the JBoss AS project.
Understandability is hard to define and measure. It is an
I want to research if the quality of identifiers used in JBoss external attribute that relates to how developers experience
AS changes over time. the software and its documentation. It is affected by
subjective factors such as user experience and education Identifier flaws were measured using a tool developed by
and amount of time contributing to the software and the Butler. The tool measures the twelve identifier flaws that
choice of IDE and other development tools. The only way are used for his research into identifier quality. The tool
to measure this quality is to assume it has software outputs a XML file with a summary for the entire set of
attributes related to it. By examining the definitions for files. It outputs the details for each Java class and its
understandability and the studies related to this subject methods in a separate file.
there are several possible attributes related to this quality.
Understandability is influenced by size, low complexity Capitalisation Anomaly is the violation of camelCase or the
and readability of the software and all documentations inappropriate usage of capitals abbreviations, i.e.
related to it. HTMLEditorKit or pagecounter or fooBAR. It is
measured by testing the initial letter of each component
The scope of this article is the readability aspect of word for all identifiers which are not constants.
understandability. Readability consists for the most part of
length of sentences, whitespaces and the identifiers used in Non-dictionary words are components of identifiers that are
the source code. Readability is a human judgement on how not in the English dictionary or common abbreviations or
easy it is to understand a text [2]. I will focus on identifiers words specific for Java. For example: strlen or
and how they relate to the readability of the source code if appCnt.The measure of this identifier naming style guide
they contain flaws. is done by checking if at least one component of a identifier
uses a non-dictionary word.
2. JBoss Application Server 4.2 core
StatSVN is an extension to the open source program
JBoss AS is a Java application and will run on any machine StatCVS. StatSVN is a Java program which generates html
which supports version 1.4 of the Java Virtual Machine or pages, statistics and graphs about the files stored in a
higher. JBoss AS is developed by JBoss.org. They working copy of a source control repository. I use this
pioneered the Professional Open Source Model which program to measure the amount of developers and LOC
means they have thirty paid core developers. Besides the changed for a certain revisions. StatSVN is still a beta
core team there is a community of over a quarter of a project and cannot walk trough source trees like most
million developers and about a hundred committers. modern SVN tools can.

Version 4.2 of the JBoss AS was the first time it offered 4. Results
full J2EE 5 support. Till today it is the most downloaded
version of the JBoss AS. Version 4.2 was released in May StatSVN was applied to the JBoss AS 4.2 branch. A set of
2007 and followed by three small releases until it was arguments was specified for StatSVN to measure the six
stabilised with the release of 4.2.3 a year later in July 2008. different revisions of the branch selected for this article.
StatSVN was used to measure both the amount of changes
The part of JBoss AS which I’m interested is the library between each selected revision and the amount of people
jboss.jar. This library is in the default JBoss AS working on the revision.
configuration and contains services and xml definitions
required to run J2EE 5 applications. In figure 1 the activity in the branch is shown by both Lines
of Code(LOC) changed and amount of identifiers added to
3. Methodology the source code. Even though this illustrates significant
changes where done to the source code, the total amount of
To measure identifier quality I looked at flawed identifiers. identifiers only increased 0.53% over two years time. The
I have measured two identifier flaws over the time JBoss total amount of non-unique identifiers increased from
AS 4.2 was stabilised. SVN was used to gather six 134.649 to 135.369.
revisions of the 4.2 branch spread equal over two years.
Identifier flaws measured are flawed identifier naming
styles: “capitalisation anomalies” and “Non-dictionary
words” to measure the quality of the identifiers.

Identifier name style flaws are part of a set of style


guidelines composed by Butler et al.[1] to measure
identifier quality. The guidelines are based on best practice
in software development over the past decades.
Figure 1.
Changes to the source code
4000 5. Discussion
3500 The lack of changes to the amount of identifier flaws is
3000 probably caused by the location of the measured JBoss files
in a branch. Branches are often made from the trunk with
2500 the means to stabilize the software and release it to the
2000 public. You don’t stabilize code by refactoring it
unnecessary. Refactoring the code will introduce changes
1500 to the source code and the overall software structure and
1000 thus also new flaws.

500 The lack of changes could be because the amount of unique


0 identifiers hardly changes during development. This is
reflected in research by Antoniol et al[4] into the lexicon of
April-07

February-08
April-08
June-07
August-07

June-08
October-07
December-07

a software project and its changes overtime. They


discovered that the lexicon hardly changes after it has been
established.

The results show only a small part of the entire JBoss


applications server. To draw a better conclusion a larger set
LOC changed Identifiers added
of data should be used. Creating a larger set of a single
project might be quite hard. SVN was only released ten
The results in figure 2 show that the amount of identifier years ago and a lot of projects didn’t start using it until
flaws rarely changes over time. This is the case for both the 2003. This means that only for a period of no more than
Capitalisation Anomaly and the Non-dictionary words seven years SVN data can be retrieved. Also there are
identifier flaws. many changes in the use of SVN over the years.

Figure 2. Even though the used identifier flaws can be used to


Amount of identifier flaws for each type indicate possible low quality source code there have been
no changes to the amount of identifier flaws. This could
2500
mean flawed source code is not often changed during
branch stabilizing. It could be interesting to take a larger
2000 time span and see if flawed identifiers are renamed in the
trunk where the source code does not have to be stabilized
1500 to be released.

Identifier flaws could quite easy be solved in Java


1000 programs. An advanced IDE like Eclipse or IntelliJ
supports the large scale renaming of identifiers. It stands to
500 question if renaming the identifiers actually improves the
understandability of source code. It could very well be
0 decreasing understandability if changes to the lexicon are
made, especially on the short term, because software
developers have to learn the words and what they reference
to all over again.

Capitalisation Non dictionary words


6. Conclusion

Based on the results, I conclude that the amount of flawed


identifiers rarely changed for over two years even though
there were a lot of changes to the source code. The flaws
continue to exist even though their amount is significant.

Although earlier research on identifiers didn’t measure the


quality of identifiers; it concluded that the lexicon more or
less remains the same. This means identifiers hardly change
during the software development process once they have
been chosen in the early development stages.

This could mean that identifier flaws will remain to exist in


a project because developers will never change a flawed
lexicon.

7. Acknowledgements

I want to thank Simon Butler for his assistance with my


article. He advised me with papers to read and helped me a
great deal by measuring the JBoss AS source code with his
identifier quality metric tool

8. References

[1] Butler, Simon; Wermelinger, Michel; Yu, Yijun and


Sharp, Helen (2010). “Exploring the influence of
identifier names on code quality: An empirical study”.
In: 14th European Conference on Software Maintenance
and Reengineering, 15-18 March 2010, Madrid, Spain.

[2] D. Lawrie, C. Morrell, H. Feild, and D. Binkley,


“What’s in a name? A study of identifiers,” in 14th IEEE
Int’l Conf. on Program Comprehension. IEEE, 2006, pp.
3–12

[3] Sun Microsystems, “Code conventions for the Java


programming language”, 1999
http://java.sun.com/docs/codeconv

[4] G. Antoniol, Y.-G. Gueheneuc, E. Merlo, and P.


Tonella, “Mining the lexicon used by programmers
during sofware evolution,” in Proc. of Int’l Conf. on
Software Maintenance. IEEE, Oct. 2007, pp. 14–23

[5] W. J. Salamon and D. R. Wallace. “Quality


characteristics and metrics for reusable software
(preliminary report)”. Technical report, National Institute
of Standards and Technology, may 1994.
http://hissa.ncsl.nist.gov/HHRFdata/Artifacts/ITLdoc/5459/
metrics.html.

Вам также может понравиться