Вы находитесь на странице: 1из 6

Static Source Code Security Analyzers comparison

Gisela P. Petrini
Khu Technologies S.A, Buenos Aires, Argentina
gpetrini@khutech.com.ar

results. This is done by using an already public test suite


which contains known security weaknesses.

ABSTRACT
Application security encompasses countermeasures to prevent
vulnerabilities or weakness throughout the Software Development
Life Cycle (SDCL). The challenge is how to efficiently find and
avoid these vulnerabilities. One of the most common methods is
the use of static code analysis tools.

This work is structured as follows. Section 2 gives an


overview of other related work. In Section 3 introduced
the selected method and tools. Section 4 includes the
analysis results. The conclusion is presented in Section 5.
Section 6 describes future work.

The objective of this work is to assess some these tools


efficiency and quality. Some test cases were selected from Juliet
Test Suite for Java language which contains known-intentional
flaws. Several static source code analyzers were tested, open
source and commercial ones. The main conclusion is that
automatic static source code analysis tools help a lot but are not
still completely reliable. Detailed results and conclusions are
reported.

2.

There were a number of already published articles in which


the authors compared security source code analyzer tools.
All of them used as test suite NSAs Juliet Test Suite2
(Gotz, 2013), (Wagner & Sametinger, 2014) and (Delgado,
2015). (Gotz, 2013) as part of his work, evaluated the
following open source static analysis tools: Checkstyle3,
Findbugs4, Lapse5 and PMD6 . (Wagner & Sametinger,
2014) automated the test suite scanning process and
compared the generated results. Again, this study was
focused only on freely available source code scanners such
as PMD (Java), Findbugs (Java), Jlint (Java)7, Cppcheck
(C/C++)8, Visual Studio (C/C++)9. (Delgado, 2015)
presented an overview of a mix of open source and
commercial static code analysis tools like sonarqube10,
VisualCode Grepper11 and Fortify HP on Demand12.

Keywords
Application security, static source code analysis, software quality

1.

INTRODUCTION

Nowadays, application security is becoming mandatory in


organizations. Software is everywhere when we buy
things, pay bills, transfer money, manage TV remote
control, close electronic doors, control cars, when surgeons
operate patients, etc. and attacks are increasing in number
and complexity.
New agile software development approaches demand short
and iterative times to test applications. Therefore, the
challenge today is to know how to avoid or reduce
vulnerabilities in early steps of the SDLC (Software
Development Life Cycle) in order to produce more secure
final software products. As OWASP1 documentation states:
Code review is probably the single-most effective
technique for identifying security flaws. Moreover, a wellknown security fundamental advises that this also
minimizes attack surface area. There are several static
codes scanners that automatically search and find security
flaws into the source code of software products. The
purpose of this article is to test some popular source code
analyzers in order to measure its accuracy and compare its
1

RELATED WORK

https://samate.nist.gov/SRD/testsuite.php
http://checkstyle.sourceforge.net/
4
http://findbugs.sourceforge.net/
5
https://www.owasp.org/index.php/OWASP_LAPSE_Project
6
https://pmd.github.io/
7
http://jlint.sourceforge.net/
8
http://cppcheck.sourceforge.net/
9
https://msdn.microsoft.com /vstudio
10
http://www.sonarqube.org/
11
http://sourceforge.net/projects/visualcodegrepp/
12
http://www8.hp.com/ar/es/software-solutions/applicationsecurity-testing/
3

https://www.owasp.org/index.php/Code_Review_Introduction

This methods and classes contain tags, such as


POTENTIAL FLAW or FIX. They identify a possible
weakness or the correction of a previously found
vulnerability. FIXs tags are presented in good methods
or classes. FIXs tags amend a weakness. POTENTIAL
FLAWs tags are presented in either good or bad
methods or classes. If POTENTIAL FLAWs tags are
included in a bad method they mean an exploitable
vulnerability.

3.
METHOD AND TOOLS
DESCRIPTION
The objective of the present work was to compare the
quality of several source code static analysis tools in order
to analyze which one provides a better accuracy. Thus,
some test cases selected from Juliet Test Suite for Java
language and different scan tools which will be explained
in this section were used.

In this study we selected the same 6 main test cases than


(Delgado, 2015) because they are related with OWASP top
10. They are:
CWE89_SQL_Injection
CWE80_XSS
CWE81_XSS_Error_Message
CWE256_Plaintext_Storage_of_Password
CWE259_Hard_Coded_Password
CWE327_Use_Broken_Crypto

3.1. Juliet Test Suite


Juliet Test Suite was developed by the Center for Assured
Software (CAS) of the US American National Security
Agency (NSA) and developed specifically for assessing the
capabilities of static analysis tools.
Juliet Test Suite covers vulnerabilities for the Java and
C/C++ languages. In the present work only Juliet Test
Suite v1.2 for Java were used. The set of Test Cases are
specifically artificial source code program applications
with intentionally added security flaws.

This represents 5.555 test cases and 882.299 lines of code.

It contains 25.477 Test Cases and 4.565.713 Code Lines.


Each test case matches exactly one security flaw type,
nevertheless, some other unrelated flaws may be found and
they are indicated as incidentally flaws. The main
vulnerability source of the test cases is CWE (MITREs
Common Weakness Enumeration)13, each test case has an
unique CWE identifier as part of its name. CWE is a
software weakness type community developed dictionary.
Each CWE entry describes a class for a security weakness.
Juliet test cases cover 112 of the 2011 CWE/SANS Top 25
Most Dangerous Software Errors.

3.2. Static Code Analysis Tools


There are different Static Code Analyzers depending on the
programming language and scanning method. OWASP
defines Static Code Analysis as the running of Static
Code Analysis tools that attempt to highlight possible
vulnerabilities within 'static' (non-running) source code by
using techniques such as Taint Analysis and Data Flow
Analysis. A list of a great and popular number of this kind
of tools is available in the following site
https://samate.nist.gov/index.php/Source_Code_Security_
Analyzers.html.

The intentional flaws were coded in methods with the word


bad in its name (such as bad(), badSource(), or
badSink()) or in a class that contains the word bad in its
name
(such
as
CWE581_Object_Model_Violation__hashCode_01_bad).
When some of the tested tools report any of these
intentional flaws as a weakness, this is taken as a True
Positive. If the tool doesnt report that flaw, it is
considered as a False Negative.

As part of the present work, the following tools were


selected:
Name
Lapse+
VisualCode Grepper
Sonarqube (findbugs
and PMD)
Fortify on Demand
Code Advisor On
Demand

On the other hand, methods or classes with the word


good
in
its
name
(like
CWE563_Unused_Variable__unused_public_member_var
iable_01_good1.java
or
goodG2B(),
goodG2B1(),
goodG2B2(), goodG2B3() methods) are intentionally
secure and have not security flaws. Thus, if some of the
tested tools report any of these like a security flaw or
weaknesses, it is taken as a False Positive. Good
methods or classes are intended to prove the quality of the
tools.
13

Owner
OWASP
N1ckDunn
Sonarsource

License
Open source
Open source
Open source

HP
Coverity

Commercial
Commercial

3.2.1. LAPSE +
LAPSE+ is a Security Scanner for Java EE Applications.
It is part of OWASP Lapse Project. LAPSE+ is based on
static code analysis in order to detect the source and the

https://cwe.mitre.org/

sink14 of vulnerabilities. LAPSE+ covers the following


vulnerabilitys categories: Parameter Tampering, URL
Tampering, Header Manipulation, Cookie Poisoning, SQL
Injection, Cross-site Scripting (XSS), HTTP Response
Splitting, Command Injection, Path Traversal, XPath
Injection, XML Injection, LDAP Injection.

Visual Code Grepper is an automated code security review


tool which is intended to identify bad/insecure code. VCG
is for C++, C#, VB, PHP, Java and PL/SQL.

4.

RESULTS

In this section, we will present the metrics, some


constraints and then the results of the analysis.

LAPSE+ follows three steps for the detection of that


vulnerabilities: 1. Vulnerability sources (it detects the
points of code that can be source of an attack), 2.
Vulnerability Sink (it identifies the points that can
propagate the attack) and 3. Provenance Tracker (It checks
whether is possible to reach a Vulnerability Source from a
Vulnerability Sink).

4.1.

Estimators

The results were evaluated according to the next two


estimators, one based on the True Positives and the second
based on the False Positives related to Lines of Code.
#1: Tool efficiency

3.2.2. Sonarqube

It is the total of True Positives present in each selected Test


Case (it was calculated counting each bad method and bad
class per test case) versus the Real Bads found by each tool
per Test Case. This metric represents tool efficiency.

Sonarqube is an open source project. It is an open platform


to manage code quality. Sonar covers several languages,
uses rules and computes advanced metrics. Sonarqube
covers also security vulnerabilities detection. It uses the
follows plugins: Findbugs and PMD. In this analysis was
used sonarqube version 4.5.6, with Findbugs 3.3 and PMD
2.5.

#2: Scan quality


This estimator is the total of False Positives that each tool
reported compared with the Lines of Code in each Test
Case. This metric represents the quality of the scans.

3.2.3. Fortify on Demand

4.2.

HP Fortify on Demand is a (SaaS) Software as a Service


model. It combines dynamic and static testing. It supports
different languages such as ABAP/BSP, COBOL, Python,
ASP.NET, JavaScript/Ajax, Ruby, Java (with Android), TSQL, etc.

3.2.4. Coverity Code Advisor On Demand15


The Coverity Code Advisor On Demand detects the
security and quality defects. It finds issues like: API usage
errors, Best practice coding errors, Buffer overflows,
Control flow issues, Cross-site scripting (XSS), Cross-site
request forgery (CSRF), Deadlocks, Error handling issues,
Hard-coded credentials, Integer overflows, Memory
corruptions, Memory illegal accesses, Path manipulation,
Security
best
practices
violations,
Security
misconfigurations, SQL Injection, etc. Code Advisor on
Demand supports Java, C, C++, and C#.

3.2.5. VisualCode Grepper (VCG)


14

Vulnerability sink: point that can propagate the attack and


manipulate the behaviour of the application
15
https://ondemand.coverity.com/

Constraints and considerations

LAPSE+ doesnt cover cryptographic weaknesses.


Therefore, the vulnerabilities related with these
weaknesses werent tested with this tool.

VCG doesnt recognize specific vulnerabilities. For


instance, in order to find XSS, the tool recognizes
ObjectInputStream,FileInputStream and Poor
Input Validation. These tags were evaluated in order
to detect XSS, especially methods like getParameter,
getCookies and getQueryString.

Sometimes, tools identify weaknesses or incorrect use


of best coding practices that are not related with the
Test Cases Vulnerabilities. For that reason, they will
not be considered in this work. For example,
Sonarqube identifies Classes should not be loaded
dynamically or Throwable and Error should not be
caught.

4.3.

Values

5.

Estimators were calculated for each test case and for


each tested tool (where applicable). The next tables
show the results.

In this work, six test cases of Java version of Juliet


Test Suite intentionally vulnerable source code were
analyzed. Test cases were selected according to
OWASP top ten and (Delgado, 2015)s article. Five
different static code source analyzers were tested.
Three of them were open source and two commercial
ones (in commercial cases, on demand web or trial
version were used). The results showed that:

#1: Tool efficiency


Tool efficiency = (Number of
Positives / Number of Real Bads) %

Tools
Lapse+
VCG
Sonarqube
Coverity Code Advisor On Demand
Fortify on Demand
Tools
Lapse+
VCG
Sonarqube
Coverity Code Advisor On Demand
Fortify on Demand

True

Test cases
CWE89 CWE81 CWE80
59.677% 0.000% 59.677%
24.946% 30.755% 19.892%
10.753% 0.000% 0.000%
24.194% 0.000% 32.258%
9.946% 0.000% 0.000%
Test cases
CWE256 CWE259 CWE327
0.000%
0.000%
0.000%
0.000%

0.000%
0.000%
48.387%
14.516%

Taking into account the tool efficiencies, the best


performer tool only reaches 34% of the perfect tool.
The open source tools give an acceptable starting
point but they report many false positives. Lapse+
showed better measured efficiency for test cases
CWE89 and CWE80, nevertheless it cannot detect
cryptographic weaknesses.

100.000%
100.000%
100.000%
50.000%

Commercial tools detected most of the errors with


less false positives. Coverity Code Advisor On
Demand identified more weaknesses than Fortify on
Demand.

#2: Scan quality


Scan quality = (Number of False
Positives / Lines of Code) %

Tools
Lapse+
VCG
Sonarqube
Coverity Code Advisor On Demand
Fortify on Demand

Tools
Lapse+
VCG
Sonarqube
Coverity Code Advisor On Demand
Fortify on Demand

Test cases
CWE89 CWE81
0.408% 0.000%
0.425% 1.536%
0.228% 0.000%
0.007% 0.000%
0.003% 0.000%

Taking into account the measurements results, it can


be easily seen that there is no single tool or method
that covers an application to be absolutely
vulnerability-free.

CWE80
1.350%
1.870%
0.000%
0.029%
0.000%

So, supporting (Wagner & Sametinger, 2014)s


conclusion, it is highly recommended a combination
of different scanning tools which includes open
source and commercial ones. Moreover, even the
final tools reports should be manually verified in a
human code review as also concluded (Delgado,
2015).

Test cases
CWE256 CWE259 CWE327
0.000%
0.000%
1.758%
0.000%

0.000%
0.000%
0.042%
0.000%

CONCLUSIONS

0.000%
0.000%
2.629%
1.786%

6.

FUTURE WORK

An improved analysis may involve the use of the


whole Juliet Set in order to thoroughly evaluate the
Static Code Analyzer tools. Currently, Juliet Test
Cases covers 112 CWEs weaknesses for Java. Only,
six of them were tested in this work representing 22%
of the complete Juliet Set.

*grey cells indicates the constraint explained before.


In order to have a better comprehension of the results,
in the Figure 1 - Static Source Code Security
Analyzers comparison: Tornado chart a graphical
comparison of each tool is depicted.

Otherwise, in order to appraise the intelligence of


each tool, it would be useful to evaluate when a tool
detects or not a weakness tagged as POTENTIAL
FLAG contained in a good method. As explained
before, it doesnt mean an exploitable vulnerability.

The perfect tool was introduced for comparison


purposes. It means an ideal theoretical tool that
detects 100% of security bugs with 0% of False
Positives.

7.

REFERENCES

Delgado, J. P. (2015). Anlisis de seguridad y calidad


de aplicaciones (Sonarqube). Manizales,
Colombia: Universidad Obterta de
Cataluyna, Ingeniera, Departamento de
Informtica.
Gotz, C. (2013). Vulnerability identification in web
applications through static analysis.
Technische Universitt Mnchen.
Meade, G. G. (2012). Juliet Test Suite v1.2 for Java User guide. Center for Assured Software,
National Security Agency.
OWASP LAPSE Project. (n.d.). Retrieved from
https://www.owasp.org/index.php/OWASP_
LAPSE_Project
Wagner, A., & Sametinger, J. (2014). Using the Juliet
Test Suite to compare Static Security
Scanners. 11th International Conference on
Security and Cryptography. Vienna, Austria.

Efficiency

Quality

Figure 1 - Static Source Code Security Analyzers comparison: Tornado chart


*Note: In both sides different scales were used.

Вам также может понравиться