Вы находитесь на странице: 1из 8



White Paper: Evaluating Big Data Analytical Capabilities For Government Use
March 2012
A White Paper providing context and guidance you can use


The Big Data Tool Landscape Big Data Tool Evaluation Criteria More Resources


Evaluating Big Data Analytical Capabilities for Government

This paper, produced by the analysts and researchers of CTOlabs.com, proposes ten criteria for evaluating analytical tools, focused on capabilities in the emerging Big Data space. The methods and models here can help you select the best capability for your mission needs.

Executive Summary
The need for sensemaking across large and growing data stores has given rise to new approaches to data infrastructure, including the use of capabilities like Apache Hadoop. Hadoop overcomes traditional limitations of storage and compute by delivering capabilities that run on commodity hardware and can leverage any data type. Hadoop enables scalability to the largest of data sets in a very cost effective way, making it the infrastructure of choice for organizations seeking to make sense of their growing data stores. Its ability to store data without a data model means information can be leveraged without first knowledge of what questions will be asked of the data, making this a system with far more agility than legacy data based. The core capability of Hadoop has now grown to include a full framework of tools that include a data warehouse infrastructure (Hive), parallel computation capabilities (Pig), scalable distributed databases able to store large tables (HBase), scalable means of distributing data (HDFS) and tools for rapidly importing and managing data and coordinating the infrastructure (like Sqoop, Flume, Oozie and Zookeeper). The use of this framework of Hadoop tools has given rise to a new series of innovation in sensemaking over large quantifies of data and has laid the foundation for a dramatic growth of new analytical tools which can operate over these Big Data infrastructures. Over the last several years organizations that wanted to leverage this Hadoop framework wrote their own analytical capabilities to ride on top of the infrastructure. Now a new trend has emerged. Organizations can turn to commercial vendors who offer analytical packages that ride on top of the Hadoop framework. This positive trend makes it easier to deliver advanced big data solutions to end users. The right tool can enable more agile use of your organizations data stores and can do so quickly. The right tool can also make Big Data analytics so easy that end users can form their own queries and generate their own responses. This new development is particularly exciting to knowledge-based government organizations seeking to empower their workforce with up to date insights.

A White Paper for the Government IT Community

This paper provides a framework meant to help in your evaluation of Big Data analytical tools. We review ten factors we believe should be paramount in your evaluations of Big Data analytical software packages. We present these factors in a way you should find easily tailorable to your organizational needs.

Ten Evaluation Factors

The ten factors we believe should be at the forefront of your decision are: Mission Functionality/Capability Ease of Use/Interface Architecture Approach Data Architecture Models Licensing Security and Enterprise Governance Partner Ecosystem Deployment Models Health of the Firm We expand on these factors below. Mission Functionality/Capability: This may be the most important factor in deciding which Big Data analytical tools you decide to leverage in your infrastructure. If the Big Data analytical package you are selecting does not have the capability you expect and need then no other factor matters. The importance of this factor in evaluating solutions means you should have a well thought out vision you can articulate for your desired capability. For example, do you need a system that can analyze all types of unstructured and structured data? Do you need a solution that enables collaboration between analysts? Or one that has a focus on extracting knowledge from existing data stores? Do you want a system that just works in the back office of an IT shop or one that supports missions through empowering end-users? Ease of Use/Interface: One of the first questions you should ask when choosing Big Data analytical tools is who you intend to use them. Do you want to increase the capabilities of your data scientists to dig deeper into new questions? Do you want to increase the power of your analysts? Or are you hoping to push analytical capabilities out to your entire workforce? Giving more of your enterprise access to Big Data solutions leads to a more informed and agile workforce and reduces the IT


bottleneck. The same tools that help intelligence analysts map networks can help web developers evaluate guest activity on a website, can help the citizen-facing parts of your organization understand citizen requirements/trends, and help HR keep track of work flows and loads. But these capabilities only help if your workforce is willing and able to use them. A powerful tool that takes several specialized degrees or requires specific expertise such as SQL to utilize will obviously have both limited impact and limited usage, but more broadly, many non-IT professionals demand walk-up usability from their information management software. Often entire departments only use a small fraction of the capabilities that powerful analytics provide because they are intimidating or hard to access. Interface matters as much for specialists as it does for your less tech-savvy employees. Your analytics should be able to pose and answer questions across all data quickly and organically so that they become an extension of the analysts thought process. A natural interface can be more important than any individual functionality. With smooth and efficient interactions between tools and users, analysts and decisionmakers make more and better decisions faster, which is the ultimate goal of analytics. Architecture Approach: Some solutions require you to establish entire architectures just to support them. This is not a good approach. Other solutions are their own stand-alone islands and expect you to get all data into their closed system for them to do analysis. This might be ok for some missions, but in most cases you will want systems that work with your existing enterprise architecture and are able to securely move data in and out of the analytical tool. Your architecture should also help drive the interface into the capability. In most cases, every user in your organization will have a browser on their device already. Shouldnt that be the interface into all your new analytical capabilities as well? Bottom line here: The solution you choose should work with your architecture and should not force you to re-engineer. Expect the new solution to integrate well with what you already have. Data Architecture: Common standards for data are already key foundational components of most organization IT strategies. But integration of new tools can be complicated, requiring extensive set-up and configuration to extract, transform and load data from multiple sources. Tools that require large teams of programmers to build ETL accesses into existing data stores are not going to have the agility required to take advantage of new data sources or to accommodate shifting mission needs or new

A White Paper For The Federal IT Community

business plans. Look for Big Data analytical tools that do not require complex data mappings and schema development that are time consuming and lock your architecture into a fixed way of work. Look for tools that are designed to work with any type of data (they should be data source agnostic). Systems that force data to be collected again and imported into their local store in set formats and indices designed only for that systems use are sub-optimal and will limit your ability to perform your mission with the flexibility you want. Seek a capability that has designed in an ability to add new data fast, without a need for engineers to design and activate the new data feed. Demand integration without limits. Analytical Models: Analytical systems designed to help with complex issues use ontologies. These are ways of reflecting associations and meanings. Ontologies are sometimes called world views of an organization, since they reflect concepts in the environment that the group is dealing with. Simple, basic systems can be found that use a single ontology system. These are ok as long as the problem you analyze will never change. Multi-ontology systems enable you to see different perspectives and manage policy by namespace. Multi-ontology systems also better enable discovery of new conclusions. The ability to have multiple models allows multiple issues to be worked, and multiple organizations can make use of the same tool. This lowers overall cost and speeds return on investment. Bottom line here: Do not select a tool that forces you to lock in on a particular analytical model. Licensing: User organizations should, to the greatest extent possible, push for licensing that is as economical, flexible and predictable. For many analytical tools a license based on number of users is a common approach. Some tools license based on the number of processors or servers or cores so you can be stuck with a high cost even if you have no one using the tool. You want systems from companies that are motivated to serve users, so licenses that reflect actual analytics used, regardless of processors or users, are the most flexible and are generally the best for this type of tool. For example, if the mission team needs to be drastically expanded in a short period of time, it may slow the project down while more licenses are acquired. Also user licenses are, in most cases, acquired for longer periods of time than mission list, so when you compare options, this sort of choice can be significantly lower cost to start and to maintain. You should also be careful about other licenses that are hidden when you buy a Big Data tool. For example, are you also required to buy an Oracle or Sybase license? Security and Governance: Enterprises require authentication, authorization, auditing and other governance of tools for effective oversight of mission support and for ensured reliability. Expect the capability you support to have options for LDAP/Active Directory integration, role-based access with delegation, integrated encryption methods and strong audit capabilities. Tools working with Hadoop


clusters should have an ability to run in the secure areas of your network that hold the Hadoop master and slave nodes. Partner and Legacy Ecosystem: Your legacy IT infrastructure comes from a wide range of firms. Any organization of size will have software that operates over datastores from companies like Oracle, Microsoft, Sybase, MySQL, IBM, Cloudera and countless others. And analytical tools from a wide range of vendors are also in your ecosystem. This means any Big Data capability you pick should have great flexibility in working with others in the ecosystem. Your Big Data solution must be able to work with anyone. So the Big Data capabilities you pick should be designed to enable customization and extension. This includes an ability to change ontologies, change interfaces, change data sources and change the other tools that it interfaces to. Deployment Models: The capabilities you acquire should be able to run without a large contractor staff. Specialists are frequently required to install a capability and some level of services and support to your team can be expected, but if you must buy a large number of engineers to keep the Big Data tools running then you really have not bought a solution. You have bought the capability plus engineers, and the cost of that will eat you alive. If you are told that engineers are required it should send up other alarms. Will there always have to be a wizard behind the curtain? Health of the Firm: Who are you buying your capability from? Are they a user-focused organization that cares and will be with you long term? This can be hard to evaluate but it is worth some homework. What if the firm you are dealing with has the great reputation of an Enron pre-crash? How would you know as a potential user if the firm has the ethics and abilities you require? Is this firm having trouble staying afloat? If you are relying on the company for support, you may lose your investment if it closes its doors. This is why the government mandates market research to be done in Federal Acquisition Regulations. Never skip that step! Research the capability itself and the firm you are doing business with.

Concluding Thoughts
There are many other criteria you may want to consider for evaluating Big Data analytical tools, but the ten above are key for ensuring long term mission success. We also believe it is important to speak with others who have used the tools you are evaluating to get the benefit of the lessons learned of others. This is especially important in the current budget environment.

A White Paper For The Federal IT Community

More Reading
For more federal IaaS technology and policy issues visit: CTOvision.com- A blog for enterprise technologists with a special focus on Big Data. CTOlabs.com - A reference for research and reporting on all IT issues. Carahsoft.com - Offering Big Data solutions for Government.

About the Authors

Ryan Kamauff is the lead technology research analyst at Crucial Point LLC, focusing in disruptive technologies of interest to enterprise technologists. He is also a writer at CTOvision.com Contact Ryan at Ryan@crucialpointllc.com Bob Gourley is CTO and founder of Crucial Point LLC and editor and chief of CTOvision.com He is a former federal CTO. Contact Bob at bob@crucialpointllc.com

For More Information

If you have questions or would like to discuss this report, please contact me. As an advocate for better IT in government, I am committed to keeping the dialogue open on technologies, processes and best practices that will keep us moving forward. Contact: Bob Gourley bob@crucialpointllc.com 703-994-0549 All information/data 2011 CTOLabs.com.