Вы находитесь на странице: 1из 12

WHITE PAPER

Using Big Data technologies to enable social media analytics

Abstract
In this white paper, Impetus talks about the need for building Big Data technologies based social analytics platform for better business insight. The paper also focuses on why social media analytics is important in todays world and how 3-D data sourcesthat is, internal, external and social datacan be utilized to build a data warehouse based on Big Data technologies. Impetus also shares in this white paper, its recommended solution, and how Big Data technologies can be used to optimize costs and handle and exponential increases in data over time.

Impetus Technologies Inc. www.impetus.com February 2012

Using Big Data technologies to enable social media analytics

Table of Contents
Introduction .................................................................................................................................................. 2 The benefits of Social Analytics .................................................................................................................... 4 Data sources that facilitate Social Media Analytics ...................................................................................... 5 Technical tenets of Social Media Analytics ................................................................................................... 5 Using Big Data technologies to enable Social Media Analytics .................................................................... 7 Building a Big Data warehouse ..................................................................................................................... 8 A step-by-step approach to creating the Big Data EDW ............................................................................... 9 The Impetus solution .................................................................................................................................. 11 The iLaDaP high level architecture.......................................................................................................... 11 Summary ..................................................................................................................................................... 12

Introduction
Social Media Analytics is a discipline that helps organizations measure, assess and explain the performance of their social media initiatives. There are four stages of analyzing social media data, including the following: Step 1: collecting the data. This facilitates the compiling of reports and statistics that are to be shared with the management or the internal and external stakeholders. Step 2: measuring the data. This helps in Sentiment Analysis and gauging which products are well received in the marketplace. Step 3: analysis. Here, data is presented in a visual and interactive manner to the management, as well as the sales and marketing teams to provide better insights. Step 4: innovation. Based on the insights and analysis, there is a move towards innovation, where organizations determine the new products and ideas they are going to pursue, as a response to customer requirements. Innovation also helps unearth the cross sell or up sell opportunities that were not visible before. Social Analytics opens up a host of new opportunities and perspectives. Category-wise analysis of customer data for instance, enables their
2

Using Big Data technologies to enable social media analytics

demographic profiling and helps determine their usage patterns. Similarly, with Feature analysis, it is possible to figure out which forums, platforms or sources of data are more active as compared to others. Product Growth Analysis, which focuses on the data generated for a specific product, helps understand the response of users to that product. There is also a Recommendation Engine, which helps zero in on what is missing or lacking in a product range.

Finally, Social Analytics enables Third Party Analysis, which is purely focused on what the public social media platforms, such as Twitter, Facebook, MySpace, etc. have to say about the product.

Using Big Data technologies to enable social media analytics

The benefits of Social Analytics


Social Analytics is an outcome-based approach and one which creates visible Return on Investment (RoI). It helps organizations retain customers by addressing their concerns upfront, rather than being slaves to processes. The results of the analytics help organizations retain brand preference in a fickle consumer world. It improves customer service and brings down the cost of operations. It enables organizations to add new customers, by understanding and addressing their requirements Social Analytics helps companies keep an eye on their competition. With easy access to social media data, it is simple to track and counter the moves of competitors. It helps companies remain proactive. The turnaround time for gathering customer feedback is reduced drastically. Moreover, the reactions of customers and their subsequent actions can be predicted more accurately, enabling organizations to take appropriate measures.

Using Big Data technologies to enable social media analytics

Social Media Analytics effectively converges on-site, social media and third party data to extract useful information. Considering these factors, and the fact that it enables enterprises to leverage the colossal data that is continuously generated through social media interactions, Social Media Analytics should be made an integral part of the marketing and research strategies of enterprises.

Data sources that facilitate Social Media Analytics


Data sources include internal data, such as the purchase history of customers, their transactions, and profiles in the enterprise database. It also encompasses website traffic analysis, covering internal CSR logs, customer queries, automated agent discussions, complaints and resolutions, and employee insights. Data sources can also be the social activities and profile updates of customers on public social media platforms such as Twitter, Facebook, Myspace, LinkedIn, etc. External data sources can additionally be used, and customers analyzed by factoring in industry sources of information and market research reports.

Technical tenets of Social Media Analytics


Heres a look at what Social Media Analytics entails and enables: Clustering: Clustering is about capturing and analyzing various comments, demands, and questions that customers share with like-minded friends and groups, over social media platforms. It helps identify the appropriate response and behavioral anomalies. Classification: Having captured data on the activities of customers and their comments, it is possible to perform natural language processing on it to evolve patterns. These patterns can then be categorized and understood for appropriate responses. Organizations can use Classification to address the concerns of customers and approach them with products and offerings that really meet their needs. Sequential classification: This enables organizations to identify the subsequent steps and actions that customers might take, based on their recent experiences.

Using Big Data technologies to enable social media analytics

Entity Extraction: Organizations can identify the concerns and issues that dissatisfied customers are struggling with through Entity Extraction. They can then take appropriate measures to ease the situation and retain customers on the verge of switching to other suppliers or vendors. Event Extraction enables companies to unearth the sequence of events leading up to customer defections, or why people moved on to other providers. Communications Graphs: Once organizations have all the data nicely sliced and diced, they can draw Communication Graphs. These graphs can help analyze and identify the top influencers, and active members in various groups. They can also help companies gain a better understanding of where the messages originate, and how they travel through the network. Knowing this, organizations can target the top influencers and most active members in the network, projecting a positive image of the brand or product in the community.

Using Big Data technologies to enable social media analytics

Using Big Data technologies to enable Social Media Analytics


One of the biggest challenges that organizations face with their social media data is its humungous size. Existing Enterprise Data warehousing (EDW) environments, designed decades ago, simply lack the ability to capture, and process social media data within a reasonable time. Moreover, these traditional EDWs have limited capabilities when it comes to analyzing the behavioral data of users. Traditional solutions cannot help companies in managing complex and unstructured data generated by social media interactions nor handle multimedia data. Using Big Data technologies is their best bet in this scenario. Big Data technologies can help organizations handle large volumes of complex, unstructured data from social sources, of the order of terabytes and petabytes, gain insights into customers and trends, store images and videos, and save hundreds of thousands of dollars per terabyte per year. Take the instance of a Big Data Social Analytics Platform which has to deal with information from various data sources such as Social Media sites and web 2.0 enabled websites. The Platform can also pull historical bulk data lying around in existing systems using appropriate connectors. The connectors enable the conversion of the data from all kinds of data sources into a Hadoop-based data warehouse. After collecting this data, Apaches Mahout, a scalable machine learning and data mining solution, can be used to categorize the data and store it in accordance with the categories for later use. It is also possible to run Map-Reduce jobs that use Natural Language Tool Kits (NLTK) to perform natural language processing of the comments and feedback from the social data sources. The aptly massaged and categorized data can then be used to draw graphs, and analyze market sentiment about a product. The data can be used for MIS and to compile regulatory reports that need to be produced on a regular basis using Sqoop. Since the Big Data Social Analytics is powered by Hadoop, it can linearly scale up to thousands of nodes using commodity hardware. This spells a significant cost advantage for organizations, in the long run. Since it is important for businesses to track down, and take advantage of opportunities quickly, this platform can enable them to react to the events as they happen.

Using Big Data technologies to enable social media analytics

Building a Big Data warehouse


In order to build a Big Data warehouse that extracts data from the sources discussed earlier, and draw pertinent insights from it, organizations must begin by grabbing social media data from various public social media platforms. The historical master data and transactional data about customers can be taken from existing systems. Sqoop can come in handy for pulling out the data into the RDBMS systems, which are already in place.
Text User Location Source

Gift card Free offer

TweetUser FaceUser

USA, NY USA, GA

Twitter Facebook

Using Big Data technologies to enable social media analytics

For natural language processing, using a NLTK is a good Open Source option. Data preparation/Mashups can be accomplished by running Map-Reduce jobs over the collected data and massaging it. Apache Mahouts k-means algorithm can be used for clustering, while its Nave Bayesian algorithm can be used for classification/sentiment analysis using the comments and tweets from social media data sources and identifying patterns. The item-based similarity algorithm of Mahout can be used for collaborative filtering and recommendations. When the data is ready for analytical reporting and deep mining, Hive or Pig can be used.

A step-by-step approach to creating the Big Data EDW


Step 1: The first step is to create and run training data through Mahout to help it understand how to classify social data feeds. Next, the feeds have to be collected from public social media platforms. This can be accomplished by performing keyword based searches and streaming in the result sets on a continuous basis. It is possible now to search on the basis of a brand name, product make and model, category, industry terminology, product segment, special offers and marketing buzzwords, using the various APIs offered by social media platforms. This classified data can then be dumped into an HBASE-based data warehouse constantly and continuously. The data from existing systems can also be imported into the HBASE base Big Data warehouse. Online content can be crawled and dumped into the HBASE database. Connectors are available for classification of online pages. Lucene and Solr are very suitable for this purpose. Step 2: At this stage, quantitative analytics can be performed on the collected data. It is possible to draw comparisons between Total tweets versus Our product specific tweets. This is accomplished by using Mahout algorithms over a Hadoop cluster. Organizations can also publish a daily trend watch. This may contain the total number of comments about the products of their competitors, versus the total number of comments about their own products. With customers increasingly using devices for connecting to social media platforms, it is now possible to perform location-based trend analysis. Classification and clustering is performed by using Mahout/NLTK processed data. Organizations can run the training data through Mahout/NLTK to help it understand how to build trained models. After that, it is possible to run the tweets and feed from other social media platforms through trained models, and have the tweets and comments classified. This provides a clear picture of the sentiments prevailing in the marketplace for the products of organizations as well as their competitors.
9

Using Big Data technologies to enable social media analytics

Companies can come up with recommendations by running the data through Mahout. These recommendations can then be factored into future product design and rollouts. Step 3: This step is about using customer data to recommend new and related products. Once companies have data from their existing systems as well as social sources, they can prepare the mock customer data for Social ID mapping and run Item or User based recommendations on this data using Mahout. At this stage, it is possible to produce Analytical Reports on data generated by Mahout. This can be accomplished by generating reports using a traditional Reporting product or framework. The nicely sliced and diced reporting data can be dumped into a MySQL database or some other SQL database, with the help of Sqoop. This SQL database can be used to meet the regular downstream reporting requirements of organizations. This will enable them to use their existing investments in reporting tools as well as provide the drill down reports for use by the management and Sales and Marketing departments. Alongside social media, this Big Data Media Analytics platform can be used to address other large data analytics requirements. The platform can give companies a head start in putting together the pieces of their Big Data strategy and provide them with an asymmetric advantage over competition.

10

Using Big Data technologies to enable social media analytics

The Impetus solution


Impetus has used this approach and technologies to build a platform for Social Media Analytics. Impetus, an established thought leader in the Big Data space has conceptualized, architected and built this platform based on the experience and expertise that it has gained through its client engagements.

The iLaDaP high level architecture


The Large Data Analytics Platform developed by Impetus is built using the Service Oriented Architecture (SOA), and incorporates all the key characteristics of an ideal Big Data Analytics Platform. The iLaDaP is designed to derive intelligence and operate on huge datasets collected from numerous data sources in multiple data formats.

It is powered by Hadoop, and therefore, can linearly scale up to thousands of nodes using commodity hardware. This spells a significant cost advantage in the long run. iLaDaP also comes with a set of pre-canned and customized reports. Businesses that need to track down and take advantage of opportunities as they happen can use the Impetus platform to react to events. The iLaDaP is also capable of collecting data from a range of disparate sources. This unstructured data can be transformed and utilized for strategic business decisions. Furthermore, organizations can deploy the solution on-premise, as well as in a Cloud supported setup. iLaDaP can be seamlessly integrated with the current platforms of companies, without making any major changes.

11

Using Big Data technologies to enable social media analytics

Summary
Traditional Enterprise Data Warehouses do not have the ability to keep up with rapidly increasing social media data. The need of the hour is to effectively strategize and build a Big Data Analytics Platform to manage, store and derive insights from this digital data. Any single vendor technology may not be sufficient to undertake this task, and it is recommended that organizations go for Open Source options to build a Social Media Analytics Platform using Big Data technologies. The fact is that the success of a Big Data platform depends entire on the tools that are used. Organizations therefore, need to use discretion and select the most appropriate tools from the available options. Companies can also re-use existing EDW investments for their Big Data Analytics Platform.

About Impetus Impetus Technologies offers Product Engineering and Technology R&D services for software product development. With ongoing investments in research and application of emerging technology areas, innovative business models, and an agile approach, we partner with our client base comprising large scale ISVs and technology innovators to deliver cutting-edge software products. Our expertise spans the domains of Big Data, SaaS, Cloud Computing, Mobility Solutions, Test Engineering, Performance Engineering, and Social Media among others. Impetus Technologies, Inc. 5300 Stevens Creek Boulevard, Suite 450, San Jose, CA 95129, USA Tel: 408.252.7111 | Email: inquiry@impetus.com Regional Development Centers - INDIA: New Delhi Bangalore Indore Hyderabad To know more visit: www.impetus.com

Disclaimers
The information contained in this document is the proprietary and exclusive property of Impetus Technologies Inc. except as otherwise indicated. No part of this document, in whole or in part, may be reproduced, stored, transmitted, or used for design purposes without the prior written permission of Impetus Technologies Inc.

12

Вам также может понравиться