Наслаждайтесь миллионами электронных книг, аудиокниг, журналов и других видов контента

Только $11.99 в месяц после пробной версии. Можно отменить в любое время.

Learning Social Media Analytics with R

Learning Social Media Analytics with R

Читать отрывок

Learning Social Media Analytics with R

722 страницы
7 часов
26 мая 2017 г.


About This Book
  • A practical guide written to help leverage the power of the R eco-system to extract, process, analyze, visualize and model social media data
  • Learn about data access, retrieval, cleaning, and curation methods for data originating from various social media platforms.
  • Visualize and analyze data from social media platforms to understand and model complex relationships using various concepts and techniques such as Sentiment Analysis, Topic Modeling, Text Summarization, Recommendation Systems, Social Network Analysis, Classification, and Clustering.
Who This Book Is For

It is targeted at IT professionals, Data Scientists, Analysts, Developers, Machine Learning Enthusiasts, social media marketers and anyone with a keen interest in data, analytics, and generating insights from social data. Some background experience in R would be helpful, but not necessary, since this book is written keeping in mind, that readers can have varying levels of expertise.

26 мая 2017 г.

Об авторе

Связано с Learning Social Media Analytics with R

Похожие Книги

Похожие статьи

Предварительный просмотр книги

Learning Social Media Analytics with R - Dipanjan Sarkar

Table of Contents

Learning Social Media Analytics with R


About the Author

About the Reviewer


eBooks, discount offers, and more

Why subscribe?

Customer Feedback


What this book covers

What you need for this book

Who this book is for


Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book




1. Getting Started with R and Social Media Analytics

Understanding social media

Advantages and significance

Disadvantages and pitfalls

Social media analytics

A typical social media analytics workflow

Data access

Data processing and normalization

Data analysis




Getting started with R

Environment setup

Data types

Data structures







Built-in functions

User-defined functions

Controlling code flow

Looping constructs

Conditional constructs

Advanced operations






Visualizing data

Next steps

Getting help

Managing packages

Data analytics

Analytics workflow

Machine learning

Machine learning techniques

Supervised learning

Unsupervised learning

Text analytics


2. Twitter – What's Happening with 140 Characters

Understanding Twitter


Registering an application

Connecting to Twitter using R

Extracting sample Tweets

Revisiting analytics workflow

Trend analysis

Sentiment analysis

Key concepts of sentiment analysis


Sentiment polarity

Opinion summarization


Sentiment analysis in R

Follower graph analysis



3. Analyzing Social Networks and Brand Engagements with Facebook

Accessing Facebook data

Understanding the Graph API

Understanding Rfacebook

Understanding Netvizz

Data access challenges

Analyzing your personal social network

Basic descriptive statistics

Analyzing mutual interests

Build your friend network graph

Visualizing your friend network graph

Analyzing node properties




Analyzing network communities



Analyzing an English football social network

Basic descriptive statistics

Visualizing the network

Analyzing network properties


Page distances




Analyzing node properties




Visualizing correlation among centrality measures

Eigenvector centrality


HITS authority score

Page neighbours

Analyzing network communities



Analyzing English Football Club's brand page engagements

Getting the data

Curating the data

Visualizing post counts per page

Visualizing post counts by post type per page

Visualizing average likes by post type per page

Visualizing average shares by post type per page

Visualizing page engagement over time

Visualizing user engagement with page over time

Trending posts by user likes per page

Trending posts by user shares per page

Top influential users on popular page posts


4. Foursquare – Are You Checked in Yet?

Foursquare – the app and data

Foursquare APIs – show me the data

Creating an application – let me in

Data access – the twist in the story

Handling JSON in R – the hidden art

Getting category data – introduction to JSON parsing and data extraction

Revisiting the analytics workflow

Category trend analysis

Getting the data – the usual hurdle

The required end point

Getting data for a city – geometry to the rescue

Analysis – the fun part

Basic descriptive statistics – the usual

Recommendation engine – let's open a restaurant

Recommendation engine – the clichés

Framing the recommendation problem

Building our restaurant recommender

The sentimental rankings

Extracting tips data – the go to step

The actual data

Analysis of tips

Basic descriptive statistics

The final rankings

Venue graph – where do people go next?

Challenges for Foursquare data analysis


5. Analyzing Software Collaboration Trends I – Social Coding with GitHub

Environment setup

Understanding GitHub

Accessing GitHub data

Using the rgithub package for data access

Registering an application on GitHub

Accessing data using the GitHub API

Analyzing repository activity

Analyzing weekly commit frequency

Analyzing commit frequency distribution versus day of the week

Analyzing daily commit frequency

Analyzing weekly commit frequency comparison

Analyzing weekly code modification history

Retrieving trending repositories

Analyzing repository trends

Analyzing trending repositories created over time

Analyzing trending repositories updated over time

Analyzing repository metrics

Visualizing repository metric distributions

Analyzing repository metric correlations

Analyzing relationship between stargazer and repository counts

Analyzing relationship between stargazer and fork counts

Analyzing relationship between total forks, repository count, and health

Analyzing language trends

Visualizing top trending languages

Visualizing top trending languages over time

Analyzing languages with the most open issues

Analyzing languages with the most open issues over time

Analyzing languages with the most helpful repositories

Analyzing languages with the highest popularity score

Analyzing language correlations

Analyzing user trends

Visualizing top contributing users

Analyzing user activity metrics


6. Analyzing Software Collaboration Trends II - Answering Your Questions with StackExchange

Understanding StackExchange

Data access

The StackExchange data dump

Accessing data dumps

Contents of data dumps

Quick overview of the data in data dumps



Getting started with data dumps

Data Science and StackExchange

Demographics and data science



7. Believe What You See – Flickr Data Analysis

A Flickr-ing world

Accessing Flickr's data

Creating the Flickr app

Connecting to R

Getting started with Flickr data

Understanding Flickr data

Understanding more about EXIF

Understanding interestingness – similarities

Finding K

Elbow method

Silhouette method

Are your photos interesting?

Preparing the data

Building the classifier



8. News – The Collective Social Media!

News data – news is everywhere

Accessing news data

Creating applications for data access

Data extraction – not just an API call

The API call and JSON monster

HTML scraping from the links – the bigger monster

Sentiment trend analysis

Getting the data – not again

Basic descriptive statistics – the usual

Numerical sentiment trends

Emotion-based sentiment trends

Topic modeling

Getting to the data

Basic descriptive analysis

Topic modeling for Mr. Trump's phases

Cleaning the data

Pre-processing the data

The modeling part

Analysis of topics

Summarizing news articles

Document summarization

Understanding LexRank

Summarizing articles with lexRankr

Challenges to news data analysis



Learning Social Media Analytics with R

Learning Social Media Analytics with R

Copyright © 2017 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: May 2017

Production reference: 1220517

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78712-752-4




Raghav Bali

Dipanjan Sarkar

Tushar Sharma


Karthik Ganapathy

Commissioning Editor

Amey Varangaonkar

Acquisition Editor

Tushar Gupta

Content Development Editor

Amrita Noronha

Technical Editor

Akash Patel

Copy Editors

Vikrant Phadkay

Safis Editing

Project Coordinator

Shweta H Birwatkar


Safis Editing


Pratik Shirodkar


Tania Dutta

Production Coordinator

Shantanu Zagade

Cover Work

Shantanu Zagade

About the Author

Raghav Bali has a master's degree (gold medalist) in information technology from International Institute of Information Technology, Bangalore. He is a data scientist at Intel, the world's largest silicon company, where he works on analytics, business intelligence, and application development to develop scalable machine learning-based solutions. He has worked as an analyst and developer in domains such as ERP, finance, and BI with some of the top companies of the world.

Raghav is a technology enthusiast who loves reading and playing around with new gadgets and technologies. He recently co-authored a book on machine learning titled R Machine Learning by Example, Packt Publishing. He is a shutterbug, capturing moments when he isn't busy solving problems.

I would like to express my gratitude to my family, teachers, friends, colleagues and mentors who have encouraged, supported and taught me over the years. I would also like to take this opportunity to thank my co-authors and good friends Dipanjan Sarkar and Tushar Sharma, who made this project a memorable and 
enjoyable one.

I would like to thank Tushar Gupta, Amrita Noronha, Akash Patel, and Packt for the opportunity and their support throughout this journey. Last but not least, thanks to the R community for the amazing stuff that they do!

Dipanjan Sarkar is a data scientist at Intel, the world's largest silicon company, on a mission to make the world more connected and productive. He primarily works on data science, analytics, business intelligence, application development, and building large-scale intelligent systems. He holds a master of technology degree in information technology with specializations in data science and software engineering from the International Institute of Information Technology, Bangalore.

Dipanjan has been an analytics practitioner for over 5 years now, specializing in statistical, predictive, and text analytics. He has also authored several books on machine learning and analytics including R Machine Learning by Example and What you need to know about R, Packt. Besides this, he occasionally spends time reviewing technical books and courses. Dipanjan's interests include learning about new technology, financial markets, disruptive start-ups and data science. In his spare time he loves reading, gaming, watching popular sitcoms and football.

I am indebted to my parents, partner, friends, and well-wishers for always standing by my side and supporting me in all my endeavors. Your support keeps me going day in and day out to take on new challenges! I would also like to thank my good friends and fellow colleagues, Raghav Bali and Tushar Sharma, for co-authoring and making the experience more enjoyable. Last but never the least, I would like to thank Tushar Gupta, Amrita Noronha, Akash Patel, and Packt for giving me this wonderful opportunity to share my knowledge and experiences with analytics and R enthusiasts out there who are doing truly amazing things every day. And a big thumbs up to the R community for building an excellent analytics ecosystem.

Tushar Sharma has a master's degree specializing in data science from the International Institute of Information Technology, Bangalore. He works as a data scientist with Intel. In his previous job he used to work as a research engineer for a financial consultancy firm. His work involves handling big data at scale generated by the massive infrastructure at Intel. He engineers and delivers end to end solutions on this data using the latest machine learning tools and frameworks. He is proficient in R, Python, Spark, and mathematical aspects of machine learning among other things.

Tushar has a keen interest in everything related to technology. He likes to read a wide array of books ranging from history to philosophy and beyond. He is a running enthusiast and likes to play badminton and tennis.

I would like to express my gratitude to my family, teachers and friends who have encouraged, supported and taught me over the years. Special thanks to my classmates, friends, and colleagues, Dipanjan Sarkar and Raghav Bali for co-authoring and making this journey wonderful through their input and eye for detail.

I would like to thank Tushar Gupta, Amrita Noronha, and Packt for the opportunity and their support throughout the journey.

About the Reviewer

Karthik Ganapathy is an analytics professional with over 12 years of professional experience in analytics, predictive modeling, and project management. He has worked with several Fortune 500 clients and helped them derive business value using data.

I would like to thank my wife Sudharsana and my daughter 
Amrita for being a great support during the period I was 
reviewing the content.


eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.


Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1787127524. If you'd like to join our team of regular reviewers, you can e-mail us at customerreviews@packtpub.com. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!


The Internet has truly grown to be humongous, especially in the last decade, with the rise of various forms of social media that give users a platform to express themselves and also communicate and collaborate with each other. The current social media landscape is a complex mesh of social network platforms and applications, catering to specific audiences with unique as well as overlapping features. Each of these social networks are potential gold mines of data which are being (and can be) used to study, leverage and improve our understanding of demographics, behaviors, collaboration, user engagement, branding and so on across different domains and spheres of our lives.

This book will help the reader to understand the current social media landscape and help in understanding how analytics and machine learning can be leveraged to derive insights from social media data. It will enable readers to utilize R and its ecosystem to visualize and analyze data from different social networks. This book will also leverage machine learning, data science and other advanced concepts and techniques to solve real-world use cases spread across diverse social network domains including Twitter, Facebook, GitHub, FourSquare, StackExchange, Flickr, and more.

What this book covers

Chapter 1, Getting Started with R and Social Media Analytics, builds on foundations related to social media platforms and analyzing data relevant to social media. A concise introduction to R is given, including coverage of R syntax, data constructs, and functions. Basic concepts from machine learning, data analytics, and text analytics are also covered, setting the tone for the content in subsequent chapters.

Chapter 2, Twitter – What's Happening with 140 Characters, sets the theme for social media analytics with a focus on Twitter. It leverages R packages to extract and analyze Twitter data to uncover interesting insights through multiple use-cases, involving machine learning techniques such as trend analysis, sentiment analysis, clustering, and social graph analysis.

Chapter 3, Analyzing Social Networks and Brand Engagements with Facebook, focuses on analyzing data from perhaps the most popular social network in the world—Facebook! Readers will learn how to use the Graph API to retrieve data as well as use frameworks such as Netvizz to extract brand page data. Techniques to analyze personal social networks will be covered in detail. Besides this, readers will gain conceptual knowledge about social network analysis and graph theory. This knowledge will be used in action by analyzing a huge network of football brand pages to understand relationships, page engagement, and popularity.

Chapter 4, Foursquare – Are You Checked in Yet?, targets the popular social media channel Foursquare. Readers will learn how to collect this data using the Foursquare APIs. Steps for visualizing and analyzing this data will be depicted to uncover insights into user behavior. This data will be used to define and solve some analytics use-cases, which include sentiment analysis, graph analytics, and much more.

Chapter 5, Analyzing Software Collaboration Trends I – Social Coding with GitHub, introduces the popular social coding and collaboration platform GitHub for analyzing software collaboration trends. Readers will gain insights into using the GitHub API from R to extract useful data pertaining to users and repositories. Detailed analyzes of repository activity, repository trends, language trends, and user trends will be presented with real-world examples.

Chapter 6, Analyzing Software Collaboration Trends II – Answering Your Questions with StackExchange, introduces the StackExchange platform through its data organization and access methods. Readers learn and uncover interesting collaboration, demographic, and other patterns through use cases which leverage visualizations and different analysis techniques learned in previous chapters.

Chapter 7, Believe What You See – Flickr Data Analysis, presents Flickr through its APIs and uses some amazing packages such as piper, dplyr, and so on to extract data and insights from some complex data formats. The chapter also leverages machine learning concepts like clustering and classification to better understand Flickr.

Chapter 8, News – The Collective Social Media!, deals with analysis of free and unstructured text. Readers will learn how to collect news data from web sources using methodologies like scraping. The basic analysis on the textual data will consist of various statistical measures. Readers will also gain hands-on knowledge on advanced analysis like sentiment analysis, topic modeling, and text summarization on news data based on some interesting use cases.

What you need for this book

Who this book is for

This book is for IT professionals, data scientists, analysts, developers, machine learning enthusiasts, social media marketers, and anyone with a keen interest in data, analytics, and generating insights from social data. Some background experience in R would be helpful but is not necessary. The book has been written keeping in mind the varying levels of expertise of its readers. It also includes links, pointers, and exercises for intermediate to advanced readers to explore further.


In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: We can include other contexts through the use of the include directive.

A block of code is set as follows:

# create data frame

df <- data.frame(

  name = c(Wade, Steve, Slade, Bruce),

  age = c(28, 85, 55, 45),

  job = c(IT, HR, HR, CS)


New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: selecting them from the Add filters... option box.


Warnings or important notes appear in a box like this.


Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to feedback@packtpub.com, and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.

Hover the mouse pointer on the SUPPORT tab at the top.

Click on Code Downloads & Errata.

Enter the name of the book in the Search box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Learning-Social-Media-Analytics-with-R. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/LearningSocialMediaAnalyticswithR_ColorImages.pdf.


Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.


Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <copyright@packtpub.com> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.


You can contact us at <questions@packtpub.com> if you are having a problem with any aspect of the book, and we will do our best to address it.

Chapter 1. Getting Started with R and Social Media Analytics

The invention of computers, digital electronics, social media, and the Internet have truly ushered us from the industrial age into the information age. The Internet, and more specifically the invention of World Wide Web in the early 1990s, helped people to build an inter-connected universal platform where information can be stored, shared and consumed by anyone with an electronic device capable of connecting to the Web. This has led to the creation of vast amounts of information, ideas and opinions which people, brands, organizations and businesses want to share with everyone around the world. So, social media was born which provides interactive platforms to post content, share ideas, messages and opinions about everything under the sun.

This book will take you on a journey to understand various popular social media, analyzing rich data generated by these media and gaining valuable insights. We will focus on social media which cater to audiences in different forms, like micro-blogging, social networking, software collaboration, news and media sharing platforms. The main objective is to use standardized data access and retrieval techniques using social media application programming interfaces (APIs) to gather data from these websites and apply different data mining, statistical and machine learning, and natural language processing techniques on the data by leveraging the R programming language. This book will provide you with the tools, techniques, and approaches which would help you achieve the same. This introductory chapter will cover several important concepts which would help you get a jumpstart on social media analytics. They are mentioned as follows:

Social media – significance and pitfalls

Social media analytics – opportunities and challenges

Getting started with R

Data analytics

Machine learning

Text analytics

We will look at social media, the various forms of social media which exist today, and how it has impacted our society. This will help us understand the entire scope pertaining to social media analytics and the opportunity presented by it which would be valuable for consumers as well as businesses and brands. Concepts related to analytics, machine learning and text analytics coupled with hands on examples depicting the various features of the R programming language will help you get a grip on essential things which are necessary for the rest of this book. Without further delay, let's get started!

Understanding social media

The Internet and the information age have been responsible for revolutionizing the way we humans interact with each other in the 21st Century. Almost everyone uses some form of electronic communication, be it a laptop, tablet, smartphone or a personal computer. Social media is built upon the concept of platforms where people use computer-mediated communication (CMC) methods to communicate with others. This can range from instant messaging, emails, and chat rooms to social forums and social networking. To understand social media, you need to understand the origins of legacy or traditional media which gradually evolved into social media. Entities like the popular television, newspapers, radio, movies, books and magazines are various ways of sharing and consuming information, ideas and opinions. It's important to remember that social media has not replaced the older legacy based media; they co-exist peacefully together as we use and consume them both in our day-to-day lives.

Legacy media typically follow a one-way communication system. For instance, I can always read a magazine or watch a show on the television or get updated about the news from newspapers, but I cannot voice my opinions or share my ideas using the same media instantly. The communication mechanism in the various forms of social media is a two-way street, where audiences can share information and ideas and others can consume them and voice their own ideas, opinions and feedback on the same, and even share their own content based on what they see. Legacy based media, like radio or television, now use social media to provide a two-way communication mechanism to support their communications, but it's much more seamless in social media where anyone and everyone can share content, communicate with others, freely voice their ideas and opinions on a huge scale.

We can now formally define social media as interactive applications or platforms based on the principles of Web 2.0 and computer-mediated communication, which enable users to be publishers as well as consumers, to create and share ideas, opinions, information, emotions and expressions in various forms. While different and diverse forms of social media exist, they have several key features in common which are mentioned briefly as follows:

Web 2.0 Internet based applications or platforms

Content is created as well as consumed by users

Profiles give users have their own distinct and unique identity

Social networks help connect different users, similarly to communities

Indeed social media give users their own unique identity and the freedom to express themselves in their own user profiles. These profiles are maintained as accounts by social media companies. Features like what you see is what you get (WYSIWYG) editors, emoticons, photos and videos help users in creating and sharing rich

Вы достигли конца предварительного просмотра. , чтобы узнать больше!
Страница 1 из 1


Что люди думают о Learning Social Media Analytics with R

0 оценки / 0 Обзоры
Ваше мнение?
Рейтинг: 0 из 5 звезд

Отзывы читателей