Вы находитесь на странице: 1из 108

How To Identify Fumbling Tricks To Try Out On ` 120

To Secure A Network Thunderbird And SeaMonkey


ISSN-2456-4885

Volume: 06 | Issue: 04 | Pages: 108 | January 2018

Storage SolutionS
For Securing Data

The Best Tools For Backing


Up Enterprise Data

Build Your Own Cloud


Storage System Using OSS

Open Source Storage


Solutions You Can Depend On

My Journey:
Niyam Bhushan
One Of India’s Most Passionate
OSS Evangelists
COntEnts January 2018 ISSN-2456-4885

30
FOR U & ME
25 Tricks to Try Out on
Thunderbird and SeaMonkey

AdMin
40 DevOps Series Deploying Graylog
Using Ansible

45 Analysing Big Data with Hadoop

49 Getting Past the Hype Around


Hadoop

52 Open Source Storage Solutions


You Can Depend On

56 Build Your Own Cloud Storage A Hands-on Guide on Virtualisation with VirtualBox
System Using OSS

59 A Quick Look at Cloonix, the


Network Simulator

69 Use These Python Based Tools for 47 61


Secured Backup and Recovery of
Data

73 Encrypting Partitions Using LUKS

dE v E l O p E Rs
82 Machines Learn in Many
Different Ways

85 Regular Expressions in
Programming Languages:
Java for You

90 Explore Twitter Data Using R


Top Three Open Source The Best Tools for Backing Up
Data Backup Tools Enterprise Data
99 Demystifying Blockchains

C O l UM n s
r E G u L a r F E AT U R E S
77 CodeSport

80 Exploring Software: Python is 07 FossBytes 20 New Products 104 Tips & Tricks
Still Special

4 | January 2018 | OPEn SOurCE FOr yOu | www.OpenSourceForu.com


CONTENTS
EDITOR
RAHUL CHOPRA

EDITORIAL, SUBSCRIPTIONS & ADVERTISING


DELHI (HQ)
D-87/1, Okhla Industrial Area, Phase I, New Delhi 110020

65
Ph: (011) 26810602, 26810603; Fax: 26817563
E-mail: info@efy.in
95
MISSING ISSUES
E-mail: support@efy.in

BACK ISSUES
Kits ‘n’ Spares
New Delhi 110020
Ph: (011) 26371661, 26371662
E-mail: info@kitsnspares.com

NEWSSTAND DISTRIBUTION
Ph: 011-40596600
E-mail: efycirc@efy.in
ADVERTISEMENTS
MUMBAI
Ph: (022) 24950047, 24928520
E-mail: efymum@efy.in

BENGALURU
Ph: (080) 25260394, 25260023

How to Identify Using Jenkins to Create


E-mail: efyblr@efy.in

Fumbling to Keep a a Pipeline for Android


PUNE
Ph: 08800295610/ 09870682995
E-mail: efypune@efy.in

GUJARAT Network Secure Applications


Ph: (079) 61344948
E-mail: efyahd@efy.in

JAPAN

22 101
Tandem Inc., Ph: 81-3-3541-4166
E-mail: japan@efy.in

SINGAPORE
Publicitas Singapore Pte Ltd
Ph: +65-6836 2272
E-mail: singapore@efy.in

TAIWAN
J.K. Media, Ph: 886-2-87726780 ext. 10
E-mail: taiwan@efy.in

UNITED STATES
E & Tech Media
Ph: +1 860 536 6677
E-mail: usa@efy.in

Printed, published and owned by Ramesh Chopra. Printed at Tara

“My Love Affair Get Familiar with the


Art Printers Pvt Ltd, A-46,47, Sec-5, Noida, on 28th of the previous
month, and published from D-87/1, Okhla Industrial Area, Phase I, New
Delhi 110020. Copyright © 2018. All articles in this issue, except for

with Freedom” Basics of R


interviews, verbatim quotes, or unless otherwise explicitly mentioned,
will be released under Creative Commons Attribution-NonCommercial
3.0 Unported License a month after the date of publication. Refer to
http://creativecommons.org/licenses/by-nc/3.0/ for a copy of the

Niyam Bhushan, who has


licence. Although every effort is made to ensure accuracy, no responsi- for a free re
placement.
efy.in
bility whatsoever is taken for any loss due to publishing errors. Articles at s
upp
ort@

kickstarted a revolution in UX
us
to
rite Re
that cannot be used are returned to the authors if accompanied by a pr
ope
rly
,w co
mm
en
d

Ubuntu Desktop
(64-bit) e

design in India
k
self-addressed and sufficiently stamped envelope. But no responsibility
dS
r
wo

ys
ot

tem
sn

is taken for any loss or delay in returning the material. Disputes, if any, 17.10
Re
oe

(Live)
Dd

qu
ire
DV

me
this

nts

will be settled in a New Delhi court only.


ase

: P4
In c

, 1G
B RA

Fedora
M, D

MX Linux 17 Workstation 27
VD-RO

tended, and sh
s unin oul
c, i d be
dis
Fedora Workstation is a
M Drive

MX Linux is a cooperative
att
the

rib
on

polished, easy-to-use operating


ute

venture between the antiX and


terial, if found

d to t

DVD OF THE MONTH


former MEPIS communities, system for laptop and desktop
he complex n

which uses the best tools and computers, with a complete


l e ma

talent from each distro set of tools for developers and


SUBSCRIPTION RATES
nab

atu

makers of all kinds


re
tio
ec

of

bj Int
o ern
Any t dat e
Note: a.

Year Newstand Price You Pay Overseas


(`) (`) The latest, stable Linux for
your desktop.
January 2018
Five 7200 4320 —
Three 4320 3030 —

• Ubuntu Desktop 17.10 (Live)


One 1440 1150 US$ 120
CD

Te

106
am
Ubuntu comes with everything you e-m

• Fedora Workstation 27
Kindly add ` 50/- for outside Delhi cheques.
ail:
need to run your organisation, school, cd
tea
home or enterprise m@

Please send payments only in favour of EFY Enterprises Pvt Ltd.


efy.
in

Non-receipt of copies may be reported to support@efy.in—do mention


your subscription number. • MX Linux 17

6 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


FOSSBYTES Compiled By: OSFY Bureau

Juniper Networks Heptio and Microsoft join


reinforces longstanding
the effort to bring Heptio
Ark to Azure
commitment to open source The new collaboration between
Heptio and Microsoft aims to
During the recently organised annual NXTWORK user conference, Juniper ensure Heptio Ark delivers a
Networks announced its intent to move the code base of OpenContrail, an strong Kubernetes disaster-
open source network recovery solution for customers
virtualisation platform for who want to use it on Azure. The
the cloud, to the Linux companies will also work together
Foundation. OpenContrail to make the Ark project an efficient
is a scalable, network solution to move Kubernetes
virtualisation control plane. applications across on-premise
It provides both feature- computing environments and
rich software-defined Azure, and to ensure that Azure-
networking (SDN) and hosted backups are secure.
strong security. The Ark project provides
Juniper first open a simple, configurable and
sourced its Contrail operationally robust way to
products in 2013, and back up and restore applications
built a vibrant user and developer community around this project. In early 2017, and persistent volumes from a
expanded the project’s governance, creating an even more open, community-led series of checkpoints. With the
effort to strengthen the project for its next growth phase. Heptio-Microsoft collaboration,
Adding its code base to the Linux Foundation’s networking projects will the two firms will ensure that
further Juniper’s objective to grow the use of open source platforms in cloud organisations are not only able to
ecosystems. OpenContrail has been deployed by various organisations, including back up and restore content into
cloud providers, telecom operators and enterprises, to simplify operational Azure Container Service (AKS),
complexities and automate workload management across diverse cloud but that snapshots created using
environments, including multi-clouds. Ark are persisted in Azure and are
Arpit Joshipura, vice president of networking and orchestration at the Linux encrypted at rest.
Foundation said, “We are excited at the prospect of our growing global community
being able to broadly adopt, manage and integrate OpenContrail’s code base
to manage and secure diverse cloud environments. Having this addition to our
open source projects will be instrumental in achieving the level of technology
advancements our community has become known for.”
Once the Linux Foundation takes over the governance of OpenContrail’s
code base, Juniper’s mission to ensure the project truly remains
community-led will be fulfilled. As a result, this will accelerate pioneering
advances and community adoption, as well as enable easier and secure “I’m excited to see Heptio
migration to multi-cloud environments. and Microsoft deliver a
compelling solution that satisfies
GIMP 2.9.8 image editor now comes with better PSD support an important and unmet need in
and on-canvas gradient editing the Kubernetes ecosystem,” said
The latest release of the GIMP, the popular open source image editor, introduces Brendan Burns, distinguished
on-canvas gradient editing and various other enhancements while focusing on bug- engineer at Microsoft and co-
fixing and stability. You can now create and delete colour stops, select and shift creator of Kubernetes.
them, assign colours to colour stops, change blending and colouring for segments It will also help manage
between colour stops, etc, from mid-points. disaster recovery for Kubernetes
“Now, when you try to change an existing gradient from a system folder, the cluster resources and
GIMP will create a copy of it, call it a ‘custom gradient’ and preserve it across persistent volumes.
sessions. Unless, of course, you edit another ‘system’ gradient, in which case it

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 7


FOSSBYTES

will become the new


Google’s AI division custom gradient,”
releases open source said Alexandre
update to DeepVariant Prokoudine in the
Google has released the open release announcement.
source version of DeepVariant, He added that, “Since
a deep learning technology to this feature is useful for
reconstruct the true genome more than just gradients,
sequence from HTS sequencer it was made generic
data with significantly greater enough to be used for
accuracy than previous classical brushes and other types of resources in the future. We expect to revisit this in the
methods. This work is the future releases of GIMP.”
product of more than two years The release announcement also states that the PSD plug-in has been fixed to
of research by the Google Brain properly handle Photoshop files with deeply nested layer groups, and preserve
team, in collaboration with Verily the expanded state of groups for both importing and exporting. Additional
Life Sciences. changes fix the mask position and improve layer opacity for importing/exporting.

India’s first FIWARE Lab node to be operational


from April 2018
Addressing the growing demand for smart city applications across India,
NEC Corporation and NEC Technologies India Private Limited (NECTI)
will soon establish a FIWARE Lab node in India. Having a FIWARE Lab
node within India will encourage more participation from Asian countries,
as they can keep all experimental and research data within the boundaries
of the region.
The Brain team programmed FIWARE is an open source platform which enables real-time smart
it in TensorFlow, a library of services through data sharing across verticals and agencies via open
open source programming code standards based APIs. It focuses on specifications for common context
for numerical computation that information APIs, data publication platforms and standard data models
is popular for deep learning in order to achieve and improve cross-sector interoperability for smart
applications. The technology applications, with FIWARE NGSI as a starting point. The technology
works well on data from all is in use in more than 100 cities in 23 countries in Europe and other
types of sequencers and eases regions. It is royalty-free and avoids any vendor lock-in.
the process of transitioning The Lab node in India will help to foster a culture of collaboration
to new sequencers. between various participating entities and promote their solutions in
DeepVariant is being released as the FIWARE community.
open source software to encourage “The FIWARE Foundation welcomes the new FIWARE Lab node starting
collaboration and to accelerate the in India. FIWARE is used by an increasing number of cities in Europe and
use of this technology to solve real other regions and I wish this new
world problems. The software will FIWARE Lab node will trigger
make it easier to receive inputs the adoption of FIWARE both in
from researchers about the use cases India and other APAC countries,”
they were interested in. This is said Ulrich Ahle, CEO of the
part of a broader effort to make the FIWARE Foundation. “It is also
data of genomics compatible with our pleasure to have the commitment of the NEC Technologies India team to
the way deep learning machinery contribute to the FIWARE community, which will strengthen the FIWARE
works. Also, this move will push technology as well as its globalisation as a smart city platform,” he added.
Google technologies to healthcare The facility is expected to start operations from April 2018, and is endorsed
and other scientific applications, by the FIWARE Foundation. Organisations, entrepreneurs and individuals
and make the results of these efforts can use this lab to learn FIWARE, as well as to test their applications while
broadly accessible. capitalising on open data published by cities and other organisations.

8 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


React js

Node js

Angular js

Swift

Mongo DB

Chennai Bangalore California Zurich

994 044 6586 info@railsfactory.com www.railsfactory.com


FOSSBYTES

Red Hat OpenShift Building secure container infrastructure with Kata Containers
Container Platform 3.7 The OpenStack Foundation has announced a new open source project—Kata
released Containers, which aims to unite the security advantages of virtual machines
Red Hat has launched the latest (VMs) with the speed and manageability of container technologies. The
OpenShift Container Platform 3.7, project is designed to be hardware agnostic and compatible with Open
the version of Red Hat’s enterprise- Container Initiative (OCI)
grade Kubernetes container specifications, as well as the
application platform. As application container runtime interface
complexity and cloud incompatibility (CRI) for Kubernetes.
increase, Red Hat OpenShift Intel is contributing
Container Platform 3.7 will help IT its open source Intel Clear
organisations to build and manage Containers project and
applications that use services from the Hyper is contributing
data centre to the public cloud. its runV technology to
The latest version of the industry’s initiate the project. Besides
most comprehensive enterprise Intel and Hyper, 99cloud,
Kubernetes platform includes native AWcloud, Canonical, China Mobile, City Network, CoreOS, Dell/EMC,
integrations with Amazon Web Services EasyStack, Fiberhome, Google, Huawei, JD.com, Mirantis, NetApp, Red
(AWS) service brokers, which enable Hat, SUSE, Tencent, Ucloud, UnitedStack, and ZTE are also supporting the
developers to bind services across AWS project’s launch.
and on-premise resources to create The Kata Containers project will initially comprise six components, which
modern applications while providing include the agent, runtime, proxy, Shim, kernel and packaging of QEMU 2.9.
a consistent, open standards-based It is designed to be architecture agnostic and to run on multiple hypervisors.
foundation to drive business evolution. Kata Containers offers the ability to run container management tools directly
“We are excited about our on bare metal.
collaboration with Red Hat and “The Kata Containers Project is an exciting addition to the OpenStack
the general availability of the first Foundation family of projects. Lighter, faster, more secure VM technology fits
AWS service brokers in Red Hat perfectly into the OpenStack Foundation family and aligns well with Canonical’s
OpenShift. The ability to seamlessly data centre efficiency initiatives. Like Clear Containers and Hyper.sh previously,
configure and deploy a range of AWS Kata Container users will find their hypervisor and guests well supported on
services from within OpenShift will Ubuntu,” said Dustin Kirkland, vice president, product, Canonical.
allow our customers to benefit from
AWS’s rapid pace of innovation, Fedora 27 released
both on-premises and in the cloud,” The Fedora Project, a Red Hat sponsored and community-driven open source
said Matt Yanchyshyn, director, collaboration, has announced the general availability of Fedora 27. All
partner solution architecture, editions of Fedora 27 are
Amazon Web Services, Inc. built from a common set
Red Hat OpenShift Container of base packages and,
Platform 3.7 will ship with the as with all new Fedora
OpenShift template broker, which releases, these packages
turns any OpenShift template into a have seen numerous
discoverable service for application tweaks, incremental
developers using OpenShift. improvements and new
OpenShift templates are lists additions. For Fedora
of OpenShift objects that can 27, this includes the
be implemented within specific GNU C Library 2.26 and RPM 4.14.
parameters, making it easier for IT “Building and supporting the next generation of applications remains a critical
organisations to deploy reusable, focus for the Fedora community, showcased in Fedora 27 by our continued
composite applications comprising support and refinement of system containers and containerised services like
microservices. Kubernetes and Flannel. More traditional developers and end users will be pleased

10 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


www.opensourceforu.com

Had
oop

Open Apache

Source Post id
Is Hot gres Andro

In The MySQ
L

IT World Open
OSS

Stac PEARL
k

la PHP
Joom Drupal

THE COMPLETE MAGAZINE


ON OPEN SOURCE

To find dealers to buy copy from news-stand, visit: www.ezine.lfymag.com


To subscribe, visit: pay.efyindia.com

To buy an ezine edition, visit: www.magzter.com choose Open Source For You
FOSSBYTES

with the additions brought about by GNOME 3.26 to Fedora 27 Workstation,


Canonical and Rancher making it easier to build applications and improving the overall desktop
Labs announce experience,” said Matthew Miller, a Fedora project leader.
Kubernetes cloud native
platform Canon joins the Open Invention Network community
Canonical, in partnership with Open Invention Network (OIN), the largest patent non-aggression community
Rancher Labs, has announced in history, has announced that Canon has joined as a community member.
a turnkey application delivery As a global leader in such fields as professional and consumer imaging and
platform built on Ubuntu, printing systems and solutions, and having expanded its medical and industrial
Kubernetes and Rancher 2.0. equipment businesses, Canon is demonstrating its commitment to open source
The new cloud native platform software as an enabler of innovation across a wide spectrum of industries.
will make it easy for users to “Open source technology, especially Linux, has led to profound increases
deploy, manage and operate in capabilities across a number of key industries, while increasing overall
containers on Kubernetes product and service efficiency,” said Hideki Sanatake, an executive officer,
through a single workflow as well as deputy group executive of corporate intellectual properties and
management portal—from legal headquarters at Canon. “By joining Open Invention Network, we are
development-and-testing to demonstrating our continued commitment to innovation, and supporting it with
production environments. Built patent non-aggression in Linux.”
on Canonical’s distribution of OIN’s community practices patent non-aggression in core Linux and
Kubernetes and Rancher 2.0, adjacent open source technologies by cross-licensing Linux System patents to
the cloud native platform will one another on a royalty-free basis.
simplify enterprise usage of
Kubernetes with seamless user GTech partners with Red Hat to catalyse open source
management, access control adoption in Kerala
and cluster administration. The Kerala government’s IT policy
encourages the adoption of open source
and open technologies in the public
domain. Hence, the Group of Technology
Companies (GTech), the industry body for IT companies in Kerala, has
recently signed an MoU with Red Hat.
The partnership aims to create enhanced awareness on various open source
technologies amongst IT professionals in the state. The MoU will facilitate
partnerships between Red Hat and GTech member companies. The efforts
“Our partnership with will focus on research and product development in open source software
Rancher provides end-to-end technologies.
workflow automation for The state government has also emphasised the need to promote open
the enterprise development source among SMEs. According to the terms of the MoU, Red Hat will
and operations team on organise events in IT parks across the state. These events were kickstarted in
Canonical’s distribution November 2017, and include lectures, seminars and presentations spanning the
of Kubernetes,” said Mark Internet of Things (IoT), artificial intelligence, analytics, development tools,
Shuttleworth, CEO of content management systems, desktop publishing and other connected topics.
Canonical. “Ubuntu has
long been the platform of Amazon extends support to Facebook and Microsoft
choice for developers driving Amazon has announced its ONNX-MXNet
innovation with containers. Python package to import Open Neural
Canonical’s Kubernetes Network Exchange (ONNX) deep learning
offerings include consulting, models into Apache MXNet. This move
integration and fully-managed indicates the company’s support for Facebook and Microsoft in their efforts to
Kubernetes services on- open source artificial intelligence (AI).
premises and on-cloud,” With this package, developers running models based on open source ONNX
Shuttleworth added. will be able to run them on Apache MXNet. Basically, this allows AI developers
to keep models but switch networks, as opposed to starting from scratch.

12 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


FOSSBYTES

Microsoft launches Azure It has become increasingly evident that the future of AI needs more than just
location based services ethical direction and government oversight. It would be comforting to know that
Addressing a gathering at the tech giants are on the same page too. The machines, and the humans who will
Automobility LA 2017 in Los rely on them, need the biggest companies building AI to take on a fair share of
Angeles, California, Sam George, responsibility for the future.
director – Azure IoT, Microsoft
said, “Microsoft is making Four tech giants using Linux change their
an effort to solve mobility open source licensing policies
challenges and bring government The GNU Public License version 2 (GPLv2) is arguably the most important
bodies, private companies and open source licence for one reason—Linux uses it. On November 27, 2017,
automotive OEMs together, three tech power houses that use Linux—Facebook, Google and IBM, as
using Microsoft’s intelligent well as the major Linux distributor Red Hat, announced they would extend
cloud platform.” additional rights to help companies who’ve made GPLv2 open source licence
The new location capabilities compliance errors and mistakes.
will provide cloud developers The GPLv2 and its close relative, GNU Lesser General Public License
critical geographical data to (LGPL), are widely used open source software licences. When the GPL version
power smart cities and Internet 3 (GPLv3) was released, it came with an express termination approach. This
of Things (IoT) solutions termination policy in GPLv3 provided a way for companies to correct licensing
across industries. This includes errors and mistakes. This approach allows licence compliance enforcement that is
manufacturing, automotive, consistent with community norms.
logistics, urban planning
and retail, etc. FreeNAS 11.1 provides greater performance
and cloud integration
FreeNAS 11.1 adds cloud integration and OpenZFS performance improvements,
including the ability to prioritise ‘resilvering’ operations, and preliminary Docker
support to the world’s most popular software-defined storage operating system.
It also adds a cloud sync (data import/
export to the cloud) feature, which lets
you sync (similar to back up), move
(erase from source) or
copy (only changed
data) data to and from
public cloud providers
TomTom Telematics will that include Amazon
be the first official partner for S3 (Simple Storage
the service, supplying critical Services), Backblaze B2
location and real-time traffic data, Cloud, Google Cloud
providing Microsoft customers and Microsoft Azure.
with advanced location and OpenZFS has
mapping capabilities. noticeable performance improvements for handling multiple snapshots and large
Microsoft’s Azure location files. Resilver Priority has been added to the ‘Storage’ screen of the graphical user
based services will offer interface, allowing you to configure ‘resilvering’ at a higher priority at specific
enterprise customers location times. This helps to mitigate the inherited challenges and risks associated with
capabilities integrated in the storage array rebuilds on very large capacity drives.
cloud to help any industry The latest release includes an updated preview of the beta version of the
improve traffic flow. Microsoft new administrator graphical user interface, including the ability to select display
also announced that Azure LBS themes. It can be downloaded from freenas.org/download.
will be launched in 2018, and
will be available globally in
more than 30 languages. For more news, visit www.opensourceforu.com

14 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


www.IndiaElectronicsWeek.com

DRIVING
TECHNOLOGY,
INNOVATION &
INVESTMENTS
KTPO Whitefield Colocated
Bengaluru shows

Smart Showcasing the India’s Only Electronics


Profit from IoT
Manufacture Electronics Technology that Powers Light Centric T&M Show

India’s #1 IoT show. At Electronics Is there a show in India that Our belief is that the LED bulb is the Test & Measurement India (T&M
For You, we strongly believe that showcases the latest in culmination of various advances in India) is Asia’s leading exposition
India has the potential to become a electronics manufacturing such as technology. And such a product for test & measurement products
superpower in the IoT space, in the rapid prototyping, rapid production category and its associated and services. Launched in 2012
upcoming years. All that's needed and table top manufacturing? industry cannot grow without as a co-located show along with
are platforms for different focusing on the latest technologies. Electronics For You Expo, it has
stakeholders of the ecosystem to Yes, there is now - EFY Expo But, while there are some good established itself as the
come together. 2018. With this show’s focus on B2B shows for LED lighting in India, must-attend event for users of
the areas mentioned and it being none has a focus on ‘the T&M equipment, and a
We’ve been building one such co-located at India Electronics technology that powers lights’. must-exhibit event for suppliers of
platform: IoTshow.in--an event for Week, it has emerged as India's Thus, the need for LEDAsia.in. T&M products and services.
the creators, the enablers and leading expo on the latest
customers of IoT. In February 2018, manufacturing technologies and Who should attend? In 2015, T&M India added an
the third edition of IoTshow.in will electronic components. • Tech decision makers: CEOs, important element by launching
bring together a B2B expo, technical CTOs, R&D and design the T&M Showcase-a platform for
and business conferences, the Who should attend? engineers and those developing show-casing latest T&M products
Start-up Zone, demo sessions of • Manufacturers: CEOs, MDs, the latest LED-based products and technologies. Being a
innovative products, and more. and those involved in firms that • Purchase decision makers: first-of-its-kind event in India, the
manufacture electronics and CEOs, purchase managers and T&M Showcase was well
Who should attend? technology products production managers from received by the audience and the
• Creators of IoT solutions: • Purchase decision makers: manufacturing firms that use exhibitors.
OEMs, design houses, CEOs, CEOs, purchase managers, LEDs
CTOs, design engineers, production managers and • Channel partners: Importers, Who should attend?
software developers, IT those involved in electronics distributors, resellers of LEDs • Sr Technical Decision
managers, etc manufacturing and LED lighting products Makers--Manufacturing,
• Enablers of IoT solutions: • Technology decision • Investors: Startups, Design, R&D & Trade
Systems integrators, solutions makers: Design engineers, entrepreneurs, investment Channel organisations
providers, distributors, resellers, R&D heads and those consultants interested in this • Sr Business Decision
etc involved in electronics sector Makers--Manufacturing,
• Business customers: manufacturing • Enablers: System integrators, Design, R&D & Trade
Enterprises, SMEs, the • Channel partners: Importers, lighting consultants and those Channel organisations
government, defence distributors, resellers of interested in smarter lighting • R&D Engineers
establishments, academia, etc electronic components, tools solutions (thanks to the • Design Engineers
and equipment co-located IoTshow.in) • Test & Maintenance
Why you should attend • Investors: Startups, Engineers
• Get updates on the latest entrepreneurs, investment Why you should attend • Production Engineers
technology trends that define the consultants and others • Get updates on the latest • Academicians
IoT landscape interested in electronics technology trends defining the • Defence & Defence
• Get a glimpse of products and manufacturing LED and LED lighting sector Electronics Personnel
solutions that enable the • Get a glimpse of the latest
development of better IoT Why you should attend components, equipment and Why you should attend?
solutions • Get updates on the latest tools that help manufacture • India’s Only Show Focused on
• Connect with leading IoT brands technology trends in rapid better lighting products T&M for electronics
seeking channel partners and prototyping and production, • Get connected with new • Experience Latest T&M
systems integrators and in table top manufacturing suppliers from across India to Solutions first-hand
• Connect with leading • Get connected with new improve your supply chain • Explore Trade Channel
suppliers/service providers in the suppliers from across India to • Connect with OEMs, principals, Opportunities from Indian &
electronics, IT and telecom improve your supply chain lighting brands seeking channel Foreign OEMs
domain who can help you • Connect with OEMs, principals partners and systems • Attend Demo sessions of
develop better IoT solutions, and brands seeking channel integrators latest T&M equipment
faster partners and distributors • Connect with foreign suppliers launched in India
• Network with the who’s who of • Connect with foreign suppliers and principals to represent them • Special Passes for Defence
the IoT world and build and principals to represent in India
connections with industry peers them in India • Explore new business ideas and
• Find out about IoT solutions that • Explore new business ideas investment opportunities in the
can help you reduce costs or and investment opportunities LED and lighting sector
increase revenues in this sector • Get an insider’s view of ‘IoT +
• Get updates on the latest Lighting’ solutions that make
business trends shaping the lighting smarter
demand and supply of IoT
solutions www.IndiaElectronicsWeek.com
KTPO Whitefield Colocated
Bengaluru shows

Reasons Why You Should NOT Attend IEW 2018


We spoke to a few members of the Where most talks will not be by people
tech community to understand why trying to sell their products? How
they had not attended earlier editions of boring! I can't imagine why anyone
India Electronics Week (IEW). Our aim would want to attend such an event. I
India’s Mega Tech Conference was to identify the most common love sales talks, and I am sure
reasons and share them with you, so everybody else does too. So IEW is a
The EFY Conference (EFYCON) started out as a tiny that if you too had similar reasons, you big 'no-no' for me.
900-footfall community conference in 2012, going by the may choose not to attend IEW 2018.
name of Electronics Rocks. Within four years, it grew This is what they shared… #7. I don't think I need hands-on
into ‘India’s largest, most exciting engineering knowledge
conference,’ and was ranked ‘the most important IoT #1. Technologies like IoT, AI and I don't see any value in the tech
global event in 2016’ by Postscapes. embedded systems have no future workshops being organised at IEW.
Frankly, I have NO interest in new Why would anyone want hands-on
In 2017, 11 independent conferences covering IoT, technologies like Internet of Things knowledge? Isn't browsing the Net and
artificial intelligence, cyber security, data analytics, cloud
(IoT), artificial intelligence, etc. I don't watching YouTube videos a better
technologies, LED lighting, SMT manufacturing, PCB
think these will ever take off, or become alternative?
manufacturing, etc, were held together over three days,
as part of EFY Conferences. critical enough to affect my organisation
or my career. #8. I love my office!
Key themes of the conferences and Why do people leave the comfort of
workshops in 2018 #2. I see no point in attending their offices and weave through that
• Profit from IoT: How suppliers can make money and tech events terrible traffic to attend a technical
customers save it by using IoT What's the point in investing energy event? They must be crazy. What’s the
• IT and telecom tech trends that enable IoT
development and resources to attend such events? I big deal in listening to experts or
• Electronics tech trends that enable IoT development would rather wait and watch—let others networking with peers? I'd rather enjoy
• Artificial intelligence and IoT take the lead. Why take the initiative to the coffee and the cool comfort of my
• Cyber security and IoT understand new technologies, their office, and learn everything by browsing
• The latest trends in test and measurement
equipment
impact and business models? the Net!
• What's new in desktop manufacturing
• The latest in rapid prototyping and production #3. My boss does not like me #9. I prefer foreign events
equipment My boss is not fond of me and doesn't While IEW's IoTshow.in was voted the
really want me to grow professionally. ‘World's No.1 IoT event’ on
Who should attend
• Investors and entrepreneurs in tech
And when she came to know that IEW Postscapes.com, I don't see much
• Technical decision makers and influencers 2018 is an event that can help me value in attending such an event in
• R&D professionals advance my career, she cancelled my India—and that, too, one that’s being
• Design engineers application to attend it. Thankfully, she put together by an Indian organiser.
• IoT solutions developers is attending the event! Look forward to Naah! I would rather attend such an
• Systems integrators
• IT managers a holiday at work. event in Europe.

#4. I hate innovators! Hope we've managed to convince


SPECIAL PACKAGES FOR
Oh my! Indian startups are planning to you NOT to attend IEW 2018!
• Academicians • Defence personnel
give LIVE demonstrations at IEW Frankly, we too have NO clue why
• Bulk/Group bookings
2018? I find that hard to believe. Worse, 10,000-plus techies attended IEW in
if my boss sees these, he will expect March 2017. Perhaps there's
me to create innovative stuff too. I better something about the event that we've
find a way to keep him from attending. not figured out yet. But, if we haven't
been able to dissuade you from
#5. I am way too BUSY attending IEW 2018, then you may
I am just too busy with my ongoing register at http://register.efy.in.
projects. They just don't seem to be
getting over. Once I catch up, I'll invest
some time in enhancing my knowledge
Conference Special privileges
and skills, and figure out how to meet Pass Pricing and packages for...
my deadlines. One day pass Defence and defence
INR 1999 electronics personnel
#6. I only like attending vendor PRO pass
Academicians
events INR 7999 Group and bulk
bookings
Can you imagine an event where most
of the speakers are not vendors?

www.IndiaElectronicsWeek.com
KTPO Whitefield Colocated
Bengaluru shows

The themes
• Profit from IoT • Rapid prototyping and production
• Table top manufacturing • LEDs and LED lighting

The co-located shows

Why exhibit at IEW 2018?


More technology India’s only test Bag year-end orders;
decision makers and and measurement meet prospects in early
influencers attend IEW show is also a February and get orders
than any other event part of IEW before the FY ends

It’s a technology- 360-degree promotions The world’s No.1 IoT


centric show and not via the event, publications show is a part of IEW and
just a B2B event and online! IoT is driving growth

Over 3,000 visitors The only show in It’s an Electronics


are conference Bengaluru in the FY For You Group
delegates 2017-18 property

Besides purchase Your brand and solutions IEW is being held at a


orders, you can bag will reach an audience of venue (KTPO) that’s
‘Design Ins’ and over 500,000 relevant and closer to where all the
‘Design-Wins’ too interested people tech firms are

Co-located events IEW connects you with Special packages for


offer cross-pollination customers before the ‘Make in India’, ‘Design in
of business and event, at the event, and India’, ‘Start-up India’ and
networking even after the event ‘LED Lighting’ exhibitors
opportunities

Why you should risk being an early bird


1. The best locations sell out first
2. The earlier you book—the better the rates; and the more the deliverables
3. We might just run out of space this year!

To get more details on how exhibiting at IEW 2018 can help you achieve your sales and marketing goals,

Contact us at +91-9811155335 Or Write to us at growmybiz@efy.in


EFY Enterprises Pvt Ltd | D-87/1, Okhla Industrial Area, Phase -1, New Delhi– 110020
Overview For U & Me

Top Tech Trends to


Watch Out For in 2018
The technologies that will dominate the tech world in the new year
are based on open source software.

A
t the start of a brand new year, 2020. Microsoft recently contributed the IoT challenges such as Arduino,
I looked into the crystal ball to the mix by launching its Virtual Home Assistant, Zetta, Device Hive,
to figure out the areas that no Kubelet connector for Azure, and ThingSpeak. Open source has
technologist can afford to ignore. Here streamlining the whole container already served as the foundation
is my rundown on some of the top management process. for IoT’s growth till now and will
trends that will define 2018. continue to do so.
Blockchain finds its footing
Automation and artificial As Bitcoin is likely to hit the US$ OpenStack to gain more
intelligence 20,000 mark, we’re all in awe of acceptance
Two of the most talked about trends the blockchain technology behind OpenStack has enjoyed tremendous
are increasingly utilising open source. all the crypto currencies. Other success since the beginning, with its
Companies including Google, Amazon industries are expected to follow suit, exciting and creative ways to utilise
and Microsoft have released the code such as supply chain, healthcare, the cloud. But it lags behind when it
for Open Network Automation Platform government services, etc. comes to adoption, partly due to its
(ONAP) software frameworks that The fact that it’s not controlled complex structure and dependence on
are designed to help developers build by any single authority and has no virtualisation, servers and extensive
powerful AI applications. single point of failure makes it a very networking resources.
In fact, Gartner says that artificial robust, transparent, and incorruptible But new fixes are in the works as
intelligence is going to widen its technology. Russia has also become several big software development and
net to include data preparation, one of the first countries to embrace hosting companies work overtime to
integration, algorithm selection, the technology by piloting its resolve the underlying challenges. In
training methodology selection, and banking industry’s first ever payment fact, OpenStack has now expanded
model creation. I can point out so many transaction. Sberbank, Russia’s biggest its scope to include containers
examples right now such as chatbots, bank by assets, has executed a realtime with the recent launch of the Kata
autonomous vehicles and drones, money transfer over an IBM-built containers project.
video games as well as other real-life blockchain based on the Hyperledger Open source is evolving at a great
scenarios such as design, training and open source collaborative project. pace, which presents tremendous
visualisation processes. One more case in point is a opportunities for enterprises to grow
consortium comprising more than a bigger and better. Today, the cloud
Open source containers are no dozen food companies and retailers — also shares a close bond with the
longer orphans now including Walmart, Nestle and Tyson open source apps, with the services of
DevOps ecosystems are now Foods—dedicated to using blockchain various big cloud companies like AWS,
seeing the widespread adoption of technology to gather better information Google Cloud, and Microsoft Azure
containers like Docker for open source on the origin and state of food. being quite open source-friendly.
e-commerce development. I can think of no better way to say
Containers are one of the hottest IoT-related open source this, but it is poised to be the driver
tickets in open source technology. You tools/libraries behind various innovations. I’d love to
can imagine them as a lightweight IoT has already made its presence felt. hear your thoughts on other trends that
packing of application software that Various open source tools are available will dominate 2018. Do drop me a line
has all its dependencies bundled for now that are a perfect match to further (dinesh@railsfactory.com).
easy portability. This removes a lot of
hassles for enterprises as they cut down By: Dinesh Kumar
on costs and time. The author is the CEO of Sedin Technologies and the co-founder of RailsFactory. He is
According to 451 Research, the a passionate proponent of open source and keenly observes the trends in this space. In
market is expected to grow by more this new column, he digs into his experience of servicing over 200 major global clients
than 250 per cent between 2016 and across the USA, UK, Australia, Canada and India.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 19


NEW PRODUCTS
Fitness smartwatch
Price:
` 4,990
with the ‘tap to pay’
feature from Garmin
Innovative GPS technology firm, Garmin,
has unveiled its latest smartwatch – the
Vivoactive 3, in India. It is the first
smartwatch from the company to Price:
offer a feature like ‘tap and pay’. ` 24,990
Travel- and The device offers contactless

pocket-friendly payments and has 15 preloaded sports sources claim, and seven days on
apps along with inbuilt GPS functionality. smartwatch mode.
leather headset With the company’s in-house Compatible with all Android and
from Astrum chroma display with LED backlighting, iOS devices, the Garmin Vivoactive
the smartwatch features a 3.04cm (1.2 smartwatch is available in black and
Leading ‘new technology’ brand Astrum inch) screen with a 240 x 240 pixel white colours via selected retail and
has unveiled a travel-friendly headset screen resolution. Its display is protected online stores.
– the HT600. The affordable yet stylish by Corning Gorilla Glass 3 with a
headphones come in a lightweight, stainless steel bezel, and the case is Address: Garmin India, D186, 2nd
compact design with no wires. made of fibre-reinforced polymer. Floor, Yakult Building, Okhla Industrial
The headset’s twist-folding design The device offers 11 hours of Area, Phase 1, New Delhi – 110020;
allows compact storage, making it battery life with GPS mode, company Ph: 09716661666
easily portable. Packed with its own
hard case and pouch, the leather

This activity tracker from Timex


headband ensures a perfect fit with
foam earcups.
The headphones come with noise- has an SOS trigger
cancelling technology and 3.5mm
drivers, delivering the full range Timex, the manufacturer of watches
of deep bass and clear high notes. and accessories, has recently
These also come with handy controls, expanded its product portfolio by
conveniently placed on the outside of launching its latest activity tracker in
the device for users to easily pause, India – the Timex Blink.
play or rewind the music. The device is a blend of a traditional Price:
The HT6000 supports Bluetooth watch and a fitness tracker, and uses ` 4,459
version 4.0, and can be paired with two Bluetooth to connect with a smartphone. for the leather strap and
smartphones on the go. It comes with a Apart from just being a funky watch, ` 4,995
built-in microphone to make or receive the device is designed to track all day- for bracelet style variants
calls. With NFC technology, a user can to-day activities such as calories burnt,
simply pair the headphones by touch to distance covered, hours of sleep, etc. with a 90mAh battery, the device
non-stop music. The headphones offer The tracker is capable of sending instant supposedly offers 10 days of battery
up to 96 hours of standby and eight mails and SMSs with the GPS location backup on a single charge.
hours of call and playback on a single of the user in case of an emergency. The Timex Blink is available in
charge, company sources claim. Compatible with both Android two variants – with a leather strap and
They are available in black colour and iOS, the Blink activity tracker as a bracelet, at retail stores.
online and at retail stores. comes with a 2.28cm (0.9 inch)
OLED touchscreen display with a Address: Timex Group India Ltd,
Address: Astrum India, 3rd Floor, stainless steel 304L case and six-axis Tower B, Plot No. B37, Sector 1,
Plot No. 79, Sector-44, Gurugram, motion sensors, an SOS trigger and Near GAIL Building, Noida, Uttar
Haryana – 122003; Ph: 09711118615 a Nordic nRF52832 CPU. Backed Pradesh – 201301

20 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Speakers from Harman Kardon with Amazon
Alexa support now in India
Price:
Well-known manufacturer of audio
equipment, Harman Kardon, recently ` 22,490
launched its latest premium speakers
– the Harman Kardon Allure. The
highlight of the speakers is the support
for Amazon’s Alexa – a proprietary
voice technology assistant, which
can help users manage day-to-day
tasks, such as playing music, making
purchases, reading out the news, etc.
With a built-in four-microphone
array and the latest voice technology,
the speaker is capable of responding
to commands even in noisy surrounding environment to help the to all Android, iOS and Windows
environments. It supports up to speakers blend in. smartphones, laptops and TVs.
24-bit/96kHz HD audio streaming and The device supports Bluetooth v4.2, The Harman Kardon Allure is
delivers 360-degree sound with its aux and WPS/Wi-Fi for connecting available by invitation only at Amazon.in.
transducers and built-in sub-woofers.
The multi-coloured lighting on Address: Harman International, A-11, Jawahar Park, Devli Road,
the top of the device adapts to the New Delhi – 110062; Ph: 011-29552509

Feature loaded mid-range smartwatch


collection from Misfit
Price:
Misfit, the consumer electronics ` 14,495
company owned by the Fossil group,
has recently unveiled its much-awaited
smartwatch collection called Vapor.
The all-in-one smartwatch
collection comes with a stunning
3.53cm (1.39 inch) full round
AMOLED display with a vibrant
colour palette in 326 ppi (pixels per
inch) and is designed with a 44mm
satin-finish stainless steel upper casing.
Vapor is powered by Android 2.0,
the latest wearable operating system.
The ‘OK Google’ feature enables
access to hundreds of top rated apps to Bluetooth and Wi-Fi connectivity. rate monitor. The smartwatches also
get things done. Water-resistant to 50 metres, the offer sensors such as an accelerometer,
The devices’ many features include Vapor range allows users to browse gyroscope, etc, and can be paired with
a useful customised watch face, an the menu of applications to respond to any device running Android 4.3 or
enhanced fitness experience, on- notifications easily. iOS9 and above.
board music functionality, the Google Fitness features include a calorie The Misfit Vapor collection can be
assistant, limitless apps, etc. counter, as well as a distance and heart purchased from Flipkart.
The smartwatches come with the
Qualcomm Snapdragon Wear 2100 Address: Fossil Group, 621, 12th Main Road, HAL 2nd Stage,
processor and 4GB memory with Indiranagar, Bengaluru, Karnataka 560008
The prices, features and specifications are based on information provided to us, or as available
on various websites and portals. OSFY cannot vouch for their accuracy. Compiled by: Aashima Sharma

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 21


For U & Me Open Journey

“MY LOVE
Affair with Freedom”
Wearing geeky eyewear,
this dimple-chinned man
looks content with his
life. When asked about
his sun sign, he mimes
the sun with its rays,
but does not reveal his
zodiac sign. Yes, this is
the creative and very witty
Niyam Bhushan, who has
kickstarted a revolution in
UX design in India through
the workshops conducted
by his venture DesignRev.
in. In a tete-a-tete with
Syeda Beenish of OSFY,
this industry veteran, who
has spent 30 odd years
in understanding and
sharing the value of open
source with the masses,
speaks passionately about
the essence of open
source. Excerpts:

Zilog chips, and later in COBOL


and BASIC on DEC mini computers
PDP 11/70. But in a few years, I
realised the game would be in digital
graphics and design. So, I started
with pioneering many techniques
Discovering Ghostscript the GPL that made me realise this was and workflows with digital graphics
back in 1988/89 a powerful hack of an idea that could design, typography, and imaging
Being a graphics designer, I came across transform the IT industry. I was excited in publishing. Eventually, I started
Ghostscript circa 1988 or 1989. It was a from my first encounter and eventually consulting for the best IT companies
muft and mukt alternative to Postscript. devoted 14 years exclusively to the like Apple, Adobe and Xerox in this
What intrigued me most about it was the FOSS movement and its offshoots, most field, and also with the advertising,
licence—GPL, which got me started. notably, Creative Commons. publishing and even textile-printing
Those were the days when people industry. Concurrently, I focused
were curious to understand the The journey on what was then called human
difference between freeware, shareware, From 1982 to 1985 I was busy learning computer interaction (HCI) and is
crippleware and adware. But it was how to program in machine code on now more popularly known as user-

22 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Open Journey For U & Me

Your definition of open source: Muft and mukt is a state of mind, not software
Favourite book: ‘The Cathedral and the Bazaar’ by Eric S. Raymond
Past-time: Tasting the timeless through meditation
Favourite movie: ‘Snowden’ by Oliver Stone
Dream destination: Bhutan, birthplace of ‘Schumacher Economics’ that
gives a more holistic vision to the open source philosophy
Idol: Osho, a visionary who talked about true freedom and how to exercise
your individual freedom in your society

interface design and UX. This is the


ultimate love affair between intuition
and engineering. The huge impact of
the computer industry on billions of
people directly can be attributed to
this synergy.
I’ve brought tens of thousands
of people into the free and open
source movement in India. How?
By writing extensively about it in
mainstream newspapers as well as in
tech magazines, and by conducting
countless seminars and public talks for
the industry, government, academia,
and the community. Besides, I was a
core member of the Freed.in event,
and helped to set up several chapters source software, and of open
of Linux user groups across India. knowledge, which for me is way
I ventured into consultancy, and beyond Linux. Honestly, I am not
guided companies on free and open happy with the way open source
source software. During my journey, adoption has happened in India. In
I also contributed extensively to bug this vast country, there is one and
reports of a few GPL software in the only one challenge—the mindset
graphics design space. of people towards open source.
What’s happening in India is ‘digital
Establishing ILUG-D colonialism’ as our minds are still
I still remember one cool evening back ruled by proprietary software,
in 1995, when a couple of us hackers proprietary services and a lack of
were huddled around an assembled understanding of privacy. We lack
PC. Somebody was strumming a the understanding of our ‘digital
badly-tuned guitar, an excited pet dog sovereignty’. To address this
was barking at new guests… This mindset, I wrote two whitepapers
was the founding of the Indian Linux and published them on my website,
Users Group, Delhi (ILUG-D). This www.niyam.com, which became
was also the first official meet at the very popular. The first was ‘Seven
home of the late Raj Mathur, founding Steps to Software Samadhi: How
member of the ILUG-D. That meeting to migrate from Windows to GNU/
shaped free and open source software Linux for the Non-techie in a Hurry’.
as a movement, and not just a licence. Published under the FDL licence,
Everyone knows what happened over this initiative acquired a life of its
the next decade-and-a-half. own among the community. The
second one was ‘Guerilla Warfare for
The reality of open source Gyaan’, which was about bringing
adoption in India in free knowledge, especially in
Today, it is all about free and open academia. Both were received well

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 23


For U & Me Open Journey

by the community, but we are yet


to unlock the true potential of open What open source can do for:
source in the country. An organisation
Free and open source software (FOSS) is a wild dragon-child that can trans-
How many of us really know that
form any organisation into a Daenerys Targaryen. But like her, you need to
the highly sophisticated computer in know how to tame this dragon, and where and when to use this effectively.
our pocket is running Linux! Apple Otherwise, its fire can and will consume you instead.
Macintosh and the iOS are based on
the MACH kernel, Windows on BSD, An individual (home user)
Whatever software a home user adopts (including proprietary and com-
and all of these are open source kernels.
mercial software), open source offers fierce competition to push costs
On a positive note, I would say that down, keep it free, enhance its performance, make it secure, or honour
it is impressive to see the adoption of your privacy better. Hence, open source browsers are free. Home users get
Android, but at other levels, the real operating systems for free or a token fee. The latest Firefox outclasses even
potential of open source is yet to be Google Chrome, while Telegram messenger and Signal outshine WhatsApp
realised by Indians. with their privacy and security.
Techie home user
Survival That’s like singing to the choir. For the techie home user, open source is the
You may wonder, “How did Niyam best way to tinker and hack and, hopefully, also build the next billion-dollar
Bhushan survive and continue giving unicorn in your barsaati.
to the industry?” One should always
remember that any community-
building needs your time and effort, A ray of hope
but gradually, it will start giving Unfortunately, people in India are not yet
you returns in the most unexpected sensitised enough to the issue of digital
manner. This was not the real driving privacy. If this sleeping giant wakes up
force for me. I love people and I to the importance of digital privacy, the
love ideas. Sharing your knowledge adoption of open source will naturally
and experiences in return brings you become pervasive. IoT will provide the
commercial opportunities, as well as a next push for open source across India,
plethora of ideas that further enhance invisibly. Startups and entrepreneurs are
your understanding. My intention and will continue to set up sophisticated
was never to be a multi-billionaire, cloud-based services deployed on free
but to earn more than comfortably for and open source software. So, here’s
myself while following my passion. I the magic bullet: sell your value-
wanted to touch the lives of as many Dos and Don’ts for developers proposition, not your open source
people as possible and enrich my life I insist that people should read their philosophy, and the market will adopt
with knowledge-sharing whenever and employment contract carefully. In in droves. Beyond software, I see
wherever possible. most cases in India, I’ve noticed open source licences being adopted
The beauty of the community is developers have signed away their directly in agriculture, health, pharma
that it seems like it is taking your rights to their contributions to FOSS and education, creating an exponentially
time and effort, but it opens doors to in the name of the company, which larger impact than they could ever create
lucrative opportunities as well. The may even keep them a trade secret, as just software licences.
community will continue to evolve and may even threaten employees To conclude, I would say that
around specific value-based pillars. from using their own code ever again. we’ve managed to discover the magic
For instance, in the vibrant startup Even if the software is under a free, formula for the adoption of free and
communities of India, open source is muft and mukt licence, please carefully open source software in India. Just
fuelling a gold rush, propelling India consider whom you want to assign the make it invisible, and people will
to becoming a creator of wealth in the copyright of your work—to yourself, adopt it — hence the exponential
world. In academia, it is the highly or your organisation. growth in the adoption of Android in
local and focused communities that Check with the legal department India. Arduino projects bring FOSS
deepen learning and exploration. about policies on the use of code to kids. But for me, adoption of open
In the government and the public marked as open source. Often, source is successful when people
sector, their internal communities violations occur when developers help start the relationship with it after
orient, adopt, collaborate and themselves to code without bothering to understanding its true philosophy. This
formulate policies. check the implications of its licence. is one love affair with freedom!

24 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let’s Try For U & Me

Tricks to Try Out on


Thunderbird and SeaMonkey
Learn to use and store email messages offline with
Thunderbird and SeaMonkey.

I
n 2004, Google introduced its Gmail service with a 1GB tell your contacts whether you are online or if your
mailbox and free POP access. This was at a time when most camera is on. Email clients do not do this.
people had email accounts with their ISP or had free Web ƒ Modern Web browsers take many liberties without
mail accounts with Hotmail or Yahoo. Mailbox storage was asking. Chrome, by default, listens to your microphone
limited to measly amounts such as 5MB or 10MB. If you did and uploads conversations to Google servers (for your
not regularly purge old messages, then your incoming mail convenience of course). Email clients are not like that.
would bounce with the dreaded ‘Inbox full’ error. Hence, ƒ Searching archived messages is extremely powerful on
it was a standard practice to store email ‘offline’ using an desktop mail clients. There is no paging of the results.
email client. Each year now, a new generation of young ƒ When popular Web mail providers offer free POP access,
people (mostly students) discover the Internet and they start why suffer the slowness of the Web?
with Web mail straight away. As popular Web mail services
integrate online chatting as well, they prefer to use a Web POP or IMAP access to email
browser rather than a desktop mail client to access email. This Email clients use two protocols, POP and IMAP, to receive
is sad because desktop email clients represent one of those mail. POP is ideal if you want to download and delete mail.
rare Internet technologies that can claim to have achieved IMAP is best if you need access on multiple devices or
perfection. This article will bring readers up to speed on at different locations. POP is more prevalent than IMAP.
Thunderbird, the most popular FOSS email client. For offline storage, POP is the best. Popular Web mail
providers provide both POP and IMAP access. Before you
Why use a desktop email client? can use an email client, you will have to log in to your Web
With an email client, you store emails offline. After mail provider in a browser, check the settings and activate
the email application connects to your mail server and POP/IMAP access for incoming mail. Email clients use
downloads new mail, it instructs the server to delete those the SMTP protocol for outgoing mail. In Thunderbird/
messages from your mailbox (unless configured otherwise). SeaMonkey, you may have to add SMTP server settings
This has several advantages. separately for each email account.
ƒ If your account gets hacked, the hacker will not get your If you have lots of email already online, then it may not
archived messages. This also limits the fallout on your be possible to make your email client create an offline copy in
other accounts such as those of online banking. one go. Each time you choose to receive messages, the mail
ƒ Web mail providers such as Gmail read your messages to client will download a few hundred of your old messages.
display ‘relevant’ advertisements. This is creepy, even if it After it has downloaded all your old archived messages,
is software-driven. the mail client will then settle down to downloading
ƒ Email clients let you read and compose messages offline. only your newest messages.
A working Net connection is not required. Web mail The settings for some popular Web mail services
requires you to log in first. are as follows:
ƒ Web mail providers such as Gmail automatically ƒ Hotmail/Live/Outlook

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 25


For U & Me Let’s Try

• POP: pop-mail.outlook.com
• SMTP: smtp-mail.outlook.com
ƒ Gmail
• POP: pop.gmail.com
• SMTP: smtp.gmail.com
ƒ Yahoo
• POP: pop.mail.yahoo.com Figure 1: Live off the grid with no mail online. To get this Gmail note, you will
• SMTP: smtp.mail.yahoo.com have to empty the Inbox and Trash, and also delete all archived messages.
The following settings are common for them:
ƒ POP Even on a desktop screen, space may be at a premium.
• Connection security/Encryption method: SSL Currently, Thunderbird and SeaMonkey do not provide an
• Port: 995 easy way to customise the date columns. I use this trick in the
ƒ SMTP launcher command to fix it.
• Connection security/Encryption method: SSL/TLS/
STARTTLS export LC_TIME=en_DK.UTF-8 && seamonkey -mail
• Port: 465/587
Some ISPs and hosting providers provide unencrypted
mail access. Here, the connection security method will be
‘None’, and the ports are set to 110 for POP and 25 for SMTP.
However, please be aware that most ISPs block Port 25, and
many mail servers block mail originating from that port.

Thunderbird and SeaMonkey


Popular email clients today are Microsoft Outlook and Mozilla
Thunderbird, the latter being the obvious FOSS option. Like the
browser Firefox, Thunderbird is modern software and supports
many extensions or add-ons. Unlike Outlook (which uses Figure 2: Changing the format of the date columns requires a hack
Microsoft Word as the HTML formatting engine), Thunderbird
has better CSS support as it renders HTML messages using the
Gecko engine (like the Firefox browser).
The SeaMonkey Internet suite bundles both the Firefox
browser and Thunderbird mail clients, in addition to an IRC
client and a Web page designer. SeaMonkey is based on the
philosophy of the old NetScape Internet Communication
Suite, in which the browser was known as Netscape Navigator
and the mail client was known as Netscape Communicator.
Because of certain trademark objections with Mozilla,
some GNU/Linux distributions were bundling Firefox and
Thunderbird as IceWeasel and IceDove. SeaMonkey became
IceApe. This was resolved in 2016.
If you have already opened the SeaMonkey browser,
then the SeaMonkey mail client can be opened in a flash, and
the reverse is also true. This is very useful because website
links in the SeaMonkey mails are opened in the SeaMonkey
browser. Firefox is a separate application from Thunderbird
and does not have the same advantage. For this reason, I use
SeaMonkey instead of Thunderbird. SeaMonkey is available
at https://www.seamonkey-project.org/. Figure 3: Configure your own mail filters
By default, SeaMonkey looks like Firefox or Thunderbird.
I prefer to change its appearance using the Modern theme, Email providers today do a good job of filtering junk mail.
as it makes it look like the old Netscape 6, and also because You can do a better job with your own mail filters (Tools »
I need the browser to look different from regular Firefox. To Message Filters). You can choose to move/delete messages
enable this theme, go to Tools » Add-Ons » Appearance » based on the occurrences of certain words in the From, To or
Seamonkey Modern. Subject headers of the email.

26 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


For U & Me Let’s Try

Figure 5: A newsgroup user sends an email message

Email backup
When you store email offline, the burden of doing regular
Figure 4: Thunderbird is also an RSS feed reader backups falls on you. You also need to ensure that your
computer is not vulnerable to malware such as email viruses.
Apart from email, Thunderbird can also display content Web mail providers do a good job of eliminating email-borne
from RSS feeds (as shown in Figure 4) and Usenet forums malware, but malware can still arrive from other sources.
(as shown in Figure 5). Windows computers are particularly vulnerable to malware
Usenet newsgroups predate the World Wide Web. They spread by USB drives and browser toolbars and extensions. In
are like an online discussion forum organised into several Windows, simply creating a directory named ‘autorun.inf’ at
hierarchical groups. Forum participants post messages in the root level stops most USB drive infections.
the form of an email addressed to a newsgroup (say comp. SeaMonkey stores all its data (email messages and
lang.javascript), and the NNTP client threads the discussions accounts, RSS feeds, website user names/ passwords/
based on the subject line (Google Groups is a Web based preferences, etc,) in the ~/.mozilla/Seamonkey directory.
interface into the world of Usenet). For backup, just zip this directory regularly. If you move to
a new GNU/Linux system, restore the backed-up directory
SeaMonkey ChatZilla to your new ~/.mozilla directory.
Apart from the Firefox-based browser and the Thunderbird-
based email client, SeaMonkey also bundles an IRC chat
client. IRC is yet another Internet-based communication By: V. Subhash
protocol that does not use the World Wide Web. It is the The author is a writer, illustrator, programmer and FOSS fan.
His website is at www.vsubhash.com. You can contact him at
preferred medium of communication for hackers. Here is a tech.writer@outlook.com.
link for starters: irc://chat.freenode.net/.

Read more TOPSECURITY STORIES


• CCTV camera market
is expected to double by
etric dev ices
2015 ELECTRONICS
biom

stories on
• The late st in to a bright future
turers can look forward
• CCTV Camera manufac a CCT V into a proactive tool
s Turning
• Video Analytics System
cam eras evo lvin g with new technologies
• Security
INDUSTRY IS AT A
Security in
eras
• The latest in dome cam t security cameras
-proof and vandal-resistan
• The latest in weather

www.electronicsb2b.com
Log on to www.electronicsb2b.com and be in touch with the Electronics B2B Fraternity 24x7

28 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Asias #1 website on open source
in an all-new avatar!

Tutorials

Latest News

Feature Stories

RESPONSIVE
MOCKUP SET

Interviews from
the world of
open source

Just at the click of a button.


Whether you surf the Open Source For You website from your smartphone,
tablet, PC or Mac, you will enjoy a unified experience across all devices.

www.OpenSourceForU.com

You can also submit your tips, contribute with your ideas or extend your subscription directly from the website.

Remember to follow us on Twitter (@OpenSourceForU) and like us on Facebook (Facebook.com/OpenSourceForU) to get regular updates on open source developments.
Admin How To

A Hands-on Guide on
Virtualisation with VirtualBox
Virtualisation is the process of creating software based (or virtual) representation
of a resource rather than a physical one. Virtualisation is applicable at the compute,
storage or network levels. In this article we will discuss compute level virtualisation,
which is commonly referred to as server virtualisation.

S
erver virtualisation (henceforth referred to as the physical host on which they are running. VMware
virtualisation) allows us to run multiple instances of Workstation and Oracle VM VirtualBox (hereafter referred
operating systems (OS) simultaneously on a single to as VirtualBox) are examples of hosted hypervisors.
server. These OSs can be of the same or of different types.
For instance, you can run Windows as well as Linux OS on An introduction to VirtualBox
the same server simultaneously. Virtualisation adds a software VirtualBox is cross-platform virtualisation software. It is
layer on top of the hardware, which allows users to share available on a wide range of platforms like Windows, Linux,
physical hardware (memory, CPU, network, storage and so Solaris, and so on. It extends the functionality of the existing
on) with multiple OSs. This virtualisation layer is called the OS and allows us to run multiple guests simultaneously along
virtual machine manager (VMM) or a hypervisor. There are with the host’s other applications.
two types of hypervisors.
Bare metal hypervisors: These are also known as VirtualBox terminology
Type-1 hypervisors and are directly installed on hardware. To get a better understanding of VirtualBox, let’s get familiar
This enables the sharing of hardware resources with a guest with its terminology.
OS (henceforth referred to as ‘guest’) running on top of 1) Host OS: This is a physical or virtual machine on which
them. Each guest runs in an isolated environment without VirtualBox is installed.
interfering with other guests. ESXi, Xen, Hyper-V and KVM 2) Virtual machine: This is the virtual environment created to
are examples of bare metal hypervisors. run the guest OS. All its resources, like the CPU, memory,
Hosted hypervisors: These are also known as Type-2 storage, network devices, etc, are virtual.
hypervisors. They cannot be installed directly on hardware. 3) Guest OS: This is the OS running inside VirtualBox.
They run as applications and hence require an OS to run VirtualBox supports a wide range of guests like Windows,
them. Similar to bare metal hypervisors, they are able Solaris, Linux, Apple, and so on.
to share physical resources among multiple guests and 4) Guest additions: These are additional software bundles

30 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Admin

installed inside a guest to improve its performance and To begin installation, execute the command given below in a
extend the functionality. For instance, these allow us to terminal and follow the on-screen instructions:
share folders between the host and guest, and to drag and
drop functionality. $ sudo dpkg -i virtualbox-5.2_5.2.0-118431-Ubuntu-xenial_amd64.
deb
Features of VirtualBox
Let us discuss some important features of VirtualBox. Using VirtualBox
1) Portability: VirtualBox is highly portable. It is available After successfully installing VirtualBox, let us get our hands dirty
on a wide range of platforms and its functionality remains by first starting VirtualBox from the desktop environment. It will
identical on each of those platforms. It uses the same file launch the VirtualBox manager window as shown in Figure 1.
and image format for VMs on all platforms. Because of
this, a VM created on one platform can be easily migrated
to another. In addition, VirtualBox supports the Open
Virtualisation Format (OVF), which enables VM import
and export functionality.
2) Commodity hardware: VirtualBox can be used on a CPU
that doesn’t support hardware virtualisation instructions,
like Intel’s VT-x or AMD-V.
3) Guest additions: As stated earlier, these software bundles are
installed inside a guest, and enable advanced features like Figure 1: VirtualBox manager
shared folders, seamless windows and 3D virtualisation.
4) Snapshot: VirtualBox allows the user to take consistent This is the main window from which you can manage your
snapshots of the guest. It records the current state of the VMs. It allows you to perform various actions on VMs like
guest and stores it on disk. It allows the user to go back in Create, Import, Start, Stop, Reset and so on. At this moment,
time and revert the machine to an older configuration. we haven’t created any VMs; hence, the left pane is empty.
5) VM groups: VirtualBox allows the creation of a group Otherwise, a list of VMs are displayed there.
of VMs and represents them as a single entity. We can
perform various operations on that group like Start, Stop, Creating a new VM
Pause, Reset, and so on. Let us create a new VM from scratch. Follow the instructions
given below to create a virtual environment for OS installation.
Getting started with VirtualBox 1) Click the ‘New’ button on the toolbar.
2) Enter the guest’s name, its type and version and click the
System requirements ‘Next’ button to continue.
VirtualBox runs as an application on the host machine and 3) Select the amount of memory to be allocated to the guest and
for it to work properly, the host must meet the following click the ‘Next’ button.
hardware and software requirements: 4) From this window we can provide storage to the
1) An Intel or AMD CPU VM. It allows us to create a new virtual hard disk or
2) A 64-bit processor with hardware virtualisation is required use the existing one.
to run 64-bit guests 4a) To create a new virtual hard disk, select the ‘Create
3) 1GB of physical memory a virtual hard disk now’ option and click the ‘Create’
4) Windows, OS X, Linux or Solaris host OS button.
4b) Select the VDI disk format and click on ‘Continue’.
Downloading and installation 4c) On this page, we can choose between a storage policy
To download VirtualBox, visit https://www.virtualbox.org/ that is either dynamically allocated or a fixed size:
wiki/Downloads link. It provides software packages for i) As the name suggests, a dynamically allocated disk will
Windows, OS X, Linux and Solaris hosts. In this column I’ll be grow on demand up to the maximum provided size.
demonstrating VirtualBox on Mint Linux. Refer to the official ii) A fixed size allocation will reserve the required storage
documentation if you wish to install it on other platforms. upfront. If you are concerned about performance, then go
For Debian based Linux, it provides the ‘.deb’ package. with a fixed size allocation.
Its format is virtualbox-xx_xx-yy-zz.deb where xx_xx-yy is 4d) Click the ‘Next’ button.
the version and build number respectively and zz is the host 5) Provide the virtual hard disk’s name, location and size
OS’s name and platform. For instance, in case of a Debian before clicking on the ‘Create’ button.
based 64-bit host, the package name is virtualbox-5.2_5.2.0- This will show a newly created VM on the left pane as
118431-Ubuntu-xenial_amd64.deb. seen in Figure 2.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 31


Loonycorn
is hiring
Interested?

Mail Resume + Cover Letter to


contact@loonycorn.com
You:

 Really into tech - cloud, ML, anything and everything


 Interested in video as a medium
 Willing to work from Bangalore
 in the 0-3 years of experience range

Us:

 ex-Google | Stanford | INSEAD


 100,000+ students
 Video content on Pluralsight, Stack, Udemy...
Our Content:
Loonycorn
 The Ultimate Computer Science Bundle
9 courses | 139 hours

 The Complete Machine Learning Bundle


10 courses | 63 hours

 The Complete Computer Science Bundle


8 courses | 78 hours

 The Big Data Bundle


9 courses | 64 hours

 The Complete Web Programming Bundle


8 courses | 61 hours

 The Complete Finance & Economics Bundle


9 courses | 56 hours

 The Scientific Essentials Bundle


7 courses | 41 hours

 ~35 courses on Pluralsight


~80 on StackSocial
~75 on Udemy
Admin How To

Installing a guest OS
To begin OS installation, we need to attach an ISO image
to the VM. Follow the steps given below to begin OS
installation:
1) Select the newly created VM.
2) Click the ‘Settings’ button on the toolbar.
3) Select the storage option from the left pane.
4) Select the optical disk drive from the storage devices.
5) Provide the path of the ISO image and click the ‘OK’
button. Figure 3 depicts the first five steps.
6) Select the VM from the left pane. Click the ‘Start’ button
on the toolbar. Follow the on-screen instructions to
complete OS installation.

VM power actions
Let us understand VM power actions in detail. Figure 2: Creating a VM
1) Power On: As the name suggests, this starts the VM at
the state it was powered off or saved in. To start the VM,
right-click on it and select the ‘Start’ option.
2) Pause: In this state, the guest releases the CPU but not
the memory. As a result, the contents of the memory are
preserved when the VM is resumed. To pause the VM,
right-click on it and select the ‘Pause’ option.
3) Save: This action saves the current VM state and releases
the CPU as well as the memory. The saved machine can
be started again in the same state. To save the VM, right-
click on it and select the ‘Close->Save State’ option.
4) Shutdown: This is a graceful turn-off operation. In this
case, the shutdown signal is sent to the guest. To shut
down the VM, right-click on it and select the ‘Close-
>ACPI Shutdown’ option.
5) Poweroff: This is non-graceful turn-off operation. It can Figure 3: Installing the OS
cause data loss. To power off the VM, right-click on it and
select the ‘Close->Poweroff’ option.
6) Reset: The Reset option will turn off and turn on the
VM, respectively. It is different from Restart, which is a
graceful turn-off operation. To reset the VM, right-click
on it and select the ‘Reset’ option.

Removing the VM
Let us explore the steps we need to take to remove a VM. The
remove operation can be broken up into two parts.
1) Unregister VM: This removes the VM from the library,
i.e., it will just unregister the VM from VirtualBox
so that it won’t be visible in VirtualBox Manager. To
unregister a VM, right-click on it, select the ‘Remove’
option and click the ‘Remove Only’ option. You can
re-register this VM by navigating to the ‘Machine->Add’ Figure 4: Starting the VM
option from VirtualBox Manager.
2) Delete VM: This action is used to delete the VM permanently. VirtualBox—beyond the basics
It will delete the VM’s configuration files and virtual hard Beginners will get a fair idea about virtualisation and
disks. Once performed, this action cannot be undone. To VirtualBox by referring to the first few sections of this article.
remove a VM permanently, right-click on it, select the However, VirtualBox is a feature-rich product; this section
‘Remove’ option and click the ‘Delete all files’ option. describes its more advanced features.

34 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Admin

Export appliance Group VMs


We can export a VM as an appliance in the Open VirtualBox allows you to create groups of VMs, and to
Virtualisation Format (OVF). It comes in a two-file format. manage and perform actions on them as a single entity. You
1) OVF file format: In this format, several VM related files can perform various actions on them like expanding/shrinking
will be generated. For instance, there will be separate files the group, renaming the group, or Start, Stop, Reset and Pause
for virtual hard disks, configurations and so on. actions on the group of VMs.
2) OVA file format: In this format, all VM related files will To create a VM group, perform the following steps:
be archived into a single file and the .ova extension will 1) Select multiple VMs from VirtualBox Manager. Hold the
be assigned to it. ‘Ctrl’ key for multiple selections.
By leveraging this feature, we can create a Golden 2) Right-click it and select the ‘Group’ option.
Image of a VM and deploy multiple instances of it. OVF This will create a VM group called ‘New group’ as
is a platform-independent, efficient, extensible, and shown in Figure 5.
open packaging and distribution format for VMs. As it If you right-click on the group, it will show various
is platform-independent, it allows the import of OVF options like ‘Add VM to group’, ‘Rename group’, ‘Ungroup’,
virtual machines exported from VirtualBox into VMware ‘Start’ and so on. To remove a VM from the group, just drag
Workstation Player and vice versa. and drop that particular VM outside the group.
To export a VM, perform the steps listed below: With snapshots, you can save a particular state of a VM
1) Select a VM from the VirtualBox manager. Navigate to for later use, at which point, you can revert to that state.
the ‘File->Export Appliance’ option. To take a snapshot, perform the following steps:
2) Select the VMs to be exported, and click the ‘Next’ button. 1) Select a VM from VirtualBox Manager.
3) Provide the directory’s location and OVF format version. 2) Click the ‘Machine Tools’ drop down arrow from the
4) Provide the appliance settings and click the ‘Export’ button. toolbar and select the ‘Snapshots’ option.
3) Click the ‘Take’ button.
Import appliance 4) Enter the snapshot’s name and description before clicking
To import a VM, perform the steps given below: on the ‘OK’ button. Figure 6 depicts the above steps.
1) Open VirtualBox Manager and navigate to the ‘File- This window provides various options related to snapshots
>Import Appliance’ option. like Delete, Clone, Restore and so on. Click on the ‘Properties’
2) Select ‘Virtual appliance’ and click on the ‘Next’ button. button to see more details about the selected snapshot.
3) Verify appliance settings and click on the ‘Import’ button.
You will see that a new VM appears in VirtualBox Shared folders
Manager’s left pane. Shared folders enable data sharing between the guest and host
OS. They require VirtualBox guest additions to be installed
Cloning a VM inside the guest. This section describes the installation of
VirtualBox also provides an option to clone existing VMs. guest additions along with the shared folder feature.
As the name suggests, it creates an exact copy of the VM. It To enable the shared folder feature, perform the
supports the following two types of clones. following steps:
1) Full clone: In case of a full clone, it will duplicate all the 1) Start the VM from VirtualBox Manager.
VM’s files. As this is a totally separate VM copy, we can 2) Go to the Devices->Insert Guest Additions CD image option.
easily move this VM to another host. Follow the on-screen instructions to perform the guest
2) Linked clone: In case of linked clones, it will not copy additions installation. Figure 7 depicts the first two steps.
virtual hard disks but, instead, it will take a snapshot of 3) Navigate to Devices->Shared Folders->Shared Folder
the original VM. It will create a new VM, but this one will Settings.
refer to the virtual hard disks of the original VM. This is 4) Click the ‘Add new shared folder’ button. Enter the
a space efficient clone operation, but the downside is that folder’s name, its path and select ‘Permissions’. Click the
you cannot move the VM to another host as the original ‘OK’ button. Figure 8 illustrates the above steps.
VM and the cloned one share the same virtual hard disks. You can mount the shared folder from the guest in the
To create a clone, perform the steps given below: same way as an ordinary network share. Given below is the
1) Select the VM from VirtualBox Manager. Right-click the syntax for that:
VM and select the ‘Clone’ option.
2) Provide the name of the clone VM and click mount -t vboxsf [-o OPTIONS] <sharename> <mountpoint>
the ‘Next’ button.
3) Select the clone type and click the ‘Clone’ button. Understanding virtual networking
You will see that a new cloned VM appears in This section delves deep into the aspects of VirtualBox’s
VirtualBox Manager’s left pane. networking, which supports the network modes

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 35


Admin How To

Figure 5: VM groups Figure 7: Guest addition installation

Figure 6: Snapshot VM Figure 8: Shared folder

Not Attached, NAT, bridged adapters, internal networks and networks from the guest, then this will serve your
host-only adapters. purpose. It is similar to a physical system connected to an
Perform the steps given below to view/manipulate the external network via the router.
current network settings: 3) Bridged adapter: In this mode, VirtualBox connects
1) Select the VM from the VirtualBox Manager. to one of your installed network cards and exchanges
2) Click the ‘Settings’ button on the toolbar. network packets directly, circumventing the host operating
3) Select the ‘Network’ option from the left pane. system’s network stack.
4) Select the adapter. The current networking mode will be 4) Internal: In this mode, communication is allowed between
displayed under the ‘Attached to’ drop-down box. a selected group of VMs only. Communication with the
5) To change the mode, select the required network mode host is not possible.
from the drop-down box and click the ‘OK’ button. 5) Host only: In this mode, communication is allowed
Figure 9 illustrates the above steps. between a selected group of VMs and the host. A
physical Ethernet card is not required; instead, a virtual
VirtualBox network modes network interface (similar to a loopback interface) is
Let us discuss each network mode briefly. created on the host.
1) Not Attached: In this mode, VirtualBox reports to the
guest that the network card is installed but it is not An introduction to VBoxManage
connected. As a result of this, networking is not possible VBoxManage is the command line interface (CLI) of
in this mode. If you want to compare this scenario with VirtualBox. You can manage VirtualBox from your host
a physical machine, then it is similar to the Ethernet card via these commands. It supports all the features that
being present but the cable not being connected to it. are supported by the GUI. It gets installed by default
2) NAT: This stands for Network Address Translation and when the VirtualBox package is installed. Let us look at
it is the default mode. If you want to access external some of its basic commands.

38 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Admin

To turn on the VM
VBoxManage provides a simple command to start the VM. It
accepts the VM name as an argument.

$ VBoxManage startvm Mint-18


Waiting for VM
“Mint-18” to power on...
VM “Mint-18” has been successfully started.

To turn off the VM


The controlvm option supports various actions like pause,
reset, power-off, shutdown and so on. To power off the VM,
execute the command given below at a terminal. It accepts the
VM name as an argument.

Figure 9: Network modes $ VBoxManage controlvm “Mint-18” poweroff


0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
To list VMs
Execute the commands given below in a terminal to list all the To unregister VM
registered VMs: The command given below can be used to unregister a VM. It
accepts the VM’s name as an argument.
$ VBoxManage list vms
“Mint-18” {e54feffd-50ed-4880-8f81-b6deae19110d} $ VBoxManage unregistervm “Mint-18”
“VM-1” {37a25c9a-c6fb-4d08-a11e-234717261abc}
“VM-2” {03b39a35-1954-4778-a261-ceeddc677e65} To register VM
“VM-3” {875be4d5-3fbf-4d06-815d-6cecfb2c2304} The command given below can be used to register a VM. It
accepts the VM’s file name as an argument.
To list groups
We can also list VM groups using the following commands: $ VBoxManage registervm “/home/groot/VirtualBox VMs/Mint-18/
$ VBoxManage list groups Mint-18.vbox”
“/”
“/VM Group”
To delete VM
To show VM information To delete a VM permanently, use the --delete option with the
We can use the showvminfo command to display details unregistervm command. For instance, the following command
about a VM. For instance, the command given below will delete the VM permanently.
provides detailed information about the VM. It accepts the
VM’s name as an argument. $ VBoxManage unregistervm “VM-1” --delete
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...1
$ VBoxManage showvminfo Mint-18 00%
Name: Mint-18
Groups: / VBoxManage provides many more commands and
Guest OS: Ubuntu (64-bit) covering them all is beyond the scope of this tutorial.
UUID: e54feffd-50ed-4880-8f81-b6deae19110d Anyway, you can always dig deeper into this topic by
Config file: /home/groot/VirtualBox VMs/Mint-18/Mint-18.vbox referring to VirtualBox’s official guide. To view all
Snapshot folder: /home/groot/VirtualBox VMs/Mint-18/Snapshots supported commands and their options, execute the
Log folder: /home/groot/VirtualBox VMs/Mint-18/Logs following command in a terminal:
Hardware UUID: e54feffd-50ed-4880-8f81-b6deae19110d
Memory size: 1024MB $ VBoxManage –help
Page Fusion: off
VRAM size: 16MB
By: Narendra K.
Note: The remaining output is not shown here, The author is a FOSS enthusiast. He can be reached at
in order to save space. narendra0002017@gmail.com.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 39


Admin How To

DevOps Series
Deploying Graylog Using Ansible
This 11th article in the DevOps series is a tutorial on installing
Graylog software using Ansible.

G
raylog is a free and open source log management plugins/modules’, u’/usr/share/ansible/plugins/modules’]
software that allows you to store and analyse all your ansible python module location = /usr/lib/python2.7/site-
logs from a central location. It requires MongoDB packages/ansible
(a document-oriented, NoSQL database) to store meta executable location = /usr/bin/ansible
information and configuration information. The actual log python version = 2.7.14 (default, Sep 20 2017, 01:25:59)
messages are stored in Elasticsearch. It is written using the [GCC 7.2.0]
Java programming language and released under the GNU
General Public License (GPL) v3.0. Add an entry to the /etc/hosts file for the guest ‘ubuntu’
Access control management is built into the software, VM as indicated below:
and you can create roles and user accounts with different
permissions. If you already have an LDAP server, its user 192.168.122.25 ubuntu
accounts can be used with the Graylog software. It also
provides a REST API, which allows you to fetch data to build On the host system, let’s create a project directory
your own dashboards. You can create alerts to take actions structure to store the Ansible playbooks:
based on the log messages, and also forward the log data
to other output streams. In this article, we will install the ansible/inventory/kvm/
Graylog software and its dependencies using Ansible. /playbooks/configuration/
/playbooks/admin/
GNU/Linux
An Ubuntu 16.04.3 LTS guest virtual machine (VM) instance An ‘inventory’ file is created inside the inventory/kvm
will be used to set up Graylog using KVM/QEMU. The folder that contains the following code:
host system is a Parabola GNU/Linux-libre x86_64 system.
Ansible is installed on the host system using the distribution ubuntu ansible_host=192.168.122.25 ansible_connection=ssh
package manager. The version of Ansible used is: ansible_user=ubuntu ansible_password=password

$ ansible --version You should be able to issue commands using Ansible to


ansible 2.4.1.0 the guest OS. For example:
config file = /etc/ansible/ansible.cfg
configured module search path = [u’/home/shakthi/.ansible/ $ ansible -i inventory/kvm/inventory ubuntu -m ping

40 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Admin

ubuntu | SUCCESS => { - name: Install Mongodb


“changed”: false, hosts: ubuntu
“failed”: false, become: yes
“ping”: “pong” become_method: sudo
} gather_facts: true
tags: [mongodb]
Pre-requisites
The Graylog software has a few dependency packages that tasks:
need to be installed as pre-requisites. The APT package - name: Install MongoDB
repository is updated and upgraded before installing the pre- package:
requisite software packages. name: mongodb-server
state: latest
---
- name: Pre-requisites - name: Start the server
hosts: ubuntu service:
become: yes name: mongodb
become_method: sudo state: started
gather_facts: true
tags: [prerequisite] - wait_for:
port: 27017
tasks:
- name: Update the software package repository The Ubuntu software package for MongoDB is called
apt: the ‘mongodb-server’. It is installed, and the database server
update_cache: yes is started. The Ansible playbook waits for the MongoDB
server to start and listen on the default port 27017. The above
- name: Update all the packages playbook can be invoked using the following command:
apt:
upgrade: dist $ ansible-playbook -i inventory/kvm/inventory playbooks/
configuration/graylog.yml --tags mongodb -K
- name: Install pre-requisite packages
package: Elasticsearch
name: “{{ item }}” Elasticsearch is a search engine that is written in Java and
state: latest released under the Apache licence. It is based on Lucene (an
with_items: information retrieval software library) and provides a full-text
- apt-transport-https search feature. The elastic.co website provides .deb packages
- openjdk-8-jre-headless that can be used to install the same on Ubuntu. The Ansible
- uuid-runtime playbook for this is provided below:
- pwgen
- name: Install Elasticsearch
The above playbook can be invoked as follows: hosts: ubuntu
become: yes
$ ansible-playbook -i inventory/kvm/inventory playbooks/ become_method: sudo
configuration/graylog.yml --tags prerequisite -K gather_facts: true
tags: [elastic]
The ‘-K’ option prompts for the sudo password for the
‘ubuntu’ user. You can append multiple ‘-v’ to the end of the tasks:
playbook invocation to get a more verbose output. - name: Add key
apt_key:
MongoDB url: https://artifacts.elastic.co/GPG-KEY-
Graylog uses MongoDB to store meta information and elasticsearch
configuration changes. The MongoDB software package state: present
that ships with Ubuntu 16.04 is supported by the latest
Graylog software. The Ansible playbook to install the - name: Add elastic deb sources
same is as follows: lineinfile:

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 41


Admin How To

path: /etc/apt/sources.list.d/elastic-5.x.list “cluster_uuid” : “nuBTSlFBTk6PDGyrfDCr3A”,


create: yes “version” : {
line: ‘deb https://artifacts.elastic.co/packages/5.x/ “number” : “5.6.5”,
apt stable main’ “build_hash” : “6a37571”,
“build_date” : “2017-12-04T07:50:10.466Z”,
- name: Update the software package repository “build_snapshot” : false,
apt: “lucene_version” : “6.6.1”
update_cache: yes },
“tagline” : “You Know, for Search”
- name: Install Elasticsearch }
package:
name: elasticsearch Graylog
state: latest The final step is to install Graylog itself. The .deb package
available from the graylog2.org website is installed and
- name: Update cluster name then the actual ‘graylog-server’ package is installed. The
lineinfile: configuration file is updated with credentials for the ‘admin’
path: /etc/elasticsearch/elastisearch.yml user with a hashed string for the password ‘osfy’. The Web
create: yes interface is also enabled with the default IP address of the
regexp: ‘^#cluster.name: my-application’ guest VM. The Graylog service is finally started. The Ansible
line: ‘cluster.name: graylog’ playbook to install Graylog is as follows:

- name: Daemon reload - name: Install Graylog


systemd: daemon_reload=yes hosts: ubuntu
become: yes
- name: Start elasticsearch service become_method: sudo
service: gather_facts: true
name: elasticsearch.service tags: [graylog]
state: started
tasks:
- wait_for: - name: Install Graylog repo deb
port: 9200 apt:
deb: https://packages.graylog2.org/repo/packages/
- name: Test Curl query graylog-2.3-repository_latest.deb
shell: curl -XGET ‘localhost:9200/?pretty’
- name: Update the software package repository
The stable elastic.co repository package is installed before apt:
installing Elasticsearch. The cluster name is then updated in update_cache: yes
the /etc/elasticsearch/elasticsearch.yml configuration file. The
system daemon services are reloaded, and the Elasticsearch - name: Install Graylog
service is started. The Ansible playbook waits for the service package:
to run and listen on port 9200. name: graylog-server
The above playbook can be invoked as follows: state: latest

$ ansible-playbook -i inventory/kvm/inventory playbooks/ - name: Update database credentials in the file


configuration/graylog.yml --tags elastic -K replace:
dest: “/etc/graylog/server/server.conf”
You can perform a manual query to verify that regexp: “{{ item.regexp }}”
Elasticsearch is running using the following Curl command: replace: “{{ item.replace }}”
with_items:
$ curl -XGET ‘localhost:9200/?pretty’ - { regexp: ‘password_secret =’, replace: ‘password_
secret = QXHg3Eqvsu PmFxUY2aKlgimUF05plMPXQ Hy1stUiQ1uaxgIG27
{ K3t2MviRiFLNot09U1ako T30njK3G69KIzqIoYqdY3oLUP’ }
“name” : “cFn-3YD”, - { regexp: ‘#root_username = admin’, replace: ‘root_
“cluster_name” : “elasticsearch”, username = admin’ }

42 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Admin

- { regexp: ‘root_password_sha2 =’, The guest VM is a single node, and hence if you traverse
replace: ‘root_password_sha2 = eabb9bb2efa089223 to System -> Nodes, you will see this node information as
d4f54d55bf2333ebf04a29094bff00753536d7488629399’} illustrated in Figure 3.
- { regexp: ‘#web_enable = false’, replace: ‘web_
enable = true’ }
- { regexp: ‘#web_listen_uri =
http://127.0.0.1:9000/’, replace: “web_listen_uri = http://{{
ansible_default_ipv4.address }}:9000/” }
- { regexp: ‘rest_listen_uri = http://127.0.0.1:9000/
api/’, replace: “rest_listen_uri = http://{{ ansible_default_
ipv4.address }}:9000/api/” }

- name: Start graylog service


service:
name: graylog-server.service
state: started Figure 3: Graylog node activated

The above playbook can be run using the following command: You can now test the Graylog installation by adding a
data source as input by traversing System -> Input in the Web
$ ansible-playbook -i inventory/kvm/inventory playbooks/ interface. The ‘random HTTP message generator’ is used as a
configuration/graylog.yml --tags graylog -K local input, as shown in Figure 4.

Web interface
You can now open the URL http://192.168.122.25:9000 in a
browser on the host system to see the default Graylog login
page as shown in Figure 1.

Figure 1: Graylog login page

The user name is ‘admin’ and the password is ‘osfy’.


You will then be taken to the Graylog home page as
shown in Figure 2.

Figure 2: Graylog home page Figure 4: Random HTTP message generator

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 43


Admin How To

Figure 5: Graylog input random HTTP message generator

The newly created input source is now running and visible


as a local input in the Web page as shown in Figure 5. Figure 6: Graylog random HTTP messages
After a few minutes, you can observe the created
messages in the Search link as shown in Figure 6. name: elasticsearch
state: absent
Uninstalling Graylog
An Ansible playbook to stop the different services, and to - name: Stop the MongoDB server
uninstall Graylog and its dependency software packages, is service:
given below for reference: name: mongodb
--- state: stopped
- name: Uninstall Graylog
hosts: ubuntu - name: Uninstall MongoDB
become: yes package:
become_method: sudo name: mongodb-server
gather_facts: true state: absent
tags: [uninstall]
- name: Uninstall pre-requisites
tasks: package:
- name: Stop the graylog service name: “{{ item }}”
service: state: absent
name: graylog-server.service with_items:
state: stopped - pwgen
- uuid-runtime
- name: Uninstall graylog server - openjdk-8-jre-headless
package: - apt-transport-https
name: graylog-server
state: absent The above playbook can be invoked using:

- name: Stop the Elasticsearch server $ ansible-playbook -i inventory/kvm/inventory playbooks/


service: admin/uninstall-graylog.yml -K
name: elasticsearch.service
state: stopped
By: Shakthi Kannan
- name: Uninstall Elasticsearch The author is a free software enthusiast and blogs at
shakthimaan.com.
package:

44 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Overview Admin

Analysing Big Data with Hadoop


Big Data is unwieldy because of its vast size, and needs tools to efficiently
process and extract meaningful results from it. Hadoop is an open source
software framework and platform for storing, analysing and processing data. This
article is a beginner’s guide to how Hadoop can help in the analysis of Big Data.

B
ig Data is a term used to refer to a huge collection Apache. There are many organisations using Hadoop —
of data that comprises both structured data found in Amazon Web Services, Intel, Cloudera, Microsoft, MapR
traditional databases and unstructured data like text Technologies, Teradata, etc.
documents, video and audio. Big Data is not merely data but
also a collection of various tools, techniques, frameworks and The history of Hadoop
platforms. Transport data, search data, stock exchange data, Doug Cutting and Mike Cafarella are two important people
social media data, etc, all come under Big Data. in the history of Hadoop. They wanted to invent a way to
Technically, Big Data refers to a large set of data that can return Web search results faster by distributing the data over
be analysed by means of computational techniques to draw several machines and make calculations, so that several
patterns and reveal the common or recurring points that would jobs could be performed at the same time. At that time,
help to predict the next step—especially human behaviour, they were working on an open source search engine project
like future consumer actions based on an analysis of past called Nutch. But, at the same time, the Google search
purchase patterns. engine project also was in progress. So, Nutch was divided
Big Data is not about the volume of the data, but more into two parts—one of the parts dealt with the processing
about what people use it for. Many organisations like business of data, which the duo named Hadoop after the toy elephant
corporations and educational institutions are using this data to that belonged to Cutting’s son. Hadoop was released as an
analyse and predict the consequences of certain actions. After open source project in 2008 by Yahoo. Today, the Apache
collecting the data, it can be used for several functions like: Software Foundation maintains the Hadoop ecosystem.
ƒ Cost reduction
ƒ The development of new products Prerequisites for using Hadoop
ƒ Making faster and smarter decisions Linux based operating systems like Ubuntu or Debian
ƒ Detecting faults are preferred for setting up Hadoop. Basic knowledge of
Today, Big Data is used by almost all sectors including the Linux commands is helpful. Besides, Java plays an
banking, government, manufacturing, airlines and hospitality. important role in the use of Hadoop. But people can use
There are many open source software frameworks for their preferred languages like Python or Perl to write the
storing and managing data, and Hadoop is one of them. methods or functions.
It has a huge capacity to store data, has efficient data There are four main libraries in Hadoop.
processing power and the capability to do countless jobs. 1. Hadoop Common: This provides utilities used by
It is a Java based programming framework, developed by all other modules in Hadoop.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 45


Admin Overview

2. Hadoop MapReduce: This works as a parallel framework efficiently. MapReduce programming is inefficient
for scheduling and processing the data. for jobs involving highly analytical skills. It is a
3. Hadoop YARN: This is an acronym for Yet Another distributed system with low level APIs. Some APIs are
Resource Navigator. It is an improved version of not useful to developers.
MapReduce and is used for processes running over But there are benefits too. Hadoop has many useful
Hadoop. functions like data warehousing, fraud detection and
4. Hadoop Distributed File System – HDFS: This stores data marketing campaign analysis. These are helpful to get
and maintains records over various machines or clusters. useful information from the collected data. Hadoop has
It also allows the data to be stored in an accessible format. the ability to duplicate data automatically. So multiple
HDFS sends data to the server once and uses it as copies of data are used as a backup to prevent loss of data.
many times as it wants. When a query is raised, NameNode
manages all the DataNode slave nodes that serve the given Frameworks similar to Hadoop
query. Hadoop MapReduce performs all the jobs assigned Any discussion on Big Data is never complete without
sequentially. Instead of MapReduce, Pig Hadoop and Hive a mention of Hadoop. But like with other technologies,
Hadoop are used for better performances. a variety of frameworks that are similar to Hadoop have
Other packages that can support Hadoop are listed below. been developed. Other frameworks used widely are Ceph,
ƒ Apache Oozie: A scheduling system that manages Apache Storm, Apache Spark, DataTorrentRTS, Google
processes taking place in Hadoop BiqQuery, Samza, Flink and HydraDataTorrentRTS.
ƒ Apache Pig: A platform to run programs made on Hadoop MapReduce requires a lot of time to perform assigned
ƒ Cloudera Impala: A processing database for Hadoop. tasks. Spark can fix this issue by doing in-memory
Originally it was created by the software organisation processing of data. Flink is another framework that works
Cloudera, but was later released as open source software faster than Hadoop and Spark. Hadoop is not efficient for
ƒ Apache HBase: A non-relational database for Hadoop real-time processing of data. Apache Spark uses stream
ƒ Apache Phoenix: A relational database based on processing of data where continuous input and output of
Apache HBase data happens. Apache Flink also provides single runtime
ƒ Apache Hive: A data warehouse used for summarisation, for the streaming of data and batch processing.
querying and the analysis of data However, Hadoop is the preferred platform for
ƒ Apache Sqoop: Is used to store data between Hadoop and Big Data analytics because of its scalability, low cost
structured data sources and flexibility. It offers an array of tools that data
ƒ Apache Flume: A tool used to move data to HDFS scientists need. Apache Hadoop with YARN transforms
ƒ Cassandra: A scalable multi-database system a large set of raw data into a feature matrix which is
easily consumed. Hadoop makes machine learning
The importance of Hadoop algorithms easier.
Hadoop is capable of storing and processing large amounts
of data of various kinds. There is no need to preprocess the
data before storing it. Hadoop is highly scalable as it can References
store and distribute large data sets over several machines [1] https://www.sas.com/en_us/insights/big-data/hadoop.html
running in parallel. This framework is free and uses cost- [2] https://www.sas.com/en_us/insights/big-data/what-is-
big-data.html
efficient methods. [3] http://www.mastersindatascience.org/data-scientist-
Hadoop is used for: skills/hadoop/
ƒ Machine learning [4] https://data-flair.training/blogs/13-limitations-of-hadoop/
[5] https://www.tutorialspoint.com/hadoop/hadoop_big_data_
ƒ Processing of text documents overview.htm
ƒ Image processing [6] https://www.knowledgehut.com/blog/bigdata-hadoop/top-
ƒ Processing of XML messages pros-and-cons-of-hadoop
[7] http://searchcloudcomputing.techtarget.com/definition/
ƒ Web crawling Hadoop
ƒ Data analysis [8] https://en.wikipedia.org/wiki/Apache_Hadoop
ƒ Analysis in the marketing field [9] http://bigdata-madesimple.com/the-top-12-apache-
hadoop-challenges/
ƒ Study of statistical data

Challenges when using Hadoop By: Jameer Babu


Hadoop does not provide easy tools for removing noise
The author is a FOSS enthusiast and is interested in
from the data; hence, maintaining that data is a challenge. competitive programming and problem solving. He can be
It has many data security issues like encryption problems. contacted at jameer.jb@gmail.com.
Streaming jobs and batch jobs are not performed

46 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Overview Admin

Top Three Open Source


DATA BACKUP TOOLS
This article examines three open source data backup solutions
that are the best among the many available.

O
pen source data backup software has become quite Amanda is a scheduling, automation and tracking
popular in recent times. One of the main reasons program wrapped around native backup tools like tar (for
for this is that users have access to the code, which UNIX/Linux) and zip (for Windows). The database that
allows them to tweak the product. Open source tools are now tracks all backups allows you to restore any file from a
being used in data centre environments because they are low previous version of that file that was backed up by Amanda.
cost and provide flexibility. This reliance on native backup tools comes with advantages
Let’s take a look at three open source backup software and disadvantages. The biggest advantage, of course, is that
packages that I consider the best. All three provide support for you will never have a problem reading an Amanda tape on
UNIX, Linux, Windows and Mac OS. any platform. The formats Amanda uses are easily available
on any open-systems platform. The biggest disadvantage is
Amanda that some of these tools have limitations (e.g., path length)
This is one of the oldest open source backup software and Amanda will inherit those limitations.
packages. It gets its name from the University of Maryland On another level, Amanda is a sophisticated program
where it was originally conceived. Amanda stands for the that has a number of enterprise-level features, like
Advanced Maryland Disk Archive. automatically determining when to run your full backups,

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 47


Admin Overview

Figure 1: Selecting files and folders for file system backup Figure 3: BackupPC server status

As of this writing, Bacula is only a file backup product


and does not provide any database agents. You can shut a
database down and back up its files, but this is not a viable
backup method for some databases.

BackupPC
Both Amanda and Bacula feel and behave like conventional
backup products. They have support for both disk and tape,
scheduled full and incremental backups, and they come
in a ‘backup format’. BackupPC, on the other hand, is a
disk-only backup tool that forever performs incremental
backups, and stores those backups in their native format in
a snapshot-like tree structure that is available via a GUI.
Like Bacula, it’s a file-only backup tool and its incremental
nature might be hampered by backing up large database
files. However, it’s a really interesting alternative for file
Figure 2: Bacula admin page data. BackupPC’s single most imposing feature is that it
does file-level de-duplication. If you have a file duplicated
instead of having you schedule them. It’s also the only open anywhere in your environment, it will find that duplicate and
source package to have database agents for SQL Server, replace it with a link to the original file.
Exchange, SharePoint and Oracle, as well as the only backup
package to have an agent for MySQL and Ingress. Which one should you use?
Amanda is now backed by Zmanda, and this company Choosing a data backup tool entirely depends on the purpose.
has put its development into overdrive. Just a few months If you want the least proprietary backup format then go for
after beginning operations, Zmanda has addressed major BackupPC. If database agents are a big driver, you can choose
limitations in the product that had hindered it for years. Amanda. Or if you want a product that’s designed like a typical
Since then, it has been responsible for the addition of a lot of commercial backup application, then opt for Bacula. One more
functionality, including those database agents. important aspect is that both BackupPC and Amanda need the
Linux server to control backup and Bacula has a Windows
Bacula server to do the same.
Bacula was originally written by Kern Sibbald, who chose All three products are very popular. Which one you
a very different path from Amanda by writing a custom choose depends on what you need. The really nice thing
backup format designed to overcome the limitations of the about all three tools is that they can be downloaded free of
native tools. Sibbald’s original goal was to write a tool that cost. So you can decide which one is better for you after
could take the place of the enterprise tools he saw in the trying out all three.
data centre.
Bacula also has scheduling, automation and tracking of
By Neetesh Mehrotra
all backups, allowing you to easily restore any file (or files)
from a previous version. Like Amanda, it also has media The author works at TCS as a systems engineer, and his areas
of interest are Java development and automation testing. For
management features that allow you to use automated tape any queries, do contact him at mehrotra.neetesh@gmail.com.
libraries and perform disk-to-disk backups.

48 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Overview Admin

Getting Past the Hype


Around Hadoop
The term Big Data and the name Hadoop are bandied about freely in computer circles.
In this article, the author attempts to explain them in very simple terms.

I
magine this scenario: You have 1GB of data that you need in Java, originally developed by Doug Cutting, who named it
to process. The data is stored in a relational database in after his son’s toy elephant!
your desktop computer which has no problem managing Hadoop uses Google’s MapReduce technology as its
the load. Your company soon starts growing very rapidly, and foundation. It is optimised to handle massive quantities
the data generated grows to 10GB, and then 100GB. You start of data which could be structured, unstructured or semi-
to reach the limits of what your current desktop computer structured, using commodity hardware, i.e., relatively
can handle. So what do you do? You scale up by investing inexpensive computers. This massive parallel processing
in a larger computer, and you are then alright for a few more is done with great efficiency. However, handling massive
months. When your data grows from 1TB to 10TB, and amounts of data is a batch operation, so the response time is
then to 100TB, you are again quickly approaching the limits not immediate. Importantly, Hadoop replicates its data across
of that computer. Besides, you are now asked to feed your different computers, so that if one goes down, the data is
application with unstructured data coming from sources like processed on one of the replicated computers.
Facebook, Twitter, RFID readers, sensors, and so on. Your
managers want to derive information from both the relational Big Data
data and the unstructured data, and they want this information Hadoop is used for Big Data. Now what exactly is Big Data?
as soon as possible. What should you do? With all the devices available today to collect data, such as
Hadoop may be the answer. Hadoop is an open source RFID readers, microphones, cameras, sensors, and so on, we
project of the Apache Foundation. It is a framework written are seeing an explosion of data being collected worldwide.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 49


Admin Overview

High Level Architecture of Hadoop


Maser Node Slave Node Slave Node

TaskTracker TaskTracker TaskTracker

JobTracker
MapReducelayer

HDFS layer
NameNode

DataNode DataNode DataNode

Figure 1: High level architecture Figure 2: Hadoop architecture

Big Data is a term used to describe large collections of data Hadoop architecture
(also known as data sets) that may be unstructured, and grow Before we examine Hadoop’s components and architecture,
so large and so quickly that it is difficult to manage with let’s review some of the terms that are used in this discussion.
regular database or statistical tools. A node is simply a computer. It is typically non-enterprise,
In terms of numbers, what are we looking at? How BIG is commodity hardware that contains data. We can keep adding
Big Data? Well there are more than 3.2 billion Internet users, nodes, such as Node 2, Node 3, and so on. This is called
and active cell phones have crossed the 7.6 billion mark. a rack, which is a collection of 30 or 40 nodes that are
There are now more in-use cell phones than there are people physically stored close together and are all connected to the
on the planet (7.4 billion). Twitter processes 7TB of data same network switch. A Hadoop cluster (or just a ‘cluster’
every day, and 600TB of data is processed by Facebook daily. from now on) is a collection of racks.
Interestingly, about 80 per cent of this data is unstructured. Now, let’s examine Hadoop’s architecture—it has two
With this massive amount of data, businesses need fast, major components.
reliable, deeper data insight. Therefore, Big Data solutions 1. The distributed file system component: The main example
based on Hadoop and other analytic software are becoming of this is the Hadoop distributed file system (HDFS),
more and more relevant. though other file systems like IBM Spectrum Scale, are
also supported.
Open source projects related to Hadoop 2. The MapReduce component: This is a framework
Here is a list of some other open source projects for performing calculations on the data in the
related to Hadoop: distributed file system.
ƒ Eclipse is a popular IDE donated by IBM to the open HDFS runs on top of the existing file systems on each
source community. node in a Hadoop cluster. It is designed to tolerate a high
ƒ Lucene is a text search engine library written in Java. component failure rate through the replication of the data.
ƒ Hbase is a Hadoop database - Hive provides data A file on HDFS is split into multiple blocks, and each is
warehousing tools to extract, transform and load (ETL) replicated within the Hadoop cluster. A block on HDFS is a
data, and query this data stored in Hadoop files. blob of data within the underlying file system (see Figure 1).
ƒ Pig is a high-level language that generates MapReduce Hadoop distributed file system (HDFS) stores the
code to analyse large data sets. application data and file system metadata separately on
ƒ Spark is a cluster computing framework. dedicated servers. NameNode and DataNode are the two
ƒ ZooKeeper is a centralised configuration service and critical components of the HDFS architecture. Application
naming registry for large distributed systems. data is stored on servers referred to as DataNodes, and
ƒ Ambari manages and monitors Hadoop clusters through file system metadata is stored on servers referred to as
an intuitive Web UI. NameNodes. HDFS replicates the file’s contents on multiple
ƒ Avro is a data serialisation system. DataNodes, based on the replication factor, to ensure
ƒ UIMA is the architecture used for the analysis of the reliability of data. The NameNode and DataNode
unstructured data. communicate with each other using TCP based protocols.
ƒ Yarn is a large scale operating system for Big Data The heart of the Hadoop distributed computation platform
applications. is the Java-based programming paradigm MapReduce. Map
ƒ MapReduce is a software framework for easily writing or Reduce is a special type of directed acyclic graph that can
applications that process vast amounts of data. be applied to a wide range of business use cases. The Map

50 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Overview Admin

can only work with structured data. The RDBMS architecture


Node with the ER model is unable to deliver fast results with
Manager
vertical scalability by adding CPU or more storage. It
Container App Mstr
becomes unreliable if the main server is down. On the other
hand, the Hadoop system manages effectively with large-
Client
sized structured and unstructured data in different formats
Node
Resource
Manager Manager such as XML, JSON and text, at high fault tolerance. With
Client clusters of many servers in horizontal scalability, Hadoop’s
App Mstr Container performance is superior. It provides faster results from Big
Data and unstructured data because its Hadoop architecture is
Node
based on open source.
MapReduce Status
Manager
Job Submission
Node Status
Resource Request
Container Container
What Hadoop can’t do
Hadoop is not suitable for online transaction processing
workloads where data is randomly accessed on structured
Figure 3: Resource Manager and Node Manager data like a relational database. Also, Hadoop is not suitable
for online analytical processing or decision support system
function transforms a piece of data into key-value pairs; then workloads, where data is sequentially accessed on structured
the keys are sorted, where a Reduce function is applied to data like a relational database, to generate reports that provide
merge the values (based on the key) into a single output. business intelligence. Nor would Hadoop be optimal for
structured data sets that require very nominal latency, like
Resource Manager and Node Manager when a website is served up by a MySQL database in a
The Resource Manager and the Node Manager form the data typical LAMP stack—that’s a speed requirement that Hadoop
computation framework. The Resource Manager is the ultimate would not serve well.
authority that arbitrates resources among all the applications in
the system. The Node Manager is the per-machine framework Reference
agent that is responsible for containers, monitoring their [1] https://hadoop.apache.org/
resource usage (CPU, memory, disk and network), and reports
this data to the Resource Manager/Scheduler.
By: Neetesh Mehrotra
Why Hadoop? The author works at TCS as a systems engineer. His areas
The problem with a relational database management system of interest are Java development and automation testing.
He can be contacted at mehrotra.neetesh@gmail.com.
(RDBMS) is that it cannot process semi-structured data. It

OSFY Magazine Attractions During 2017-18


MONTH THEME
March 2017 Open Source Firewall, Network security and Monitoring

April 2017 Databases management and Optimisation

May 2017 Open Source Programming (Languages and tools)

June 2017 Open Source and IoT

July 2017 Mobile App Development and Optimisation

August 2017 Docker and Containers

September 2017 Web and desktop app Development

October 2017 Artificial Intelligence, Deep learning and Machine Learning

November 2017 Open Source on Windows

December 2017 BigData, Hadoop, PaaS, SaaS, Iaas and Cloud

January 2018 Data Security, Storage and Backup

February 2018 Best in the world of Open Source (Tools and Services)

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 51


Admin Insight

Open Source Storage Solutions


You Can Depend On
Storage space is at a premium with petabytes and terabytes of data being generated
almost on a daily basis due to modern day living. Open source storage solutions can
help mitigate the storage problems of individuals as well as small and large scale
enterprises. Let’s take a look at some of the best solutions and what they offer.

W
e have all been observing a sudden surge in the e-mails, image graphics, audio files, databases, spreadsheets,
production of data in the recent past and this will etc, which act as the lifeblood for most companies. Besides,
undoubtedly increase in the years ahead. Almost all many organisations also have some confidential information
the applications on our smartphones (like Facebook, Instagram, that must not be leaked or accessed by anyone, in which
WhatsApp, Ola, etc) generate data in different forms like text case, security becomes one of the most important aspects of
and images, or depend on data to work upon. With around 2.32 any data storage solution. In critical healthcare applications,
billion smartphone users across the globe (as per the latest data an organisation cannot afford to run out of memory, so data
from statista.com) having installed multiple applications, it needs to be monitored at each and every second.
certainly adds up to a really huge amount of data, daily. Apart Storing different kinds of data and managing its storage
from this, there are other sources of data as well like different is critical to any company’s behind-the-scenes success.
Web applications, sensors and actuators used in IoT devices, When we look for a solution that covers all our storage
process automation plants, etc. All this creates a really big needs, the possibilities seem quite endless, and many of
challenge to store such massive amounts of data in a manner them are likely to consume our precious IT budgets. This is
that can be used as and when needed. why we cannot afford to overlook open source data storage
We all know that our businesses cannot get by without solutions. Once you dive into the open source world, you
storing our data. Sooner or later, even small businesses will find a huge array of solutions for almost every problem
need space for data storage—for documents, presentations, or purpose, which includes storage as well.

52 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Admin

Reasons for the growth in the data storage backing up your most important files to a highly secure
solutions segment remote server, you are actually protecting the data stored at
Let’s check out some of the reasons for this: your place of business. You can also easily share different
1. Various recent government regulations, like Sarbanes- large files with your clients, partners or others by providing
Oxley, ask businesses to maintain and keep a backup them with password-protected access to your online storage
of different types of data which they might have service, hence eliminating the need to send those large files
otherwise deleted. by e-mail. And in most cases, you can log into your account
2. Many of the small businesses have now started archiving from any system using a Web based browser, which is one of
different e-mail messages, even those dating back five or the great ways to retrieve files when you are away from your
more years for various legal reasons. PC. Remote storage can be a bit slow, especially during an
3. Also, the pervasiveness of spyware and viruses requires initial backup session, and only as fast as the speed of your
backups and that again requires more storage capacity. network’s access to that storage. For extremely large files, you
4. There has been a growing need to back up and store may require higher speed network access.
different large media files, such as video, MP3, etc, Network attached storage: Network attached storage
and make the same available to users on a specific (NAS) provides fast, reliable and simple access to data in any
network. This is again generating a demand for large IP networking environment. Such solutions are quite suitable
storage solutions. for small or mid-sized businesses that require large volumes
5. Each newer version of any software application or of economical storage which can be shared by multiple users
operating system demands more space and memory over a network. Given that many of the small businesses lack
than its predecessor, which is another reason driving the IT departments, this storage solution is easy to deploy, can
demand for large storage solutions. be managed and consolidated centrally. This type of storage
solution can be as simple as a single hard drive with an
Different types of storage options Ethernet port or even built-in Wi-Fi connectivity.
There are different types of storage solutions that can be used More sophisticated NAS solutions can also provide
based on individual requirements, as listed below. additional USB as well as FireWire ports, enabling you to
Flash memory thumb drives: These drives are connect external hard drives to scale up the overall storage
particularly useful to mobile professionals since they consume capacity of businesses. A NAS storage solution can also offer
little power, are small enough to even fit on a keychain and print-server capabilities, which let multiple users easily share a
have almost no moving parts. You can connect any Flash single printer. A NAS solution may also include multiple hard
memory thumb drive to your laptop’s Universal Serial Bus drives in a Redundant Array of Independent Disks (RAID)
(USB) port and back up different files on the system. Some of Level 1 array. This storage system contains two or more
the USB thumb drives also provide encryption to protect files equivalent hard drives (similar to two 250GB drives) in a single
in case the drive gets lost or is stolen. Flash memory thumb network-connected device. Files written to the first (main) drive
drives also let us store our Outlook data (like recent e-mails are automatically written to the second drive as well. This kind
or calendar items), different bookmarks on Internet Explorer, of automated redundancy present in NAS solutions implies
and even some of the desktop applications. That way, you can that if the first hard drive dies, we will still have access to all
leave your laptop at home and just plug the USB drive into our applications and files present on the second drive. Such
any borrowed computer to access all your data elsewhere. solutions can also help in offloading files being served by other
External hard drives: An inexpensive and relatively servers on your network, which increases the performance.
simpler way to add more memory storage is to connect A NAS system allows you to consolidate storage, hence
an external hard drive to your computer. External hard increasing the efficiency and reducing costs. It simplifies the
disk drives that are directly connected to PCs have several storage administration, data backup and its recovery, and also
disadvantages. Any file stored only on the drive but not allows for easy scaling to meet the growing storage needs.
elsewhere requires to be backed up. Also, if you travel
somewhere for work and need access to some of the files on Choosing the right storage solution
an external drive, you will have to take the drive with you or There are a number of storage solutions available in the
remember to make a copy of the required files to your laptop’s market, which meet diverse requirements. At times, you could
internal drive, a USB thumb drive, a CD or any other storage get confused while trying to choose the right one. Let’s get
media. Finally, in case of a fire or other catastrophe at your rid of that confusion by considering some of the important
place of business, your data will not be completely protected aspects of a storage solution.
if it’s stored on an external hard drive. Scalability: This is one of the important factors to
Online storage: There are different services which be considered while looking for any storage solution. In
provide remote storage and backup over the Internet. All different distributed storage systems, storage capacity can
such services offer businesses a number of benefits. By be added in two ways. The first way involves adding disks

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 53


Admin Insight

Figure 1: Qualities of NAS solutions (Image source: googleimages.com)


Figure 2: Main services and components of OpenStack (Image source: googleimages.com)
or replacing the existing disks with ones that have higher
storage capacity (also called ‘scaling up’). The other method
involves adding nodes with ‘scale out’ capacity. Whenever
you add hardware, you increase the whole system’s
performance as well as its capacity.
Performance: This is what we look for while choosing
any storage solution. One cannot afford to compromise on
the performance of any storage solution, as this may directly
impact the performance of the application that uses the
given storage solution. Flexible scalability allows users to
increase the capacity and performance independently as per
their needs and budget.
Reliability: We all look for resources that can be relied Figure 3: Architecture for the Ceph storage solution (Image source: googleimages.com)
upon for a long period of time, and this is the case even when
searching for a storage solution. OpenStack: OpenStack is basically a cloud operating
Affordability: Since budget and pricing are important, system which controls large pools of networking resources,
an open source storage solution is a good option because it computation and storage throughout a data centre, all of which
is available free of cost. This is an important factor for small are managed using a dashboard that gives its administrators
businesses that cannot afford to spend much just for storage the controls while empowering users to provision the
solutions. resources through a Web interface. The OpenStack Object
Availability: Sometimes, data stored in a storage solution Storage service helps in providing software that stores and
is not available when being fetched by any application. This retrieves data over HTTP. Objects (also referred to as blobs
can occur because of some disk failure. We all want to avoid of data) are stored in an organisational hierarchy which offers
such circumstances, which may lead to unavailability of data. anonymous read-only access or ACL defined access, or even
Data should be easily available when it’s being accessed. a temporary access. This type of object storage supports
Simplicity: Even the most advanced storage solutions multiple token-based authentication mechanisms that are
come with management interfaces that are as good as or better implemented via middleware.
than the traditional storage units. All such interfaces show Ceph: This is a type of distributed object storage and file
details about each node, capacity allocation, alerts, overall system designed to provide high performance, scalability and
performance, etc. This is a significant factor to be considered reliability. It is built on the Reliable Autonomic Distributed
while choosing a storage solution. Object Store, and allows enterprises to build their own economic
Support: Last but not the least, there should be support storage devices using different commodity hardware. It is
from the manufacturer or from a group of developers, maintained by Red Hat after its acquisition of InkTank in April
including the support for applications. Support is quite 2014. It is capable of storing blocks, files and objects as well. It
essential if you plan on installing your database, virtual is scale-out, which means that multiple Ceph storage nodes are
server farm, email or other critical information on the storage present on a single storage system which easily handles many
solution. You must make sure that the manufacturer offers the petabytes of memory, and simultaneously increases performance
level of support you require. and capacity. Ceph has many of the basic enterprise storage
features, which include replication, thin provisioning, snapshots,
Some of the available open source storage solutions auto-tiering and self-healing capabilities.
Here’s a glance at some of the good open source RockStor: This is a free and open source NAS solution.
solutions available. The Personal Cloud Server present in it is a very powerful

54 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Admin

local alternative for public cloud storage, which mitigates


the cost and risks associated with public cloud storage.
This network attached and cloud storage platform is quite
suitable for small to medium businesses as well as home
users who do not have much IT experience but need to scale
up to terabytes of data storage. If users are more interested
in Linux and Btrfs, it is a great alternative to FreeNAS.
This cloud storage platform can be managed even within
a LAN or over the Web using a very simple and intuitive
user interface. And with the inclusion of add-ons (named
‘Rockons’), you can extend the feature set to include
different new applications, servers and services.
Kinetic Open Storage: Backed by different companies
like Seagate, EMC, Toshiba, Cisco, Red Hat, NetApp, Figure 4: Ten year Data centre revenue forecast (Image source: googleimages.com)
Dell, etc, Kinetic is a Linux Foundation project which is
dedicated to establishing standards for new kinds of object stored three times redundantly, by default. The network
storage architecture. It is designed especially to meet the can automatically determine a new farmer and can also
need for scale-out storage used for unstructured data. Kinetic move data if copies become unavailable. The system puts
is basically a way for storage applications to communicate different measures in place to prevent renters and farmers
directly with storage devices over the Ethernet. Most of the from cheating on each other—for instance, by manipulating
storage use cases targeted by Kinetic consist of unstructured the auditing process. Storj offers several advantages over
data like Hadoop, NoSQL and other distributed file systems, many traditional cloud based storage solutions. As data
as well as object stores in the cloud such as Amazon S3, present here is encrypted and cut into shards at the source,
Basho’s Riak and OpenStack Swift. there is almost no chance for any unauthorised third parties
Storj DriveShare and MetaDisk: Storj is a new to access the data. And because data storage is distributed,
type of cloud storage which is built on peer-to-peer and the availability and download speed increases.
blockchain technology. It offers decentralised and end-to-
end encrypted cloud storage. The DriveShare application
allows users to rent out all their unused hard drive space References
so that it can be used by the service. The MetaDisk Web
[1] http://www.wikipedia.org/
application present in it allows users to save all their files [2] http://novipro.com/
to the service securely. The core protocol helps in peer-to- [3] https://thesanguy.com/
peer negotiation and verification of the storage contracts. [4] https://www.techrepublic.com
Providers of the storage are usually referred to as ‘farmers’
and those using the storage are called ‘renters’. Renters can
By: Vivek Ratan
periodically audit in order to check if the farmers are still
The author has completed his B.Tech in electronics and
keeping their files secure and safe. Conversely, farmers can instrumentation engineering. He is currently working as
also decide to stop storing any specific file if its owners an automation test engineer at Infosys, Pune and as a
do not pay and audit their services on time. Different files freelance educator at LearnerKul, Pune. He can be reached at
ratanvivek14@gmail.com.
are cut up into smaller pieces called ‘shards’ and then are

The latest from the Open Source world is here.


THE COMPLETE MAGAZINE ON OPEN SOURCE
OpenSourceForU.com
Join the community at facebook.com/opensourceforu
Follow us on Twitter @OpenSourceForU

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 55


Admin Insight

Build Your Own Cloud


Storage System Using OSS
The real threats to stored data are breaches which, of late, have been affecting
many cloud service providers. Security vulnerabilities that enable breaches result
in a loss of millions of user credentials. In this article, we explore the prospects of
setting up a personal data store or even a private cloud.

T
he European Organisation for Nuclear Research Data storage infrastructure is broadly classified as
(CERN), a research collaboration of over 20 object-based, block storage and file systems, each with its
countries, has a unique problem—it has way more own set of features.
data than it is possible to store! We’re talking about
petabytes of data per year, where one petabyte equals a Object-based storage
million gigabytes. There are entire departments of scientists This construct manages data as objects instead of treating
working on a subject termed DAQ (Data Acquisition and it as a hierarchy of files or blocks. Each object is associated
Filtering), simply to filter out 95 per cent of the experiment- with a unique identifier and comprises not only the data but
generated data and store only the useful 5 per cent. In fact, also, in some cases, the metadata. This storage pattern seeks
it has been estimated that data in the digital universe will to enable capabilities such as application programmable
amount to 40 zettabytes by 2020, which is about 5,000 interfaces, data management such as replication at object-
gigabytes of data per person. scale, etc. It is often used to allow for the retention of
With the recent spate of breaches affecting cloud service massive amounts of data. Examples include the storage of
providers, setting up a personal data store or even a private photos, songs and files on a massive scale by Facebook,
cloud becomes an attractive prospect. Spotify and Dropbox, respectively.

56 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Admin

Figure 1: Hacked credentials Figure 3: Selecting the boot partition [Source: freenas.org]

Figure 2: Object storage, file systems and block storage [Source: ubuntu.com] Figure 4: FreeNAS GUI [Source: freenas.org]

Block storage example, we will look into the general steps involved in
Data is stored as a sequence of bytes, termed a physical deploying such a system by taking the case of a popular
record. This so called ‘block’ of data comprises a whole representative of the set.
number of records. The process of putting data into blocks is
termed as blocking, while the reverse is called deblocking. FreeNAS
Blocking is widely employed when storing data to certain With enterprise-grade features, richly supported plugins, and
types of magnetic tape, Flash memory and rotating media. an enterprise-ready ZFS file system, it is easy to see why
FreeNAS is one of the most popular operating systems in the
File systems market for data storage.
These data storage structures follow a hierarchy, which Let’s take a deeper look at file systems since they are
controls how data is stored and retrieved. In the absence widely used in setting up storage networks today. Building
of a file system, information would simply be a large your own data storage using FreeNAS involves following a
body of data with no way to isolate individual pieces of few of the following simple steps:
information from the whole. A file system encapsulates the 1. You will need to download the disk image suitable for
complete set of rules and logic used to manage sets of data. your architecture and burn it onto either a USB stick or a
File systems can be used on a variety of storage media, CD-ROM, as per your preference.
most commonly, hard disk drives (HDDs), magnetic tapes 2. Since you will be booting your new disk or machine with
and optical discs. FreeNAS, you will need to open the BIOS settings on
booting it, and set the boot preference to USB so that your
Building open source storage system first tries to boot from the USB and, if not found,
then from other attached media.
Software 3. Once you have created the storage media with the
Network Attached Storage (NAS) provides a stable and required software, you can boot up your system and install
widely employed alternative for data storage and sharing FreeNAS in the designated partition.
across a network. It provides centralised repository of 4. Having set the root password, when you boot into it after
data that can be accessed by different members within installation, you will have the option of using the Web
the organisation. Variations include providing complete GUI to log into the system. For some users, it might be
software and hardware packages serving as out-of-the-box much more intuitive to use this option as compared to the
alternatives. These include software and file systems such console-based login.
as Gluster, Ceph, NAS4Free, FreeNAS, and others. As an 5. Using the GUI or console, you can configure and manage

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 57


Admin Insight

Create an admin account

Username

Password

Storage & database

Data folder

/var/www/onwncloud/data
Figure 5: Configuring storage options [Source: freenas.org]
Configure the database

Only MySql/MariaDB is available,


Install and activate additional PHP
modules to choose other database
type. For more details check out
the documentation

Database User

Database Password

Database name

localhost

Figure 6: Setting up a private cloud [Source: owncloud.org] Finish setup

Image 8: Final configuration for ownCloud [Source: ittutorials.net]

2. While installing and running an Apache server on Linux,


the up-load_max_filesize and post_max_filesize flags need
to be updated to higher values than the default (2MB).
3. The system is required to have MySQL, PHP (5.4+),
Apache, GD and cURL installed before proceeding with
the ownCloud installation. Further, a database must be
created with privileges granted to a new user.
4. Once the system is set up, proceed with downloading the
ownCloud files and extract them to /var/www/ownCloud.
Figure 7: Editing the document root in the configuration files [Source: ittutorials.net] 5. Change the Apache virtual host to point to this ownCloud
directory by modifying the document root in /etc/apache2/
your storage options depending on your application(s). sites-available/000-default.conf to /var/www/ownCloud.
6. Finally, type in the IP address of the server in
Private cloud storage your browser and you should be able to arrive at
Another recent trend is cloud storage, given the sudden the login screen.
reduction in free cloud storage offered by providers like While there are trade-offs between cloud-based storage
Microsoft and Dropbox. Public clouds have multi-tenancy and traditional means of storage, the former is a highly
infrastructure and allow for great scalability and flexibility, flexible, simplified and secure model of data storage. And
abstracting away the complexities associated with deploying with the providers offering more control over deployments,
and maintaining hardware. For instance, the creators of private clouds may well be the main file storage
Gluster recently came out with an open source project options in the near future!
called Minio to provide this functionality to users. One
of the services we will look at is ownCloud, a Dropbox By: Swapneel Mehta
alternative, that offers similar functionality, along with the The author has worked with Microsoft Research, CERN and
advantage of being open source. startups in AI and cyber security. An open source enthusiast,
1. In order to build a private cloud, you require a server he enjoys spending his time organising software development
running an operating system such as Linux or Windows. workshops for school and college students. You can connect
with him at https://www.linkedin.com/in/swapneelm and find
ownCloud allows clients to be installed on such a out more at https://github.com/SwapneelM.
Linux server.

58 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Admin

A Quick Look at Cloonix, the


Network Simulator
Cloonix is a Linux router and host simulation platform. It fully encapsulates
applications, hosts and the network. Simulators like Cloonix offer students
and researchers scope for research into various Internet technologies like the
Domain Name System (DNS).

C
loonix is a network simulator based on KVM cloonix
or UML. It is basically a Linux router and host ├── allclean
simulation platform. You can simulate a network ├── build
with multiple reconfigurable VMs in a single PC. The VMs ├── cloonix
may be different Linux distributions. You can also monitor │ ├── client
the network’s activities through Wireshark. Cloonix can be │ ├── cloonix_cli
installed on Arch, CentOS, Debian, Fedora, OpenSUSE and │ ├── cloonix_config
their derivative distros. │ ├── cloonix_gui
The main features of Cloonix are: │ ├── cloonix_net
ƒ GUI based NS tool │ ├── cloonix_ocp
ƒ KVM based VM │ ├── cloonix_osh
ƒ VMs and clients are Linux based │ ├── cloonix_scp
ƒ Spice server is front-end for VMs │ ├── cloonix_ssh
ƒ Network activity monitoring by Wireshark │ ├── cloonix_zor
The system requirements are: │ ├── common
ƒ 32/64-bit Linux OS (tested on Ubuntu 16.04 64-bit) │ ├── id_rsa
ƒ Wireshark │ ├── id_rsa.pub
ƒ Cloonix package: http://cloonix.fr/source_stored/ │ ├── LICENCE
cloonix-37-01.tar.gz │ └── server
ƒ VM images: http://cloonix.fr/bulk_stored/ ├── doitall
To set it up, download the Cloonix package and ├── install_cloonix
extract it. I am assuming that Cloonix is extracted in the ├── install_depends
$HOME directory. ├── pack
The directory structure of Cloonix is as follows: └── README

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 59


Admin Insight

Figure 2: Dynamic DNS demo

Shown below are the demo scripts available:

├── batman
├── cisco
Figure 1: Ping simulation demo ├── dns
├── dyn_dns
5 directories, 19 files ├── eap_802_1x
├── ethereum
To install Cloonix, run the following commands, which ├── fwmark2mpls
will install all the packages required, except Wireshark. ├── mpls
├── mplsflow
$cd $HOME/cloonix ├── netem
$sudo ./install_depends build ├── ntp
├── olsr
The following command will install and configure ├── openvswitch
Cloonix in your system: ├── ospf
├── ping
$sudo ./doinstall ├── strongswan
└── unix2inet
The command given below will install Wireshark:
To run any demo for ping, for instance, just go to the ping
$sudo apt-get install wireshark directory and run the following code:

You have to download VMs in $HOME/cloonix_data/ $./ping.sh


bulk, as shown below:
This will create all the required VM(s) and network
bulk components. You can also monitor traffic by using Wireshark.
│ ├── batman.qcow2 Cloonix is a good tool to run network simulation. All
│ ├── bind9.qcow2 the VMs are basically Linux VMs, which you can easily
│ ├── centos-7.qcow2 reconfigure.
│ ├── coreos.qcow2
│ ├── ethereum.qcow2 References
│ ├── jessie.qcow2 [1] https://github.com/clownix/cloonix
│ ├── mpls.qcow2 [2] http://cloonix.fr/
│ ├── stretch.qcow2
│ └── zesty.qcow2 By: Kousik Maiti
The author is a senior technical officer at ICT & Services, CDAC,
To simulate the networks, you can download the Kolkata. He has over ten years of industry experience, and his areas
ready-to-demo scripts available in http://cloonix.fr/demo_ of interest include Linux system administration, mobile forensics
and Big Data. He can be reached at kousikster@gmail.com.
stored/v-37-01/cloonix_demo_all.tar.gz.

60 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Admin

The Best Tools for


Backing Up Enterprise Data
To prevent a disastrous loss of data, regular backups are not only recommended but are
de jure. From the many open source tools available in the market for this purpose, this
article helps systems administrators decide which one is best for their systems.

B
efore discussing the need for backup software, are increasingly adopting SSDs (solid-state drives) for main
some knowledge of the brief history of storage storage, but HDDs still remain the champions of low cost and
is recommended. In 1953, IBM recognised the very high capacity data storage.
importance and immediate application of what it called the The cost per GB of data has come down significantly over
‘random access file’. The company then went on to describe the years because of a number of innovations and advanced
this as having high capacity with rapid random access to techniques developed in manufacturing HDDs. The graph in
files. This led to the invention of what subsequently became Figure 1 gives a glimpse of this.
the hard disk drive. IBM’s San Jose, California laboratory The general assumption is that this cost will be reduced
invented the HDD. This disk drive created a new level in the further. Now, since storing data is not at all costly compared
computer data hierarchy, then termed random access storage to what it was in the 1970s and ‘80s, why should one take
but today known as secondary storage. backup of data when it so cheap to buy new storage. What are
The commercial use of hard disk drives began in 1957, the advantages of having backup of data?
with the shipment of an IBM 305 RAMAC system including Today, we are generating a lot of data by using various
IBM Model 350 disk storage, for which a US Patent No. gadgets like mobiles, tablets, laptops, handheld computers,
3,503,060 was issued on March 24, 1970. servers, etc. When we exceed the allowed storage capacity
The year 2016 marks the 60th anniversary of the in these devices, we tend to push this data to the cloud or
venerable hard disk drive (HDD). Nowadays, new computers take a backup to avoid any future disastrous events. Many

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 61


Admin Insight

0% 10% 20% 30% 40% 50% 60%


Hard Drive Cost per Gigabyte
1980-2009 Ceph RBD 38% 16% 7% 62%
$10,000,000.00
LVM (default) 13% 7% 2% 22%

$1,000,000.00 NetApp 9% 3% 12%

$100,000.00 NFS 5% 3% 2% 10%

$10,000.00 GlusterFS 5% 2% 9%

$1,000.00 VMWare VMDK 3% 6%

EMC 3% 5%
$100.00
SolidFire 4% 4%
$10.00
IBM Storwize 3% 3%
$1.00
Dell EqualLogic 3%

$0.10 HP 3PAR 2%

$0.01 HP LeftHand 2% Production

80 85 90 95 00
Dev/QA
05 10
19 19 19 19 20
7%
20 20
Other Black Storage Driver 2% 3% Proof of Concept

Figure 1: Hard drive costs per GB of data (Source: http://www.mkomo. Figure 2: Ceph adoption rate (Source: https://sanenthusiast.com/top-5-storage-
com/cost-per-gigabyte) data-center-tech-predictions-2016/)

corporates and enterprise level customers are generating huge This script helps in backing up Ceph pools. It was developed
volumes of data, and to have backups is critical for them. keeping in mind backing up of specified storage pools and
Backing up data is very important. After taking a not only individual images; it also allows retention of dates
backup, we have to also make sure that this data is secure, is and implements a synthetic full backup schedule if needed.
manageable and that the data’s integrity is not compromised. Many organisations are now moving towards large
Keeping in mind these aspects, many open source backup scale object storage and take backups regularly. Ceph is the
software have been developed over a period of years. ultimate solution, as it provides object storage management
Data backup comes in different flavours like individual along with state-of-art backup. It also provides integration
files and folders, whole drives or partitions, or full system into private cloud solutions like OpenStack, which helps one
backups. Nowadays, we also have the ‘smart’ method, which in managing backups of data in the cloud.
automatically backs up files in commonly used locations The Ceph script can also archive data, remove all the old
(syncing) and we have the option of using cloud storage. files and purge all snapshots. This triggers the creation of a
Backups can be scheduled, running as incremental, new, full and initial snapshot.
differential or full backups, as required. OpenStack has a built-in Ceph backup driver, which
For organisations and large enterprises that are planning is an intelligent solution for VM volume backup and
on selecting backup software tools and technologies, this maintenance. This helps in taking regular and incremental
article reviews the best open source tools. Before choosing backups of volumes to maintain consistency of data. Along
the best software or tool, users should evaluate the features with Ceph backup, one can use a tool called CloudBerry
they provide, with reference to stability and open source for versatile control over Ceph based backup and
community support. recovery mechanisms.
Advanced open source storage software like Ceph, Gluster, Ceph also has good support from the community and
ZFS and Lustre can be integrated with some of the popular from large organisations, many of which have adopted it for
backup tools like Bareos, Bacula, AMANDA and CloneZilla; storage and backup management and inturn contribute back
each of these is described in detail in the following section. to the community.
A lot of developments and enhancements are happening
Ceph on a continuous basis with Ceph. A number of research
Ceph is one of the leading choices in open source software organisations have predicted that Ceph’s adoption rate will
for storage and backup. Ceph provides object storage, block increase in the future. Ceph also has certain cost advantages
storage and file system storage features. It is very popular in comparison with other software products.
because of its CRUSH algorithm, which liberates storage More information about the Ceph RBD script can be
clusters from the scalability and performance limitations found at http://obsidiancreeper.com/2017/04/03/Updated-
imposed by centralised data table mapping. Ceph eliminates Ceph-Backup/.
many tedious tasks for administrators by replicating and
rebalancing data within the cluster, and delivers high Gluster
performance and infinite scalability. Red Hat’s Gluster is another open source software defined
Ceph also has RADOS (reliable autonomic distributed scale out, backup and storage solution. It is also called RGHS.
object store), which provides the earlier described object, It helps in managing unstructured data for physical, virtual
block and file system storage in singly unified storage and cloud environments. The advantages of Gluster software
clusters. The Ceph RBD backup script in the v0.1.1 release are its cost effectiveness and highly available storage that
of ceph_rbd_bck.sh creates the backup solution for Ceph. does not compromise on scale or performance.

62 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Admin

Cost difference Between Red Hat Gluster Storage and Competitive NAS Storage
System for 300TB Initial Procurement ($)

140,000
120,000
100,000
80,000
($)

60,000
40,000
20,000

Year 0 Year 1 Year 2 Year 3 Year 4 Year 5

Red Hat Gluster Storage


Competitive NAS Appliance
Source: IDC, 2016

Figure 3: Gluster Storage cost effectiveness (Source: https://redhatstorage.redhat.


com/2016/11/03/idc-the-economics-of-software-defined-storage/)

RGHS has a great feature called ‘snapshotting’, Figure 4: AMANDA architecture


which helps in taking ‘point-in-time’ copies of Red Hat
Gluster Storage server volumes. This helps administrators
in easily reverting back to previous states of data in case
of any mishap.
Some of the benefits of the snapshot feature are:
ƒ Allows file and volume restoration with a point-in-time
copy of Red Hat Gluster Storage volume(s)
ƒ Has little to no impact on the user or applications,
regardless of the size of the volume when
snapshots are taken
ƒ Supports up to 256 snapshots per volume, providing
flexibility in data backup to meet production environment
recovery point objectives
ƒ Creates a read-only volume that is a point-in-time copy of
the original volume, which users can use to recover files
ƒ Allows administrators to create scripts to take snapshots Figure 5: Bareos architecture
of a supported number of volumes in a scheduled fashion
ƒ Provides a restore feature that helps the administrator popular, enterprise grade open source backup and
return to any previous point-in-time copy recovery software. According to the disclosure
ƒ Allows the instant creation of a clone or a writable made by AMANDA, it runs on servers and desktop
snapshot, which is a space-efficient clone that systems containing Linux, UNIX, BSD, Mac OS X
shares the back-end logical volume manager (LVM) and MS Windows.
with the snapshot AMANDA comes as both an enterprise edition and
BareOS configured on GlusterFS has the advantage of an open source edition (though the latter may need some
being able to take incremental backups. One can create a customisation). The latest version of the AMANDA
‘glusterfind’ session to remember the time when it was last Enterprise version is release 3.3.5.
synched or when processing was completed. For example, It is one of the key backup software tools to be
your backup application (BareOS) can run every day and get implemented in government, databases, healthcare and
incremental results at each run. cloud based organisations across the globe.
More details on the RGHS snapshot feature can be found at AMANDA has a number of good features to tackle
https://www.redhat.com/cms/managed-files/st-gluster-storage- the explosive data growth and for high data availability. It
snapshot-technology-overview-inc0407879-201606-en.pdf. provides and helps in managing complex and expensive
backup and recovery software products.
The best open source backup software tools Some of its advantages and features are:
ƒ Centralised management for heterogeneous
AMANDA open source backup software environments (involving multiple OSs and platforms)
Amanda or Advanced Maryland Automatic Network ƒ Powerful protection with simple administration
Disk Archive (https://amanda.zmanda.com/) is a ƒ Wide platform and application support

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 63


Admin Insight

ƒ Industry standard open source support and data formats It works with public clouds like Amazon, Google Drive
ƒ Low cost of ownership and Rackspace, as well as private clouds and networked file
servers. Operating systems that it is compatible with include
Bareos (Backup Archiving Recovery Open Sourced) Windows, Linux and Mac OS X.
Bareos offers high data security and reliability along with
cross-network open source software for backups. Now FOG
being actively developed, it emerged from the Bacula Like Clonezilla, FOG is a disk imaging and cloning tool that
Project in 2010. can aid with both backup and deployment. It’s easy to use,
Bareos supports Linux/UNIX, Mac and Windows based supports networks of all sizes, and includes other features like
OS platforms, along with both a Web GUI and CLI. virus scanning, memory testing, disk wiping, disk testing and
file recovery. Operating systems compatible with it include
Clonezilla Linux and Windows.
Clonezilla is a partition and disk imaging/cloning program. It
is similar to many variants available in the market like Norton References
Ghost and True Image. It has features like bare metal backup
[1] To know more about the history of HDDs
recovery, and supports massive cloning with high efficiency https://www.pcworld.com/article/127105/article.html
in multi-cluster node environments. [2] http://clonezilla.org/
Clonezilla comes in two variants—Clonezilla Live and [3] https://amanda.zmanda.com/amanda-enterprise-edition.html
[4] http://ceph.com/ceph-storage/
Clonezilla SE (Server Edition). Clonezilla Live is suitable [5] http://www.mkomo.com/cost-per-gigabyte
for single machine backup and restore, and Clonezilla SE [6] https://en.wikipedia.org/wiki/History_of_hard_disk_drives
for massive deployment. The latter can clone many (40 plus)
computers simultaneously.
By: Shashidhar Soppin
The author is a senior architect with 16+ years of experience
Duplicati in the IT industry, and has expertise in virtualisation, cloud,
Designed to be used in a cloud computing environment, Docker, open source, ML, Deep Learning and open stack.
Duplicati is a client application for creating encrypted, He is part of the PES team at Wipro. You can contact him at
shashi.soppin@gmail.com.
incremental, compressed backups to be stored on a server.

64 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let’s Try Admin

How to Identify Fumbling to


Keep a Network Secure
Repeated systematic failed attempts by a host to access resources like a
URL, an IP address or an email address is known as fumbling. Erroneous
attempts to access resources by legitimate users must not be confused with
fumbling. Let’s look at how we can design an effective system for identifying
network fumbling, to help keep our networks secure.

N
etwork security implementation mainly depends on security of the target. It is the task of the security personnel to
exploratory data analysis (EDA) and visualisation. identify the pattern of the attack and the mistakes committed
EDA provides a mechanism to examine a data set to differentiate them from innocent errors. Let’s now discuss a
without preconceived assumptions about the data and its few examples to identify a fumbling condition.
behaviour. The behaviour of the Internet and the attackers is In a nutshell, fumbling is a type of Internet attack, which
dynamic and EDA is a continuous process to help identify all is characterised by failing to connect to one location with a
the phenomena that are cause for an alarm, and to help detect systematic attack from one or more locations. After a brief
anomalies in access to resources. discussion of this type of network intrusion, let’s consider a
Fumbling is a general term for repeated systematic problem of network data analysis using R, which is a good
failed attempts by a host to access resources. For example, choice as it provides powerful statistical data analysis tools
legitimate users of a service should have a valid email ID or together with a graphical visualisation opportunity for a better
user identification. So if there are numerous attempts by a understanding of the data.
user from a different location to target the users of this service
with different email identifications, then there is a chance that Fumbling of the network and services
this is an attack from that location. From the data analysis In case of TCP fumbling, a host fails to reach a target port
point of view, we say a fumbling condition has happened. of a host, whereas in the case of HTTP fumbling, hackers
This indicates that the user does not have access to that fail to access a target URL. All fumbling is not a network
system and is exploring different possibilities to break the attack, but most of the suspicious attacks appear as fumbling.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 65


Admin Let’s Try

The most common reason for fumbling is lookup failure easier than communication level fumbling, as in most of the
which happens mainly due to misaddressing, the movement cases exhaustive logs record each access and malfunction.
of the host or due to the non-existence of a resource. Other For example, HTTP returns three-digit status codes 4xx for
than this, an automated search of destination targets, and every client-side error. Among the different codes, 404 and
scanning of addresses and their ports are possible causes 401 are the most common for unavailability of resources and
of fumbling. Sometimes, to search a target host, automated unauthorised access, respectively. Most of the 404 errors
measures are taken to check whether the target is up and are innocuous, as they occur due to misconfiguration of the
running. These types of failed attempts are generally mistaken URL or the internal vulnerabilities of different services of
for network attacks, though lookup failure happens either the HTTP server. But if it is a 404 scanning, then it may be
due to misconfiguration of DNA, a faulty redirection on the malicious traffic and there may be a chance that attackers
Web server, or email with a wrong URL. Similarly, SMTP are trying to guess the object in order to reach the vulnerable
communication uses an automated network traffic control target. Web server authentication is really used by modern
scheme for its destination address search. Web servers. In case of discovering any log entry of an 401
The most serious cause of fumbling is repeated error, proper steps should be taken to remove the source from
scanning by attackers. Attackers scan the entire address- the server.
port combination matrix either in vertical or in horizontal Another common service level vulnerability comes from
directions. Generally, attackers explore horizontally, as they the mail service protocol, SMTP. When a host sends a mail
are most interested in exploring potential vulnerabilities. to a non-existent address, the server either rejects the mail or
Vertical search is basically a defensive approach to identify an bounces it back to the source. Sometimes it also directs the
attack on an open port address. As an alternative to scanning, mail to a catch-all account. In all these three cases, the routing
at times attackers use a hit-list to explore a vulnerable system. SMTP server keeps a record of the mail delivery status. But
For example, to identify SSH host, attackers may use a blind the main hurdle of identifying SMTP fumbling comes from
scan and then start a password attack. spam. It’s hard to differentiate SMTP fumbling from spam
as spammers send mail to every conceivable address. SMTP
Identifying fumbling fumblers also send mails to target addresses to verify whether
Identifying malicious fumbling is not a trivial task, as an address exists for possible scouting out of the target.
it requires demarcating innocuous fumbling from the
malevolent kind. Primarily, the task of assessing failed Designing a fumbling identification system
accesses to a resource is to identify whether the failure is From the above discussion, it is apparent that identifying
consistent or transient. To explore TCP fumbling, look into fumbling is more subjective than objective. Designing a
all TCP communication flags, payload size and packet count. fumbling identification and alarm system requires in-depth
In TCP communication, the client sends an ACK flag only knowledge of the network and its traffic pattern. There are
after receiving the SYN+ACK signal from the server. If there several network tools, but here we will cover some basic
is no ACK after a SYN from the server, then that indicates a system utilities so that readers can explore the infinite
fumbling. Another possible way to locate a malicious attack possibilities of designing network intrusion detection and
is to count the number of packets of a flow. A legitimate TCP prevention systems of their own.
flow requires at least three packets of overhead before it In order to separate malicious from innocuous fumbling,
considers transmitting data. Most retries require three to five the analyst should mark the targets to determine whether
packets, and TCP flows having five packets or less are likely the attackers are reaching the goal and exploring the target.
to be fumbles. This step reduces the bulk of data to a manageable state and
Since, during a failed connection, the host sends the same makes the task easier. After fixing the target, it is necessary
SYN packets options repeatedly, a ration of packet size and to examine the traffic to study the failure pattern. If it is TCP
packet number is also a good measure of identifying TCP fumbling, as mentioned earlier, this can be detected by finding
flow fumbling. traffic without the ACK flag. In case of an HTTP scanning,
ICMP informs a user about why a connection failed. It is examination of the HTTP server log table for 404 or 401 is
also possible to look into the ICMP response traffic to identify done to find out the malicious fumbling. Similarly, the SMTP
fumbling. If there is a sudden spike in messages originating server log helps us to find out doubtful emails to identify the
from a router, then there is a good chance that a target is attacking hosts.
probing the router’s network. A proper forensic investigation If a scouting happens to a dark space of a network, then
can identify a possible attacking host attacking host. the chance of malicious attack is high. Similarly, if a scanner
Since UDP does not follow TCP as a strict communication scans more than one port in a given time frame, the chance
protocol, the easiest way to identify UDP fumbling is by of intrusion is high. A malicious attack can be confirmed by
exploring network mapping and ICMP traffic. examining the conversation between the attacker and the
Identifying service level fumbling is comparatively target. Suspicious conversations can be subsequent transfers

66 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let’s Try Admin

of files or communication using odd ports. [1] 302 11


Some statistical techniques are also available to find the > names(rfi)
expected number of hosts of a target network that would be [1] “ccadmin” “pts.0” “X” “ipaddress” “Mon” “Oct”
explored by a user, or to compute the likelihood of a fumbling [7] “X30” X17.25” “X.1” “still” “logged.in”
attack test that could either pass or fail. > class(rfi)
[1] “data.frame”
Capturing TCP flags
In a UNIX environment, a de facto packet-capturing tool is To make the relevant column heading meaningful, the first
tcpdump. It is powerful as well as flexible. As a UNIX tool, and fourth column headings are changed to:
a powerful shell script is also applicable over the outputs of
tcpdump and can produce a filtered report as desired. The > colnames(rfi)[1]=’user’
underlying packet-capturing tool of tcpdump is libcap and > colnames(rfi)[4]=’ipaddress’
it provides the source, destination, IP address, port and IP
protocol over the target network interface for each network If we consider a few selective columns of the data frame,
protocol. For example, to capture TCP SYN packets over the as shown here:
eth0 interface, we can use the following command: > c = c(colnames(rfi)[1],colnames(rfi)[2],colnames(rfi)
[4],colnames(rfi)[5],colnames(rfi)[6],colnames(rfi)
$ tcpdump –i eth0 “tcp[tcpflags] & (tcp-syn) !=0” –nn –v [7],colnames(rfi)[8])

Similarly, TCP ACK packets can be captured by issuing …then the first ten rows can be displayed to have a view
the command given below: of the table structure, as shown in Figure 1.

$tcpdump –I eth0 “tcp[tcpflags] & (tcp-ack) != 0” –nn –v > x = rfi[, c,drop=F]


> head(x,10)
To have a combined capture report of SYN and ACK, user pts.0 ipaddress Mon Oct X30 X17.25
both the flags can be combined as follows: 1 root pts/1 172.16.7.226 Mon Oct 30 12:48
2 ccadmin pts/0 172.16.5.230 Mon Oct 30 12:30
$tcpdump –I eth0 “tcp[tcpflags] & (tcp | tcp-ack) != 0” –nn –v 3 ccadmin pts/0 172.16.5.230 Wed Oct 25 10:22
4 root pts/1 172.16.7.226 Tue Oct 24 11:54
Getting network information 5 ccadmin pts/0 172.16.5.230 Tue Oct 24 11:53
In this regard, netstat is a useful tool to get network 6 (unknown :0 :0 Thu Oct 12 12:57
connections, routing tables, interface statistics, masquerade 7 root pts/0 :0 Thu Oct 12 12:57
connections, and multi-cast memberships. It provides a
detailed view of the network to diagnose network problems. IP histogram

In our case, we can use this to identify ports that are listening.
For example, to know about connections of HTTP and 20

HTTPS traffic over TCP, we can use the following command


expression with -l (to report socket), -p (to report relevant
port) and –t (for only TCP) options.
15

$ netstat -tlp
Frequency

Data analysis 10

Now, let’s discuss a network data analysis example on netstat


command outcomes. This will help you to understand the
network traffic to carry out intrusion detection and prevention.
Let’s say we have a csv file from the netstat command, as 5

shown below:

> rfi <- read.csv(“rficsv.csv”,header=TRUE, sep=”,”)


0

…where the dimension, columns and object class are: 172.16.11.95 172.16.4.66 172.16.5.132 172.16.5.230 172.16.6.252 172.16.7.155
IP Address

> dim(rfi) Figure 1: Histogram of IP addresses of netstat

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 67


Admin Let’s Try

8 root :0 :0 Wed Oct 11 12:56 : 0 172.16.7.155: 6


9 (unknown :0 :0 Wed Oct 11 12:55 (unknown : 0 172.16.5.230: 3
10 reboot system 3.10.0-123.el7.x Thu Oct 12 12:37 backup_u : 0 172.16.11.95: 2
reboot : 0 172.16.4.66 : 1
The data shows that the data frame is not in a uniform root : 0 172.16.5.132: 1
table format and fields of records are separated by a tab (Other) : 0 (Other) : 0
character. This requires some amount of filtering of data in the and
table to extract relevant rows for further processing. Since I > count(u)
will be demonstrating the distribution of IP addresses within user ipaddress freq
a system, only the IP address and other related fields are kept 1 ccadmin 172.16.11.95 2
for histogram plotting. 2 ccadmin 172.16.4.66 1
To have a statistical evaluation of this data, it is worth 3 ccadmin 172.16.5.132 1
removing all the irrelevant fields from the data frame: 4 ccadmin 172.16.5.230 3
5 ccadmin 172.16.6.252 21
drops = c(colnames(rfi)[2],colnames(rfi)[3],colnames(rfi) 6 ccadmin 172.16.7.155 6
[5],colnames(rfi)[6],colnames(rfi)[7],colnames(rfi)
[8],colnames(rfi)[9],colnames(rfi)[10],colnames(rfi)[11]) For better visualisation, this frequency distribution of the
d = rfi[ , !(names(rfi) %in% drops)] IP address can be depicted using a histogram, as follows:

Then, for simplicity, extract all IP addresses attached to qplot(u$ipaddress,main=’IP histogram’,xlab=’BioMass of


the user ‘ccadmin’ which start with ‘172’. Leaves’,ylab=’Frequency’)

u = d[like(d$user,’ccadmin’) & like(d$ipaddress,’172’),]

Now the data is ready for analysis. The R summary By: Dipankar Ray
command will show the count of elements of each field, The author is a member of IEEE and IET, and has more than 20
whereas the count command will show the frequency years of experience in open source versions of UNIX operating
distribution of the IP address as shown below: systems and Sun Solaris. He is presently working on data
analysis and machine learning using a neural network as well
> summary(u) as on different statistical tools. He has also jointly authored a
textbook called ‘MATLAB for Engineering and Science’. He can
user ipaddress
be reached at dipankarray@ieee.org.
ccadmin :34 172.16.6.252:21

Read more stories on Components in


www.electronicsb2b.com
TOPCOMPONENTS STORIES
• The latest in power co
nverters
ELECTRONICS

ent distributors
• India’s leading compon
ry
onics components indust
• Growth of Indian electr INDUSTRY IS AT A
components for LEDs
• The latest launches of
ics
components for electron
• The latest launches of

Log on to www.electronicsb2b.com and be in touch with the Electronics B2B Fraternity 24x7

68 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let’s Try Admin

Use These Python Based


Tools for Secured Backup and
Recovery of Data
Python, the versatile programming environment, has a
variety of uses. This article will familiarise the reader with
a few Python based tools that can be used for secured
backup and recovery of data.

W
e keep data on portable hard disks, memory cards, many others. Python is a free and open source programming
USB Flash drives or other such similar media. language which is equipped with in-built features of system
Ensuring the long term preservation of this data programming, a high level programming environment and
with timely backup is very important. Many times, these network compatibility. In addition, the interfacing of Python
memory drives get corrupted because of malicious programs can be done with any channel, whether it is live streaming on
or viruses; so they should be protected by using secure backup social media or in real-time via satellite. A number of other
and recovery tools. programming languages have been developed, which have
been influenced by Python. These languages include Boo,
Popular tools for secured backup and recovery Cobra, Go, Goovy, Julia, OCaml, Swift, ECMAScript and
For secured backup and recovery of data, it is always CoffeeScript. There are other programming environments
preferable to use performance-aware software tools and with the base code and programming paradigm of Python
technologies, which can protect the data against any under development.
malicious or unauthenticated access. A few free and open Python is rich in maintaining the repository of
source software tools which can be used for secured backup packages for big applications and domains including
and recovery of data in multiple formats are: AMANDA, image processing, text mining, systems administration,
Bacula, Barcos, CloneZilla, Fog, Rsync, BURP, Duplicata, Web scraping, Big Data analysis, database applications,
BackupPC, Mondo Rescue, GRSync, Areca Backup, etc. automation tools, networking, video processing, satellite
imaging, multimedia and many others.
Python as a high performance programming
environment Python Package Index (PyPi): https://pypi.
Python is a widely used programming environment for almost python.org/pypi
every application domain including Big Data analytics, The Python Package Index (PyPi), which is also known
wireless networks, cloud computing, the Internet of Things as Cheese Shop, is the repository of Python packages
(IoT), security tools, parallel computing, machine learning, for different software modules and plugins developed as
knowledge discovery, deep learning, NoSQL databases and add-ons to Python. Till September 2017, there were more

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 69


Admin Let’s Try

than 117,000 packages for different functionalities and The rotation approach in Rotate-Backups can be customised
applications in PyPi. This escalated to 123,086 packages by as strict rotation (enforcement of the time window) or relaxed
November 30, 2017. rotation (no enforcement of time windows).
The table in Figure 1 gives the statistics fetched from After installation, there are two files ~/.rotate-backups.
ModuleCounts.com, which maintains data about modules, ini and /etc/rotate-backups.ini which are used by default.
plugins and software tools. This default setting can be changed using the command
line option --config.
Date Nov-24 Nov-25 Nov-26 Nov-27 Nov-28 Nov-29 Nov-30
The timeline and schedules of the backup can be specified
Packages
122,619 122,669 122,723 122,808 122,918 123,008 123,086
in PyPi on the configuration file as follows:
Figure 1: Statistics of modules and packages in PyPi in the last week of November
(Source: http://www.modulecounts.com/) # /etc/rotate-backups.ini:
[/backups/mylaptop]
Python based packages for secured hourly = 24
backup and recovery daily = 7
As Python has assorted tools and packages for diversified weekly = 4
applications, security and backup tools with tremendous monthly = 12
functionalities are also integrated in PyPi. Descriptions of yearly = always
Python based key tools that offer security and integrity during ionice = idle
backup follow. [/backups/myserver]
daily = 7 * 2
Rotate-Backups weekly = 4 * 2
Rotate-Backups is a simplified command line tool that is used monthly = 12 * 4
for backup rotation. It has multiple features including flexible yearly = always
rotations on particular timestamps and schedules. ionice = idle
The installation process is quite simple. Give the [/backups/myregion]
following command: daily = 7
weekly = 4
$ pip install rotate-backups monthly = 2
ionice = idle
The usage is as follows (the table at the bottom of this [/backups/myxbmc]
page lists the options): daily = 7
weekly = 4
$ rotate-backups [Options] monthly = 2

Option Description
-M, --minutely=COUNT Number of backups per minute
-H, --hourly=COUNT Number of hourly backups
-d, --daily=COUNT Number of daily backups
-w, --weekly=COUNT Number of weekly backups
-m, --monthly=COUNT Number of monthly backups
-y, --yearly=COUNT Number of yearly backups
-I, --include=PATTERN Matching the shell patterns
-x, --exclude=PATTERN No process of backups that match the shell pattern
-j, --parallel One backup at a time, no parallel backup
-p, --prefer-recent Ordering or preferences
-r, --relaxed Strict rotation with the time window for each rotation scheme
-i, --ionice=CLASS Input-output scheduling and priorities
-c, --config=PATH Configuration path
-u, --use-sudo Enabling the use of ‘sudo’
-n, --dry-run No changes, display the output
-v, --verbose Increase logging verbosity
-q, --quiet Decrease logging verbosity
-h, --help Messages and documentation

70 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let’s Try Admin

ionice = idle BorgBackup


BorgBackup (or just Borg, for short) refers to a
Bakthat deduplicating backup tool developed in Python, which
Bakthat is a command line tool with the functionalities of can be used in software frameworks or independently.
cloud based backups. It has excellent features to compress, It provides an effective way for secured backup and
encrypt and upload the files with a higher degree of recovery of data.
integrity and security. Bakthat has many features of data The key features of BorgBackup include the following:
backup with security, including compression with tarfiles, ƒ Space efficiency
encryption using BeeFish, uploading of data to S3 and ƒ Higher speed and minimum delays
Gracier, local backups to the SQLite database, sync based ƒ Data encryption using 256-bit AES
backups and many others. ƒ Dynamic compression
Installation is as follows: ƒ Off-site backups
ƒ Backups can be mounted as a file system
$ pip install bakthat ƒ Compatible with multiple platforms
The commands that need to be given in different
For source based installation, give the following commands: distributions to install BorgBackup are given below.

$ git clone https://github.com/tsileo/bakthat.git Distribution Command


$ cd bakthat Ubuntu sudo apt-get install borgbackup
$ sudo python setup.py install
Arch Linux pacman -S borg
Debian apt install borgbackup
For configuration with the options of security and cloud
setup, give the command: Gentoo emerge borgbackup
GNU Guix guix package --install borg
$ bakthat configure Fedora/RHEL dnf install borgbackup
cd /usr/ports/archivers/py-borgback-
Usage is as follows: FreeBSD
up && make install clean

$ bakthat backup mydirectory Mageia urpmi borgbackup


NetBSD pkg_add py-borgbackup
To set up a password, give the following command: OpenBSD pkg_add borgbackup
OpenIndiana pkg install borg
$ BAKTHAT_PASSWORD=mysecuritypassword bakthat mybackup
mydocument
openSUSE zypper in borgbackup
OS X brew cask install borgbackup
You can restore the backup as follows: Raspbian apt install borgbackup

$ bakthat restore mybackup


To initialise a new backup repository, use the
$ bakthat restore mybackup.tgz.enc following command:

For backing up a single file, type: $ borg init -e repokey /PathRepository

$ bakthat backup /home/mylocation/myfile.txt To create a backup archive, use the command given below:

To back up to Glacier on the cloud, type: $ borg create /PathRepository::Saturday1 ~/MyDocuments

$ bakthat backup myfile -d glacier For another backup with deduplication, use the
following code:
To disable the password prompt, give the following
command: $ borg create -v --stats /path/to/repo::Saturday2 ~/Documents
---------------------------------------------------------
$ bakthat mybackup mymyfile --prompt no Archive name: MyArchive

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 71


Admin Let’s Try

Archive fingerprint: 612b7c35c... MongoDB Backup:


Time (start): Sat, 2017-11-27 14:48:13
Time (end): Sat, 2017-11-27 14:48:14 $ mongodbbackup --help
Duration: 0.98 seconds
Number of files: 903 To take a backup of a single, standalone MongoDB
--------------------------------------------------------- instance, type:
Original size Compressed size Deduplicated size
This archive: 6.85 MB 6.85 MB 30.79 kB $ mongodbbackup -p <port> --primary-ok <Backup-Directory>
All archives: 13.69 MB 13.71 MB 6.88 MB
To take a backup of a cluster, config server and shards,
Unique chunks Total chunks use the following command:
Chunk index: 167 330
--------------------------------------------------------- $ mongodbbackup --ms-url <MongoS-URL> -p <port> <Backup-
Directory>
MongoDB Backup
In MongoDB NoSQL, the backup of databases and collections You can use any of these reliable packages available in
can be retrieved using MongoDB Backup without any issues Python to secure data and back it up, depending on the data
of size. The connection to Port 27017 of MongoDB can be that needs to be protected.
directly created for the backup of instances and clusters.
Installation is as follows: By: Dr Gaurav Kumar
The author is the MD of Magma Research and Consultancy
$ pip install mongodb-backup Pvt Ltd, Ambala. He delivers expert lectures and conducts
workshops on the latest technologies and tools. He can
The documentation and help files help keep track of be contacted at kumargaurav.in@gmail.com. His personal
website is www.gauravkumarindia.com.
the commands with the options that can be integrated with

Would You
Like More
DIY Circuits?

72 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let’s Try Admin

Encrypting Partitions
Using LUKS
Sensitive data needs total protection. And there’s no better way of protecting
your sensitive data than by encrypting it. This article is a tutorial on how to
encrypt your laptop or server partitions using LUKS.

S
ensitive data on mobile systems such as laptops can the following line will encrypt the /home partition:
get compromised if they get lost, but this risk can
be mitigated if the data is encrypted. Red Hat Linux # part /home --fstype=ext4 --size=10000 --onpart=vda2
supports partition encryption through the Linux Unified --encrypted --passphrase=PASSPHRASE
Key Setup (LUKS) on-disks-format technology. Encrypting
partitions is easiest during installation but LUKS can also be Note that the passphrase, PASSPHRASE is stored in the
configured post installation. Kickstart profile in plain text, so this profile must be secured.
Omitting the –passphrase = option will cause the installer to
Encryption during installation pause and ask for the passphrase during installation.
When carrying out an interactive installation, tick the Encrypt
checkbox while creating the partition to encrypt it. When this Encryption post installation
option is selected, the system will prompt users for a passphrase Listed below are the steps needed to create an encrypted
to be used for decrypting the partition. The passphrase needs to volume:
be manually entered every time the system boots. 1. Create either a physical disk partition or a new logical
When performing automated installations, Kickstart volume.
can create encrypted partitions. Use the --encrypted and 2. Encrypt the block device and designate a passphrase, by
--passphrase option to encrypt each partition. For example, using the following command:

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 73


Admin Let’s Try

Figure 2: The encrypted partition has been locked and verified


Figure 1: An encrypted partition with an ext4 file system
To boot a server with an encrypted volume unattended,
# cryptsetup luksFormat /dev/vdb1 a file must be created with a LUKS key that will unlock the
encrypted volume. This file must reside on an unencrypted
3. Unlock the encrypted volume and assign it a logical file system on the disk. Of course, this presents a security
volume, as follows: risk if the file system is on the same disk as the encrypted
volume, because theft of the disk would include the
# cryptsetup luksOpen /dev/vdb1 name key needed to unlock the encrypted volume. Typically,
the file with the key is stored on removable media such
4. Create a file system in the decrypted volume, using the as a USB drive.
following command: Here are the steps to be taken to configure a system to
persistently mount an encrypted volume without human
# mkfs -t ext4 /dev/mapper/name intervention.
1. First, locate or generate a key file. This is typically
As shown in Figure 1, the partition has been encrypted created with random data on the server and kept on a
and opened and, finally, a file system is associated with the separate storage device. The key file should take random
partition. input from /dev/urandom, and generate our output /root/
5. Create a mount point for the file system, mount it, and key.txt with a block size of 4096 bytes as a single count of
then access its contents as follows: random numbers.

#mkdir /secret # dd if=/dev/urandom of=/root/key.txt bs=4096 count=1


#mount /dev/mapper/name /secret
Make sure it is owned by the root user and the mode is
We can verify the mounted partition using the df -h 600, as follows:
command, as shown in Figure 2.
6. When finished, unmount the file system and then lock the # chmod 600 /root/key.txt
encrypted volume, as follows:
Add the key file for LUKS using the following command:
#umount /secret
# cryptsetup luksAddKey /dev/vda1 /root/key.txt

Note: The directory should be unmounted before


Provide the passphrase used to unlock the encrypted
closing the LUKS partition. After the partition has been
closed, it will be locked. This can be verified using the df -h volume when prompted.
command, as shown in Figure 2. 2. Create an /etc/crypttab entry for the volume. /etc/crypttab

# cryptsetup luksClose name

How to persistently mount encrypted partitions


If a LUKS partition is created during installation, normal
system operation prompts the user for the LUKS passphrase
at boot time. This is fine for a laptop, but not for servers that
may need to be able to reboot, unattended. Figure 3: A key file has been generated and added to the LUKS partition

74 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let’s Try Admin

command. All the steps are clearly described in Figure 5.

Note: The device listed in the first field of /etc/fstab


must match the name chosen for the local name to map in
/etc/crypttab. This is a common configuration error.

Attaching a key file to the desired slot


LUKS offers a total of eight key slots for encrypted devices
(0-7). If other keys or a passphrase exist, they can be used
to open the partition. We can check all the available slots by
using the luksDump command as shown below:
Figure 4: Decryption of a persistent encrypted partition using the key file
# cryptsetup luksDump /dev/vdb1
contains a list of devices to be unlocked during system boot.
As can be seen in Figure 5, Slot0 and Slot1 are enabled.
# echo “name /dev/vdb1 /root/key.txt” >> /etc/crypttab So the key file we have supplied manually, by default moves
to Slot1, which we can use for decrypting the partition. Slot0
…lists one device per line with the following space- carries the master key, which is supplied while creating the
separated fields: encrypted partition.
ƒ The device mapper used for the device Now we will add a key file to Slot3. For this, we have to
ƒ The underlying locked device generate a key file of random numbers by using the urandom
ƒ The absolute pathname to the password file used to unlock command, after which we will add it to Slot3 as shown below.
the device (if this field is left blank, or set to none, the The passphrase of the encrypted partition must be supplied in
user will be prompted for the encryption password during order to add any secondary key to the encrypted volume.
system boot)
3. Create an entry in /etc/fstab as shown below. After making # dd if=/dev/urandom of=/root/key2.txt bs=4096 count=1.
the entry in /etc/fstab, if we open the partition using the #cryptsetup luksAddKey /dev/vdb1 --key-slot 3 /root/key2.
key file, the command will be: txt.

# cryptsetup luksOpen /dev/vdb1 --key-file /root/key.txt name After adding the secondary key, again run the luksDump
command to verify whether the key file has been added to
As shown in the entry of the fstab file, if the device to Slot3 or not. As shown in Figure 7, the key file has been
be mounted is named, then the file system on which the added to Slot3, as Slot2 remains disabled and Slot3 has been
encrypted partition should be permanently mounted is in
the other entries. Also, no passphrase is asked for separately
now, as we have supplied the key file, which has already been
added to the partition. The partition can now be mounted
using the mount -a command, after which the mounted
partition can be verified upon reboot by using the df -h Figure 6: Secondary key file key2.txt has been added at Slot3

Figure 5: Available slots for an encrypted partition are shown Figure 7: Slot3 enabled successfully

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 75


Admin Let’s Try

Figure 8: Passphrase has been changed

enabled with the key file supplied. Now Slot3 can also be
used to decrypt the partition.
Figure 9: Decrypting a partition with the passphrase supplied initially
Restoring LUKS headers
For some commonly encountered LUKS issues, LUKS
header backups can mean the difference between a simple
administrative fix and permanently unrecoverable data.
Therefore, administrators of LUKS encrypted volumes should
engage in the good practice of routinely backing up their headers.
In addition, they should be familiar with the procedures for
restoring the headers from backup, should the need arise.

LUKS header backup


LUKS header backups are performed using the cryptsetup
tool in conjunction with the luksHeaderBackup sub- Figure 10: Header is restored from the backup file
command. The location of the header is specified with the
--header-backup-file option. So by using the command given the encrypted partition /dev/vdb1 using luksChangeKey, but
below we can create the backup of any LUKS header: the password is unknown. So the only option is to restore the
partition from the backup that we have created above, so that
# cryptsetup luksHeaderBackup /dev/vdb1 --header-backup-file / we can decrypt the partition from the previous passphrase.
root/back The backup also helps when admin forgets the passphrase.
In Figure 8, a backup of /dev/vdb1 has been taken
As with all systems administration tasks, LUKS header initially, and its passphrase has been subsequently changed by
backup should be done before every administrative task someone, without our knowledge.
performed on a LUKS-encrypted volume. Should the LUKS Before closing a partition, we have to unmount the
header be corrupted, LUKS stores a metadata header and locked partition. After closing the partition, trying to open the
key slots at the beginning of each encrypted device. Thus, partition by using the previously set passphrase will throw
corruption of the LUKS header can render the encrypted the error ‘No key available with this passphrase’, because the
data inaccessible. If a backup of the corrupted LUKS passphrase has been changed by someone (Figure 9).
header exists, the issue can be resolved by restoring the But as the backup has already been taken by us, we
header from this backup. just need to restore the LUKS header from the backup
file which was created earlier. As shown in Figure 10, the
Testing and recovering LUKS headers header has been restored.
If an encrypted volume’s LUKS header has been backed up, Now we can open the header with the passphrase that was
the backups can be restored to the volume to overcome issues set earlier. Therefore, it is always beneficial for administrators
such as forgotten passwords or corrupted headers. If multiple to create a backup of their header, so that they can restore
backups exist for an encrypted volume, an administrator it if somehow the existing header gets corrupted or a
needs to identify the proper one to restore. The header can be password is changed.
restored using the following command:
By: Kshitij Upadhyay
# cryptsetup luksHeaderRestore /dev/vdb1 --header-backup-file
/root/backup The author is RHCSA and RHCE certified, and loves to
write about new technologies. He can be reached at
upadhyayk04@gmail.com.
Now, let’s suppose someone has changed the password of

76 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


CODE
SPORT
In this month’s column, we continue our discussion on detecting duplicate
Sandya Mannarswamy

questions in community question answering forums.

B
ased on our readers’ requests to take up a Example 1
real life ML/NLP problem with a sufficiently Q1a: What are the ways of investing in the
large data set, we had started on the problem share market?
of detecting duplicate questions in community Q1b: What are the ways of investing in the share
question answering (CQA) forums using the Quora market in India?
Question Pair Dataset. One of the state-of-art tools available online
Let’s first define our task as follows: Given a for detecting semantic text similarity is SEMILAR
pair of questions <Q1, Q2>, the task is to identify (http://www.semanticsimilarity.org/). A freely
whether Q2 is a duplicate of Q1, in the sense that, available state-of-art tool for entailment recognition
will the informational needs expressed in Q1 satisfy is Excitement Open Platform or EOP (http://hlt-
the informational needs of Q2? In simpler terms, services4.fbk.eu/eop/index.php). SEMILAR gave
we can say that Q1 and Q2 are duplicates from a lay a semantic similarity score of 0.95 for the above
person’s perspective if both of them are asking the pair whereas EOP reported it as textual entailment.
same thing in different surface forms. However, these two questions have different
An alternative definition is to consider that information needs and hence they are not duplicates
Q1 and Q2 are duplicates if the answer to Q1 will of each other.
also provide the answer to Q2. However, we will Example 2
not consider the second definition since we are Q2a: In which year did McEnroe beat Becker,
concerned only with analysing the informational who went on to become the youngest winner of the
needs expressed in the questions themselves and Wimbledon finals?
have no access to answer text. Therefore, let’s define Q2b: In which year did Becker beat McEnroe
our task as a binary classification problem, where and go on to become the youngest winner in the
one of the two labels (duplicate or non-duplicate) finals at Wimbledon?
needs to be predicted for each given question pair, SEMILAR reported a similarity score of
with the restriction that only the question text is 0.972 and EOP marked this question pair as
available for the task and not answer text. entailment, indicating that Q2b is entailed from
As I pointed out in last month’s column, a Q2a. Again, these two questions are about entirely
number of NLP problems are closely related to two different events, and hence are not duplicates.
duplicate question detection. The general consensus We hypothesise that humans are quick to see the
is that duplicate question detection can be solved as difference by extracting the relations that are
a by-product by using these techniques themselves. being sought for in the two questions. In Q2a,
Detecting semantic text similarity and recognising the relational event is “<McEnroe (subject), beat
textual entailment are the closest in nature to that of (predicate), Becker (object)> whereas in Q2b,
duplicate question detection. However, given that the relational event is <Becker (subject), beat
the goal of each of these problems is distinct from (predicate), McEnroe (object)> which is a different
that of duplicate question detection, they fail to relation from that in Q2a. By quickly scanning for
solve the latter problem adequately. Let me illustrate a relational match/mismatch at the cross-sentence
this with a few example question pairs. level, humans quickly mark this as non-duplicate,

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 77


CodeSport Guest Column

even though there is considerable textual similarity across the but instead should consider the following:
text pair. It is also possible that the entailment system gets ƒ Do relations that exist in the first question hold true for
confused due to sub-classes being entailed across the two the second question?
questions (namely, the clause, “Becker went on to become ƒ Are there word level interactions across the two
youngest winner”). This lends weight to our claim that while questions which cause them to have different
semantic similarity matching and textual entailment are informational needs (even if the rest of the question is
closely related problems to the duplicate question detection pretty much identical across the two sentences)?
task, they cannot be used as solutions directly for the duplicate Now that we have a good idea of the requirements for a
detection problem. reasonable duplicate question detection system, let’s look at
There are subtle but important differences in the relations how we can start implementing this solution. For the sake of
of entities—cross-sentence word level interaction between simplicity, let us assume that our data set consists of single
two sentences which mark them as non-duplicates when sentence questions. Our system for duplicate detection first
examined by humans. We can hypothesise that humans needs to create a representation for each input sentence and
use these additional checks on top of the coarse grained then feed the representations for each of the two questions to
similarity comparison they do in their minds when they a classifier, which will decide whether they are duplicates or
look at these questions in isolation, and then arrive at not, by comparing the representations. The high-level block
the decision of whether they are duplicates or not. If we diagram of such a system is shown in Figure 1.
consider the example we discussed in Q2a and Q2b, the First, we need to create an input representation for
fact is that the relation between the entities in Question 2a each question sentence. We have a number of choices
does not hold good in Question 2b and, hence, if this cross- for this module. As is common in most neural network
sentence level semantic relations are checked, it would be based approaches, we use word embeddings to create a
possible to determine that this pair is not a duplicate. It is sentence representation. We can either use pre-trained
also important to note that not all mismatches are equally word embeddings such as Word2Vec embeddings/Glove
important. Let us consider another example. embeddings, or we can train our own word embeddings
Example 3 using the training data as our corpus. For each word in a
Q3a: Do omega-3 fatty acids, normally available as fish sentence, we look up its corresponding word embedding
oil supplements, help prevent cancer? vector and form the sentence matrix. Thus, each question
Q3b: Do omega-3 fatty acids help prevent cancer? (sentence) is represented by its sentence matrix (a matrix
Though Q3b does not mention the fact that omega-3 whose rows represent each word in the sentence and hence
fatty acids are typically available as fish oil supplements, its each row is the word-embedding vector for that word). We
information needs are satisfied by the answer to Q3a, and now need to convert the sentence-embedding matrix into a
hence these two questions are duplicates. From a human fixed length input representation vector.
perspective, we hypothesise that the word fragment “normally One of the popular ways of representing an input
available as fish oil supplements” is not seen as essential to sentence is by creating a sequence-to-sequence
the overall semantic compositional meaning of Q3a; so we can representation using recurrent neural networks. Given a
quickly discard this information when we refine the overall sequence of input words (this constitutes the sentence),
representation of the first question when doing a pass over we now pass this sequence through a recurrent neural
the second question. Also, we can hypothesise that humans network (RNN) and create an output sequence. While RNN
use cross-sentence word level interactions to quickly check generates an output for each input in the sequence, we are
whether similar information needs are being met in the two only interested in the final aggregated representation of
questions. the input sequence. Hence, we take the output of the last
Example 4 unit of the RNN and use it as our sentence representation.
Q4a: How old was Becker when he won the first time at We can use either vanilla RNNs, or gated recurrent units
Wimbledon? (GRU), or long short term memory (LSTM) units for
Q4b: What was Becker’s age when he was crowned as the creating a fixed length representation from a given input
youngest winner at Wimbledon? sequence. Given that LSTMs have been quite successfully
Though the surface forms of the two questions are quite used in many of the NLP tasks, we decided to use LSTMs
dissimilar, humans tend to compare cross-sentence word to create the fixed length representation of the question.
level interactions such as (<old, age>, <won, crowned>) The last stage output from each of the two LSTMs
in the context of the entity in question, namely, Becker to (one LSTM for each of the two questions) represents
conclude that these two questions are duplicates. Hence any the input question representation. We then feed the two
system which attempts to solve the task of duplicate question representations to a multi-layer perceptron (MLP) classifier.
detection should not depend blindly on a single aggregated An MLP classifier is nothing but a fully connected multi-
coarse-grained similarity measure to compare the sentences, layer feed forward neural network. Given that we have

78 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Guest Column CodeSport

Do-it-yourself
a two-class prediction problem, the last stage of the MLP
classifier is a two-unit softmax, the output of which gives
the probabilities for each of the two output classes. This is
shown in the overall block diagram in Figure 1.

Sentence
representation
(LSTM)
Input
Question 1 MLP Output
Question 2 Classifer

Sentence
representation
(LSTM)

Figure 1: Block diagram for duplicate question detection system

Given that we discussed the overall structure of our


implementation, I request
Fig. 7: Start Simulation our readers to implement this
window
using a deep learning library of their choice. I would
recommend using Tensorflow, PyTorch or Keras. We
will discuss the Tensorflow code for this problem in next
month’s column. Here are a few questions for our readers
to consider in their implementation:
ƒ How would you handle ‘out of vocabulary’ words in
the test data? Basically, if there are words which do not
have embeddings in either Word2vec/Glove or even in
the case of corpus-trained embedding, how would you
represent them?
Fig. 8: sim-Default window
ƒ Given that question sentences can be of different lengths,
how would you handle the variable length sentences?
ƒ On what basis would you decide how many hidden
layers should be present in the MLP classifier and the
number of hidden units in each layer?
I suggest that our readers (specifically those who have
just started exploring ML and NLP) can try implementing
the solution and share the results in a Python jupyter
notebook. Please do send me the pointer to your notebook
and we can discuss it later in this column.
If you have any favourite programming questions/
software topics wave
Fig. 9: Adding that to
youthewould
project like to discuss on this forum,
please send them to me, along with your solutions and
sumptions
feedback, into consideration for easing the
at sandyasm_AT_yahoo_DOT_com. operations
Wishing all
of the circuit. While data input
our readers a very happy and prosperous new year!pin and address pin
may have any value depending on the specifications of
memory used and your need, clock used in the circuit
is active high.
By: Sandya
EnableMannarswamy
pin triggers the circuit when it is active
high, and read operation is performed when read/
The author is an expert in systems software and is currently • www.lulu.com
write
working aspin is high,scientist
a research while write operation
at Conduent LabsisIndia
performed
whenXerox
(formerly read/write pin is active
India Research low.
Centre). Her interests include • www.magzter.com
compilers, programming languages, file systems and natural
Software
language processing. If you are preparing for systems • Createspace.com
software interviews, you may find it useful to visit Sandya’s
Verilog has been used for register-transfer logic cod- • www.readwhere.com
LinkedIn group ‘Computer Science Interview Training (India)‘
ing and verification. The bottom-up design has been
at http://www.linkedin.com/groups?home=&gid=2339182.

www.EFYmAg.com ElEctronics For You | April 2017 113


www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 79
Exploring Software Guest Column

Python is Still Special


Anil Seth

The author takes a good look at Python and discovers


that he is as partial to it after years of using it, as when
he first discovered it. He shares his reasons with readers.

I
t was in the year 2000 that I had first come across … return x*y
Python in the Linux Journal, a magazine that’s no ...
longer published. I read about it in a review titled >>> def op(fn,x,y):
‘Why Python’ by Eric Raymond. I had loved the idea … return fn(x,y)
of a language that enforced indentation for obvious …
reasons. It was a pain to keep requesting colleagues >>> op(add,4,5)
to indent the code. IDEs were primitive then—not 9
even as good as a simple text editor today. >>> op(prod,4,5)
However, one of Raymond’s statements that stayed 20
in my mind was, “I was generating working code >>>
nearly as fast as I could type.”
It is hard to explain but somehow the syntax of All too often, the method required is determined
Python offers minimal resistance! by the data. For example, a form-ID is used to call an
The significance of Python even today is underlined appropriate validation method. This, in turn, results in a
by the fact that Uber has just open sourced its AI tool set of conditional statements which obscure the code.
Pyro, which aims at ‘…deep universal probabilistic Consider the following illustration:
programming with Python and PyTorch (https://eng.
uber.com/pyro/).’ >>> def op2(fname,x,y):
Mozilla’s DeepSpeech open source speech … fn = eval(fname)
recognition model includes pre-built packages for … return fn(x,y)
Python (https://goo.gl/nxXz2Y). ...
>>> op2(‘add’,4,5)
Passing a function as a parameter 9
Years ago, after coding a number of forms, it was >>> op2(‘prod’,4,5)
obvious that handling user interface forms required 20
the same logic, except for validations. You could code >>>
a common validations routine, which used a form
identifier to execute the required code. However, as the The eval function allows you to convert a string
number of forms increased, it was obviously a messy into code. This eliminates the need for the conditional
solution. The ability to pass a function as a parameter in expressions discussed above.
Pascal, simplified the code a lot. Now, consider the following addition:
So, the fact that Python can do it as well is nothing
special. However, examine the simple example that >>> newfn =”””def div(x,y):
follows. There should be no difficulty in reading the … return x/y”””
code and understanding its intent. >>> exec(newfn)
>>> div(6,2)
>>> def add(x,y): 3
… return x+y >>> op(div,6,2)
... 3
>>> def prod(x,y): >>> op2(‘div’,6,2)

80 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Guest Column Exploring Software
internet of things
The system solution to miti- and mechanical probing.
gating something like this is to Main SPI Serial Flash Various secure key-storage devic-
Controller Storing Device
implement secure boot for the FW es provide system designers a host
main PLC CPU. This is a way of au- of features that range from package
thenticating the firmware and only design to external-sensor interfaces
3 On the importance of which programming languages
accepting software that has a valid and internal circuit architectures.
>>> Power-
to teach, Prof. Dijkstra wrote, “It is not only the violin that
digital signature. Depending on MAXQ1050 These requirements were developed
Management shapes the violinist; we are all shaped by the tools we train
the requirements, you could also IC by American military in the form of
In the example above, function has been added ourselves to use, and in this respect, programming languages
encrypt the firmware. FIPS 140 standard, and many chip
to the application at runtime. Again, the emphasis is have a devious influence: they shape our thinking habits.”
Security processing demands Fig. 6: Secure boot of the main PLC CPU vendors provide very comprehensive
not on the fact that this can be done, but consider the (http://chrisdone.com/posts/dijkstra-haskell-java)
can easily overwhelm the MIPS tamper-proof capabilities that can be
simplicity and readability of the code. A person does not It is not surprising that Python is widely used for AI. It is
of a traditional PLC CPU or even encryption key is of prime consid- used in ICSes.
have to even know Python to get the idea of what the easy to integrate Python with C/C++, and it has a wide range
create latency issues. This is best eration in many applications, since
code intends to do. of inbuilt libraries. ButThemost future ofeasy
of all, it is thetoIoT security
experiment
done by off-loading the security there is no security once the key
Prof. Dijkstra wrote, “If you want more effective with new ideas and explore prototypes in Python.
functions to a low-cost, off-the- is compromised. There may be other approaches to
programmers, you will discover that they should not It is definitely a language all programmers should
shelf secure microprocessor that is To properly address physical security as well, and as you begin to
waste their time debugging; they should notsecurity,
built for these functions, as shown
introduceseveral issues include
must be
in their toolkits.
realise how important security is in
the bugs6.toThe
in Fig. startsystem
with.” shown
(https://en.wikipedia.org/wiki/
here considered. These include a physical a connected factories environment,
Edsger_W._Dijkstra)
uses an external secure micropro- mechanism for generating By: Dr random
Anil Seth you will eventually coalesce around
Thistois validate
cessor where Python appears to do well.keys,
the firmware’s It appears to
a physical design that prevents a few approaches.
allow The author has earned the right to do what interests him. You
digitalyou to program fairly complex algorithms
signature. covertconcisely
electronic interception of a The IIoT in manufacturing
can find him online at http://sethanil.com, http://sethanil. is
and All
retain
thereadability. Hence, the
above examples uselikelihood key
of introducing
that is being communicated in him
highviademand, and is a growing
blogspot.com, and reach email at anil@sethanil.com.
bugs
keys totostart withauthentication,
enable is minimised. between authorised agents, and a trend. Security will also eventually
but this raises the question of key secure method of storing a key that grow to cover vulnerabilities, but
protection. Physical security of an protects against clandestine physical the need is already here.

50 March 2017 | ElEctronics For You www.EFYMag.coM


www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 81
Developers Insight

Machines
Learn in Many Different Ways
This article gives the reader a bird’s eye view of machine learning models, and solves
a use case through Sframes and Python.

‘D
ata is the new oil’—and this is not an empty single x and a single y), the form of the model is:
expression doing the rounds within the tech
industry. Nowadays, the strength of a company y = B0 + B1*x
is also measured by the amount of data it has. Facebook and
Google offer their services free in lieu of the vast amount Using this model, the price of a house can be predicted
of data they get from their users. These companies analyse based on the data available on nearby homes.
the data to extract useful information. For instance, Amazon Classification model: The classification model helps
keeps on suggesting products based on your buying trends, identify the sentiments of a particular post. For example, a
and Facebook always suggests friends and posts in which user review can be classified as positive or negative based on
you might be interested. Data in the raw form is like crude the words used in the comments. Given one or more inputs,
oil—you need to refine crude oil to make petrol and diesel. a classification model will try to predict the value of one or
Similarly, you need to process data to get useful insights and more outcomes. Outcomes are labels that can be applied to a
this is where machine learning comes handy. data set. Emails can be categorised as spam or not, based on
Machine learning has different models such as regression, these models.
classification, clustering and similarity, matrix factorisation, Clustering and similarity: This model helps when
deep learning, etc. In this article, I will briefly describe these we are trying to find similar objects. For example, if I am
models and also solve a use case using Python. interested in reading articles about football, this model will
Linear regression: Linear regression is studied as a search for documents with certain high-priority words and
model to understand the relationship between input and suggest articles about football. It will also find articles on
output numerical values. The representation is a linear Messi or Ronaldo as they are involved with football. TF-
equation that combines a specific set of input values (x), the IDF (term frequency - inverse term frequency) is used to
solution to which is the predicted output for that set of input evaluate this model.
values. It helps in estimating the values of the coefficients Deep learning: This is also known as deep structured
used in the representation with the data that we have learning or hierarchical learning. It is used for product
available. For example, in a simple regression problem (a recommendations and image comparison based on pixels.

82 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Developers

Now, let’s explore the concept of clustering and


similarity, and try to find out the documents of our
interest. Let’s assume that we want to read an article
on soccer. We like an article and would like to retrieve
another article that we may be interested in reading.
The question is how do we do this? In the market, there
are lots and lots of articles that we may or may not be
interested in. We have to think of a mechanism that
suggests articles that interest us. One of the ways is to
have a word count of the article, and suggest articles that
have the highest number of similar words. But there is
a problem with this model as the document length can
be excessive, and other unrelated documents can also
be fetched as they might have many similar words. For
example, articles on football players’ lives may also get
suggested, which we are not interested in. To solve this, Figure 1: The people data loaded in Sframes
the TF-IDF model comes in. In this model, the words are
prioritised to find the related articles.
Let’s get hands-on for the document retrieval. The first
thing you need to do is to install GraphLab Create, on which
Python commands can be run. GraphLab Create can be
downloaded from https://turi.com/ by filling in a simple
form, which asks for a few details such as your name, email
id, etc. GraphLab Create has the IPython notebook, which is
used to write the Python commands. The IPython notebook
is similar to any other notebook with the advantage that it
can display the graphs on its console.
Open the IPython notebook which runs in the browser Figure 2: Data generated for the Obama article
at http://localhost:8888/. Import GraphLab using the
Python command:

import graphlab

Next, import the data in Sframe using the following


command:

peoples = graphlab.SFrame(‘people_wiki.gl/’) .

To view the data, use the command:

peoples.head()

This displays the top few rows in the console.


The details of the data are the URL, the name of the Figure 3: Sorting the word count
people and the text from Wikipedia.
I will now list some of the Python commands that can following command:
be used to search for related articles on US ex-President
Barack Obama. obama_word_count_table = obama[[‘word_count’]].stack(‘word_
1. To explore the entry for Obama, use the command: count’, new_column_name = [‘word’,’count’])

obama = people[people[‘name’] == ‘Barack Obama’] 3. To sort the word counts to show the most common words
at the top, type:
2. Now, sort the word counts for the Obama article. To
turn the dictionary of word counts into a table, give the obama_word_count_table.head()

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 83


Developers Insight

Figure 4: Compute TF-IDF for the corpus


Figure 5: TF-IDF for the Obama article
4. Next, compute the TF-IDF for the corpus. To give more
weight to informative words, we evaluate them based on Words with the highest TF-IDF are much more
their TF-IDF scores, as follows: informative. The TF-IDF of the Obama article brings up
similar articles that are related to it, like Iraq, Control, etc.
people[‘word_count’] = graphlab.text_analytics.count_ Machine learning is not a new technology. It’s been
words(people[‘text’]) around for years but is gaining popularity only now as many
people.head() companies have started using it.

5. To examine the TF-IDF for the Obama article, give the


By: Ashish Sinha
following commands:
The author is a software engineer based in Bengaluru. A
software enthusiast at heart, he is passionate about using
obama = people[people[‘name’] == ‘Barack Obama’] open source technology and sharing it with the world.
obama[[‘tfidf’]].stack(‘tfidf’,new_column_ He can be reached at ashi.sinha.87@gmail.com. Twitter
handle: @sinha_tweet.
name=[‘word’,’tfidf’]).sort(‘tfidf’,ascending=False)

We love to hear from you


as Electronics Bazaar
consistently strives to make
its content informative
and interesting.

Please share your feedback/ thoughts/


views via email at myeb@efy.in

INDIA’S FIRST ELECTRONICS SOURCING MAGAZINE

84 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Developers

Regular Expressions in Programming


Languages: Java for You
This is the sixth and final part of a series of articles on regular expressions
in programming languages. In this article, we will discuss the use of regular
expressions in Java, a very powerful programming language.

J
ava is an object-oriented general-purpose about the history of Java by describing the different
programming language. Java applications are initially platforms and versions of Java. But here I am at a loss.
compiled to bytecode, which can then be run on a The availability of a large number of Java platforms and
Java virtual machine (JVM), independent of the underlying the complicated version numbering scheme followed by
computer architecture. According to Wikipedia, “A Java Sun Microsystems makes such a discussion difficult. For
virtual machine is an abstract computing machine that example, in order to explain terms like Java 2, Java SE, Core
enables a computer to run a Java program.” Don’t get Java, JDK, Java EE, etc, in detail, a series of articles might
confused with this complicated definition—just imagine that be required. Such a discussion about the history of Java
JVM acts as software capable of running Java bytecode. might be a worthy pursuit for another time but definitely not
JVM acts as an interpreter for Java bytecode. This is the for this article. So, all I am going to do is explain a few key
reason why Java is often called a compiled and interpreted points regarding various Java implementations.
language. The development of Java—initially called Oak— First of all, Java Card, Java ME (Micro Edition),
began in 1991 by James Gosling, Mike Sheridan and Patrick Java SE (Standard Edition) and Java EE (Enterprise
Naughton. The first public implementation of Java was Edition) are all different Java platforms that target
released as Java 1.0 in 1996 by different classes of devices and application domains. For
Sun Microsystems. Currently, example, Java SE is customised for general-purpose use
Oracle Corporation owns Sun on desktop PCs, servers and similar devices. Another
Microsystems. Unlike many important question that requires an answer is, ‘What is
other programming languages, the difference between Java SE and Java 2?’ Books like
Java has a mascot called Duke ‘Learn Java 2 in 48 Hours’ or ‘Learn Java SE in Two Days’
(shown in Figure 1). can confuse beginners a lot while making a choice. In a
As with previous articles nutshell, there is no difference between the two. All this
in this series I really wanted to confusion arises due to the complicated naming convention
begin with a brief discussion Figure 1: Duke – the mascot of Java followed by Sun Microsystems.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 85


Developers Insight

The December 1998 release of Java was called Java 2, and Now a Java class file called HelloWorld.class containing
the version name J2SE 1.2 was given to JDK 1.2 to distinguish the Java bytecode is created in the directory. The JVM can
it from the other platforms of Java. Again, J2SE 1.5 (JDK be invoked to execute this class file containing bytecode
1.5) was renamed J2SE 5.0 and later as Java SE 5, citing the with the command:
maturity of J2SE over the years as the reason for this name
change. The latest version of Java is Java SE 9, which was java HelloWorld.class
released in September 2017. But actually, when you say Java 9,
you mean JDK 1.9. So, keep in mind that Java SE was formerly The message ‘Hello World’ is displayed on the terminal.
known as Java 2 Platform, Standard Edition or J2SE. Figure 2 shows the execution and output of the Java program
The Java Development Kit (JDK) is an implementation HelloWorld.java. The program contains a special method
of one of the Java Platforms, Standard Edition, Enterprise named main( ), the starting point of this program, which
Edition, or Micro Edition in the form of a binary product. will be identified and executed by the JVM. Remember
The JDK includes the JVM and a few other tools like the that a method in an object oriented programming paradigm
compiler (javac), debugger (jdb), applet viewer, etc, which are is nothing but a function in a procedural programming
required for the development of Java applications and applets. paradigm. The main( ) method contains the following line of
The latest version of JDK is JDK 9.0.1 released in October code, which prints the message ‘Hello World’ on the terminal:
2017. OpenJDK is a free and open source implementation of
Java SE. The OpenJDK implementation is licensed under the ‘System.out.println(“Hello World”);’
GNU General Public License (GNU GPL). The Java Class
Library (JCL) is a set of dynamically loadable libraries that The program HelloWorld.java and all the other
Java applications can call at run time. JCL contains a number programs discussed in this article can be downloaded
of packages, and each of them contains a number of classes to from opensourceforu.com/article_source_code/
provide various functionalities. Some of the packages in JCL January18javaforyou.zip.
include java.lang, java.io, java.net, java.util, etc.

The ‘Hello World’ program in Java


Other than console based Java application programs, special
classes like the applet, servlet, swing, etc, are used to Figure 2: Hello World program in Java
develop Java programs to complete a variety of tasks. For
example, Java applets are programs that are embedded in Regular expressions in Java
other applications, typically in a Web page displayed in a Now coming down to business, let us discuss regular
browser. Regular expressions can be used in Java application expressions in Java. The first question to be answered
programs and programs based on other classes like the is ‘What flavour of regular expression is being used in
applet, swing, servlet, etc, without making any changes. Java?’ Well, Java uses PCRE (Perl Compatible Regular
Since there is no difference in the use of regular expressions, Expressions). So, all the regular expressions we have
all our discussions are based on simple Java application developed in the previous articles describing regular
programs. But before exploring Java programs using regular expressions in Python, Perl and PHP will work in Java
expressions let us build our muscles by executing a simple without any modifications, because Python, Perl and PHP also
‘Hello World’ program in Java. The code given below shows use the PCRE flavour of regular expressions.
the program HelloWorld.java. Since we have already covered much of the syntax of
PCRE in the previous articles on Python, Perl and PHP,
public class HelloWorld I am not going to reintroduce them here. But I would
{ like to point out a few minor differences between the
public static void main(String[ ] args) classic PCRE and the PCRE standard tailor-made for
{ Java. For example, the regular expressions in Java lack
System.out.println(“Hello World”); the embedded comment syntax available in programming
} languages like Perl. Another difference is regarding the
} quantifiers used in regular expressions in Java and other
PCRE based programming languages. Quantifiers allow
To execute the Java source file HelloWorld.java you to specify the number of occurrences of a character to
open a terminal in the same directory containing the file match against a string. Almost all the PCRE flavours have
and execute the command: a greedy quantifier and a reluctant quantifier. In addition
to these two, the regular expression syntax of Java has a
javac HelloWorld.java. possessive quantifier also.

86 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Developers

To differentiate between these three quantifiers, consider {


the string aaaaaa. The regular expression pattern ‘a+a’ Pattern pat = Pattern.compile(“Open Source”);
involves a greedy quantifier by default. This pattern will Matcher mat = pat.matcher(“Magazine Open Source For
result in a greedy match of the whole string aaaaaa because You”);
the pattern ‘a+’ will match only the string aaaaa. Now if(mat.matches( ))
consider the reluctant quantifier ‘a+?a’. This pattern will {
only result in a match for the string aa since the pattern System.out.println(“Match from “ + (mat.
‘a+?’ will only match the single character string a. Now let start( )+1) + “ to “ + (mat.end( )));
us see the effect of the Java specific possessive quantifier }
denoted by the pattern ‘a++a’. This pattern will not result else
in any match because the possessive quantifier behaves {
like a greedy quantifier, except that it is possessive. So, the System.out.println(“No Match Found”);
pattern ‘a++’ itself will possessively match the whole string }
aaaaaa, and the last character a in the regular expression }
pattern ‘a++a’ will not have a match. So, a possessive }
quantifier will match greedily and after a match it will never
give away a character. Open a terminal in the same directory containing the file
You can download and test the three example Java Regex1.java and execute the following commands to view
files Greedy.java, Reluctant.java and Possessive.java for the output:
a better understanding of these concepts. In Java, regular
expression processing is enabled with the help of the javac Regex1.java and Java Regex1
package java.util.regex. This package was included in the
Java Class Library (JCL) by J2SE 1.4 (JDK 1.4). So, if you You will be surprised to see the message ‘No Match Found’
are going to use regular expressions in Java, make sure displayed in the terminal. Let us analyse the code in detail to
that you have JDK 1.4 or later installed on your system. understand the reason for this output. The first line of code:
Execute the command:
‘import java.util.regex.*;’
java -version
…imports the classes Pattern and Matcher from the
… at the terminal to find the particular version of Java package java.util.regex. The line of code:
installed on your system. The later versions of Java have
fixed many bugs and added support for features like named ‘Pattern pat = Pattern.compile(“Open Source”);’
capture and Unicode based regular expression processing.
There are also some third party packages that support …generates the regular expression pattern with the help
regular expression processing in Java but our discussion of the method compile( ) provided by the Pattern class. The
strictly covers the classes offered by the package java.util. Pattern object thus generated is stored in the object pat. A
regex, which is standard and part of the JCL. The package PatternSyntaxException is thrown if the regular expression
java.util.regex offers two classes called Pattern and Matcher syntax is invalid. The line of code:
two classes called Pattern and Matcher that are used are
used jointly for regular expression processing. The Pattern ‘Matcher mat = pat.matcher(“Magazine Open Source For You”);’
class enables us to define a regular expression pattern. The
Matcher class helps us match a regular expression pattern …uses the matcher( ) method of Pattern class to generate
with the contents of a string. a Matcher object, because the Matcher class does not have a
constructor. The Matcher object thus generated is stored in the
Java programs using regular expressions object mat. The line of code:
Let us now execute and analyse a simple Java program using
regular expressions. The code given below shows the program ‘if(mat.matches( ))’
Regex1.java.
…uses the method matches( ) provided by the class Pattern
import java.util.regex.*; to perform a matching between the regular expression pattern
‘Open Source’ and the string ‘Magazine Open Source For You’.
class Regex1 The method matches( ) returns True if there is a match and
{ returns False if there is no match. But the important thing to
public static void main(String args[ ]) remember is that the method matches( ) returns True only if the

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 87


Developers Insight

pattern matches the whole string. In this case, the string ‘Open Regex3.java will display the message ‘Match from 10 to
Source’ is just a substring of the string ‘Magazine Open Source 20’ on the terminal. This is due to the fact that the substring
For You’ and since there is no match, the method matches( ) ‘Open Source’ appears from the 10th character to the 20th
returns False, and the if statement displays the message ‘No character in the string ‘Magazine Open Source For You’. The
Match Found’ on the terminal. method find( ) also returns True in case of a match and False
If you replace the line of code: in case if there is no match. The method find( ) can be used
repeatedly to find all the matching substrings present in a
‘Pattern pat = Pattern.compile(“Open Source”);’ string. Consider the program Regex4.java shown below.

…with the line of code: import java.util.regex.*;

‘Pattern pat = Pattern.compile(“Magazine Open Source For class Regex4


You”);’ {
public static void main(String args[])
…then you will get a match and the matches( ) method {
will return True. The file with this modification Regex2.java is Pattern pat = Pattern.compile(“abc”);
also available for download. The line of code: String str = “abcdabcdabcd”;
Matcher mat = pat.matcher(str);
‘System.out.println(“Match from “ + (mat.start( )+1) + “ to “ while(mat.find( ))
+ (mat.end( )));’ {
System.out.println(“Match from “ + (mat.
…uses two methods provided by the Matcher class, start( )+1) + “ to “ + (mat.end( )));
start( ) and end( ). The method start( ) returns the start index }
of the previous match and the method end( ) returns the }
offset after the last character matched. So, the output of the }
program Regex2.java will be ‘Match from 1 to 28’.
Figure 3 shows the output of Regex1.java and Regex2. In this case, the method find( ) will search the whole string
java. An important point to remember is that the indexing and find matches at positions starting at the first, fifth and
starts at 0 and that is the reason why 1 is added to the value ninth characters. The line of code:
returned by the method start( ) as (mat.start( )+1). Since the
method end( ) returns the index immediately after the last ‘String str = “abcdabcdabcd”;’
matched character, nothing needs to be done there.
The matches( ) method of Pattern class with this sort of a …is used to store the string to be searched, and in the line
comparison is almost useless. But many other useful methods of code:
are provided by the class Matcher to carry out different
types of comparisons. The method find( ) provided by the ‘Matcher mat = pat.matcher(str);’
class Matcher is useful if you want to find a substring match.
…this string is used by the method matcher( ) for further
processing. Figure 4 shows the output of the programs
Regex3.java and Regex4.java.
Now, what if you want the matched string displayed
instead of the index at which a match is obtained. Well, then
you have to use the method group( ) provided by the class
Figure 3: Output of Regex1.java and Regex2.java

Replace the line of code:

‘if(mat.matches( ))’

…in Regex1.java with the line of code: Figure 4: Output of Regex3.java and Regex4.java

‘if(mat.find( ))’ Matcher. Consider the program Regex5.java shown below:

…to obtain the program Regex3.java. On execution, import java.util.regex.*;

88 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Developers

class Regex5
{
public static void main(String args[])
{
Pattern pat = Pattern.compile(“S.*r”);
String str = “Sachin Tendulkar Hits a Sixer”;
Matcher mat = pat.matcher(str); Figure 5: Output of Regex5.java and Regex6.java
int i=1;
while(mat.find( )) this series, C++ and JavaScript, use a style known as the
{ ECMAScript regular expression style. The articles in this
System.out.println(“Matched String “ + i + series were never intended to describe the complexities of
“ : “ + mat.group( )); intricate regular expressions in detail. Instead, I tried to focus
i++; on the different flavours of regular expressions and how
} they can be used in various programming languages. Any
} decent textbook on regular expressions will give a language-
} agnostic discussion of regular expressions but we were more
worried about the actual execution of regular expressions in
On execution, the program regex5.java displays the programming languages.
message ‘Matched String 1 : Sachin Tendulkar Hits a Sixer’ Before concluding this series, I would like to go over
on the terminal. What is the reason for matching the whole the important takeaways. First, always remember the fact
string? Because the pattern ‘S.*r’ searches for a string that there are many different regular expression flavours.
starting with S, followed by zero or more occurrences of any The differences between many of them are subtle, yet they
character, and finally ending with an r. Since the pattern ‘.*’ can cause havoc if used indiscreetly. Second, the style of
results in a greedy match, the whole string is matched. regular expression used in a programming language depends
Now replace the line of code: on the flavour of the regular expression implemented by the
language’s regular expression engine. Due to this reason, a
‘Pattern pat = Pattern.compile(“S.*r”);’ single programming language may support multiple regular
expression styles with the help of different regular expression
…in Regex5.java with the line: engines and library functions. Third, the way different
languages support regular expressions is different. In some
‘Pattern pat = Pattern.compile(“S.*?r”);’ languages the support for regular expressions is part of the
language core. An example for such a language is Perl. In
…to get Regex6.java. What will be the output of Regex6. some other languages the regular expressions are supported
java? Since this is the last article of this series on regular with the help of library functions. C++ is a programming
expressions, I request you to try your best to find the answer language in which regular expressions are implemented using
before proceeding any further. Figure 5 shows the output of library functions. Due to this, all the versions and standards
Regex5.java and Regex6.java. But what is the reason for the of some programming languages may not support the use
output shown by Regex6.java? Again, I request you to ponder of regular expressions. For example, in C++, the support
over the problem for some time and find out the answer. If for regular expressions starts with the C++11 standard.
you don’t get the answer, download the file Regex6.java For the same reason, the different versions of a particular
from the link shown earlier, and in that file I have given the programming language itself might support different regular
explanation as a comment. expression styles. You must be very careful about these
So, with that example, let us wind up our discussion important points while developing programs using regular
about regular expressions in Java. Java is a very powerful expressions to avoid dangerous pitfalls.
programming language and the effective use of regular So, finally, we are at the end of a long journey of learning
expressions will make it even more powerful. The basic stuff regular expressions. But an even longer and far more exciting
discussed here will definitely kick-start your journey towards journey of practising and developing regular expressions lies
the efficient use of regular expressions in Java. And now it is ahead. Good luck!
time to say farewell.
In this series we have discussed regular expression By: Deepu Benson
processing in six different programming languages. Four of The author is a free software enthusiast whose area of interest
these—Python, Perl, PHP and Java—use a regular expression is theoretical computer science. He maintains a technical
blog at www.computingforbeginners.blogspot.in and can be
style called PCRE (Perl Compatible Regular Expressions).
reached at deepumb@hotmail.com.
The other two programming languages we discussed in

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 89


Developers How To

Explore
Data Using R
As of August 2017, Twitter had 328 million active users, with 500 million tweets
being sent every day. Let’s look at how the open source R programming
language can be used to analyse the tremendous amount of data created by
this very popular social media tool.

S
ocial networking websites are ideal sources of Big Start your exploration
Data, which has many applications in the real world. Exploring Twitter data using R requires some preparation.
These sites contain both structured and unstructured First, you need to have a Twitter account. Using that account,
data, and are perfect platforms for data mining and subsequent register an application into your Twitter account from https://
knowledge discovery from the source. Twitter is a popular apps.twitter.com/ site. The registration process requires basic
source of text data for data mining. Huge volumes of Twitter personal information and produces four keys for R application
data contain many varieties of topics, which can be analysed and Twitter application connectivity. For example, an
to study the trends of different current subjects, like market application myapptwitterR1 may be created as shown in
economics or a wide variety of social issues. Accessing Figure 1.
Twitter data is easy as open APIs are available to transfer and In turn, this will create your application settings, as shown
arrange data in JSON and ATOM formats. in Figure 2.
In this article, we will look at an R programming A customer key, a customer secret, an access token
implementation for Twitter data analysis and visualisation. This and the access token secret combination forms the final
will give readers an idea of how to use R to analyse Big Data. authentication using the setup_twitter_oauth() function.
As a micro blogging network for the exchange and sharing
of short public messages, Twitter provides a rich repository >setup_twitter_oauth(consumerKey, consumerSecret,AccessToken,
of different hyperlinks, multimedia and hashtags, depicting AccessTokenSecret)
the contemporary social scenario in a geolocation. From the
originating tweets and the responses to them, as well as the It is also necessary to create an object to save
retweets by other users, it is possible to implement opinion the authentication for future use. This is done by
mining over a subject of interest in a geopolitical location. By OAuthFactory$new() as follows:
analysing the favourite counts and the information about the
popularity of users in their followers’ count, it is also possible credential<- OAuthFactory$new(consumerKey, consumerSecret,
to make a weighted statistical analysis of the data. requestURL, accessURL,authURL)

90 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Developers

Application Settings

40
Consumer Key (API Key)HTgXiD3kqncGM93bxlBczTfhR
Consumer Secret (API Secret)

30
djgP2zhAWKbGAgiEd4R6DXujipXRq1aTSdoD9yaHSA8q97G8O

Frequency
Access LevelRead and write (modify app permissions)
Owner dpnkray

20
Owner ID 1371497528
and access tokens:
Access Token

10
1371497582-xD5GxHnkpg8z6k0XqpnJZ3XvIyc1vVJGUsDXNWZ
Access Token Secret
Qm9tV2XvlOcwbrL2z4QktA3azydtgIYPqflZglJ3D4WQ3

0
15:00 17:30 20:00 22:30 25:00 27:30

Figure 1: Twitter application settings created-time

Here, requestURL, accessURL and authURL are available Figure 2: Histogram of created time tag
from the application setting of https://apps.twitter.com/.
connectivity object:
Connect to Twitter
This exercise requires R to have a few packages for calling >cred<- OAuthFactory$new(consumerKey,consumerSecret,requestUR
all Twitter related functions. Here is an R script to start the L,accessURL,authURL)
Twitter data analysis task. To access the Twitter data through >cred$handshake(cainfo=”cacert.pem”)
the just created application myapptwitterR, one needs to call
twitter, ROAuth and modest packages. Authentication to a Twitter application is done by the
function setup_twitter_oauth() with the stored key values as:
>setwd(‘d:\\r\\twitter’)
>setup_twitter_oauth(consumerKey, consumerSecret,AccessToken,
>install.packages(“twitteR”) AccessTokenSecret)
>install.packages(“ROAuth”)
>install.packages(“modest”) With all this done successfully, we are ready to access
Twitter data. As an example of data analysis, let us consider
>library(“twitteR”) the simple problem of opinion mining.
>library(“ROAuth”)
>library(“httr”) Data analysis
To demonstrate how data analysis is done, let’s get some
To test this on the MS Windows platform, load Curl into data from Twitter. The Twitter package provides the function
the current workspace, as follows: searchTwitter() to retrieve a tweet based on the keywords
searched for. Twitter organises tweets using hashtags. With the
>download.file (url=”http://curl.haxx.se/ca/cacert. help of a hashtag, you can expose your message to an audience
pem”,destfile=”cacert.pem”) interested in only some specific subject. If the hashtag is a
popular keyword related to your business, it can act to increase
Before the final connectivity to the Twitter application, your brand’s awareness levels. The use of popular hashtags
save all the necessary key values to suitable variables: helps one to get noticed. Analysis of hashtag appearances in
tweets or Instagram can reveal different trends of what the
>consumerKey=’HTgXiD3kqncGM93bxlBczTfhR’ people are thinking about the hashtag keyword. So this can be
>consumerSecret=’djgP2zhAWKbGAgiEd4R6DXujipXRq1aTSdoD9yaHSA8 a good starting point to decide your business strategy.
q97G8Oe’ To demonstrate hashtag analysis using R, here, we have
>requestURL=’https://api.twitter.com/oauth/request_token’, picked up the number one hashtag keyword #love for the
>accessURL=’https://api.twitter.com/oauth/access_token’, study. Other than this search keyword, the searchTwitter()
>authURL=’https://api.twitter.com/oauth/authorize’) function also requires the maximum number of tweets that the
function call will return from the tweets. For this discussion,
With these preparations, one can now create the required let us consider the maximum number as 500. Depending upon

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 91


Developers How To

the speed of your Internet and the traffic on the Twitter server, .. getLanguage, getLatitude, getLocation, getLongitude,
you will get an R list class object responses within a few getProfileImageURL,
minutes and an R list class object. .. getReplyToSID, getReplyToSN, getReplyToUID,
getRetweetCount,
>tweetList<- searchTwitter(“#love”,n=500) .. getRetweeted, getRetweeters, getRetweets,
>mode(tweetList) getScreenName, getStatusSource,
[1] “list” .. getText, getTruncated, getUrls, initialize, setCreated,
>length(tweetList) setFavoriteCount,
[1] 500 .. setFavorited, setId, setIsRetweet, setLanguage,
setLatitude, setLocation,
In R, an object list is a compound data structure and .. setLongitude, setProfileImageURL, setReplyToSID,
contains all types of R objects, including itself. For further setReplyToSN,
analysis, it is necessary to investigate its structure. Since it .. setReplyToUID, setRetweetCount, setRetweeted,
is an object of 500 list items, the structure of the first item is setScreenName,
sufficient to understand the schema of the set of records. .. setStatusSource, setText, setTruncated, setUrls,
toDataFrame,
>str(head(tweetList,1)) .. toDataFrame#twitterObj
List of 1 >
$ :Reference class ‘status’ [package “twitteR”] with 20 fields
..$ text : chr “https://t.co/L8dGustBQX #SavOne #LLOVE The structure shows that there are 20 fields of each list
#GotItWrong #JCole #Drake #Love #F4F #follow #follow4follow item, and the fields contain information and data related
#Repost #followback” to the tweets.
..$ favorited : logi FALSE Since the data frame is the most efficient structure for
..$ favoriteCount :num 0 processing records, it is now necessary to convert each list
..$ replyToSN : chr(0) item to the data frame and bind these row-by-row into a single
..$ created : POSIXct[1:1], format: “2017-10-04 frame. This can be done in an elegant way using the do.call()
06:11:03” function call, as shown here:
..$ truncated : logi FALSE
..$ replyToSID : chr(0) loveDF<- do.call(“rbind”,lapply(tweetList, as.data.frame))
..$ id : chr “915459228004892672”
..$ replyToUID : chr(0) Function lapply() will first convert each list to a data frame,
..$ statusSource :chr “<a href=\”http://twitter.com\” then do.call() will bind these, one by one. Now we have a set of
rel=\”nofollow\”>Twitter Web Client</a>” records with 19 fields (one less than the list!) in a regular format
..$ screenName : chr “Lezzardman” ready for analysis. Here, we shall mainly consider ‘created’ field
..$ retweetCount : num 0 to study the distribution pattern of arrival of tweets.
..$ isRetweet : logi FALSE
..$ retweeted : logi FALSE >length(head(loveDF,1))
..$ longitude : chr(0) [1] 19
..$ latitude : chr(0) >str(head(lovetDF,1))
..$ location :chr “Bay Area, CA, #CLGWORLDWIDE <ed><U+00A0> ‘data.frame’ : 1 obs. of 19 variables:
<U+00BD><ed><U+00B2><U+00AF>” $ text : chr “https://t.co/L8dGustBQX
..$ language : chr “en” #SavOne #LLOVE #GotItWrong #JCole #Drake #Love #F4F #follow
..$profileImageURL:chrhttp://pbs.twimg.com/profile_ #follow4follow #Repost #followback”
images/444325116407603200/XmZ92DvB_normal.jpeg” $ favorited : logi FALSE
..$ urls :’data.frame’: 1 obs. of 5 $ favoriteCount : num 0
variables: $ replyToSN : chr NA
.. ..$ url : chr “https://t.co/L8dGustBQX” $ created : POSIXct, format: “2017-10-04
.. ..$ expanded_url: chr “http://cdbaby.com/cd/savone” 06:11:03”
.. ..$ display_url :chr “cdbaby.com/cd/savone” $ truncated : logi FALSE
.. ..$ start_index :num 0 $ replyToSID : chr NA
.. ..$ stop_index :num 23 $ id : chr “915459228004892672”
..and 59 methods, of which 45 are possibly relevant: $ replyToUID : chr NA
.. getCreated, getFavoriteCount, getFavorited, getId, $ statusSource : chr “<a href=\”http://twitter.com\”
getIsRetweet, rel=\”nofollow\”>Twitter Web Client</a>”

92 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Developers

1.0
300

0.8
Frequency

Frequency
200

0.6
100

0.4
0

0.2
0 5 10 15
2 4 6 8 10 12 14
created-time
Cumulative-time-interval

Figure 3: Histogram of ordered created time tag Figure 4: Cumulative frequency distribution

$ screenName : chr “Lezzardman” If we want to study the pattern of how the word ‘love’
$ retweetCount : num 0 appears in the data set, we can take the differences of
$ isRetweet : logi FALSE consecutive time elements of the vector ‘created’. R function
$ retweeted : logi FALSE diff() can do this. It returns iterative lagged differences of
$ longitude : chr NA the elements of an integer vector. In this case, we need lag
$ latitude : chr NA and iteration variables as one. To have a time series from the
$ location : chr “Bay Area, CA, #CLGWORLDWIDE ‘created’ vector, it first needs to be converted to an integer;
<ed><U+00A0><U+00BD><ed><U+00B2><U+00AF>” here, we have done it before creating the series, as follows:
$ language : chr “en”
$ profileImageURL: chr “http://pbs.twimg.com/profile_ >detach(loveDF)
images/444325116407603200/XmZ92DvB_normal.jpeg” >sortloveDF<-loveDF[order(as.integer(created)),]
> >attach(sortloveDF)
The fifth column field is ‘created’; we shall try to >hist(as.integer(abs(diff(created)))
explore the different statistical characteristics features of
this field. This distribution shows that the majority of tweets
in this group come within the first few seconds and a
>attach(loveDF) # attach the frame for further much smaller number of tweets arrive in subsequent time
processing. intervals. From the distribution, it’s apparent that the
>head(loveDF[‘created’],2) # first 2 record set items for arrival time distribution follows a Poisson Distribution
demo. pattern, and it is now possible to model the number of
created times an event occurs in a given time interval.
1 2017-10-04 06:11:03 Let’s check the cumulative distribution pattern, and the
2 2017-10-04 06:10:55 number of tweets arriving within a time interval. For this
we have to write a short R function to get the cumulative
Twitter follows the Coordinated Universal Time tag as values within each interval. Here is the demo script and
the time-stamp to record the tweet’s time of creation. This the graph plot:
helps to maintain a normalised time frame for all records,
and it becomes easy to draw a frequency histogram of the countarrival<- function(created)
‘created’ time tag. {
i=1
>hist(created,breaks=15,freq=TRUE,main=”Histogram of s <- seq(1,15,1)
created time tag”) for(t in seq(1,15,1))

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 93


Developers How To

{ of arrival probabilities. The pattern in Figure 5 shows


s[i] <- sum((as.integer(abs(diff(created))))<t)/500 a cumulative Poisson Distribution, and can be used to
i=i+1 model the number of events occurring within a given time
} interval. The X-axis contains one-second time intervals.
return(s) Since this is a cumulative probability plot, the likelihood
} of the next tweet arriving corresponds to the X-axis value
or less than that. For instance, since 4 on the X-axis
To create a cumulative value of the arriving tweets within approximately corresponds to 60 per cent on the Y-axis, the
a given interval, countarrival() uses sum() function over diff() next tweet will arrive in 4 seconds or less than that time
function after converting the values into an integer. interval. In conclusion, we can say that all the events are
mutually independent and occur at a known and constant
>s <-countarrival(created) rate per unit time interval.
>x<-seq(1,15,1) This data analysis and visualisation shows that
>y<-s the arrival pattern is random and follows the Poisson
>lo<- loess(y~x) Distribution. The reader may test the arrival pattern with a
>plot(x,y) different keyword too.
>lines(predict(lo), col=’red’, lwd=2)
By: Dipankar Ray
# sum((as.integer(abs(diff(created))))<t)/500
The author is a member of IEEE and IET, with more than
20 years of experience in open source versions of UNIX
To have a smooth time series curve, the loess() function operating systems and Sun Solaris. He is presently
has been used with the predict() function. Predicted values working on data analysis and machine learning using a
based on the linear regression model, as provided by loess(), neural network and different statistical tools. He has also
jointly authored a textbook called ‘MATLAB for Engineering
are plotted along with the x-y frequency values. and Science’. He can be reached at dipankarray@ieee.org.
This is a classic example of probability distribution

94 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Developers

Using Jenkins to Create a Pipeline


for Android Applications
This article is a tutorial on how to create a pipeline to perform code analysis
using lint and create a APK file for Android applications.

C
ontinuous integration is a practice that requires Kohsuke Kawaguchi in 2004, and is an automation server
developers to integrate code into a shared repository that helps to speed up different DevOps implementation
such as GitHub, GitLab, SVN, etc, at regular practices such as continuous integration, continuous
intervals. This concept was meant to avoid the hassle of testing, continuous delivery, continuous deployment,
later finding problems in the build life cycle. Continuous continuous notifications, orchestration using a build
integration requires developers to have frequent builds. The pipeline or Pipeline as a Code.
common practice is that whenever a code commit occurs, Jenkins helps to manage different application lifecycle
a build should be triggered. However, sometimes the build management activities. Users can map continuous integration
process is also scheduled in a way that too many builds with build, unit test execution and static code analysis;
are avoided. Jenkins is one of the most popular continuous continuous testing with functional testing, load testing and
integration tools. security testing; continuous delivery and deployment with
Jenkins was known as a continuous integration server automated deployment into different environments, and so on.
earlier. However, the Jenkins 2.0 announcement made it Jenkins provides easier ways to configure DevOps practices.
clear that, going forward, the focus would not only be on The Jenkins package has two release lines:
continuous integration but on continuous delivery too. ƒ LTS (long term support): Releases are selected every 12
Hence, ‘automation server’ is the term used more often weeks from the stream of regular releases, ensuring a
after Jenkins 2.0 was released. It was initially developed by stable release.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 95


Developers How To

ƒ Weekly: A new release is available every week to fix bugs configure authentication using Active Directory, Jenkins’
and provide features to the community. own user database, and LDAP. You can also configure
LTS and weekly releases are available in different authorisation using Matrix-based security or the project-
flavours such as .war files (Jenkins is written in Java), based Matrix authorisation strategy.
native packages for the operating systems, installers and ƒ To configure environment variables (such as ANDROID_
Docker containers. HOME), tool locations, SonarQube servers, Jenkins
The current LTS version is Jenkins 2.73.3. This version location, Quality Gates - Sonarqube, e-mail notification,
comes with a very useful option, called Deploy to Azure. and so on, go to Jenkins Dashboard > Manage Jenkins >
Yes, we can deploy Jenkins to the Microsoft Azure public Configure System.
cloud within minutes. Of course, you need a Microsoft ƒ To configure Git, JDK, Gradle, and so on, go to
Azure subscription to utilise this option. Jenkins can be Jenkins Dashboard > Manage Jenkins > Global Tool
installed and used in Docker, FreeBSD, Gentoo, Mac OS X, Configuration.
OpenBSD, openSUSE, Red Hat/Fedora/CentOS, Ubuntu/
Debian and Windows. Creating a pipeline for Android applications
The features of Jenkins are: We have the following prerequisites:
ƒ Support for SCM tools such as Git, Subversion, Star ƒ Sample the Android application on GitHub, GitLab, SVN
Team, CVS, AccuRev, etc. or file systems.
ƒ Extensible architecture using plugins: The plugins ƒ Download the Gradle installation package or configure it
available are for Android development, iOS development, to install automatically from Jenkins Dashboard.
.NET development, Ruby development, library plugins, ƒ Download the Android SDK.
source code management, build tools, build triggers, ƒ Install plugins in Jenkins such as the Gradle plugin, the
build notifiers, build reports, UI plugins, authentication Android Lint plugin, the Build Pipeline plugin, etc.
and user management, etc. Now, let’s look at how to create a pipeline using the Build
ƒ It has the ‘Pipelines as a Code’ feature, which uses a Pipeline plugin so we can achieve the following tasks:
domain-specific language (DSL) to create a pipeline to ƒ Perform code analysis for Android application code using
manage the application’s lifecycle. Android Lint.
ƒ The master agent architecture supports distributed builds. ƒ Create an APK file.
To install Jenkins, the minimum hardware requirements
are 256MB of RAM and 1GB of drive space. The
recommended hardware configuration for a small team is
1GB+ of RAM and 50 GB+ of drive space. You need to
have Java 8 - Java Runtime Environment (JRE) or a Java
Development Kit (JDK).
The easiest way to run Jenkins is to download and run its
latest stable WAR file version. Download the jenkins.war file,
go to that directory and execute the following command:

java -jar jenkins.war.

Next, go to http://<localhost|IP address>:8080 and wait


until the ‘Unlock Jenkins’ page appears. Follow the wizard Figure 1: Global tool configuration
instructions and install the plugins after providing proxy
details (if you are configuring Jenkins behind a proxy).

Configuration
ƒ To install plugins, go to Jenkins Dashboard > Manage
Jenkins > Manage Plugins. Verify the updates as well as
the available and the installed tabs. For the HTTP proxy
configuration, go to the Advanced tab.
ƒ To manually upload plugins, go to Jenkins Dashboard
> Manage Jenkins > Manage Plugins > Advanced >
Upload Plugin.
ƒ To configure security, go to Jenkins Dashboard >
Manage Jenkins > Configure Global Security. You can Figure 2: Gradle installation

96 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Developers

Figure 6: Publish Android Lint results

Figure 3: ANDROID_HOME environment variable In an Android project, the main part is Gradle, which
is used to build our source code and download all the
necessary dependencies required for the project. In the
name field, users can enter Gradle with their version for
better readability. The next field is Gradle Home, which is
the same as the environment variable in our system. Copy
your path to Gradle and paste it here. There is one more
option, ‘Install automatically’, which installs Gradle’s latest
version if the user does not have it.
Configuring the ANDROID_HOME environment
variable: The next step is to configure the SDK for
the Android project that contains all the platform tools
and other tools also.
Here, the user has to follow a path to the SDK file,
Figure 4: Source code management which is present in the system.
The path in Jenkins is Home> Configuration >SDK.
Creating a Freestyle project to perform Lint analysis
for the Android application: The basic setup is ready; so
let’s start our project. The first step is to enter a proper name
(AndroidApp-CA) to your project. Then select ‘Freestyle
project’ under Category, and click on OK. Your project file
structure is ready to use.
The user can customise all the configuration steps to
show a neat and clean function. As shown in Figure 4, in a
general configuration, the ‘Discard old build’ option discards
all your old builds and keeps the number of the build at
whatever the user wants. The path in Jenkins is Home>
#your_project# > General Setting.
Figure 5: Lint configuration In the last step, we configure Git as version control to
pull the latest code for the Build Pipeline. Select the Git
ƒ Create a pipeline so that the first code analysis is option and provide the repository’s URL and its credentials.
performed and on its successful implementation, execute Users can also mention from which branch they want to take
another build job to create an APK file. the code, and as shown in Figure 5, the ‘Master’ branch is
Now, let’s perform each step in sequence. applied to it. Then click the Apply and Save button to save
Configuring Git, Java, and Gradle: To execute the all your configuration steps.
build pipeline, it is necessary to take code from a shared The next step is to add Gradle to the build, as well
repository. As shown below, Git is configured to go further. as add Lint to do static code analysis. Lint is a tool that
The same configuration is applied to all version control that performs code analysis for Android applications, just as
is to be set up. The path in Jenkins is Home > Global tool Sonarqube does in Java applications. To add the Lint task to
configuration > Version control / Git. the configuration, the user has to write Lint options in the

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 97


Developers How To

Figure 10: Build Pipeline flow

Figure 7: Gradle build

Figure 11: Successful execution of the Build Pipeline

Figure 8: Archive Artifact Archive the build artefacts such as JAR, WAR, APK or IPA
files so that they can be downloaded later. Click on Save.
After executing all the jobs, the pipeline can be
pictorially represented by using the Build Pipeline plugin.
After installing that plugin, users have to give the start,
middle and end points to show the build jobs in that
sequence. They can configure upstream and downstream
jobs to build the pipeline.
To show all the build jobs, click on the ‘+’ sign on the right
hand side top of the screen. Select build pipeline view on the
screen that comes up after clicking on this sign. Configuring
a build pipeline view can be decided on by the user, as per
requirements. Select AndroidApp-CA as the initial job.
There are multiple options like the Trigger Option,
Figure 9: Downstream job Display Option, Pipeline Flow, etc.
As configured earlier, the pipeline starts by clicking
build.gradle file in the Android project. on the Run button and is refreshed periodically. Upstream
The Android Lint plugin offers a feature to examine and downstream, the job execution will take place
the XML output produced by the Android Lint tool and as per the configuration.
provides the results on the build page for analysis. It does After completing all the processes, you can see the
not run Lint, but Lint results in XML format must be visualisation shown in Figure 11. Green colour indicates
generated and available in the workspace. the successful execution of a pipeline whereas red
Creating a Freestyle project to build the APK file for indicates an unsuccessful build.
the Android application: After completing the analysis of
the code, the next step is to build the Android project and
create an APK file to execute further. By: Bhagyashri Jain
Creating a Freestyle project with the name The author is a systems engineer and loves Android
development. She likes to read and share daily news on her
AndroidApp-APK: In the build actions, select Invoke
blog at http://bjlittlethings.wordpress.com.
Gradle script.

98 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Overview Developers

Demystifying Blockchains
A blockchain is a continuously growing list of records, called blocks, which are
linked and secured using cryptography to ensure data security.

Hence, a blockchain is nearly impossible to tamper


with without anyone noticing.

Demystifying blockchains
A blockchain, in itself, is a distributed ledger and an
interconnected chain of individual blocks of data, where each
block can be a transaction, or a group of transactions.
In order to explain the concepts of the blockchain, let’s
look at a code example in JavaScript. The link to the GitHub
repository can be found at https://github.com/abhiit89/Ang
Coins. So do check the GitHub repo and go through the
‘README’ as it contains the instructions on how to run the
code locally.

D
Block: A block in a blockchain is a combination of the
ata security is of paramount importance to transaction data along with the hash of the previous block.
corporations. Enterprises need to establish high For example:
levels of trust and offer guarantees on the security
of the data being shared with them while interacting with class Block {
other enterprises. The major concern of any enterprise about constructor(blockId, dateTimeStamp, transactionData,
data security is data integrity. What many in the enterprise previousTransactionHash) {
domain worry about is, “Is my data accurate?” this.blockId = blockId;
Data integrity ensures that the data is accurate, untampered this.dateTimeStamp = dateTimeStamp;
with and consistent across the life cycle of any transaction. this.transactionData = transactionData;
Enterprises share data like invoices, orders, etc. The integrity of this.previousTransactionHash =
this data is the pillar on which their businesses are built. previousTransactionHash;
this.currentTransactionHash = this.
Blockchain calculateBlockDigest();
A blockchain is a distributed public ledger of transactions that }
no person or company owns or controls. Instead, every user
can access the entire blockchain, and every transaction from The definition of the block, inside a blockchain, is
any account to any other account, as it is recorded in a secure presented in the above example. It consists of the data
and verifiable form using algorithms of cryptography. In (which includes blockId, dateTimeStamp, transactionData,
short, a blockchain ensures data integrity. previousTransactionHash, nonce), the hash of the data
A blockchain provides data integrity due to its unique and (currentTransactionHash) and the hash of the previous
significant features. Some of these are listed below. transaction data.
Timeless validation for a transaction: Each transaction Genesis block: A genesis block is the first block to be
in a blockchain has a signature digest attached to it which created at the beginning of the blockchain. For example:
depends on all the previous transactions, without the
expiration date. Due to this, each transaction can be validated new Block(0, new Date().getTime().valueOf(), ‘First Block’,
at any point in time by anyone without the risk of the data ‘0’);
being altered or tampered with.
Highly scalable and portable: A blockchain is a Adding a block to the blockchain
decentralised ledger distributed across the globe, and it In order to add blocks or transactions to the blockchain, we
ensures very high availability and resilience against disaster. have to create a new block with a set of transactions, and add
Tamper-proof: A blockchain uses asymmetric or elliptic it to the blockchain as explained in the code example below:
curve cryptography under the hood. Besides, each transaction
gets added to the blockchain only after validation, and addNewTransactionBlockToTransactionChain(currentBlock) {
each transaction also depends on the previous transaction. currentBlock.previousTransactionHash = this.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 99


Developers Overview

returnLatestBlock().currentTransactionHash; Another way is to not only change the data but also
currentBlock.currentTransactionHash = currentBlock. update the hash. Even then the current implementation
calculateBlockDigest(); can invalidate it. The code for it is available in the branch
this.transactionChain.push(currentBlock); https://github.com/abhiit89/AngCoins/tree/tampering_data_
} with_updated_hash.

In the above code example, we calculate the hash of the Proof of work
previous transaction and the hash of the current transaction With the current implementation, it is still possible that
before pushing the new block to the blockchain. We also someone can spam the blockchain by changing the data in
validate the new block before adding it to the blockchain one block and updating the hash in all the following blocks
using the method described below. in the blockchain. In order to prevent that, the concept of the
‘proof of work’ suggests a difficulty or condition that each
Validating the blockchain block that is generated has to meet before getting added to the
Each block needs to be validated before it gets added to the blockchain. This difficulty prevents very frequent generation
blockchain. The validation we used in our implementation is of the block, as the hashing algorithm used to generate the
described below: block is not under the control of the person creating the
block. In this way, it becomes a game of hit and miss to try to
isBlockChainValid() { generate the block that meets the required conditions.
for (let blockCount = 1; blockCount < this. For our implementation, we have set the difficult task that
transactionChain.length; blockCount++) { each block generated must have two ‘00’ in the beginning of
const currentBlockInBlockChain = this. the hash, in order to be added to the blockchain. For example,
transactionChain[blockCount]; we can modify the function to add a new block to include the
const previousBlockInBlockChain = this. difficult task, given as below:
transactionChain[blockCount - 1];
if (currentBlockInBlockChain. addNewTransactionBlockToTransactionChain(currentBlock) {
currentTransactionHash !== currentBlockInBlockChain. currentBlock.previousTransactionHash = this.
calculateBlockDigest()) { returnLatestBlock().currentTransactionHash;
return false; currentBlock.mineNewBlock(this.difficulty);
} this.transactionChain.push(currentBlock);
}
if (currentBlockInBlockChain.
previousTransactionHash !== previousBlockInBlockChain. This calls the mining function (which validates the difficult
currentTransactionHash) { conditions):
return false;
} mineNewBlock(difficulty) {
while(this.currentTransactionHash.substring(0,
} difficulty) !== Array(difficulty + 1).join(‘0’)) {
this.nonce++;
return true; this.currentTransactionHash = this.
} calculateBlockDigest();
}
In this implementation, there are a lot of features missing console.log(‘New Block Mined --> ‘ + this.
as of now, like validation of the funds, the rollback feature currentTransactionHash);
in case the newly added block corrupts the blockchain, }
etc. If anyone is interested in tackling fund validation, the
rollback or any other issue they find, please go to my GitHub The complete code for this implementation can be seen
repository, create an issue and the fix for it, and send me a in the branch https://github.com/abhiit89/AngCoins/tree/
pull request or just fork the repository and use it whichever block_chain_mining.
way this code suits your requirements.
A point to be noted here is that in this implementation, Blockchain providers
there are numerous ways to tamper with the blockchain. One Blockchain technology, with its unprecedented way of
way is to tamper with the data alone. The implementation managing trust and data and of executing procedures,
for that is done in the branch https://github.com/abhiit89/ can transform businesses. Here are some open source
AngCoins/tree/tampering_data. blockchain platforms.
Continued on page 103...
100 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com
Let’s Try Developers

Get Familiar with the Basics of R


This article tells readers how to get their systems ready for R—how to
install it and how to use a few basic commands.

R
is an open source programming viewer and the tab for visualisation.
language and environment for data Clicking on the Packages tab in Section 3
analysis and visualisation, and is will list all the packages available in R Studio,
widely used by statisticians and analysts. as shown in Figure 6.
It is a GNU package written mostly in C, Using R is very straightforward. On the
Fortran and R itself. console area, type ‘2 + 2’ and you will get ‘4’
as the output. Refer to Figure 7.
Installing R The R console supports all the basic math
Installing R is very easy. Navigate the browser to www.r- operations; so one can think of it as a calculator. You can try
project.org and click on CRAN in the Download section to do more calculations on the console.
(Figure 1). Creating a variable is very straightforward too. To assign
This will open the CRAN mirrors. Select the appropriate ‘2’ to variable ‘x’, use the following different ways:
mirror and it will take you to the Download section, as
shown in Figure 2. > x <- 2
Grab the version which is appropriate for your system OR
and install R. After the installation, you can see the R icon > x = 2
on the menu/desktop, as seen in Figure 3. OR
You can start using R by double-clicking on the icon, but > assign(“x”,2)
there is a better way available. You can install the R Studio, OR
which is an IDE (integrated development environment)— > x <- y <- 2
this makes things very easy. It’s a free and open source
integrated environment for R. One can see that there is no concept of data type
Download R Studio from https://www.rstudio.com/ declaration. The data type is assumed according to the value
products/rstudio/. Use the open source edition, which is free assigned to the variable.
to use. Once installed, open R Studio by double-clicking on As we assign the value, we can also see the Environment
its icon, which will look like what’s shown in Figure 4. panel display the variable and value, as shown in Figure 8.
The default screen of R Studio is divided into three A rm command is used to remove the variable.
sections, as shown in Figure 5. The section marked ‘1’ is R supports basic data types to find the type of data in
the main console window where we will execute the R variable use class functions, as shown below:
commands. Section 2 shows the environment and history.
The former will show all the available variables for the > x <- 2
console and their values, while ‘history’ stores all the > class(x)
commands’ history. Section 3 shows the file explorer, help [1] “numeric”

The four major data types in R are numeric, character,

Figure 1: R Project website Figure 2: R Project download page

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 101


Developers Let’s Try

Figure 3: R icon after installation Figure 4: R Studio icon

date and logical. The following code shows how to use Figure 7: Using the console in R Studio
various data types:
Apart from basic data types, R supports data structures or
> x<-”data” objects like vectors, lists, arrays, matrices and data frames.
> class(x) These are the key objects or data structures in R.
[1] “character” A vector stores data of the same type. It can be
> nchar(x) thought of as a standard array in most of the programming
[1] 4 languages. A ‘c’ function is used to create a vector (‘c’
> d<-as.Date(“2017-12-01”) stands for ‘combine’).
> d The following code snippet shows the creation of a vector:
[1] “2017-12-01”
> class(d) > v <- c(10,20,30,40)
[1] “Date” > v
> b<-TRUE [1] 10 20 30 40
> class(b)
[1] “logical” The most interesting thing about a vector is that any
operation applied on it will be applied to individual elements
of it. For example, ‘v + 10’ will increase the value of each
element of a vector by 10.

> v + 10
[1] 20 30 40 50

This concept is difficult to digest for some, but it’s a very


powerful concept in R. Vector has no dimensions; it is simply
a vector and is not to be confused with vectors in mathematics
which have dimensions. Vector can also be created by using
the ‘:’ sign with start and end values; for example, to create a
Figure 5: R Studio default screen vector with values 1 to 10, use 1:10.

> a <- 1:10


> a
[1] 1 2 3 4 5 6 7 8 9 10

It is also possible to do some basic operations on vectors,


but do remember that any operation applied on a vector is
applied on individual elements of it. For example, if the
addition operation is applied on two vectors, the individual
elements of the vectors will be added:

> a<-1:5
> b<-21:25
> a+b
Figure 6: Packages in R Studio [1] 22 24 26 28 30

102 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let’s Try Developers

> arr <- array(21:24, dim=c(2,2))


> arr
[,1] [,2]
[1,] 21 23
[2,] 22 24

A data frame and matrix are used to hold tabular data. It


can be thought of as an Excel sheet with rows and columns.
Figure 8: R Studio Environment and console The only difference between a data frame and matrix is that
in the latter, every element should be of the same type. The
> a-b following code shows how to create the data frame:
[1] -20 -20 -20 -20 -20
> a*b > x<-1:5
[1] 21 44 69 96 125 > y<-(“ABC”, “DEF”, “GHI”, “JKL”, “MNO”)
> z<-c(25,65,33,77,11)
A list is like a vector, but can store arbitrary or any type of > d <- data.frame(SrNo=x, Name=y, Percentage=z)
data. To create a list, the ‘list’ function is used, as follows: > d

> l <- list(1,2,3,”ABC”) SrNo Name Percentage


> l 1 1 ABC 25
[[1]] 2 2 DEF 65
[1] 1 3 3 GHI 33
4 4 JKL 77
[[2]] 5 5 MNO 11
[1] 2
So a data frame is nothing but a vector combined in the
[[3]] column format.
[1] 3 This article gives a basic idea of how data is handled by
R. I leave the rest for you to explore.
[[4]]
[1] “ABC”
By: Ashish Singh Bhatia
A list can be used to hold different types of objects. It can The author is a technology enthusiast and a FOSS fan. He
be used to store a vector, list, data frame or anything else. loves to explore new technology and to work on Python, Java
An array is nothing but a multi-dimensional vector that and Android. He can be reached at ast.bhatia@gmail.com.
He blogs at https://openfreeidea.wordpress.com/ and http://
can store data in rows and columns. An array function is used
etutorialsworld.com/.
to create an array.

Continued from page 100...


downtime or the possibility of tampering, as this platform
HyperLedger: Hyperledger nurtures and endorses a wide leverages the custom built blockchain.
array of businesses around blockchain technologies, including Project link: https://www.ethereum.org/
distributed ledgers, smart contracts, etc. Hyperledger encourages There are some more blockchain projects, links to which
the re-use of common building blocks and enables the speedy can be found in the References section.
invention of distributed ledger technology components.
Project link: https://hyperledger.org/projects Reference
Project GitHub link: https://github.com/hyperledger [1] https://lightrains.com/blogs/opensource-
Openchain: Openchain is an open source distributed ledger blockchain-platforms
technology. It is ideal for enterprises, and deals in issuing and
managing digital assets in a robust, secure and scalable way. By: Abhinav Nath Gupta
Project link: https://www.openchain.org/ The author is a software development engineer at Cleo
Ethereum project: This is a distributed framework Software India Pvt Ltd, Bengaluru. He is interested in
cryptography, data security, cryptocurrency and cloud
that runs smart contracts—applications that run exactly
computing. He can be reached at abhi.aec89@gmail.com.
as programmed in a secured virtual environment without

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 103


TIPS
& TRICKS

Clearing the terminal screen convert image1.jpg -resize 800x600 newimage1.jpg


Enter ‘clear’ without quotes in the terminal and hit the
Enter button. This causes the screen to be cleared, making —Anirudh K, anirudh.3194@gmail.com
it look like a new terminal.
—Abhinay B, ohyesabhi2393@gmail.com Passwordless SSH to remote machine
It can be really annoying (mostly in the enterprise
Convert image formats from the command environment) when you have to enter a password each time
line in Ubuntu while doing an SSH to a remote machine. So, our aim here is
‘convert’ is a command line tool that works very well in to do a passwordless SSH from one machine (let’s call it host
many Linux based OSs. The ‘convert’ program is a part A/User a) to another (host B/User b).
of the ImageMagick suite of tools and is available for all Now, on host A, if a pair of authentication keys are not
major Linux based operating systems. If it is not on your generated for User a, then generate these with the following
computer, you can install it using your package manager. commands (do not enter a passphrase):
It can convert between image formats as well as resize
an image, blur, crop, dither, draw on, flip, join, and re- a@A:~> ssh-keygen -t rsa
sample more from your command line. Generating public/private rsa key pair.
The syntax is as follows: Enter file in which to save the key (/home/a/.ssh/id_rsa):
Created directory ‘/home/a/.ssh’.
convert [input options] input file [output options] output Enter passphrase (empty for no passphrase):
file. Enter same passphrase again:
Your identification has been saved in /home/a/.ssh/id_rsa.
For example, we can convert a PNG image to GIF by Your public key has been saved in /home/a/.ssh/id_rsa.pub.
giving the following command: The key fingerprint is:

convert image.png image.gif 3e:4f:05:79:3a:9f:96:7c:3b:ad:e9:58:37:bc:37:e4 a@A

To convert a JPG image to BMP, you can give the This will generate a public key on /home/a/.ssh/id_rsa.pub.
following command: On host B as User b, create ~/.ssh directory (if not
already present) as follows:
convert image.jpg image.bmp
a@A:~> ssh b@B mkdir -p .ssh
The tool can also be used to resize an image, for which b@B’s password:
the syntax is shown below:
Finally, append User a’s new public key to b@B:.ssh/
convert [nameofimage.jpg] -resize [dimensions] authorized_keys and enter User b’s password for one last time:
[newnameofimage.jpg]
a@A:~> cat .ssh/id_rsa.pub | ssh b@B ‘cat >> .ssh/authorized_
For example, to convert an image to a size of 800 x keys’
600, the command would be as follows: b@B’s password:

104 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


From now on, you can log into host B as User b from $echo “Tips and Tricks” | curl -F-=\<- qrenco.de
host A as User a without a password:
To generate the QR code for a domain, use the
a@A:~> ssh b@B following code:

—Ashay Shirwadkar, ashayshirwadkar12@gmail.com $echo “http://opensourceforu.com” | curl -F-=\<- qrenco.de

Performance analysis of code Note: You need a working Internet connection on


In order to check the performance of the code you your computer.
have written, you can use a simple tool called ‘perf’. Just
run the following command: —Remin Raphael, remin13@gmail.com

$sudo apt-get install linux-tools-common linux-tools-generic Replace all occurrences of a string with a
new line
The above command will install the ‘perf’ tool on Often, we might need to replace all occurrences of a
Ubuntu or a similar operating system. string with a new line in a file. We can use the ‘sed’
command for this:
$perf list
$sed ‘s/\@@/\n/g’ file1.txt > file2.txt
The above command gives the list of all the information
that can be got by running ‘perf’. The above command replaces the string ‘@@’ in ‘file1.
For example, to analyse the performance of a C txt’ with a new line character and writes the modified lines
program and if you want to know the number of cache- to ‘file2.txt’.
misses, the command is as follows: sed is a very powerful tool; you can read its manual for
more details.
$perf stat -e cache-misses ./a.out
—Nagaraju Dhulipalla, nagarajunice@gmail.com
If you want to use more than one command at a time,
give the following command: Git: Know about modified files in changeset
Running the plain old ‘git log’ spews out a whole lot
$perf stat -e cache-misses,cache-references ./a.out of details about each commit. How about extracting just
the name of the files (with their path relative to the root of
—Gunasekar Duraisamy, dg.gunasekar@gmail.com the Git repository)? Here is a handy command for that:

Create a QR code from the command line git log -m -1 --name-only --pretty=”format:” HEAD
QR code (abbreviated from Quick Response Code)
is a type of matrix bar code (or two-dimensional bar code) Changing the HEAD to a different SHA1 commit ID
first designed for the automotive industry. There are many will fetch the names of the files only. This can come in
online websites that help you create a QR code of your handy while tooling the CI environment.
choice. Here is a method that helps generate QR codes for a
string or URL using the Linux command line: Note: This will return empty on merge commits.

—Ramanathan M, rus.cahimb@gmail.com

Share Your Open Source Recipes!


The joy of using open source software is in finding ways to get
around problems—take them head on, defeat them! We invite
you to share your tips and tricks with us for publication in
OSFY so that they can reach a wider audience. Your tips could
be related to administration, programming, troubleshooting or
general tweaking. Submit them at www.opensourceforu.com.
The sender of each published tip will get a T-shirt.
Figure 1: Generated QR code

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | JANUARY 2018 | 105


OSFY DVD

DVD OF THE MONTH


The latest, stable Linux for your desktop.
Ubuntu Desktop 17.10 (Live)
Ubuntu comes with everything you need to run your organisation,
school, home or enterprise. All the essential applications, like an
office suite, browsers, email and media apps come pre-installed, and
thousands of more games and applications are available in the Ubuntu
efy.in
for a free re
placement. Software Centre. Ubuntu 17.10, codenamed Artful Aardvark, is the
rt@
sa
t su
ppo
first release to include the new shell; so it’s a great way to preview the
tou

rly
,w
rite Re
co
mm future of Ubuntu. You can try it live from the bundled DVD.
pe en
ro de
kp

dS
r
wo

Fedora Workstation 27
ys
ot

tem
sn

Re
oe

Fedora Workstation is a polished, easy-to-use operating system


Dd

qu
ire
DV

me
this

for laptop and desktop computers, with a complete set of tools


nts
ase

: P4
In c

, 1G
for developers and makers of all kinds. It comes with a sleek user
B RA

tended, and sh
M, D
VD-RO interface and the complete open source toolbox. Previous releases
unin

of Fedora have included Yumex-DNF as a graphical user interface


s oul
c, i db
dis e
M Drive
att
the

rib

for package management. Yumex-DNF is no longer under active


on

ute
terial, if found

d to t
he complex n

development; it has been replaced in Fedora 27 by dnfdragora, which


l e ma

is a new DNF front-end that is written in Python 3 and uses libYui,


nab

atu
re
tio
ec

of

bj Int
o ern
Any t dat e
Note:

the widget abstraction library written by SUSE, so that it can be run


a.

using Qt 5, GTK+ 3, or ncurses interfaces. The ISO image can be


found in the other_isos folder on the root of the DVD.
January 2018

MX Linux 17
MX Linux is a cooperative venture between the antiX and former
CD

MEPIS communities, which uses the best tools and talent from
Te
am
e-m
ail:

each distro. It is a mid-weight OS designed to combine an elegant


cd
tea
m@
efy.

and efficient desktop with simple configuration, high stability, solid


in

performance and a medium-sized footprint. MX Linux is a great no-


fuss system for all types of users and applications. The ISO image
can be found in the other_isos folder on the root of the DVD.

What is a live DVD?


A live CD/DVD or live disk contains a bootable operating
system, the core program of any computer, which is
designed to run all your programs and manage all your
hardware and software.
Live CDs/DVDs have the ability to run a complete,
modern OS on a computer even without secondary
storage, such as a hard disk drive. The CD/DVD directly
runs the OS and other applications from the DVD drive
itself. Thus, a live disk allows you to try the OS before
you install it, without erasing or installing anything on
your current system. Such disks are used to demonstrate
features or try out a release. They are also used for
testing hardware functionality, before actual installation.
To run a live DVD, you need to boot your computer
using the disk in the ROM drive. To know how to set
a boot device in BIOS, please refer to the hardware
documentation for your computer/laptop.

106 | JANUARY 2018 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


PROFIT FROM

AN EVENT FOR
THE CREATORS,
THE INTEGRATORS,
THE ENABLERS, AND
THE CUSTOMERS OF IOT

IOTSHOW.IN Creating
IoT Solutions?
Come and explore
7-9 Feb 2018 latest products
AN
KTPO EVENT
Whitefield FOR
• Bengaluru & technologies
THE CREATORS,
150+ speakers • 200+ exhibitors • 5,000+ delegates
THE ENABLERS AND
CONTACT: 98111 55335 • www.iotshow.in • iew@efy.in
THE CUSTOMERS
December 2017

Loonycorn
is hiring
Interested?

Mail Resume + Cover Letter to


contact@loonycorn.com
You:

 Really into tech - cloud, ML, anything and everything


 Interested in video as a medium
 Willing to work from Bangalore
 in the 0-3 years of experience range

Us:

 ex-Google | Stanford | INSEAD


 100,000+ students
 Video content on Pluralsight, Stack, Udemy...

Вам также может понравиться