Вы находитесь на странице: 1из 16

DATA SCIENCE

Data Science is a field of Big Data which seeks to provide meaningful information
from large amounts of complex data. It combines different fields of work in statistics
and computation in order to interpret data for the purpose of decision making. Data
science is the study of where information comes from, what it represents and how it
can be turned into a valuable resource in the creation of business and IT strategies.
Mining large amounts of structured and unstructured data to identify patterns can help
an organization rein in costs, increase efficiencies, recognize new market
opportunities and increase the organization's competitive advantage. It is a
multidisciplinary field of study with goal to address the challenges in big data.

TELECOMMUNICATIONS SECTOR
The Telecommunications Sector comprises companies that make communication
possible on a global scale, whether it is through the phone or the Internet. These
companies created the infrastructure that allows data to be sent anywhere in the world.
The largest companies in the sector are wireless operators, satellite companies, cable
companies and Internet service providers.

Evolution

The telecommunications sector evolved from the telegraph, where communication


took days, to modern mobile technology, where large amounts of data can be sent in
seconds. These shifts are due to technology, and they changed how people live and do
business. At one time, telecommunications required physical wires connecting homes
and businesses. In modern society, this is changing with mobile technology and
wireless technology becoming the primary form of communication.

The sector's structure has also changed from a few large players to a more
decentralized system with decreased regulation and barriers to entry. Other than the
service providers, smaller companies in the telecommunications sector sell and service
the equipment, such as routers, switches and infrastructure, which enable this
communication. For growth investors, these companies provide the best opportunities
for share price appreciation. In contrast, larger companies tend to be havens for
conservative, income-focused investors.

However, the ultimate determination for growth is wireless Internet. This piece of the
industry is the anticipated keystone for the continued global expansion of the
telecommunications sector. Furthermore, this industry has experienced a shift in focus
from voice calls to video and data. There is increasing demand for speedier data
connectivity, higher resolution, quicker video streaming and ample multimedia
applications.
Segments Within the Telecommunications Sector
The major segments within the telecommunications sector are wireless
communications, communications equipment, processing systems and products, long-
distance carriers, domestic telecom services, foreign telecom services and diversified
communication services. The fastest growth area within the sector is wireless
communications, as more and more communications and computing shift to mobile
devices. Looking forward, the sector's biggest challenge is to keep up with people's
demand for faster connections as they consume and create content, which requires
significant capital expenditures. Companies that can meet these needs thrive.

Outlook for the Telecommunications Sector

Analysts foresee that product innovation and an increase in mergers and acquisitions
will only facilitate the continued growth and success of the telecommunications
industry. There are many opportunities for investors, and an increase in investors will
only serve to benefit the sector further.

The long-term historical growth rate of the telecommunications sector averages to a


fairly stable rate of approximately 3% per year. The stability of the sector's growth
even during recession means that it is considered to be a solid defensive investment,
while maintaining its appeal to growth investors. Even during uncertain and volatile
economic times, the steady demand for voice and data services, along with extensive
subscription plans, assures a stable source of revenues for major telecom firms such as
Verizon.

Telecommunications has become an increasingly important basic industry, which


bodes well for its future prospects and continued growth. The continuing growth in
high-speed mobile services and Internet connectivity between devices keeps driving
innovation and competition within the sector. Much of the industry focus is on
providing faster data services, especially in the area of high-resolution video.
Essentially, the driving forces are toward quicker and clearer services, increased
connectivity, and multi-application usage.

How Data Science can be used in telecom sector?


In the past the business was driving the technology. Now with the explosion of data
that is like a tsunami the data is now driving the business.

Analytics in action helps to make sense of the near real time information flow and
helps to understand insights and allows an action to be made. Example is with a
Telecommunication Network having insights means that a Promotion Plan can be
applied in near real time. Imagine if a customer is about to churn (move to another
operator) and they receive a lucrative campaign that reduces the chance of churning.
The tools that allows the analysis and actions via insights is a part of Data Science.

Data Science can also help with Network Planning so that it has the right capacity at
the right time (elastic scaling).

Telecommunications is complex business. As users we are mostly faced with one


question- how to get the best deal out of the many telecom players in the market? But
as an analyst in a Telecom company one may be faced with any one or more of the
following question -

 How many Network Towers should be installed in a particular area to meet the
rising demand given the rise in population, rising need of data connectivity and
competition?
 What should be the optimum call rate to maintain revenue margins in the wake
of severe price competition among Telecos?
 How do we track call data to minimize frauds?
 How do we induce higher and broader use of Data by non users for increasing
revenue?

Sources of Big data in telecommunications include phone calls, emails, messages,


transactions, log data, social media usage, geo-spatial information, downloads, digital
media, social media, data from sensors, and more.

Broadly there are 3 types of Big Data generated in any telecommunication business:

 Call-Details data
 Network Data
 Customer Data

These will include phone calls, emails, messages, transactions, log data, social media
usage, geo-spatial information, downloads, digital media, social media, data from
sensors, and more.

The use of Big Data Analytics in telecommunications is used to create real time
analysis of customer preferences and network efficiency. Use of Advance Analytics
directly impacts:

 Customer experience
 Data driven efficiency
 Data driven growth

Broadly use of Data Science in Telecom can be categorized as under:


According to Mind Commerce, big data can be used to improve business processes
and provide insights into a telecom company’s customer base in near real-time, using
both structured and unstructured data. Some key areas include:

 Monitoring network traffic to improve service


 Analyzing call data records to identify fraudulent activity
 Customizing call plans based on usage patterns
 Using data from social networks to optimize marketing campaigns

Telecom companies now wish to move beyond traditional descriptive and exploratory
analytics, which was mainly used for postmortem of business decisions, to advanced
analytics and machine learning driven automated decision making. These new big data
analytics technology platforms are improving personalization at a transformational
scale by allowing telecom companies to manage customer expectations in the very
moment of truths.

Other ways in which data science benefit telecom companies


Telecom organizations can use it to gain insights on customer priorities, discover
faults in their systems, optimize their processes, unlock new revenue streams, predict
customer churn, and launch targeted advertising.

While data science is a useful tool for telecom companies, they can face certain
challenges while adopting the technology. These include non-integration of different
formats of data from various sources; limited number of skilled data scientists;
perishable nature of insights derived from the data; and high costs associated with
advanced analytics.
The advancements in technology can help enterprises address these challenges. Firms
can use the services of data exchanges or marketplaces, which assist in aggregating
data from various sources..

Three Big Data Trends Driving Telecom Sector


Big data is transforming the telecom sector like several other industries. According to
telecom evangelist and director of product marketing, the amount and quality of data
collected in telecom sector and the need to turn this data into insights is just massive.
He believes telecom operators can capitalize on big data to move up the value chain.

The key trends in big data and analytics (BDA) in 2016 for the telecom market,
focusing on the areas that are gaining momentum:

1. Real-time

As competition intensifies to provide the best customer experience, service providers


require real-time data to help provide the digital experience that today’s consumers
demand. The customers want interactions that are more personalized, more contextual
and more relevant. They want more social, mobile and online service and expect faster
responses. This is where real-time or streaming analytics can help, by providing the
most up-to-date information to the service provider, or even directly to the customer.
With the advent of new solutions such as Spark and other streaming analytics this is
now possible, supporting service providers to provide value to solve a critical pain-
point of customer experience –direct touch-point.

2. IoT (Internet of Things)

The IoT is gaining momentum. Although many providers still struggle with finding
the optimal strategy around IoT, it’s definitely becoming a reality. The nature of IoT –
involving data feeds from large numbers of sensors of various types, each of which
needs to be monitored and analyzed – presents an inherent need for automation and
analytics for extracting value out of the internet-of-everything.

3. Skilled resources

While the promise of big data and analytics is real, we still see that lack of skilled
resources is a major obstacle for successful big data analytics implementations. Data
scientists – particularly those with experience in telco data — continue to be in strong
demand, as shown by the fact that data scientist salaries are the fastest-growing
category IT operators today understand that analytics require intimate understanding
of the telecom domain. While many data analysis algorithms are quite common,
telecommunications data and business processes have their unique characteristics. As
a result, a successful analytics implementation requires expertise in both data analysis
and the specific business processes of telecom service providers.
Big Data Use Cases In Telecom
Telecommunication companies collect massive amounts of data from call detail
records, mobile phone usage, network equipment, server logs, billing, and social
networks, providing lots of information about their customers and network, but how
can telecom companies use this data to improve their business?

Java
Java is an extremely popular, general purpose language which runs on the (JVM) Java
Virtual Machine. It’s an abstract computing system that enables seamless portability
between platforms. Currently supported by Oracle Corporation.

Java is powerful, portable, and scalable, which makes the platform perfect for building
enterprise-scale applications and supporting rapid growth. Java also includes many
tools, collectively known as the Java Platform. This robust, open-source development
environment includes libraries, frameworks, APIs, the Java Runtime Environment,
Java plug-ins, and the Java Virtual Machine (JVM). Taken together, these tools
simplify coding with Java and support development at every level, giving developers
everything they need to build Java web systems and applications.

Java’s speed allows it to outperform other languages and frameworks, which is a big
part of why it’s so well suited to large-scale applications. These performance gains are
what prompted Twitter to shift its search engine to Java from Ruby on Rails and move
more of its back-end stack to the Java Virtual Machine.

Another key component of Java is that it comes as close to being 100% object-
oriented as you can get. With that comes all the benefits of object-oriented
programming, from ease of development to modular software to flexibility and
extensibility. As one of the most widely known programming languages, it’s easy to
find and hire talented developers. What’s more, Java’s massive community of
developers means that there’s lots of excellent documentation around.

License
Version 8 — Free! Legacy versions, proprietary.

Pros

 Ubiquity . Many modern systems and applications are built upon a Java back-
end. The ability to integrate data science methods directly into the existing
codebase is a powerful one to have.
 Strongly typed. Java is no-nonsense when it comes to ensuring type safety. For
mission-critical big data applications, this is invaluable.
 Java is a high-performance, general purpose, compiled language . This makes it
suitable for writing efficient ETL production code and computationally
intensive machine learning algorithms.

Cons

 For ad-hoc analyses and more dedicated statistical applications, Java’s


verbosity makes it an unlikely first choice. Dynamically typed scripting
languages such as R and Python lend themselves to much greater productivity.
 Compared to domain-specific languages like R, there aren’t a great number of
libraries available for advanced statistical methods in Java.
Reasons why data scientists need to learn Java
Python and R have long been the two languages said to have a hold on the data
science world, but that’s not to say they’re the only languages worth using for data
science. Java is, in fact, a great language for doing data science. Here are some
reasons why Java is a great language for doing data science :

1 Old is gold: Java is one of the oldest languages used for enterprise
development and it’s quite likely that the organization you’re working in
also has a major part of their infrastructure based on Java.
2 First Class Citizen: Most of the popular Big Data frameworks/tools on
the likes of Spark, Flink, Hive, Spark and Hadoop are written in Java.
It’s easier to find a Java developer who’s comfortable working with
Hadoop and Hive, rather than one who isn’t familiar with Java and the
stack.
3 Great Toolset: Java has a great number of libraries and tools for
Machine Learning and Data Science. Some of them being, Weka, Java-
ML, MLlib and Deeplearning4j, to solve most of your ML or data
science problems.
4 Lambdas and REPL: With Java 8 came Lambdas, which rectified most
of Java’s verbosity, thus making it less painful to develop large
enterprise/data science projects. On the other hand, Java 9 brings in the
much-missed REPL, that facilitates iterative development.
5 Java Virtual Machine: The JVM is one of the best platforms, enabling
you to write code that is identical on multiple platforms. The JVM
allows developers to create custom tools quickly. Moreover, Java has a
load of IDEs that improve developers’ productivity.
6 Java is Strongly Typed: Not to be confused with static typing, strong
typing helps when working with large data applications, and type safety
is a feature worth having. Java ensures programmers are explicit about
the types of data and variables they deal with. It makes it much easier to
maintain the code base and you can safely avoid writing trivial unit tests
for your applications.
7 JVM has Scala: Although this is somewhat of a next step, it’s worth
learning Scala to do some heavy data science, and it gets easier if you
already know how to code in Java. Scala offers amazing support for data
science, and several powerful frameworks like Spark are built on top of
Scala.
8 The Job Scene: If SQL is knocked out of the way, Java is a clear
winner in the job space. It’s more likely you will get picked up by an
organization if you have Java as one of your skills.
9 Scalability: Java is excellent when it comes to scaling your
applications. This makes it a great choice when you’re thinking of
building larger and more complex ML/AI applications. If you’re starting
out to build up your application from the ground level, it’s good to
choose Java as your programming language.
10 Java is Fast: Unlike some of the other widely used languages for Data
Science, Java is fast. Speed is critical for building large-scale applications and
Java is perfectly suited for this. MNCs like Twitter, Facebook and LinkedIn
rely on Java for data engineering efforts.

What is Java used for in data science?


Right now, Data Science is growing explosively and there are lot of new technologies
being added. The challenge with companies getting into data science is

a) picking the right stack of technologies

b) getting the developers that can build the platform using the selected technologies

The reason why Java programmers are a popular choice is because

a) Java can be used to build virtually anything, and is particularly suited to building
scalable, multi-threaded platforms.
b) It's easier for a decent Java developer to learn technologies that require grid
computing
c) It's just plan easier to hire a Java developer. Let's say you pick Hadoop + Hive, you
can either spend 3 months looking for a guy who knows Hadoop + Hive, or 1 month
hiring a Java guy who has worked on server side and 2 months training him on
Hadoop.

In a nutshell, Java is popular in Data Science not because Java is the "best" language
for Data Science, but it's because Java Developers tend to have a good grounding in
concepts that a lot of data science applications are built on.

Usually back end web services run on Java, and in data science as you might have
experienced only just the 10% are the algorithms, the rest 90% is cleaning and getting
the data. In enterprises, this would have already been written in Java, so they do prefer
their entire technology stack to stay with Java. That said, ROR and Python web
services are also gaining immense popularity.

Scala and its associated Apache Spark stack has yet to mature to enterprise levels,
everyone says it'll scale well but no one has actually seen it deployed like Hadoop
(Yahoo and Facebook apparently crunch petabytes in a day!).

Finally, Java like any programming language is just a way to express your ideas as
code, and at the end of the day it is among the very few languages that has stood the
test of time to be used in the enterprise and it has done pretty well till now.
Features Of Java
There is given many features of java. They are also known as java buzzwords.
1. Object-Oriented
2. Platform independent
3. Simple
4. Secured
5. Robust
6. Architecture neutral
7. Portable
8. Dynamic
9. Interpreted
10. High Performance
11. Multithreaded
12. Distributed

1. Object Oriented:-
Object means a real word entity such as pen, chair, table etc. Object-Oriented
Programming is a methodology or paradigm to design a program using classes and
objects. It simplifies the software development and maintenance by providing some
concepts:-
i) Object
ii) Class
iii) Inheritance
iv) Polymorphism
v) Abstraction
vi) Encapsulation

2. Platform independent:-
Java runs on a variety of platforms, such as Windows, Mac OS, and the various
versions of UNIX.

3. Simple:-
Java was designed to be easy for the professional programmer to learn and use
effectively. If you already understand the basic concepts of object-oriented
programming, learning Java will be even easier.
Best of all, if you are an experienced C++ programmer, moving to Java will require
very little effort. Because Java inherits the C/C++ syntax and many of the object-
oriented features of C++, most programmers have little trouble learning Java.

4. Secured:-
Java is secured because:
i) No explicit pointer
ii) Programs run inside virtual machine sandbox.

5. Robust:-
Robust simply means strong. Java uses strong memory management. There are lack of
pointers that avoids security problem. There is automatic garbage collection in java.
There is exception handling and type checking mechanism in java. All these points
makes java robust.

6. Architectural- neutral:-
There is no implementation dependent features e.g. size of primitive types is set.

7. Portable:-
We may carry the java bytecode to any platform.

8. Dyanamic:-
Java programs carry with them substantial amounts of run-time type information that
is used to verify and resolve accesses to objects at run time. This makes it possible to
dynamically link code in a safe and expedient manner. This is crucial to the robustness
of the Java environment, in which small fragments of bytecode may be dynamically
updated on a running system.

9. Interpreted:-
As described earlier, Java enables the creation of cross-platform programs by
compiling into an intermediate representation called Java bytecode. This code can be
executed on any system that implements the Java Virtual Machine. Most previous
attempts at cross-platform solutions have done so at the expense of performance.

10. High Performance:-


Java is faster than traditional interpretation since byte code is "close" to native code
still somewhat slower than a compiled language (e.g., C++)
11. Multi-threaded:-
A thread is like a separate program, executing concurrently. We can write Java
programs that deal with many tasks at once by defining multiple threads. The main
advantage of multi-threading is that it shares the same memory. Threads are important
for multi-media, Web applications etc.

12. Distributed:-
We can create distributed applications in java. RMI and EJB are used for creating
distributed applications. We may access files by calling the methods from any
machine on the internet.

Java Variables
The Java programming language defines the following kinds of variables:

i) Instance Variables (Non-Static Fields)


ii) Class Variables (Static Fields)
iii) Local Variables

i) Instance Variables (Non-Static Fields):- Technically speaking, objects store their


individual states in "non-static fields", that is, fields declared without the static
keyword.

Non-static fields are also known as instance variables because their values are unique
to each instance of a class (to each object, in other words); the current Speed of one
bicycle is independent from the current Speed of another.

ii) Class Variables (Static Fields):- A class variable is any field declared with the
static modifier; this tells the compiler that there is exactly one copy of this variable in
existence, regardless of how many times the class has been instantiated.

A field defining the number of gears for a particular kind of bicycle could be marked
as static since conceptually the same number of gears will apply to all instances. The
code static int num Gears = 6; would create such a static field. Additionally, the
keyword final could be added to indicate that the number of gears will never change.

iii) Local Variables:- Similar to how an object stores its state in fields, a method will
often store its temporary state in local variables. The syntax for declaring a local
variable is similar to declaring a field (for example, int count = 0;).

There is no special keyword designating a variable as local; that determination comes


entirely from the location in which the variable is declared — which is between the
opening and closing braces of a method.

As such, local variables are only visible to the methods in which they are declared;
they are not accessible from the rest of the class.

Java Data Types

The main purpose of Data Types in java is to determine what kind of value we can
stored in to the variable.
Ex: int x; like, int=12
Advantages of Java
 Java offers higher cross- functionality and portability as programs written in
one platform can run across desktops, mobiles, embedded systems.
 Java is free, simple, object-oriented, distributed, supports multithreading and
offers multimedia and network support.
 Java is a mature language, therefore more stable and predictable. The Java
Class Library enables cross-platform development.
 Being highly popular at enterprise, embedded and network level, Java has a
large active user community and support available.
 Unlike C and C++, Java programs are compiled independent of platform in
bytecode language which allows the same program to run on any machine that
has a JVM installed.
 Java has powerful development tools like Eclipse SDK and NetBeans which
have debugging capability and offer integrated development environment.
 Increasing language diversity, evidenced by compatibility of Java with Scala,
Groovy, JRuby, and Clojure.
 Relatively seamless forward compatibility from one version to the next

REAL WORLD JAVA APPLICATIONS


There are many places where Java is used in real world, starting from commercial e-
commerce website to android apps, from scientific application to financial
applications like electronic trading systems, from games like Minecraft to desktop
applications like Eclipse, Netbeans, and IntelliJ, from an open source library to J2ME
apps etc.

1) Android Apps
If you want to see where Java is used, you are not too far away. Open your Android
phone and any app, they are actually written in Java programming language, with
Google's Android API, which is similar to JDK. Couple of years back Android has
provided much needed boost and today many Java programmer are Android App
developer. By the way android uses different JVM and different packaging, as we
have seen in our previous article about how Android app works, but code is still
written in Java.

2) Server Apps at Financial Services Industry


Java is very big in Financial Services. Lots of global Investment banks like Goldman
Sachs, Citigroup, Barclays, Standard Charted and other banks use Java for writing
front and back office electronic trading system, writing settlement and confirmation
systems, data processing projects and several others. Java is mostly used to write
server side application, mostly without any front end, which receives data form one
server (upstream), process it and sends it other process (downstream). Java Swing was
also popular for creating thick client GUIs for traders, but now C# is quickly gaining
market share on that space and Swing is out of its breath.

3) Java Web applications


Java is also big on E commerce and web application space. You have a lot of REST
full services being created using Spring MVC, Struts 2.0 and similar frameworks.
Even simple Servlet, JSP and Struts based web applications are quite popular on
various government projects. Many of government, healthcare, insurance, education,
defense and several other department have their web application built in Java.

4) Software Tools
Many useful software and development tools are written and developed in Java e.g.
Eclipse, InetelliJ Idea and Netbans IDE. I think they are also most used desktop
applications written in Java. Though there was time when Swing was very popular to
write thick client, mostly in financial service sector and Investment banks. Now days,
Java FX is gaining popularity but still it is not a replacement of Swing and C# has
almost replaced Swing in Finance domain.

5) Trading Application
Third party trading application, which is also part of bigger financial services industry,
also use Java. Popular trading application like Murex, which is used in many banks for
front to bank connectivity, is also written in Java.

6) J2ME Apps
Though advent of iOS and Android almost killed J2ME market, but still there is large
market of low end Nokia and Samsung handset which uses J2ME. There was time
when almost all games, application, which is available in Android are written using
MIDP and CLDC, part of J2ME platform. J2ME is still popular on products like Blu-
ray, Cards, Set top boxes etc. One of the reason of WhatsApp being so popular is
because it is also available in J2ME for all those Nokia handset which is still quite big.

7) Embedded Space
Java is also big in the embedded space. It shows how capable the platform is, you only
need 130 KB to be able to use Java technology (on a smart card or sensor). Originally
Java was designed for embedded devices. In fact, this is the one area, which was part
of Java's initial campaign of "write once, run anywhere" and looks like it is paying up
now.

8) Big Data technologies


Hadoop and other big data technologies are also using Java in one way or other e.g.
Apache's Java-based HBase and Accumulo (open source), and Elastic Search as well.
By the Java is not dominating this space, as there are technologies like Mongo DB
which is written in C++. Java has potential to get major share on this growing space if
Hadoop or Elastic Search goes big.

9) High Frequency Trading Space


Java platform has improved its performance characteristics a lot and with modern
JITs, its capable of delivering performance at C++ level. Due to this reason, Java is
also popular on writing high performance systems, because Though performance is
little less compared to native language, but you can compromise safety, portability and
maintainability for more speed and it only takes one inexperienced C++ programmer
to make an application slow and unreliable.

10) Scientific Applications


Nowadays Java is often a default choice for scientific applications, including natural
language processing. Main reason of this is because Java is more safe, portable,
maintainable and comes with better high-level concurrency tools than C++ or any
other language.

In 1990s Java was quite big on Internet due to Applet, but over the years, Applet's lost
its popularity, mainly due to various security issues on Applet's sand boxing model.
Today desktop Java and Applets is almost dead. Java is by default Software industries
darling application development language, and given its heavy usage in financial
services industry, Investment banks and E-commerce web application space, any one
learning Java has bright future ahead of him. Java 8 has only reinforced the belief that
Java will continuing dominating software development space for years to come.