Вы находитесь на странице: 1из 95

Cyber-bullying Detection Using smSDA

INOV TECHNOLOGY
COMPANY PROFILE

1
Cyber-bullying Detection Using smSDA

Contents
WHY USE INOV TECHNOLOGY?

MISSION & OBJECTIVES


CORPORATE POLICY

BACKGROUND

PRODUCTS

SOFTWARE DEVELOPMENT

CONSULTING SERVICES

CONTACT DETAILS

2
Cyber-bullying Detection Using smSDA

WHY USE INOV TECHNOLOGY?

Inov Technology has been a long-term trusted partner for clients in the Mysuru region

and beyond since 2016; working closely with our clients allows us to act in their best interest

over the long term.

Our clients benefit from:

• Established team of 8 staff –most 5+ years.

• A proven client service model: locally owned and operated

• Considerable range of combined technical knowledge and experience

• Timely response to any issues

• Cost efficient services

• Fixed costs for easier budgeting

MISSION

To emerge as a leader in the Indian “Information Technology” industry - through total customer

satisfaction & employee motivation.

OBJECTIVES

 To provide efficient and cost-effective solutions to complex information management

requirements through innovative application of the latest in technology.

 To provide an informal yet highly professional environment to our workforce and nurture

them towards identifying the organization’s goals as their personal targets.

 To achieve excellence in every sphere of operation.

STRENGTHS (moving to the future)

3
Cyber-bullying Detection Using smSDA

 Deliver secure, reliable and scalable applications that help businesses excel in today's

rapidly evolving economy.

 Gain client’s satisfaction by fully understand and address our client’s requirements within

the minimal time-to-market.

 Assure a steady quality by guarantee high quality of our deliverables.

 Maintain continuous improvement by promoting a learning environment, and ensuring

that our employees are exposed to, and trained on state-of-the-art technologies. Our

software development process is continuously monitored and improved to maximize

productivity.

 Utilize object-oriented analysis, design, and testing methodologies; ensure seamless

integration and traceability between the project’s requirements, design, development,

quality assurance, and delivery.

 Understand In-depth clients’ business requirements.

 Deliver world-class End-to-End IT solutions.

 Implement cutting-edge innovative solutions.

 Guarantees high quality of our deliverables.

 Seeking Continuous improvement.

 Adopting International & best practice standards.

CORPORATE PHILOSOPHY

Our corporate philosophy establishes the fundamental principles of our management system. Our

worldwide operational and performance standards translate the corporate values into specific

4
Cyber-bullying Detection Using smSDA

management expectations. We preserve a high level of business ethics characterized by integrity

and honesty in all our business actions.

Corporate Values

INOV TECHNOLOGY commits to:

 adopting high standards of ethics in all its business actions and practices;

 providing its customers with high quality services, tailor-made to their needs and

expectations;

 guaranteeing highly competitive services to its clients;

 engaging a highly skilled personnel supported by an effective organizational structure;

 implementing an ‘equal opportunities’ and ‘environmentally friendly’ policy; and

 Increasing shareholders’ value.

QUALITY POLICY

Inov Technology is committed to achieve customer delight through cost effective and

customer-centric Quality I.T. Solutions, that are innovative and continuously upgraded in

keeping with emerging technology trends by a motivated workforce, on time, all the time;

resulting in maximizing stakeholder value.

ENVIRONMENT POLICY

Environmental Policy As a responsible corporate entity, INOV TECHNOLOGY is

committed to protecting the environment in compliance with the environmental laws and the

practices of the communities where it operates. While pursuing our activities we endeavor to

minimize any adverse impact on air, water and land by means of pollution prevention and energy

and water conservation. By doing so, we achieve cost savings, an increased operational

5
Cyber-bullying Detection Using smSDA

efficiency, improved quality of products and services and ultimately, a safe environment for the

community as a whole and a healthy workplace for our employees

Our commitment is summarized to the following principles:

 application of good environmental practices globally;

 consideration of the environmental impact within the processes of development and

engineering of our services/products;

 prevention of pollution by responsible management of materials, reduction of emissions

and waste and efficient use of energy and natural resources;

 promotion of the idea of environmental responsibility among our employees; they are

trained in managing their environmental responsibilities, dealing with day-to-day actions

to help in preserving a healthy environment and reacting to environmental emergencies;

 monitoring of our environmental performance and setting measurable objectives and

targets for achieving sustainable improvement;

 communicating with our employees and local communities regarding our environmental

commitment and performance ;and

 Regular audits of our procedures to ensure conformance to our policy.

BACKGROUND

Established in 2016 as an ERP Software Products Company, Inov Technology is today, a

mature and fast growing company committed to providing reliable and cost-effective I.T.

solutions to organizations. Emphasis on quality, world-class human resources and cutting edge

solutions drive its commitment.

With a successful track record of serving the most demanding customers, Inov

Technology can bring you the benefits of working with a partner with software skills,

6
Cyber-bullying Detection Using smSDA

networking expertise, project management experience and domain knowledge in every aspect of

Information Technology.

Inov Technology has the customer’s needs at its core and ERP as a core competence,

which is supported by several other software products and technologies that help deliver a

complete solution through its Consulting Services, Software Products & Services and Managed

Services.

A brief note on the services and products offered by Inov Technology is as follows:

PRODUCTS

Inov Technology spotted the potential of the Indian Software Industry in its teething stages and

developed various enterprise applications and off the shelf software products, which were

focused around the customer’s business processes. Such products include:

 IN-SMS: provides user-friendly dashboards with login access for teachers, non-teaching

staff, students, parents and management personnel of your institution. The various

modules available in this will facilitate all the processes of your institution, from

admission of new students to generating transfer certificates when students complete their

studies. It has modules to manage Timetable, Attendance, Examinations, Campus News,

Hostel, Library, Transportation, School Calendar, Events and many more. It has a fully-

fledged Human Resource module to manage the payroll and employee pay slips.

The Finance module helps you to plan and allot different fee structures to

students. There is an internal messaging system within this system but you can also

integrate it with external communication tools like email and texting.

 eCampusManager: IN-CMS is an advanced web based college management system

offering high flexibility and an abundance of features to collaboratively create quality

7
Cyber-bullying Detection Using smSDA

education system. Its main feature is to provide seamless networked campus and a

paperless administration. IN-CMS is the best solution for the centralized management of

academic data and this application platform provides a right communication link between

faculty, parents and students so that a good feedback system and a knowledge rich

environment can be created to improve the education system. It also contains an exam

management system that provides accessibility of result, statistics and customized report

generation.

The software entitled "IN-CMS" is an application for systematically, logically and

efficiently managing the functioning of sectors such as group of educational institutions,

schools colleges, universities etc. IN-CMS is a product, developed institutions to manage

its working in a distributed environment with role based access control.

This system works across the Internet as well as the organizations intranet and

extranet. IN-CMS provides a framework with which all members of an institution can

access, view and manage their account. The product is software which provides a

systematic approach to control, describe, store, retrieve and share information contained.

 IN-HelpDesk: Help desk or service desk software works to automate the service

management and support function. Typical support desk purposes include helping users

retrieve lost passwords, helping customers troubleshoot product issues, assisting

employees with hardware and software technical problems, and more. There are number

of service and support solutions available that offer rich and robust functionality for

optimizing the help desk management process.

 IN-TravelPro: IN-TravelPro is the leading Accounting & Business Management

Software for Travel Agents. It helps you with all your accounting requirements like

8
Cyber-bullying Detection Using smSDA

invoicing, credit notes, receipts & payments, service tax & TDS calculations, outstanding

statements, cash & bank reports, etc. The software also give you exhaustive MIS Reports

like sales & purchase statements, Invoice-wise outstanding report, Income-Expense

report, etc. All reports can be exported to MS-Excel (CSV Format) giving you the

flexibility to work on them or upload them into other software’s like Tally, etc.

For ease of travel agency operations, we have also integrated Movement Chart

Report into the software. This report gives you details about upcoming travel activities of

your clients along with supplier contact details. So reconfirmation of the services with

your suppliers becomes hassle-free & you can ensure better travel experience for your

clients.

SOFTWARE DEVELOPMENT

Inov Technology has Customized Software Development Division, which provides

development services on web-based and client-server technologies. With Inov Technology

complement of software specialists, the company responds to needs, to opportunities and to

challenges, providing a growing ability to support operations either on an on-site or off-shore

basis in the following areas:

 New product development

 Customized products

 Product enhancement

 Modification, conversion & migration of existing applications.

Skills towards the use of powerful computers, advanced equipment, sophisticated software &

systems development methodologies and the latest productivity tools are available with our

software group to provide high-quality services in the above areas. Regular walk-thru's ensure

9
Cyber-bullying Detection Using smSDA

functionality, reliability and maintainability of software developed by us. Specialized

applications can be developed for Local Area Networks as well as Enterprise Wide Networks.

We have honed our skills to the vast requirements of web-related software development

and e-commerce applications. For this purpose, we have in place a focused group of software

professionals specifically for this activity.

CONSULTING SERVICES

Inov Technology offers expert consultancy to organizations embarking on

computerization or upgrading their networks. We can evaluate various software and hardware

products available in order to provide our clients with a technically appropriate and cost effective

solution. Our vast experience in ERP and business applications gives us that cutting-edge

required from a business consulting organization, since we have the requisite expertise in

functional areas like finance, manufacturing, sales & distribution, human resources management,

supply chain management, customer relationship management, fleet management etc.

While our technology services in the realm of IT infrastructure, security, data mining,

networking etc. give us that extra edge required of a technology oriented, business consulting

firm.

Technical assistance from us could also cover feasibility studies, staffing requirements,

training on systems & applications, environmental engineering of computer sites, systems audit

and systems integration, involving interconnection of heterogeneous systems, merging hardware,

software and communications products of diverse origins to form a single source solution.

Clients may avail of one or more of our specialized services, or retain the expertise of a

multidisciplinary team for an entire project cycle of inter-related specialist services.

CONTACT DETAILS

10
Cyber-bullying Detection Using smSDA

Registered Office
#897/3, ch-1/3, Narayana Shastry Road,
Laxmipuram, Chamaraja Mohalla,
Mysuru-570024
Mob – 9591104342
Email – info@inov-tec.com

11
Cyber-bullying Detection Using smSDA

Chapter 1

INTRODUCTION

1.1 Overview

Social Media defined as a group of Internet based applications that build on the

ideological and technological foundations of Web 2.0, and that allow the creation and exchange

of user-generated content. Using social media, people can enjoy enormous information,

convenient communication experience and so on. However, social media may have some side

effects such as cyber-bullying, which may have negative impacts on the life of people, especially

children and teenagers. Cyber-bullying can be defined as aggressive, intentional actions

performed by an individual or a group of people through digital communication methods such as

sending messages and posting comments against a casualty.

Different from traditional bullying that usually occurs at school during face-to-face

communication, cyber-bullying on social media can take place anywhere at any time. For bullies,

they are free to hurt their peers’ feelings because they do not need to face someone and can hide

behind the Internet. For victims, they are easily exposed to harassment who are constantly

connected to Internet or social media. As reported cyber-bullying victimization rate ranges from

10% to 40%. In the United States, approximately 43% of teenagers were ever bullied on social

media. The same as traditional bullying, cyber-bullying has negative, insidious and sweeping

impacts on children. The outcomes for dupe under cyber-bullying may even be tragic such as the

occurrence of self-injurious behavior or suicides.

12
Cyber-bullying Detection Using smSDA

The way to address the cyber-bullying problem is to automatically detect and promptly

report bullying messages so measures can be taken to prevent possible tragedies. Cyber-bullying

detection can be formulated as a supervised learning problem. A classifier is first trained on a

cyber-bullying corpus labeled by humans, and the learned classifier is then used to recognize a

bullying message. Three kinds of information including text, user demography, and social

network features are often used in cyber-bullying detection [9]. Since the text content is the most

reliable, the work here focuses on text-based cyber-bullying detection.

According to Belsey (2004) "cyberbullying involves the use of information and

communication technologies to support deliberate, repeated, and hostile behavior by an

individual or group that is intended to harm others" (Belsey, 2004). In 2006, the National Crime

Prevention Council worked with Harris Interactive Inc., to create a study on cyberbullying. The

study found that 43% of the 824 middle school and high school-aged students surveyed in the

United States had been cyberbullied in the past year (cited in Moessner, 2007). The Pew Internet

and American Life Project on cyberbullying conducted a similar study in 2006 which found that

one out of three teens have experienced online harassment (cited in Lenhart, 2007). Pew also

found that the most prevalent form of cyberbullying was making private information public;

which included e-mails, text messages, and pictures (cited in Lenhart, 2007). The findings ofthe

Pew research also indicated that girls are more likely to be part of cyberbullying than boys. Older

girls, between the ages of 15 and 17, are the most likely to be involved in some form of

cyberbullying, with 41 % of those surveyed indicating that they have been involved in some type

of cyberbuUying (cited in Lenhart, 2007). Cyberbullying is different from traditional bullying

due to the anonymity that the Internet can provide. Cyberbullies do not have to own their actions

13
Cyber-bullying Detection Using smSDA

due to the anonymity and cyberbullying is often outside of the legal reach of schools and school

boards since it often happens outside of the school (Belsey, 2004). According to Willard (2006),

there are different forms of cyberbullying. These forms include flaming, harassment, denigration,

impersonation, outing, trickery, exclusion, cyberstalking, and cyberthreats.

As previously mentioned cyberbullies often believe they are anonymous to the victim and

therefore tend to say more hurtful things to the victims than they would if they were faceto-face

(Juvonen & Gross, 2008). However, Juvonen and Gross (2008), found that 73% of the

respondents to their study were "pretty sure" or "totally sure" of the identity of the cyberbully.

Cyberbullying is more likely than other forms of bUllying to go unreported to parents and

administrators. This is due to victims feeling they needed to learn to deal with it themselves and

also being afraid that if they tell their parents, their internet privileges will be reduced or taken

away. It has been found that 90% of respondents in the Juvonen and Gross study (2008) reported

not telling adults about cyberbullying incidents due to these reasons. Victims of cyberbullying

may experience stress, low self-esteem, and depression. It has been found that cyberbullying can

also have extreme repercussions such as suicide and violence. Marr and Field (2001) referred to

suicide brought on by bullying as "bullycide" (Marr & Field, 2001, p. 1). A particular victim of

cyberbullying that lead to "bullycide" is Megan Meier. Megan was a I3-year-old female from

Missouri who was cyberbullied to the point that she hung herself in her closet in October of 2006

(Pokin, 2007). Megan thought that she was talking with a 16-year-old boy named Josh on

MySpace. During the six weeks they were talking, Megan's mom kept a close eye on the

conversations. On October 15th, 2006, Megan received a message on MySpace from Josh which

said, "I don't know if! want to be friends with you anymore because I've heard that you are not

very nice to your friends." The next day, students were posting bulletins about Megan and Josh

14
Cyber-bullying Detection Using smSDA

had sent her another message which read, "Everybody in O'Fallon knows how you are. You are a

bad person and everybody hates you. Have a shitty rest of your life. The world would be a better

place without you." That day, Megan's parents found her hanging in her closet and rushed her to

the hospital, where she died the following day (Pokin, 2007). Although Megan's parents did

know about Josh and what he had been saying to her, there was no way of knowing that these

messages would lead to her suicide. It was found that Lori Drew, the mother of one of Megan's

former friends, had created the fake MySpace account with her daughter. Drew was convicted of

three misdemeanor charges of computer fraud for her involvement in creating the phony account

which tricked Megan, who later committed suicide. This conviction was the country's first

cyberbullying verdict which was ruled on November 26t h, 2008 (Steinhauer, 2008). On July

2nd, 2009, federal judge George H. Wu threw out the conviction. Judge Wu tentatively acquitted

Drew of the previously mentioned misdemeanor charges, stating that the federal statute under

which Drew was convicted is too "vague" when applied in this particular case. Further stating

that if he were to allow Drew's conviction to stand, "one could literally prosecute anyone who

violates a terms of service agreement" in any way (Cathcart, 2009). This study examines ways in

which schools can prevent cyberbullying and, when necessary, intervene when cyberbullying

does occur. In finding a possible solution to cyberbu11ying, victims will feel safer, not only in

their homes, but at school as well

In the text-based cyber-bullying detection, the critical step is the numerical representation

learning for text messages. In fact, representation learning of text is extensively studied in text

mining, information retrieval and natural language processing (NLP). Bag-of-words (BoW)

model is one commonly used model that each dimension corresponds to a term. Latent Semantic

Analysis (LSA) and topic models are another popular text representation models, which are both

15
Cyber-bullying Detection Using smSDA

based on BoW models. By mapping text units into fixed-length vectors, the learned

representation can be further processed for numerous language processing tasks. Therefore, the

useful representation should discover the meaning behind text units. In cyber-bullying detection,

the numerical representation for Internet messages should be robust and discriminative. Since

messages on social media are often very short and contain a lot of informal language and

misspellings, robust representations for these messages are required to reduce their ambiguity.

Even worse, the lack of sufficient high-quality training data, i.e., data sparsity make the issue

more challenging. Firstly, labeling data is labor intensive and time consuming. Secondly, cyber-

bullying is hard to describe and judge from a third view due to its intrinsic ambiguities. Thirdly,

due to protection of Internet users and privacy issues, only a small portion of messages are left

on the Internet, and bullying posts are deleted. As a result, the trained classifier may not

generalize well on testing messages that contain non-activated but discriminative features.

Some approaches have been proposed to solve these problems by incorporating expert

knowledge into feature learning. Yin et.al proposed to combine BoW features, sentiment features

and contextual features to train a support vector machine for online harassment detection [10].

Dinakaret.al utilized label specific features to extend the general features, the label specific

features are learned by Linear Discriminative Analysis [11]. In addition, common sense

knowledge was also applied. Nahar et.al presented a weighted TF-IDF scheme via scaling

bullying-like features by a factor of two [12]. Besides content-based information, Maral et.al

proposed to apply users’ information, such as gender and history messages, and context

information as extra features [13], [14]. But a major limitation of these approaches is that the

learned feature space still relies on the BoW assumption and may not be robust. In addition, the

16
Cyber-bullying Detection Using smSDA

performance of these approaches rely on the quality of hand-crafted features, which require

extensive domain knowledge.

1.2 Existing system

Cyber bullying detection is not a new concept. Developers have been working on to find

the most convenient yet cost effective way to implement this concept. The datasets that have

been made use of this technology are Twitter and MySpace. Twitter is ‘’a real time information

network that connects you to the latest stories, ideas, opinions and news about what you find

interesting‘’. Registered users can read and post tweets, which are defined as the messages

posted on Twitter with a maximum length of 140 characters. MySpace is another web2.0 social

networking website. The registered accounts are allowed to view pictures, read chat and check

other peoples’ profile information. Twitter and MySpace analyzes the bullying percentage, but

here the posted bullying words are not blocked.

Cyberbullying is a new form of bullying that follows students from the hallways of their

schools to the privacy of their homes. Many victims of cyberbullying are bullied from the

moment they wake up and check their cell phone or e-mail, to the time they go to bed and shut

off their computer or cell phone.

17
Cyber-bullying Detection Using smSDA

1.3 Objectives

• Develop semantic-enhanced marginalized denoising autoencoder as a specialized

representation learning model for cyberbullying detection.

• In addition, word embeddings have been used to automatically expand and refine

bullying word lists that are initialized by domain knowledge

• Automatic detection of cyber bullying.

• Reduction of cyber harassments.

• Healthy and safe social media environment

1.4 Scope of work

Investigating one deep learning method named stacked denoising auto encoder (SDA).

SDA stacks several denoising autoencoders and concatenates the output of each layer as the

learned representation. Each denoising autoencoder in SDA is trained to recover the input data

from a corrupted version of it. The input is corrupted by randomly setting some of the input to

zero, which is called dropout noise. The denoising process helps the autoencoders to learn robust

representation. In addition, each autoencoder layer is intended to learn an increasingly abstract

representation of the input.

Development of new text representation model based on a variant of SDA: marginalized

stacked denoising autoencoders (mSDA),which adopts linear instead of nonlinear projection to

accelerate training and marginalizes infinite noise distribution in order to learn more robust

representations. Utilization of semantic information to expand mSDA and develop Semantic-

enhanced Marginalized Stacked Denoising Autoencoders (smSDA). The semantic information

consists of bullying words. An automatic extraction of bullying words based on word

18
Cyber-bullying Detection Using smSDA

embeddings is proposed so that the involved human labor can be reduced. During training of

smSDA, we attempt to reconstruct bullying features from other normal words by discovering the

latent structure, i.e. correlation, between bullying and normal words. The intuition behind this

idea is that some bullying messages do not contain bullying words. The correlation information

discovered by smSDA helps to reconstruct bullying features from normal words, and this in turn

facilitates detection of bullying messages without containing bullying words.

For example, there is a strong correlation between bullying word fuck and normal word

off since they often occur together. If bullying messages do not contain such obvious bullying

features, such as fuck is often misspelled as fck, the correlation may help to reconstruct the

bullying features from normal ones so that the bullying message can be detected. It should be

noted that introducing dropout noise has the effects of enlarging the size of the dataset, including

training data size, which helps the data sparsity problem. In addition, L1 regularization of the

projection matrix is added to the objective function of each auto-encoder layer in our model to

enforce the sparsity of projection matrix, and this in turn facilitates the discovery of the most

relevant terms for reconstructing bullying terms.

1.5 Organization of the report

The project report is organized as follows:

Chapter 1: It gives the brief introduction about cyber bullying detection, the main objective of

the project. It gives a framework about the existing system and it detriments. The proposed

system that overcomes the detriments of the existing system is stated.

Chapter 2: Briefs the literature survey of the project.

19
Cyber-bullying Detection Using smSDA

Chapter 3: It specifies Software Requirements Specifications (SRS), which includes functional

requirements, non-functional requirements and system requirements such as hardware and

software specifications.

Chapter 4: It specifies the design phase of the project which includes System Architecture,

modules, low level and high level design consisting of sequence, flow chart and block diagrams

for each module.

Chapter 5: It gives the detailed implementation with pseudo code, algorithm and flow chart.

Chapter 6: Testing techniques with sample test cases are given which is further followed by

snapshots.

20
Cyber-bullying Detection Using smSDA

Chapter 2

LITERATURE SURVEY
[1] T. K. Landauer, P. W. Foltz, and D. Laham, “An introduction to latent semantic analysis,”

Discourse processes.

Latent Semantic Analysis (LSA) is a theory and method for extracting and representing

the contextual-usage meaning of words by statistical computations applied to a large corpus of

text (Landauer and Dumais, 1997). The underlying idea is that the aggregate of all the word

contexts in which a given word does and does not appear provides a set of mutual constraints

that largely determines the similarity of meaning of words and sets of words to each other. The

adequacy of LSA’s reflection of human knowledge has been established in a variety of ways. For

example, its scores overlap those of humans on standard vocabulary and subject matter tests; it

mimics human word sorting and category judgments; it simulates word–word and passage–word

lexical priming data; and, as reported in 3 following articles in this issue, it accurately estimates

passage coherence, learnability of passages by individual students, and the quality and quantity

of knowledge contained in an essay. LSA can be construed in two ways: (1) simply as a practical

expedient for obtaining approximate estimates of the contextual usage substitutability of words

in larger text segments, and of the kinds of—as yet incompletely specified— meaning

similarities among words and text segments that such relations may reflect, or (2) as a model of

the computational processes and representations underlying substantial portions of the

acquisition and utilization of knowledge.

[2] T. L. Griffiths and M. Steyvers, “Finding scientific topics,” Proceedings of the National

academy of Sciences of the United States of America.

21
Cyber-bullying Detection Using smSDA

A first step in identifying the content of a document is determining which topics that

document addresses. We describe a generative model for documents, introduced by Blei, Ng, and

Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn. Res. 3, 993-1022], in

which each document is generated by choosing a distribution over topics and then choosing each

word in the document from a topic selected according to this distribution. We then present a

Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to

analyze abstracts from PNAS by using Bayesian model selection to establish the number of

topics. We show that the extracted topics capture meaningful structure in the data, consistent

with the class designations provided by the authors of the articles, and outline further

applications of this analysis, including identifying “hot topics” by examining temporal dynamics

and tagging abstracts to illustrate semantic content. one of the first things they do is identify an

interesting subset of the many possible topics of scientific investigation. The topics addressed by

a paper are also one of the first pieces of information a person tries to extract when reading a

scientific abstract. Scientific experts know which topics are pursued in their field, and this

information plays a role in their assessments of whether papers are relevant to their interests,

which research areas are rising or falling in popularity, and how papers relate to one another.

Here, we present a statistical method for automatically extracting a representation of documents

that provides a first-order approximation to the kind of knowledge available to domain experts.

Our method discovers a set of topics expressed by documents, providing quantitative measures

that can be used to identify the content of those documents, track changes in content over time,

and express the similarity between documents. We use our method to discover the topics covered

by papers in PNAS in a purely unsupervised fashion and illustrate how these topics can be used

to gain insight into some of the structure of science.

22
Cyber-bullying Detection Using smSDA

[3] D.M.Blei,A.Y. N g, and M.I. Jordan, “ Latent dirichletal location,” the Journal of machine

Learning research.

We describe latent Dirichlet allocation (LDA), a generative probabilistic model for

collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian

model, in which each item of a collection is modeled as a finite mixture over an underlying set of

topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic

probabilities. In the context of text modeling, the topic probabilities provide an explicit

representation of a document. We present efficient approximate inference techniques based on

variation methods and an EM algorithm for empirical Bayes parameter estimation. We report

results in document modeling, text classification, and collaborative filtering, comparing to a

mixture of unigrams model and the probabilistic LSI model. In this paper we consider the

problem of modeling text corpora and other collections of discrete data. The goal is to find short

descriptions of the members of a collection that enable efficient processing of large collections

while preserving the essential statistical relationships that are useful for basic tasks such as

classification, novelty detection, summarization, and similarity and relevance judgments.

Significant progress has been made on this problem by researchers in the field of information

retrieval (IR). The basic methodology proposed by IR researchers for text corpora—a

methodology successfully deployed in modern Internet search engines—reduces each document

in the corpus to a vector of real numbers, each of which represents ratios of counts. a basic

vocabulary of “words” or “terms” is chosen.

23
Cyber-bullying Detection Using smSDA

[4] J. Juvonen and E. F. Gross, “Extending the school grounds? bullying experiences in

cyberspace,” Journal of School health.

Bullying is a national public health problem affecting millions of students. With the rapid

increase in electronic or online communication, bullying is no longer limited to schools. The goal

of the current investigation was to examine the overlap among targets of, and the similarities

between, online and in-school bullying among Internet-using adolescents. Additionally, a

number of common assumptions regarding online or cyberbullying were tested. Within the past

year, 72% of respondents reported at least 1 online incident of bullying, 85% of whom also

experienced bullying in school. The most frequent forms of online and in-school bullying

involved name-calling or insults, and the online incidents most typically took place through

instant messaging. When controlling for Internet use, repeated school-based bullying experiences

increased the likelihood of repeated cyberbullying more than the use of any particular electronic

communication tool. About two thirds of cyberbullying victims reported knowing their

perpetrators, and half of them knew the bully from school. Both in-school and online bullying

experiences were independently associated with increased social anxiety. Ninety percent of the

sample reported they do not tell an adult about cyberbullying, and only a minority of participants

had used digital tools to prevent online incidents. The findings have implications for (1) school

policies about cyberbullying, (2) parent education about the risks associated with online

communication, and (3) youth advice regarding strategies to prevent and deal with cyberbullying

incidents.

24
Cyber-bullying Detection Using smSDA

[5] M. Fekkes, F. I. Pijpers, A. M. Fredriks, T. Vogel's, and S. P. Verloove-Vanhorick, “Do

bullied children get ill, or do ill children get bullied? a prospective cohort study on the

relationship between bullying and health-related symptoms” .

A number of studies have shown that victimization from bullying behavior is associated

with substantial adverse effects on physical and psychological health, but it is unclear which

comes first, the victimization or the health-related symptoms. In our present study, we

investigated whether victimization precedes psychosomatic and psychosocial symptoms or

whether these symptoms precede victimization. Victims of bullying had significantly higher

chances of developing new psychosomatic and psychosocial problems compared with children

who were not bullied. In contrast, some psychosocial, but not physical, health symptoms

preceded bullying victimization. Children with depressive symptoms had a significantly higher

chance of being newly victimized, as did children with anxiety. Many psychosomatic and

psychosocial health problems follow an episode of bullying victimization. These findings stress

the importance for doctors and health practitioners to establish whether bullying plays a

contributing role in the etiology of such symptoms. Furthermore, our results indicate that

children with depressive symptoms and anxiety are at increased risk of being victimized.

Because victimization could have an adverse effect on children's attempts to cope with

depression or anxiety, it is important to consider teaching these children skills that could make

them less vulnerable to bullying behavior.

[6] Cyberbullying involves the use of information and communication technologies to cause

harm to others (Belsey, 2004). According to the National Crime Prevention Council and Harris

Interactive, Inc.'s study in 2006,43% of the students surveyed had been cyberbullied within the

last year (cited in Moessner, 2007). That same year, the Pew Internet and American Life Project

25
Cyber-bullying Detection Using smSDA

found that one out of three teens have experienced online harassment (Lenhart, 2007). According

to an article in the NASP Communique (2007), a poll conducted by the Fight Crime: Invest in

Kids group found that more than 13 million children in the United States aged 6 to 17 were

victims of cyberbullying. The poll also found that one-third of teens and one-sixth of primary

school-aged children had reported being cyberbullied (Cook, Williams, Guerra, & Tuthill, 2007).

Forms of cyberbullying go beyond name calling and enter a world of impersonation and

cyberthreats. According to Willard (2006), there are nine main forms of cyberbullying: flaming,

harassment, denigration, impersonation, outing, trickery, exclusion, cyberstalking and

cybertreats. Flaming is online fights using electronic messages with angry and vulgar language.

Harassment is another form in which the cyberbully repeatedly sends insulting messages via the

Internet. Denigration is "dissing" someone online which can include sending or posting gossip or

rumors about a person that could damage their reputation or friendships. Impersonation is

pretending to be someone else in order to get that person in trouble with other people or to

damage their reputation and friendships. Outing is sharing someone's secrets, embarrassing

information, or photos online without hislher permission. Trickery is similar to outing, in which

the cyberbully will trick the victim to reveal secrets or embarrassing information and then share

it with others online. Exclusion is intentionally excluding someone from an online group.

Cyberstalking is repeated, intense harassment and denigration that includes threats or creates a

significant amount of fear in the victim. Lastly, cyberthreats are defined as either threats or

"distressing material," general statements that make it sound like the writer is emotionally upset

and may be considering harming someone else, themselves, or committing suicide (Willard,

2006). According to Willard, there are three related concerns in addition to the nine forms of

cyberbullying. These are students disclosing massive amounts of personal information via the

26
Cyber-bullying Detection Using smSDA

Internet, becoming 'addicted' to the Internet to the point where their lives are highly dependent

on their time spent online, and the prevalence of suicide and self-harm communities in which

depressed youths will sometimes access to gain information on suicide and self-harm methods

(Willard, 2006). In Confronting cyber-bullying (2009), Shariff overviews additional concerns

related to cyberbullying. These are anonymity, an infinite audience, prevalent sexual and

homophobic harassment, and permanence of expression. Anonymity refers to the anonymous

nature of cyberspace in which people are able to hide behind screen names that protect their

identity, which was mentioned earlier in this chapter. The online audience is described as being

infinite due to the large number of people that are able to see what is written by the bully and the

tendency of onlookers to support the perpetrators rather than the victim (cited in Shariff, 2009).

Shariff s third concern is the emergence of sexual and homophobic harassment on the Internet,

which she feels may be related to gender differences in the way that males and females use

communication technology. The Internet has a permanence aspect that is difficult to erase

because once anything is posted online, millions of people can download and save it

immediately, and share it with others. Heirman & Walrave (2008) have similar concerns. They

also list anonymity and infinite audiences, although they add other concerns as well. These are

2417 attainability, the private nature of online communication, and the absence of non-verbal

communication cues. As they describe it, 2417 attainability refers to the fact that the bullying

follows the victims home and is present online and on the victim's phones, all hours of the day.

The internet never turns off and therefore the victim can be bullied at anytime, anywhere that

they have their computer or phone with them. They also can be bullied when they do not even

know about it. This could happen if the bully posting something online without the victim

knowing about it until hours or days later. In that case, a lot of other people have the opportunity

27
Cyber-bullying Detection Using smSDA

to view the post or web site and draw their own conclusions. At that point, the damage has been

done (Heirman & Walrave, 2008).

[7]. According to Beale and Hall (2007), the six main ways are e~mail, instant messaging, chat

roomslbash boards, small text messaging, Web sites, and voting booths. E~mail is used to send

harassing and threatening messages to the victims and although it is possible to trace where the

e~mail was sent from, it is often difficult to prove exactly who sent the e~mail. Instant

messaging (1M) allows for 'real time' communication. Although most 1M programs allow users

to create a list of screen names that they do not want to contact them, it is easy for bullies to

create new screen names and therefore still be able to contact the victim. Chat rooms or bash

boards are a lot like instant messaging, however, instead of one~on~one real time

communication, there is a group of people who are all talking together at the same time (Beale &

Hall, 2007). "Bash board" is a nickname for an online bulletin board in which students can write

whatever they want, without it being known who they are. Often students will write untrue,

taunting statements about other students for the world to see. Small text messaging (SMS) are

text messages that are sent and received via mobile phones. Text messages can include words,

numbers, or an alphanumeric combination. Voting or polling booths are part of Web sites that

are made for the distinct purpose of mocking, antagonizing, and harassing others. These sites

allow the users to vote anonymously online for the "ugliest," "fattest," "dumbest," "biggest slut,"

and so on, boy or girl in their school (Beale & Hall, 2007).

[8]. According to the Pew Internet and American Life Project survey in 2006 about

cyberbullying, girls were more likely than boys to say they have experienced cyberbullying; 38%

of online girls reported being bullied compared to 26% of online boys. Furthermore, girls aged

15 to 17 are the most likely to have experienced cyberbullying, with 41 % of respondents from

28
Cyber-bullying Detection Using smSDA

that group reporting they had been cyberbullied compared to 34% of girls ages 12 to 14. It was

also found that nearly 4 in 10 social network users have been cyberbullied, compared with 22%

of online teens who do not use social networking sites (cited in Lenhart, 2007). From the same

poll, it was found that online rumors tended to target girls as well; 36% of girls compared to 23%

of boys. Online rumors can include someone making a private e~mail, instant message

conversation, text message, or embarrassing photo of the victim, public without the victims

consent. One in eight online teens reported that they had received a threatening e~mail, text

message or instant message. Older teens, especially 15 to 17 year old girls, were more likely to

report they have been threatened online (cited in Lenhart, 2007). According to a study conducted

in 2008 by Hinduja & Patchin, females are as likely, if not more likely, to be involved in

cyberbullying in their lifetime. Although, when students were asked about their recent

experiences of being cyberbullies, males and females responded equally. When asked about

lifetime participation, females reported higher rates of participating in cyberbullying, which

leads one to believe females engage in these activities for a longer period of time. Females tend

to take pictures of victims without them knowing and posting them online more than males did.

Females also tend to post things online to make fun of someone more often, although males tend

to send emails to make them angry or to make fun of them. Although traditionally males tend to

bully in more outward and public ways, according to this study, females are more likely to

ensure that their victims are embarrassed in front of a larger audience since they use social

networking sites instead of e-mail more often than males do. When it comes to being a victim of

cyberbullying, the results are about the same. Females are more likely to have experienced the

effects of cyberbullying than males, although the difference disappears when they were asked

about the last 30 days. The data shows that females are 6% more likely to have been cyberbullied

29
Cyber-bullying Detection Using smSDA

in their lifetime than males. Females also have increased rates of being cyberbullied by someone

at their school and having threats made online be carried out at school (Hinduja & Patchin,

2008).

[9]. Hinduja & Patchin (2008) researched the reasons why females participate in and experience

cyberbullying more often than males. They found that due to females being more verbal and

cyberbullying being text based, it is more likely for females to partake in cyberbullying. Females

also tend to bully in more emotional and psychological ways, such as spreading rumors and

gossiping, which is more in line with cyberbullying. Females tend to be less confrontational

when in a face to face situation and therefore the anonymity of the online community may be

more appealing to them. Hinduja & Patchin also state that females are generally culturally and

socially constrained when it comes to using aggression or physical violence, however, are not

under those constraints while they are online. Females are often more apt to require social

support and in order to gain that, they often gang up against other females. The online

community is an easy and quick way to gang up against other females and to have many people

views it which adds to the humiliation (Hinduja & Patchin, 2008).

30
Cyber-bullying Detection Using smSDA

Chapter 3

Area of Domain: Cyber-Bullying

CONCEPT OF CYBER BULLYING

The term Cyber Bullying was coined by Bill Belsey, Canadian educator. Cyber bullying is

defined as, using both information technology and communication technology beyond the n,

state of mind, or to humiliate a person.1 It is an act by which the person being bullied suffers an

adverse effect. It is a deliberate attempt which can be continuous or one time. The bully can be a

known person or maybe an unknown person or a group. It is done using technologies such as

internet, some chat groups, instant messaging, short message service, web pages, e-mails, etc.

The intention is to harm a person. It is an act of a person who is either physically powerful or

socially powerful over the victim. It can also be in the form of developing a web site and posting

obscene photos or defamatory text on it. Some instances of cyber bullying can be a mere e-mail

to someone who has expressed his contention of not keeping any contact, posting pictures and

sexual remarks.2 It is alarming to note that 63% of harassers are reportedly under the age of 18

years. As per the survey of 1400 school children grades 4-8th, produced by abcnews.com in

September, 2006, the results were surprising; 42% kids were cyberbullying victims, in which

1/4th had it more than once; 35% were threatened, 21% had received mean or threatening emails

or messages.

CYBER BULLYING VERSUS BULLYING

Traditionally bullying has been common in everyone’s school or college days. But now bullying

is no more limited to schoolyards but has expanded the horizons.3 The increase in cyber bullying

is due to the fact, that it allows anonymous comments or posts. The traditional bullying was

31
Cyber-bullying Detection Using smSDA

confined to play grounds, face to face and in front of limited people. On the other hand, cyber

bullying has increased the number of audience in the case of bullying. Cyber bullying has

provided a mask to bullies. Also, the cruelty of bullying has been moved to cyber space and to

the world at large. Cyber bullies are at an advantage than the traditional ones, since they can also

mask someone else identity which traditional bullies could not. The major difference between the

traditional bullying and cyber bullying is that of the impact and consequences. In traditional

bullying the impact is short lived than that in case of cyber bullying. In case of bullying, the

impact can neutralize with time and also location. In case of cyber bullying, audience- being at

large and on cyber space, it is long lasting and is independent of location. There are limitations to

face to face bullying in context with the time when it can be done, place and also in front of

limited audience and by limited people. In case of cyber bullying, it can be done by anyone,

anytime, at any place on the internet, sitting at any place and in front of anyone since it is online

and also can be shared. Also, effects and consequences of cyber bullying are more adverse than

mere bullying. Cyber bullying can also give rise to legal consequences whereas in the case of

tradition bullying, such consequences are rare. There have been cases where cyber bullying has

resulted in suicide by the victim.4 Example of traditional bullying and cyber bullying can be

keeping a person out of the group or teasing them, and using someone’s profile and spreading

rumors or posting nasty comments respectively. The difference lies in the fact that the comments

remain online perpetually in case of cyber bullying where as in traditional bullying, the wounds

heal with time and the memories fade.

CYBER BULLYING VERSUS CYBER STALKING

The difference between cyber stalking and cyber bullying is that of age. When an adolescent is

involved, the term used is cyber bullying but in case when a major is involved, it is cyber

32
Cyber-bullying Detection Using smSDA

stalking.6 There is no legal distinction between the two other than that of age. The act in cyber

stalking is same as that of the cyber bullying, only difference being is that of age. Cyber stalking

is a form of cyber bullying.

REASONS FOR CYBER BULLYING

Since cyber bullies are people of tender age, they lack the sense of understanding their action and

what consequences it can have on others. One of the reasons is ignorance of consequences and

nature of the action. Some of the reasons can be anger, frustration, boredom and a need of

laughter.7 Bullies generally ignore the fact that it might cause long lasting impact on the person

being bullied. The main reason of cyber bullying revolves around the fact of revenge and power.

Bullies go for cyber bullying in order to meet their revenge. In most of the cases, a person being

bullied in earlier situation turns into a bully to satisfy his hunger of revenge. Also, there are

instances where a person who cannot speak up directly in front of the victim, takes advantage of

anonymity of cyber bullying. There are possibilities that the reason being social power. Just to

become socially powerful, people may try to demean others. Jealousy can also be one of the

reasons giving rise to cyber bullying. Since adolescent age attracts jealousy soon, jealous minors

are potential bullies. Jealousy due to academic excellence or social popularity can also give birth

to bullies. Some reasons can be with no validity. Some people turn into bullies with no reason at

all or just for the purpose of entertainment and fun. People who are socially low, find bullying as

a medium to become popular and satisfy their self-esteem. Some find it as a way to satisfy their

ego and find happiness in hurting others.

TYPES OF CYBER BULLYING

Cyber bullying can take various forms. To name some are those involving abuse to personal

information of a person such as photos, blogs, etc. sending viruses to destroy the information of

33
Cyber-bullying Detection Using smSDA

the other person, or to abuse a person in a chat room, sending images or texts through mobile

phones are also some types.8 Also, e-mails when conveyed not to, or sending vulgar/junk mails

are also its kind. Another type can be that of impersonating someone, revealing the secret

information shared, excluding someone from a chat group, exchanging rude comments on the

group, harassing someone continuously, online polling, stealing passwords and misusing it in

revealing information, telling someone else to bully a person.

MODES OF CYBER BULLYING

Bullying someone on the internet can take place through various methods. Some methods of

cyber bullying can be simplest of all that is, sending text messages, or e-mails or instant message

to someone who has already expressed his intention of not keeping any contact with the sender.9

Other methods can be of threat, gaming up on victim, defaming, sexual remarks, posting rumors,

hate speech, making an online forum against the victim, etc. Some other methods can include

impersonation, making fake accounts, posting on social media and in video games, portraying or

abusing someone.

CONSEQUENCES

The consequences of cyber bullying can be as adverse as committing suicide. In case of Ryan

Halligan, a 13 year old school boy, who committed suicide after becoming a cyber-bullying

victim. He used AOL instant messenger and became friends with the popular girl of the school.

He ended up being in an online relationship with her. All this happened during the vacations

before grade 8th. On joining the school, when he approached her, she called him a loser in front

of everybody. She told him that everything she said online was a lie and had forwarded the

messages to her friends for a laugh. Other than the psychological harm to the victim, there can be

legal consequences against the bully too.10 The legal consequences can be either criminal or

34
Cyber-bullying Detection Using smSDA

civil and the bully might be tried for the same. As punishment, the consequence can be expulsion

or suspension from school or college. The charges can be that of defamation, threat, theft,

outraging the modesty of a woman, harassment, etc. Most of the children are scared of telling

parents about them being bullied online. As per the research conducted by abcNews.com in

September, 2006, 58% of kids have not told their parents or an adult about them being bullied

online.11 Parents can look for signs that match the following to know if their child is a victim of

cyberbullying. The signs can be, fear in leaving the house, lack of appetite, low self-esteem,

secretive about internet activities, close computer windows on your arrival, behavioral changes,

aggression at home, decreased success, incomplete school work, unexplained pictures on their

computer, crying for no reason, changes in the dressing and other habits, lack interest in

attending social gatherings where other students are also present, complaining sickness before

any event in the community or at school, etc.

ILLUSTRATIVE CASE: CYBER BULLYING

Indian laws have been silent on the problem and victimization of cyber bullying. The instances

of the same has been increasing over the years and has reached an alarming situation leaving

India on the third position in terms of cyber bullying cases across the globe.12 The statute which

addresses computer related concerns is the Information Technology Act, 2000 along with its

amendment of 2008.13 It is surprising that IT act has not touched upon communication related

threats and offences on the cyber space. One of the advantages of internet and computers is ease

of communication and connectivity. Like everything, boon and banes walk hand in hand.

Similarly, with advantage of communication, there are possible threats and shortcomings of the

same. There may be instances where a person can communicate any untrue statement or a

statement which may affect someone’s self-respect or opinion about him in the society. All such

35
Cyber-bullying Detection Using smSDA

communications can have adverse effect on the person concerned. Anything on the internet is

like a permanent scar on a person’s social-image. The consequences of such communications are

grave and cannot be compared to traditional on-site insult of a person. The traditional insult is

limited to a particular area and restricts to limited ears whereas online insult can reach out to ears

across the globe crossing all the geographical barriers within a tic of a clock. The potential harm

of an online offense is wide and beyond the foreseeing capacity of many. There are numerous

potential cybercrimes possible but India has emphasized on some only.

Cyber bullying is one of such potential ones. The law relating to communication is restricted to

offensive, obscene message/images or messages when shown disinterest. Cyber bullying is a

wider term. A victim of cyber bullying can approach the court under IPC or Under IT Act. IPC

opens up doors for victims of stalking u/s 354D and its ambit covers a part of cyber stalking u/s

354D(1)(ii).14 There is also a provision for safeguarding women from harassment. Cyber

bullying is a term which covers cyber stalking and cyber harassment both along with multiple

other faces it may take. The question arises, why to define cyber bullying separately when it can

be covered under IPC and IT Act? The answer is a complex one and has many reasons for such

thought of inclusion, one of them being, cyber bullying has more than what is defined under

harassment or stalking.

Assuming a scenario, where a girl A indulges in a fight with boy B online. A clears her intention

of not keeping friendship with B and no contact in future. B on the other hand , doesn’t try to

approach A and goes to a mutual friend of theirs and leaks secrets of A by sending screenshots of

the chat. The mutual friend reposts the chats and makes fun of it, consequences of the same

being, A commits suicide. Under which provision of law will B are punished? Will these

publications be called ‘imputation’ as mentioned in the section 49915 of IPC? Imputation simply

36
Cyber-bullying Detection Using smSDA

means something dishonest. But herein, the statements were not dishonest, which means it is out

of the scope of being a defamatory statement. There was no monitoring or any chance of

following A again, which means it was not stalking either. Also, what if B is 11 years old? Since

the crime of defamation, stalking and harassment is covered under IPC, there is a provision

under section 83 which saves a child above 7 years and below 12 years from offenses.

The targets of cyber bullying are children who are of tender age and have no mental or emotional

stability. The side effects of online bullying can be grave and the imprints can be long lasting.

No kid would be able to recover from the suffering neither would be able to find any escape

since anything on cyber space spreads too fast and to places. There is a need to understand the

gravity of the consequences and possible consequences possible through various modes, and then

to make law accordingly. Even if no clear definition of cyber bullying is introduced with a

punishment, there are chances that many may fly away in the absence of law and many would

not be able to get fair justice in search of relevant sections to apply under. Also, there would be

possibility of ambiguity in applying relevant sections. All such cases would again limit down to

criminal cases under IPC, without allowing any case to go under the IT Act which would further

affect the development of Internet Laws or the Cyber Laws in India. Calling names to friends on

playground is normal and part of growing up but what if this name calling goes on the cyber

space? The affect is not the same. Making fun of a friend online does not necessarily mean

insulting that person but it can turn out to be ugly and affects the mind-set of the child. Since

cyber bullying revolves mainly around children, there is a need to understand the child

psychology and determine liabilities accordingly. There can be instances where a post is made

public on the social site for the purpose of fun but it turns out into insult, and the post was no

false statement or obscene picture, such insult has not been penalized under the statute. A child

37
Cyber-bullying Detection Using smSDA

has developing maturing and understanding capability. There are chances of him taking it

personally and be offended of the same. The need is to cover the different scenarios that do not

fall under Indian Penal code or Information and technology Act, along with keeping in mind the

mental stability of the people involved in the said act. Cyber bullying has two main elements,

namely, the age, and the act done. On the basis of these two elements a distinction can be made,

respecting the right to speech on cyber space. In the meanwhile, the above stated statutes are

competent enough to address many but not all cases of cyber bullying under various provisions.

The following section will deal with possible provisions to refer to in case of seeking remedies.

REMEDIES

1. Legal Remedy under Information Technology Act Cyber bullying is an injury which leaves it

scars for the rest of the life. To cope up with it is difficult. To keep quiet about it and letting go

the bullies is not the solution to it neither does it help the victim to overcome it. Letting go

bullies without reporting or bringing out an action against them can lead to potential attacks and

also aggression in the bully, seeing them free of any harm. There are legal remedies available

against cyber bullying. The remedy can be civil or criminal. After the amendment in the Indian

Penal Code of 1860 in 2013, cyber stalking has been added as a criminal offense. The remedies

available are herein below. Chapter 11 of the Information Technology Amendment Act consists

of offenses, where there is no clear definition of the offence of cyber bullying. Still the act

provides remedies against the same under section 66 and section 67. As discussed earlier, modes

of bullying can be through e-mails, threatening, or even posting false statement which causes

injury to the victim not only physically but psychologically. Such acts are punishable. Section

66A16 provides remedies against offences which involve sending of offensive messages through

a communication service. It gives punishment for an act which involves sending of information

38
Cyber-bullying Detection Using smSDA

which is offensive, or false, or points out the character. It also deals with punishing the minds

whose purpose is to cause danger, insult, injury, enmity, danger, hatred, ill will etc. to the victim,

through a computer resource or by communication devices. Sending any electronic mail or any

attachments with the purpose of creating annoyance, inconvenience, misleading someone about

the origin of the message is an offence under the said section. The punishment can be

imprisonment up to three years and fine. Surprisingly, the section dealt with most of the cases

that could fall under the umbrella of cyber bullying but has been struck down by Supreme Court,

giving freedom to speech due respect. A provision which resolves the friction between cyber

bullying and freedom of speech is the requirement of the hour. Since, section 66A needs to be

redefined in an unambiguous manner; including cyber bullying in the same can solve the

purpose.

Remedies are also available against transmitting material containing an explicitly sexual act.

Section 67A is a remedy against publishing or transmitting or causing such transmitting of

sexually explicit material shall be punished with imprisonment up to five years and fine up to ten

lakh rupees in the first conviction. In the second or subsequent conviction, punishment can be of

imprisonment up to seven years and also fine up to ten lakh rupees.19 Section 67B also punishes

the same act. The difference between the two sections is that of the age of people in the material

transmitted. In section 67B, it punishes a person who publishes or transmits material in electronic

form which depicts children engaged in sexually explicit act or conduct, creates text or any

digital images, or collects any material which depicts children in sexually explicit manner, or

facilitates child abuse, etc. punishment in the first conviction is of imprisonment up to five years

and fine up to ten lakh rupees, and up to seven years of imprisonment and fine up to ten years in

case of second or subsequent conviction.20 There may be cases when the personal information is

39
Cyber-bullying Detection Using smSDA

stored with intermediaries. The duty of such intermediaries is to preserve and retain such

information. Chances of leakage of information are also there. If such leakage is intentional or

with an act knowing would lead to leakage is punishable for imprisonment up to three years and

also fine, under section 67C. Under certain situations, intermediaries may not be liable. Where

there is a lawful contract between the persons who have secured access to the personal

information and the owner of the information, any breach in the contract by releasing such

information is punishable. Such disclosure shall be without the consent of the owner of the

information and with an ill-will. The punishment can be imprisonment up to three years, or with

fine up to five lakh rupees, or with both. When an offence takes place, or bullying takes place as

a consequence of some conspiracy or instigation, the person instigating or conspiring is also

liable to a punishment under the act.21 When no punishment is explicitly mentioned for an act

violating the rules and regulation of the Information Technology Act, 2000 or Information

Technology Act (amendment), 2008, such act will be punished under section 45. Under section

45, the compensation not more than twenty- five thousand to the person affected by such

violation or penalty not exceeding twentyfive thousand to be punished with when no separate

punishment is available.

International Law

The first case of cyber bullying which gathered the attention of nation world-wide was that of

Ryan Haligan. Ryan was a 13 year old boy. He had concerns with speech, language, and motor

skills in his early childhood. Having received special education services till fourth grade, he

recovered in the fifth grade and was no longer in need of special attention. In his fifth standard

he encountered cyber bullying for the first time on his physical and academic weakness. Later on

Ryan told his parents about his friendship with that kid who used to bully him in school.

40
Cyber-bullying Detection Using smSDA

Considering him a friend, Ryan told the kid about his embarrassing examination required due to

stomach pains. The kid spread rumor that Ryan was a gay. Later during his summer vacations he

started spending time online. He started talking to the famous girl in the school he had crush on.

The girl also pretended to like him. When he tried to contact her school, she called her a loser in

front of everybody. She had also shared their IM chats with others to laugh at. Being a constant

victim of cyber bullying, Ryan committed suicide by hanging himself. He had not left any

suicide note but the father registered the cause by looking at his IM chats. The father knew the

culprits of the suicide and wanted to file charges against the bullies. The police told him that

there was no criminal law that covered the circumstances. All he could do was talk to the bullies

and their families. He went up to schools to educate them on cyber bullying.

CONCEPT UNDER DIFFERENT COUNTRIES

1. United Kingdom Law Bullying is not a specific criminal offence in UK law. There are

criminal laws that can be applied to cyber bullying.32 Protection from Harassment Act, 1997 for

repeated actions. It prohibits harassment, and provides civil remedy for breach of such

prohibition under section 3. Threat is punished under section 4. Communication Acts, 2003

covers cases of offensive, obscene communication. Malicious Communication Act 1988 covers

sending of threatening, or threatening, electronic communication. Other remedies are available

under Public order Act 1986, Obscene Publications Act 1959, Computer Misuse Act, 1990 and

Crime, Defamation Acts of 1952 and 1996, and Disorder Act 1998.

2. United States Laws Nearly all states have amended and passed laws to address it. The federal

law is under the Megan Meier Cyber bullying Prevention Act.33

3. European Law European data protection legislation is being applied to the issue of cyber

bullying, online harassment and identity theft.

41
Cyber-bullying Detection Using smSDA

4. International Perspective UNICEF, the Human Rights Commission and the United Nations

are calling for a coordinated approach from governments all around the world.

CURRENT SCENARIO

As per the research conducted on 400 students of age 11-14 in the Midwest, in October 2013,

statistics say that 97.5% have been online in previous 30 days, 63% has cell phones, 43% are on

Facebook, 42% are on Instagram, 11.5% have been target of cyber bullying in previous 30 days

from which boys are 6.8% and girls 16%, and 3.9% have cyber bullied others in previous 30

days again of which boys are 0.6% and girls being 6.9%.34 Instagram has also become a mode

of cyber bullying. There have been cases of cyber bullying on Instagram too. It can take place

through posting embarrassing photos of a person, putting hash tags which can be insulting,

posting something defaming or cruel comments, creating fake profiles.35 Today, social media

has become a large platform for cyber bullying. Confession pages are new and have held

attention of most. A confession page of a community or institute allows people to post anything

about anybody without their identity being revealed. The administrators of such pages receive

inbox messages which they post on the page for everybody to ready.36 People who like these

pages are connected and remain in that circle and keep getting notifications of posts on the page.

Facebook pages and twitter pages are new in the trend. People can inbox anything to the admin

to post it. These posts can be any specific confession also. Sometimes it includes posting of

photos too which can be humiliating, also posting some secret information of the victim. People

post anything since there is no threat of their identity being released.

42
Cyber-bullying Detection Using smSDA

Chapter 4

DETAILED DESIGN

4.1 Functional and Non functional Requirements

4.1.1 Functional Requirements

 User registration.

 User login.

 User sending friend request, accepting or ignoring friend request.

 User posting photos, comments.

 User chatting with friends.

 Admin checks for the bullying words.

 Admin add the bullying words and sequence of words.

 Admin blocks the malicious user.

4.1.2 Non-Functional Requirements

These attributes will specify the system characteristics with respect to their

functionalities.

Reliability: The system has admin and user authentication process. It is

trustworthy due to systematic operation and it is reliable in nature.

Availability: The application is available on java platform.

Security: The application provides complete security for security system through user

credential

Maintainability: The java platform is used to support our application,

maintenance is very easy and economical.

43
Cyber-bullying Detection Using smSDA

Interoperability: The ability of making systems and organizations to work together

(inter-operate).

Extensibility: System design principle where the implementation takes future growth

into consideration.

4.2 System Requirements

To be used efficiently, all computer software needs certain hardware components or other

software resources to be present on a computer.

4.2.1 Hardware Requirements

 Processor: Pentium

 RAM: 1GB DDR RAM 256GHZ

 Hard disk drive: 50GB

4.2.2 Software Requirements

 Operating system: Windows/Ubuntu

 Language: Java

 Framework: Java

 Front end: HTML, CSS, JavaScript, Jquery

 Database: PostgreSql

44
Cyber-bullying Detection Using smSDA

4.3 System Analysis

The system design process partitions the system into subsystems based on the

requirements. Overall system architecture is established and is concerned with identifying

various components, specifying relationships among components, specifying software structure,

maintaining a record of design decisions, and providing a blue print for the implementation

phase.

Design consists of architectural design and detailed design. Detailed design is concerned

with the details of how to package the processing modules and how to implement the processing

algorithms, data structures, and interconnections and interconnections among modules and data

structures. Requirements analysis in systems engineering and software engineering encompasses

those tasks that determines the needs or conditions to meet a new or altered product, taking

account of conflicting requirements of the various stakeholders, such as beneficiaries or users.

Requirements must be documented, actionable, measurable, testable, related to identified

business needs or opportunities, and defined to a level of detail sufficient for system design.

Requirements can be architectural, structural, behavioral, functional, and non-functional. It is

much easier to make changes and corrections in the early phases of software development life

cycle than in later phases. For this reason, it is important to make logical system design

specification as complete and as correct as possible.

The Functional Specification produced during System Requirements Analysis is

transformed into a physical architecture. System components are distributed across the physical

architecture, usable interfaces are designed and prototyped, and Technical Specifications are

created for the Application Developers, enabling them to build and test the system. Many

45
Cyber-bullying Detection Using smSDA

organizations look at System Design primarily as the preparation of the system component

specifications; however, constructing the various system components is only one of a set of

major steps in successfully building a system.

4.4 High Level Design

A high level design provides an overview of a solution, platform, system, product,

service, or process. The overview is important in a multi-project development to make sure that

each supporting component design will be compatible with its neighboring designs.

Design level mention every work area briefly, clearly delegating the ownership of more

detailed design activity and also encourages effective collaboration between the various project

teams. Most high level design require contribution from a number of experts, representing many

distinct professional disciplines.

Figure 3.4.1 Data flow diagram

46
Cyber-bullying Detection Using smSDA

4.5 Low Level Design

Low-level design (LLD) is a component-level design process that follows a step-by-step

refinement process. The process can be used for designing data structures, required software

architecture, source code and ultimately, performance algorithms. The data organization may be

defined during requirement analysis and then refined during data design work. Post-build, each

component is specified in detail. The LLD phase is the stage where the actual software

components are designed. During the detailed phase the logical and functional design.

4.5.1 Use Case Diagrams

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral

diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical

overview of the functionality provided by a system in terms of actors, their goals (represented as

use cases), and any dependencies between those use cases. An important part of the Unified

Modeling Language (UML) is the facilities for drawing use case diagrams. Use cases are used

during the analysis phase of a project to identify and partition system functionality. They

separate the system into actors and use cases.

Use Case Diagram Symbols

Symbol Name

Actor

47
Cyber-bullying Detection Using smSDA

Association

Use case

Figure 3.5.1 Use case diagram for user

Use Case for User

Actors: User

Use cases: Register

48
Cyber-bullying Detection Using smSDA

Login

search by name

send request

send message

Logout

Description of Actors

User: User will get registered and registered user must login to use the application.

Description of Use Cases

Register: User should register.

Login: Registered user must login using valid id and password.

Search by name: The user searches the friends by name.

Send request: User sends the request to his friends.

Send message: User messages to his friends

Logout: User logs out once the requirement is fulfilled.

49
Cyber-bullying Detection Using smSDA

Figure 3.5.2 Use case diagram for admin

Use case for Query Server

Actors: Query Server

Use cases: Login

Upload words

Upload sequence of words

Checks for bad words

Logout

Description of Actor

Admin: Admin will upload the illegal words and checks for the words

50
Cyber-bullying Detection Using smSDA

Description of Use cases

Login: Admin must login using valid id and password.

Uploads words: Admin uploads the bullying words.

Upload sequence of words: Admin uploads the sequence of words.

Check for words: Admin checks for bad words

Logout: Query Server logs out once the requirement is fulfilled.

4.5.2 Sequence diagram

A sequence diagram is an interaction diagram that shows how operate with one another

and in what order. It is a construct of a Message Sequence Chart. A sequence diagram shows

object interactions arranged in time sequence. It depicts the objects and classes involved in the

scenario and the sequence of messages exchanged between the objects needed to carry out the

functionality of the scenario.

A sequence diagram shows, as parallel vertical lines (lifelines), different processes or

objects that live simultaneously, and, as horizontal arrows, the messages exchanged between

them, in the order in which they occur. This allows the specification of simple runtime scenarios

in a graphical manner.

51
Cyber-bullying Detection Using smSDA

Figure 3.5.3 Sequence diagram

The figure 4.3.3 depicts the sequence of actions in user, server and admin.

When user is going to register then all the user details are sends to the server and the

server sends the user details to the Admin and sends the acknowledgement back to the user for

the registration. Admin is going to upload the words and upload the sequence of words to the

server then server sends the acknowledgement back to the Admin. Here the user can search the

friend list and the server shows the friend list. If the user sends the request to the friends then the

server sends the acknowledgement to the user then the user can send the messages or they can

chat with them. While posting the messages the server check each word in the message and if it

contain any bad words then the server sends the notification to the user. If the Admin gets the

52
Cyber-bullying Detection Using smSDA

additional bad words then they are going to add the bad words to the server and the server sends

backs the acknowledgement to the Admin again.

53
Cyber-bullying Detection Using smSDA

Chapter 5

IMPLEMENTATION

5.1 Pseudo code

5.1.1 Connection Part: Connecting JDBC driver and Mysql includes 5 major steps:

 Load and Register the driver using

Class.forName(“sun.jdbs.odbc.jdbcodbcDriver”);

 Establish a connection using getConnection() method of Driver Manager class

 Create SQL statements using createStatement() method

 Execute SQL statement using executeQuery() method

 Close the connection using close() method

5.1.2 Web Part: Web part of the project includes login of the Service Provider (admin) and

Query Server (server). It also includes registration of the mobile user.

a. Login page:

If entered email address and password are correct

login success

Else login fail

b. Registration page:

Enter details for required fields

54
Cyber-bullying Detection Using smSDA

5.1.3 Html pages:

function get_html(pagename) {

Gives html container pages

function menu_clicks( ) {

Gives menu bar which includes edit profile,find friends,

view request,timeline,chatlist,logout

function settimeline( ) {

Edit timeline posts,images

function getrequest( ) {

Gives list of request

function searchfriend( ) {

Gives list of friends

55
Cyber-bullying Detection Using smSDA

function clicks(pagename) {

Takes to clicked pages

function setprofile( ) {

Used to set profile like profile name,gender,date of birth,

designation,profile photo etc..

function blockuser( ) {

Blocks the user’s post which includes bullying texts

5.2 Design of Test Cases

The purpose of testing is to discover errors. Testing is the process of trying to discover

every conceivable fault or weakness in a work product. It provides a way to check the

functionality of components, sub-assemblies, assemblies and/or a finished product It is the

process of exercising software with the intent of ensuring that the Software system meets its

requirements and user expectations and does not fail in an unacceptable manner. There are

various types of test. Each test type addresses a specific testing requirement.

5.2.1 Types of Testing

56
Cyber-bullying Detection Using smSDA

Unit Testing: Unit testing involves the design of test cases that validate that the internal

program logic is functioning properly, and that program inputs produce valid outputs. All

decision branches and internal code flow should be validated. It is the testing of individual

software units of the application .it is done after the completion of an individual unit before

integration. This is a structural testing, that relies on knowledge of its construction and is

invasive. Unit tests perform basic tests at component level and test a specific business process,

application, and/or system configuration. Unit tests ensure that each unique path of a business

process performs accurately to the documented specifications and contains clearly defined inputs

and expected results.

Integration Testing: Integration tests are designed to test integrated software components

to determine if they actually run as one program. Testing is event driven and is more concerned

with the basic outcome of screens or fields. Integration tests demonstrate that although the

components were individually satisfaction, as shown by successfully unit testing, the

combination of components is correct and consistent. Integration testing is specifically aimed at

exposing the problems that arise from the combination of components.

Functional Testing: Functional tests provide systematic demonstrations that functions

tested are available as specified by the business and technical requirements, system

documentation, and user manuals.

Functional testing is centered on the following items:

Valid Input: identified classes of valid input must be accepted.

Invalid Input: identified classes of invalid input must be rejected.

Functions: identified functions must be exercised.

Output: identified classes of application outputs must be exercised.

57
Cyber-bullying Detection Using smSDA

Systems/Procedures: interfacing systems or procedures must be invoked. Organization and

preparation of functional tests is focused on requirements, key functions, or special test cases. In

addition, systematic coverage pertaining to identify Business process flows; data fields,

predefined processes, and successive processes must be considered for testing. Before functional

testing is complete, additional tests are identified and the effective value of current tests is

determined.

System Testing: System testing ensures that the entire integrated software system meets

requirements. It tests a configuration to ensure known and predictable results. An example of

system testing is the configuration oriented system integration test. System testing is based on

process descriptions and flows, emphasizing pre-driven process links and integration points.

White Box Testing: White Box Testing is a testing in which in which the software tester

has knowledge of the inner workings, structure and language of the software, or at least its

purpose. It is purpose. It is used to test areas that cannot be reached from a black box level.

Black Box Testing: Black Box Testing is testing the software without any knowledge of the

inner workings, structure or language of the module being tested. Black box tests, as most other

kinds of tests, must be written from a definitive source document, such as specification or

requirements document, such as specification or requirements document. It is a testing in which

the software under test is treated, as a black box, you cannot “see” into it. The test provides

inputs and responds to outputs without considering how the software works.

5.2.2 Test Objectives

All field entries must work properly.

58
Cyber-bullying Detection Using smSDA

Pages must be activated from the identified link.

The entry screen, messages and responses must not be delayed.

5.2.3 Features to be tested

Verify that the entries are of the correct format

No duplicate entries should be allowed

All links should take the user to the correct page.

5.2.4 Acceptance Testing

User Acceptance Testing is a critical phase of any project and requires significant participation

by the end user. It also ensures that the system meets the functional requirements.

5.2.5 Test Results

All the test cases mentioned above passed successfully. No defects encountered.

5.3 Sample Test Cases

Test Test Test Case Step I/p Expected Actual Test

Case Case Description given o/p o/p Stat

ID Name us

Pass

/Fail

TC01 Login To verify Login Valid Login Login pass

59
Cyber-bullying Detection Using smSDA

that the with username successful successful


user has username &

entered & password

valid password

username

&

password

Login To verify Login Invalid Login Error Fail


that the with username
successful Enter
user has username &
valid
entered & password
username
valid password
&
username

& password

password

TC02 Registration To verify Enter all Valid Registered Registered Pass

that the user the valid details successfully successfully

has user

registered details

by entering

valid details

60
Cyber-bullying Detection Using smSDA

Registration To verify Enter all Invalid Registered Not Fail


that the user the valid details Successfully Registered

has user successfully

registered details

by entering

valid details

TC03 Search To search a Check Type a Expected Displays Pass

friend friend the list of friend friend the

friends name profile expected

are friend

available profile

Search To search a Check Type a Expected Displays Fail

friend friend the list of friend friend the wrong

friends name profile friend

are profile

available

TC04 Comment Type a The The Displays the Displays Pass

messages comment comment comment comment the

used used comment

without without

61
Cyber-bullying Detection Using smSDA

bulling bulling

words words

TC05 Posting Type a The The Post Post Pass

messages message messages messages message message

used used

without without

bulling bulling

words words

Posting Type a The The Post Block the Fail

messages message messages messages message post that

with with has been

bulling bulling sent

words words

62
Cyber-bullying Detection Using smSDA

RESULT AND CONCLUISION

Snapshots

Snapshot 1: Registration page of the user.

This snapshot depicts registration page where the user fulfill their details like

name, email address, date of birth, gender and so on.

63
Cyber-bullying Detection Using smSDA

Snapshot 2: Login page of the user.

This snapshot depicts login page where the user enters valid email address and

password.

64
Cyber-bullying Detection Using smSDA

Snapshot 3: Home page of the user

This snapshot depicts the home page of the user where time line posts and

friend list are displayed.

65
Cyber-bullying Detection Using smSDA

Snapshot 4: Friends list of the user.

This snapshot depicts the list of friends their profile picture and timeline posts.

66
Cyber-bullying Detection Using smSDA

Snapshot 5: Chat box of the user.

This snapshot depicts the chat box where the user can chat with their friends

and can send images too.

67
Cyber-bullying Detection Using smSDA

Snapshot 6: Friend list.

This snapshot depicts searching of friends by typing required friend name

where the user can send request.

68
Cyber-bullying Detection Using smSDA

Snapshot 7: Friend profile page.

This snapshot depicts friend details like name, date of birth, school name,

college name, employment details, gender and so on.

69
Cyber-bullying Detection Using smSDA

Snapshot 8: User Profile updating page.

This snapshot depicts updating user profile where user name, date of birth,

school and college name, employment designation can be edited.

70
Cyber-bullying Detection Using smSDA

Snapshot 9: Timeline photos and commenting page of the user.

This snapshot depicts the field where user can upload photos with the caption,

this can be viewed by their friends.

71
Cyber-bullying Detection Using smSDA

Snapshot 10: Messages Received by the user.

This snapshot depicts the messages that are received by the user where the user

can reply to sender, can delete the message.

72
Cyber-bullying Detection Using smSDA

Snapshot 11: Admin login page.

This snapshot depicts login page where the admin enters valid email address

and password.

73
Cyber-bullying Detection Using smSDA

Snapshot 12: Blocked user detail page.

This snapshot depicts details of blocked users who have used bullying words

either in comment field or in chat field.

74
Cyber-bullying Detection Using smSDA

Snapshot 13: Malicious user detail page for the admin.

This snapshot depicts details of malicious user details using the social media.

75
Cyber-bullying Detection Using smSDA

Snapshot 14: Bullying words adding page of the admin.

This snapshot depicts adding of new bullying words by the admin which may

be used by the users.

76
Cyber-bullying Detection Using smSDA

CONCLUSION

In conclusion, it addresses the text-based cyber bullying detection problem, where robust and

discriminative representations of messages are critical for an effective detection system. By

designing semantic dropout noise and enforcing sparsity, we have developed semantic-enhanced

marginalized denoising auto encoder as a specialized representation learning model for cyber

bullying detection. In addition, word embeddings have been used to automatically expand and

refine bullying word lists that is initialized by domain knowledge. The performance of our

approaches has been experimentally verified through cyber bullying corpora from social media:

As a next step we are planning to further improve the robustness of the learned representation by

considering word order in messages .

FUTURE ENHANCEMENT

Future enhancement can be further improvement of learned representation by considering word

order in messages. Cyber bullying image recognition through image processing and bullying

voice note recognition through audio processing. Cyber bullying detection of videos trough

video processing. Cyber bullying detection for different usage of words with different patterns.

Analyze and Record the percentage of cyber bullying by each person to reduce the harassment.

77
Cyber-bullying Detection Using smSDA

REFERENCES

[1] A. M. Kaplan and M. Haenlein, “Users of the world, unite! the challenges and opportunities of social

media,” Business horizons, vol. 53, no. 1, pp. 59–68, 2010.

[2] R. M. Kowalski, G. W. Giumetti, A. N. Schroeder, and M. R.Lattanner, “Bullying in the digital age: A

critical review and meta analysis of cyber bullying research among youth.” 2014.

[3] M. Ybarra, “Trends in technology-based sexual and non-sexual aggression over time and linkages to

nontechnology aggression,” National Summit on Interpersonal Violence and Abuse Across the Lifespan:

Forging a Shared Agenda, 2010.

[4] B. K. Biggs, J. M. Nelson, and M. L. Sampilo, “Peer relations in the anxiety–depression link: Test of a

mediation model,” Anxiety, Stress, & Coping, vol. 23, no. 4, pp. 431–447, 2010.

[5] S. R. Jimerson, S. M. Swearer, and D. L. Espelage, Handbook of bullying in schools: An international

perspective. Routledge/Taylor & Francis Group, 2010.

[6] G. Gini and T. Pozzoli, “Association between bullying and psychosomatic problems: A meta-

analysis,” Pediatrics, vol. 123, no. 3, pp. 1059–1065, 2009.

[7] A. Kontostathis, L. Edwards, and A. Leatherman, “Text mining and cybercrime,” Text Mining:

Applications and Theory. John Wiley & Sons, Ltd, Chichester, UK, 2010.

[8] J.-M. Xu, K.-S. Jun, X. Zhu, and A. Bellmore, “Learning from bullying traces in social media,” in

Proceedings of the 2012 conference of the North American chapter of the association for computational

linguistics: Human language technologies. Association for Computational Linguistics, 2012, pp. 656–666.

[9] Q. Huang, V. K. Singh, and P. K. Atrey, “Cyber bullying detection using social and textual analysis,”

in Proceedings of the 3rd International Workshop on Socially-Aware Multimedia. ACM, 2014, pp. 3–6.

78
Cyber-bullying Detection Using smSDA

1949-3045 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE

permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more

information.

This article has been accepted for publication in a future issue of this journal, but has not been fully

edited. Content may change prior to final publication. Citation information: DOI

10.1109/TAFFC.2016.2531682, IEEE Transactions on Affective Computing

[10] D. Yin, Z. Xue, L. Hong, B. D. Davison, A. Kontostathis, and L. Edwards, “Detection of harassment

on web 2.0,” Proceedings of the Content Analysis in the WEB, vol. 2, pp. 1–7, 2009.

[11] K. Dinakar, R. Reichart, and H. Lieberman, “Modeling the detection of textual cyber bullying.” in

The Social Mobile Web, 2011.

[12] V. Nahar, X. Li, and C. Pang, “An effective approach for cyber bullying detection,” Communications

in Information Science and Management Engineering, 2012.

[13] M. Dadvar, F. de Jong, R. Ordelman, and R. Trieschnigg, “Improved cyber bullying detection using

gender information,” in Proceedings of the 12th -Dutch-Belgian Information Retrieval Workshop

(DIR2012). Ghent, Belgium: ACM, 2012.

[14] M. Dadvar, D. Trieschnigg, R. Ordelman, and F. de Jong, “Improving cyber bullying detection with

user context,” in Advances in Information Retrieval. Springer, 2013, pp. 693–696.

[15] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, “Stacked denoising

autoencoders: Learning useful representations in a deep network with a local denoising criterion,” The

Journal of Machine Learning Research, vol. 11, pp. 3371–3408, 2010.

[16] P. Baldi, “Autoencoders, unsupervised learning, and deep architectures,” Unsupervised and Transfer

Learning Challenges in Machine Learning, Volume 7, p. 43, 2012.

79
Cyber-bullying Detection Using smSDA

[17] M. Chen, Z. Xu, K. Weinberger, and F. Sha, “Marginalized denoising auto encoders for domain

adaptation,” arXiv preprint arXiv: 1206.4683, 2012.

[18] T. K. Landauer, P. W. Foltz, and D. Laham, “An introduction to latent semantic analysis,” Discourse

processes, vol. 25, no. 2-3, pp. 259–284, 1998.

[19] T. L. Griffiths and M. Steyvers, “Finding scientific topics,” Proceedings of the National academy of

Sciences of the United States of America, vol. 101, no. Suppl 1, pp. 5228–5235, 2004.

[20] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” the Journal of machine

Learning research, vol. 3, pp. 993–1022, 2003.

[21] T. Hofmann, “Unsupervised learning by probabilistic latent semantic analysis,” Machine learning,

vol. 42, no. 1-2, pp. 177–196, 2001.

[22] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,”

Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35, no. 8, pp. 1798–1828, 2013.

[23] B. L. McLaughlin, A. A. Braga, C. V. Petrie, M. H. Moore et al., Deadly Lessons:: Understanding

Lethal School Violence. National Academies Press, 2002.

[24] J. Juvonen and E. F. Gross, “Extending the school grounds? bullying experiences in cyberspace,”

Journal of School health, vol. 78, no. 9, pp. 496–505, 2008.

[25] M. Fekkes, F. I. Pijpers, A. M. Fredriks, T. Vogels, and S. P. Verloove-Vanhorick, “Do bullied

children get ill, or do ill children get bullied? a prospective cohort study on the relationship between

bullying and health-related symptoms,” Pediatrics, vol. 117, no. 5, pp. 1568–1574, 2006.

[26] M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, and K. Araki, “Brute force works best against

bullying,” in Proceedings of IJCAI 2015 Joint Workshop on Constraints and Preferences for

Configuration and Recommendation and Intelligent Techniques for Web Personalization. ACM, 2015.

80
Cyber-bullying Detection Using smSDA

[27] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical

Society. Series B (Methodological), pp. 267–288, 1996.

[28] C. C. Paige and M. A. Saunders, “Lsqr: An algorithm for sparse linear equations and sparse least

squares,” ACM Transactions on Mathematical Software (TOMS), vol. 8, no. 1, pp. 43–71, 1982.

[29] M. A. Saunders et al., “Cholesky-based methods for sparse least squares: The benefits of

regularization,” Linear and Nonlinear Conjugate Gradient-Related Methods, pp. 92–100, 1996.

[30] J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,”

Journal of the American statistical Association, vol. 96, no. 456, pp. 1348–1360, 2001.

[31] C. Vogel, Computational Methods for Inverse Problems. Society for Industrial and Applied

Mathematics, 2002. [Online]. Available: http://epubs.siam.org/doi/abs/10.1137/1.9780898717570

[32] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in

vector space,” arXiv preprint arXiv:1301.3781, 2013.

[33] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words

and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp.

3111–3119.

[34] F. Godin, B. Vandersmissen, W. De Neve, and R. Van de Walle, “Named entity recognition for

twitter microposts using distributed word representations,” in Proceedings of the Workshop on Noisy

User-generated Text. Beijing, China: Association for Computational Linguistics, July 2015, pp. 146–153.

[Online]. Available: http://www.aclweb.org/anthology/W15-4322

[35] T. H. Dat and C. Guan, “Feature selection based on fisher ratio and mutual information analyses for

robust brain computer interface,” in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE

International Conference on, vol. 1. IEEE, 2007, pp. I–337.

81
Cyber-bullying Detection Using smSDA

[36] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification. John Wiley & Sons, 2012.

[37] C.-C. Chang and C.-J. Lin, “Libsvm: a library for support vector machines,” ACM Transactions on

Intelligent Systems and Technology (TIST), vol. 2, no. 3, p. 27, 2011.

[38] J. Sui, “Understanding and fighting bullying with machine learning,” Ph.D. dissertation, THE

UNIVERSITY OF WISCONSINMADISON, 2015.

[39] J. Bayzick, A. Kontostathis, and L. Edwards, “Detecting the presence of cyber bullying using

computer software,” in Proceedings of the ACM WebSci’11. Koblenz, Germany: ACM, June 2011, pp.

1–2.

82
Cyber-bullying Detection Using smSDA

Appendix A

The Java Programming Language

Based on the enormous amount of press Java is getting and the amount of excitement it has

generated, you may get the impression that Java will save the world-or at least solve all the

problems of the Internet. Not so. Java's hype has run far ahead of its capabilities, and while Java

is indeed new and interesting, it really is another programming language with which you write

programs that run on the Internet. In this respect, Java is closer to popular programming

languages such as C, C++, Visual Basic, or Pascal, than it is to a page description language such

as HTML, or a very simple scripting language such as JavaScript.

More specifically, Java is an object-oriented programming language developed by Sun

Microsystems, a company best known for its high-end UNIX workstations. Modeled after C++,

the Java language was designed to be small, simple, and portable across platforms and operating

systems, both at the source and at the binary level, which means that Java programs (applets and

applications) can run on any machine that has the Java virtual machine installed (you'll learn

more about this later).

Java is usually mentioned in the context of the World Wide Web, where browsers such as

Netscape's Navigator and Microsoft's Internet Explorer claim to be "Java enabled." Java

enabled means that the browser in question can download and play Java programs,

called applets, on the reader's system. Applets appear in a Web page much the same way as

images do, but unlike images, applets are dynamic and interactive. Applets can be used to create

animation, figures, forms that immediately respond to input from the reader, games, or other

interactive effects on the same Web pages among the text and graphics. Figure 1.1 shows an

applet running in Netscape 3.0.

83
Cyber-bullying Detection Using smSDA

Java's Past, Present, and Future

The Java language was developed at Sun Microsystems in 1991 as part of a research project to

develop software for consumer electronics devices-television sets, VCRs, toasters, and the other

sorts of machines you can buy at any department store. Java's goals at that time were to be small,

fast, efficient, and easily portable to a wide range of hardware devices. Those same goals made

Java an ideal language for distributing executable programs via the World Wide Web and also a

general-purpose programming language for developing programs that are easily usable and

portable across different platforms.

The Java language was used in several projects within Sun (under the name Oak), but did not get

very much commercial attention until it was paired with HotJava. HotJava, an experimental

World Wide Web browser, was written in 1994 in a matter of months, both as a vehicle for

downloading and running applets and also as an example of the sort of complex application that

can be written in Java. Although HotJava got a lot of attention in the Web community, it wasn't

until Netscape incorporated HotJava's ability to play applets into its own browser that Java really

took off and started to generate the excitement that it has both on and off the World Wide Web.

Java has generated so much excitement, in fact, that inside Sun the Java group spun off into its

own subsidiary called JavaSoft.

Versions of Java itself, or, as it's most commonly called, the Java API, correspond to versions of

Sun's Java Developer's Kit, or JDK. As of this writing, the current version of the JDK is 1.0.2.

Previously released versions of the JDK (alphas and betas) did not have all the features or had a

number of security-related bugs. Most Java tools and browsers conform to the features in the

1.0.2 JDK, and all the examples in this book run on that version as well.

84
Cyber-bullying Detection Using smSDA

The next major release of the JDK and therefore of the Java API will be 1.1, with a prerelease

version available sometime in the later part of 1996. This release will have few changes to the

language, but a number of additional capabilities and features added to the class library.

Throughout this book, if a feature will change or will be enhanced in 1.1, we'll let you know, and

in the last two days of this book you'll find out more about new Java features for 1.1 and for the

future.

Currently, to program in Java, you'll need a Java development environment of some sort for your

platform. Sun's JDK works just fine for this purpose and includes tools for compiling and testing

Java applets and applications. In addition, a wide variety of excellent Java development

environments have been developed, including Sun's own Java Workshop, Symantec's Café,

Microsoft's Visual J++ (which is indeed a Java tool, despite its name), and Natural Intelligence's

Roaster, with more development tools appearing all the time.

To run and view Java applets, you'll need a Java-enabled browser or other tool. As mentioned

before, recent versions of Netscape Navigator (2.0 and higher) and Internet Explorer (3.0) can

both run Java applets. (Note that for Windows you'll need the 32-bit version of Netscape, and for

Macintosh you'll need Netscape 3.0.) You can also use Sun's own HotJava browser to view

applets, as long as you have the 1.0 prebeta version (older versions are not compatible with

newer applets, and vice versa). Even if you don't have a Java-enabled browser, many

development tools provide simple viewers with which you can run your applets. The JDK comes

with one of these; it's called the appletviewer.

What's in store for Java in the future? A number of new developments have been brewing

(pardon the pun):

85
Cyber-bullying Detection Using smSDA

 Sun is developing a number of new features for the Java environment, including a

number of new class libraries for database integration, multimedia, electronic commerce,

and other uses. Sun also has a Java-based Web server, a Java-based hardware chip (with

which you can write Java-specific systems), and a Java-based operating system. You'll

learn about all these things later in this book. The 1.1 release of the JDK will include

many of these features; others will be released as separate packages.

 Sun is also developing a framework called Java Beans, which will allow the development

of component objects in Java, similarly to Microsoft's ActiveX (OLE) tech-nology. These

different components can then be easily combined and interact with each other using

standard component assembly tools. You'll learn more about Java Beans later in this

book.

 Java capabilities will be incorporated into a wide variety of operating systems, including

Solaris, Windows 95, and MacOS. This means that Java applications (as opposed to

applets) can run nearly anywhere without needing additional software to be installed.

 Many companies are working on performance enhancements for Java programs,

including the aforementioned Java chip and what are called just-in-time compilers.

Why Learn Java?

At the moment, probably the most compelling reason to learn Java-and probably the reason you

bought this book-is that applets are written in Java. Even if that were not the case, Java as a

programming language has significant advantages over other languages and other environments

that make it suitable for just about any programming task. This section describes some of those

advantages.

86
Cyber-bullying Detection Using smSDA

Java Is Platform Independent

Platform independence-that is, the ability of a program to move easily from one computer system

to another-is one of the most significant advantages that Java has over other programming

languages, particularly if your software needs to run on many different platforms. If you're

writing software for the World Wide Web, being able to run the same program on many different

systems is crucial to that program's success. Java is platform independent at both the source and

the binary level.

New Term

Platform independence means that a program can run on any computer

system. Java programs can run on any system for which a Java virtual

machine has been installed.

At the source level, Java's primitive data types have consistent sizes across all development

platforms. Java's foundation class libraries make it easy to write code that can be moved from

platform to platform without the need to rewrite it to work with that platform. When you write a

program in Java, you don't need to rely on features of that particular operating system to

accomplish basic tasks. Platform independence at the source level means that you can move Java

source files from system to system and have them compile and run cleanly on any system.

Platform independence in Java doesn't stop at the source level, however. Java compiled binary

files are also platform independent and can run on multiple platforms (if they have a Java virtual

machine available) without the need to recompile the source.

Normally, when you compile a program written in C or in most other languages, the compiler

translates your program into machine code or processor instructions. Those instructions are

specific to the processor your computer is running-so, for example, if you compile your code on

87
Cyber-bullying Detection Using smSDA

an Intel-based system, the resulting program will run only on other Intel-based systems. If you

want to use the same program on another system, you have to go back to your original source

code, get a compiler for that system, and recompile your code so that you have a program

specific to that system. Figure 1.2 shows the result of this system: multiple executable programs

for multiple systems.

Figure 1.2 : Traditional compiled programs.

Things are different when you write code in Java. The Java development environment actually

has two parts: a Java compiler and a Java interpreter. The Java compiler takes your Java program

and, instead of generating machine codes from your source files, it generates bytecodes.

Bytecodes are instructions that look a lot like machine code, but are not specific to any one

processor.

To execute a Java program, you run a program called a bytecode interpreter, which in turn reads

the bytecodes and executes your Java program (see Figure 1.3). The Java bytecode interpreter is

often also called the Java virtual machine or the Java runtime.

Figure 1.3 : Java programs.

New Term

Java bytecodes are a special set of machine instructions that are not specific

to any one processor or computer system. A platform-specific bytecode

interpreter executes the Java bytecodes. The bytecode interpreter is also

called the Java virtual machine or the Java runtime interpreter.

Where do you get the bytecode interpreter? For applets, the bytecode interpreter is built into

every Java-enabled browser, so you don't have to worry about it-Java applets just automatically

run. For more general Java applications, you'll need to have the interpreter installed on your

88
Cyber-bullying Detection Using smSDA

system in order to run that Java program. Right now, you can get the Java interpreter as part of

your development environment, or if you buy a Java program, you'll get it with that package. In

the future, however, the Java bytecode interpreter will most likely come with every new

operating system-buy a Windows machine, and you'll get Java for free.

Why go through all the trouble of adding this extra layer of the bytecode interpreter? Having

your Java programs in bytecode form means that instead of being specific to any one system,

your programs can be run on any platform and any operating or window system as long as the

Java interpreter is available. This capability of a single binary file to be executable across

platforms is crucial to what makes applets work because the World Wide Web itself is also

platform independent. Just as HTML files can be read on any platform, so can applets be

executed on any platform that has a Java-enabled browser.

The disadvantage of using bytecodes is in execution speed. Because system-specific programs

run directly on the hardware for which they are compiled, they run significantly faster than Java

bytecodes, which must be processed by the interpreter. For many basic Java programs, speed

may not be an issue. If you write programs that require more execution speed than the Java

interpreter can provide, you have several solutions available to you, including being able to link

native code into your Java program or using special tools (called just-in-time compilers) to

convert your Java bytecodes into native code and speed up their execution. Note that by using

any of these solutions, you lose the portability that Java bytecodes provide. You'll learn about

each of these mechanisms on Day 20, "Using Native Methods and Libraries."

Java Is Object Oriented

To some, the object-oriented programming (OOP) technique is merely a way of organizing

programs, and it can be accomplished using any language. Working with a real object-oriented

89
Cyber-bullying Detection Using smSDA

language and programming environment, however, enables you to take full advantage of object-

oriented methodology and its capabilities for creating flexible, modular programs and reusing

code.

Many of Java's object-oriented concepts are inherited from C++, the language on which it is

based, but it borrows many concepts from other object-oriented languages as well. Like most

object-oriented programming languages, Java includes a set of class libraries that provide basic

data types, system input and output capabilities, and other utility functions. These basic libraries

are part of the standard Java environment, which also includes simple libraries, form networking,

common Internet protocols, and user interface toolkit functions. Because these class libraries are

written in Java, they are portable across platforms as all Java applications are.

You'll learn more about object-oriented programming and Java tomorrow.

Java Is Easy to Learn

In addition to its portability and object orientation, one of Java's initial design goals was to be

small and simple, and therefore easier to write, easier to compile, easier to debug, and, best of

all, easy to learn. Keeping the language small also makes it more robust because there are fewer

chances for programmers to make mistakes that are difficult to fix. Despite its size and simple

design, however, Java still has a great deal of power and flexibility.

Java is modeled after C and C++, and much of the syntax and object-oriented structure is

borrowed from the latter. If you are familiar with C++, learning Java will be particularly easy for

you because you have most of the foundation already. (In fact, you may find yourself skipping

through the first week of this book fairly rapidly. Go ahead; I won't mind.)

Although Java looks similar to C and C++, most of the more complex parts of those languages

have been excluded from Java, making the language simpler without sacrificing much of its

90
Cyber-bullying Detection Using smSDA

power. There are no pointers in Java, nor is there pointer arithmetic. Strings and arrays are real

objects in Java. Memory management is automatic. To an experienced programmer, these

omissions may be difficult to get used to, but to beginners or programmers who have worked in

other languages, they make the Java language far easier to learn.

However, while Java's design makes it easier to learn than other programming languages,

working with a programming language is still a great deal more complicated than, say, working

in HTML. If you have no programming language background at all, you may find Java difficult

to understand and to grasp. But don't be discouraged! Learning programming is a valuable skill

for the Web and for computers in general, and Java is a terrific language to start out with.

The Java programming language is a high-level language that can be characterized by all of the

following buzzwords:

 Simple

 Architecture neutral

 Object oriented

 Portable

 Distributed

 High performance

 Interpreted

 Multithreaded

 Robust

 Dynamic

 Secure

With most programming languages, you either compile or interpret a program so that you can

91
Cyber-bullying Detection Using smSDA

run it on your computer. The Java programming language is unusual in that a program is both

compiled and interpreted. With the compiler, first you translate a program into an intermediate

language called Java byte codes —the platform-independent codes interpreted by the interpreter

on the Java platform. The interpreter parses and runs each Java byte code instruction on the

computer. Compilation happens just once; interpretation occurs each time the program is

executed. The following figure illustrates how this works.

You can think of Java byte codes as the machine code instructions for the Java Virtual Machine

(Java VM). Every Java interpreter, whether it’s a development tool or a Web browser that can

run applets, is an implementation of the Java VM. Java byte codes help make “write once, run

anywhere” possible. You can compile your program into byte codes on any platform that has a

Java compiler. The byte codes can then be run on any implementation of the Java VM. That

means that as long as a computer has a Java VM, the same program written in the Java

programming language can run on Windows 2000, a Solaris workstation, or on an iMac.

The Java Platform

A platform is the hardware or software environment in which a program runs. We’ve already

mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and Mac OS.

Most platforms can be described as a combination of the operating system and hardware. The

Java platform differs from most other platforms in that it’s a software-only platform that runs on

top of other hardware-based platforms. The Java platform has two components:

The Java Virtual Machine (Java VM)

The Java Application Programming Interface (Java API)

The Java API is a large collection of ready-made software components that provide many useful

capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into

92
Cyber-bullying Detection Using smSDA

libraries of related classes and interfaces; these libraries are known as packages. The next

section, What Can Java Technology Do? Highlights what functionality some of the packages in

the Java API provide.

The following figure depicts a program that’s running on the Java platform. As the figure shows,

the Java API and the virtual machine insulate the program from the hardware.

Native code is code that after you compile it, the compiled code runs on a specific hardware

platform. As a platform-independent environment, the Java platform can be a bit slower than

native code. However, smart compilers, well-tuned interpreters, and just-in-time byte code

compilers can bring performance close to that of native code without threatening portability. The

Java platform gives you the following features:

 The essentials: Objects, strings, threads, numbers, input and output, data structures,

system properties, date and time, and so on.

 Applets: The set of conventions used by applets.

 Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram

Protocol) sockets, and IP (Internet Protocol) addresses.

 Internationalization: Help for writing programs that can be localized for users

worldwide. Programs can automatically adapt to specific locales and be displayed in the

appropriate language.

 Security: Both low level and high level, including electronic signatures, public and

private key management, access control, and certificates.

 Software components: Known as JavaBeans TM, can plug into existing component

architectures.

 Object serialization: Allows lightweight persistence and communication via Remote

93
Cyber-bullying Detection Using smSDA

Method Invocation (RMI).

Swings

About Swing Swing library is an official Java GUI toolkit released by Sun Microsystems. The

main characteristics of the Swing toolkit

• platform independent

• customizable

• extensible

• configurable

• lightweight

Swing is probably the most advanced toolkit on this planet. It has a rich set of widgets. From

basic widgets like Buttons, Labels, Scrollbars to advanced widgets like Trees and Tables. Swing

is written in 100% java. Swing is a part of JFC, Java Foundation Classes. It is a collection of

packages for creating full featured desktop applications. JFC consists of AWT, Swing,

Accessibility, Java 2D, and Drag and Drop. Swing was released in 1997 with JDK 1.2. It is a

mature toolkit. The Java platform has Java2D library, which enables developers to create

advanced 2D graphics and imaging. There are basically two types of widget toolkits.

• Lightweight

• Heavyweight

A heavyweight toolkit uses OS's API to draw the widgets. For example Borland's VCL is a

heavyweight toolkit. It depends on WIN32 API, the built in Windows application programming

interface. On Unix systems, we have GTK+ toolkit, which is built on top of X11 library. Swing

is a lightweight toolkit. It paints it's own widgets. It is in fact the only lightweight toolkit I know

about.

94
Cyber-bullying Detection Using smSDA

SWT library There is also another GUI library for the Java programming language. It is called

SWT. The Standard widget toolkit. The SWT library was initially developed by the IBM

corporation. Now it is an open source project, supported by IBM. The SWT is an example of a

heavyweight toolkit. It lets the underlying OS to create GUI. SWT uses the java native interface

to do the job. The main advantages of the SWT are speed and native look and feel. The SWT is

on the other hand more error prone. It is less powerful then Swing. It is also quite Windows

centric library.

Swing is important to develop Java programs with a graphical user interface (GUI). There are

many components which are used for the building of GUI in Swing. The Swing Toolkit consists

of many components for the building of GUI. These components are also helpful in providing

interactivity to Java applications. Following are components which are included in Swing toolkit:

 list controls

 buttons

 labels

 tree controls

 table controls

All AWT flexible components can be handled by the Java Swing. Swing toolkit contains far

more components than the simple component toolkit. It is unique to any other toolkit in the way

that it supports integrated internationalization, a highly customizable text package, rich undo

support etc. Not only can this have you also created your own look and feel using Swing other

than the ones that are supported by it. The customized look and feel can be created using Synth

which is specially designed. Not to forget that Swing also contains the basic user interface such

as customizable painting, event handling, drag and drop etc.

95

Вам также может понравиться