Академический Документы
Профессиональный Документы
Культура Документы
INTRODUCTION
ROAD traffic monitoring is of great importance for urban transportation system. Traffic control agencies
and drivers could benefit from timely and accurate road traffic prediction and make prompt, or even
advance decisions possible for detecting and avoiding road congestions. Existing methods mainly focus
on raw speed sensing data collected from cameras or road sensors, and suffer severe data scarcity issue
because the installation and maintenance of sensors are very expensive [56]. At the same time, most
existing techniques based only on past and current traffic conditions (e.g [9], [54], [25], [38]) do not fit
well when real-world factors such as traffic accidents play a part. To address the above issues, in this
paper we introduce new-type traffic related data arising from public services: 1) Social media data,
which is posted on social networking websites, e.g. Twitter and Facebook. With the popularization of
mobile devices, people are more likely to exchange news and trifles in their life through social media
services, where messages about traffic conditions, such as “Stuck in traffic on E 32nd St. Stay away!”, are
posted by drivers, passengers and pedestrians who can be viewed as sensors observing the ongoing
traffic conditions near their physical locations. Meanwhile, traffic authorities register public accounts
and post tweets to inform the public of the traffic status Our goal is to predict the traffic speed of
specific road links, as shown with the red question marks, given: 1) some speed observations collected
by speed sensors, as shown in blue; 2) trajectory and travel time of OD pairs. Note that speeds of passed
road links are either observed or to be predicted; 3) tweets describing traffic conditions. Note that the
location mentioned by a tweet may be a street covering multiple road links. as “Slow traffic on I95 SB
from Girard Ave to Vine St.” posted by local transportation bureau account. Such text messages
describing traffic conditions and some of them tagged with location information are accessible by public
and could be a complementary information source of raw speed sensing data. (OD) pair on a map, such
services can recommend optimal route from the origin to the destination with least time, and
trajectories can be collected once drivers use the service to navigate. Here a trajectory is a sequence of
links for a given OD pair, and a link is a road segment between neighboring intersections.
Correspondently, a trajectory travel time is an integration of link travel times, which are related to the
realtime road traffic speeds. Longer trajectory travel time indicates that some involving road links may
be congested with lower traffic speed. Trajectory data is useful for a wide range of transportation
analyses and applications [49] [9]. Based on the above observations, where traditional traffic sensing
data are limited while new-type data from social media and map service begin to spring up, our goal is
1
to predict the road-level traffic speed by incorporating newtype data with traditional speed sensing
data. To motivate this scenario, consider a road traffic prediction example depicted in Fig.1. Those links
in red question marks are not covered by traditional speed sensors, but may be passed by trajectories
attached with travel time information, or mentioned in tweets describing traffic conditions, so their
speeds can be inferred fusing multiple cross-domain data
In today’s world large amount of data is generated and collected daily. Analyzing the data and
finding out important part out of it is really difficult and is the most important need. There is a
huge amount of data available in the Information Industry. This data is of no use until it is
converted into useful information. It is necessary to analyze this huge amount of data and extract
useful information from it.
Data mining is the natural evolution of information technology. It is the computational process of
discovering patterns in large data sets involving methods at the intersection of artificial
intelligence, machine learning, statistics, and database systems. The overall goal of the data
mining process is to extract information from a data set and transform it into an understandable
structure for further use. Aside from the raw analysis step, it involves database and data
management aspects, data pre-processing, model and inference considerations, interestingness
metrics, complexity considerations, post-processing of discovered structures, visualization, and
online updating. Data mining is the analysis step of the "knowledge discovery in databases"
process.
Extraction of information is not the only process we need to perform; data mining also involves
other processes such as Data Cleaning, Data Integration, Data Transformation, Data Mining,
Pattern Evaluation and Data Presentation. Once all these processes are over, we would be able to
2
use this information in many applications such as Fraud Detection, Market Analysis, Production
Control, Science Exploration, etc.
In real-world, data tend to be incomplete, noisy and inconsistent. Such situation requires data
preprocessing. Various forms of data preprocessing includes data cleaning, data integration, data
transformation and data reduction. Typically, the process of duplicate detection is preceded by a
data preparation stage, during which data entries are stored in an uniform manner in the database.
Data preprocessing is a data mining technique that involves transforming raw data into an
understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in
certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven
method of resolving such issues. Data preprocessing prepares raw data for further processing.
Data Cleaning:
Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting
(or removing) corrupt or inaccurate records from a record set, table, or database. Used
mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate,
irrelevant, etc. parts of the data and then replacing, modifying, or deleting this dirty data
or coarse data. After cleansing, a data set will be consistent with other similar data sets in
the system. The inconsistencies detected or removed may have been originally caused by
user entry errors, by corruption in transmission or storage, or by different data dictionary
definitions of similar entities in different stores.
Data Integration:
Data Integration is a data preprocessing technique that merges the data from multiple
heterogeneous data sources into a coherent data store.Data integration may involve
inconsistent data and therefore needs data cleaning.Data integration primarily supports
the analytical processingof large data sets by aligning, combining and presenting each
data set from organizational departments and external remote sources to fulfill integrator
3
objectives.Data integration is generally implemented in data warehouses (DW) through
specialized software that hosts large data repositories from internal and external
resources. Data is extracted, amalgamated and presented as a unified form. For example,
a user’s complete data set may include extracted and combined data from marketing,
sales and operations, which is combined to form a complete report.
Data Transformation:
Data transformation is the process of converting data or information from one format to
another, usually from the format of a source system into the required format of a new
destination system. The usual process involves converting documents, but data
conversions sometimes involve the conversion of a program from one computer language
to another to enable the program to run on a different platform.
Data Reduction:
4
Round drastically to one, or at most two, effective digits (effective digits are ones that
vary in that part of the data).
Use averages to provide a visual focus as well as a summary.
Use layout and labeling to guide the eye.
Give a brief verbal summary.
Data mining is widely used in diverse areas. There are a number of commercial data mining
system available today and yet there are many challenges in this field.
• Fraud detection
• Retail Industry
• Telecommunication Industry
• Intrusion Detection
The financial data in banking and financial industry is generally reliable and of high
quality which facilitates systematic data analysis and data mining. Some of the typical
cases are as follows -
5
Fraud Detection:
Data mining is also used in the fields of credit card services and telecommunication to
detect frauds. In the call, duration of the call, time of the day or week, etc. It also
analyzes the patterns that deviate from expected norms.
Retail Industry:
Data Mining has its great application in Retail Industry because it collects large amount
of data from on sales, customer purchasing history, goods transportation, consumption
and services. It is natural that the quantity of data collected will continue to expand
rapidly because of the increasing ease, availability and popularity of the web.
Data mining in retail industry helps in identifying customer buying patterns and trends
that lead to improved quality of customer service and good customer retention and
satisfaction. Here is the list of examples of data mining in the retail industry -
• Design and Construction of data warehouses based on the benefits of data mining.
• Customer Retention.
Telecommunication Industry:
Today the telecommunication industry is one of the most emerging industries providing
various services such as fax, pager, cellular phone, internet messenger, images,e-mail,
web data transmission, etc. Due to the development of new computer and communication
technologies, the telecommunication industry is rapidly expanding. This is the reason
why data mining is become very important to help and understand the business. Data
mining in telecommunication industry helps in identifying the telecommunication
patterns, catch fraudulent activities, make better use of resource, and improve quality of
service. Here is the list of examples for which data mining improves telecommunication
services -
6
• Multidimensional Analysis of Telecommunication data.
In recent times, we have seen a tremendous growth in the field of biology such as
genomics, proteomics, functional Genomics and biomedical research. Biological data
mining is a very important part of Bioinformatics. Following are the aspects in which
data mining contributes for biological data analysis -
The applications discussed above tend to handle relatively small and homogeneous data
sets for which the statistical techniques are appropriate. Huge amount of data have been
collected from scientific domains such as geosciences, astronomy, etc. A large amount of
data sets is being generated because of the fast numerical simulations in various fields
7
such as climate and ecosystem modeling, chemical engineering, fluid dynamics, etc.
Following are the applications of data mining in the field of Scientific Applications -
• Graph-based mining.
Intrusion Detection:
Intrusion refers to any kind of action that threatens integrity, confidentiality, or the
availability of network resources. In this world of connectivity, security has become the
major issue. With increased usage of internet and availability of the tools and tricks for
intruding and attacking network prompted intrusion detection to become a critical
component of network administration. Here is the list of areas in which data mining
technology may be applied for intrusion detection-
2. REQUIREMENT ELICITATION
A Requirementis a feature that the system must have or a constraint that it must satisfy to be
accepted by the clients. Requirements Engineering aims at defining the requirements of the
system under construction. It includes two main activities namely Requirements Elicitationand
Analysis.
8
Requirements Elicitation focuses on describing the purpose of the system. The Client, the
Developer, and the Users identify a problem area and define a system that addresses the problem.
Such a definition is called Requirements Specification. This specification is structured and
formalized during analysis to produce an Analysis Model. Requirements Elicitation and Analysis
focuses only on the user’s view of the system. Requirements Elicitation includes the following
activities.
Identifying Actors:
During this activity, developers identify the different types of users the future system will
support.
Identifying Scenarios:
During this activity, developers observe users and develop a set of detailed scenarios for typical
functionality provided by the future system. Developers use these scenarios to communicate
with the users and deepen their understanding.
Once developers and users agree on a set of scenarios, developers derive from the scenarios a set
of use cases that completely represent the future system.
During this activity, developers ensure that the requirements specification is complete by
detailing each use case and describing the behavior of the system in the presence of errors and
exceptional conditions.
During this activity, developers identify dependencies among use cases and also consolidate the
use case model by factoring out common functionality.
9
During this activity, developers, users and clients agree on aspects like performance of system,
documentation, resources security and its quality.
To address the above issues, in this paper we introduce new-type traffic related data arising
from public services:
1) Social media data, which is posted on social networking websites, e.g., Twitter and Facebook. With
the popularization of mobile devices, people are more likely to exchange news and trifles in their life
through social media services, where messages about traffic conditions, such as “Stuck in traffic on E
32nd St. Stay away!”, are posted by drivers, passengers and pedestrians who can be viewed as sensors
observing the ongoing traffic conditions near their physical locations.
10
2.2.2 Project objectives:
The system has a clear set of objectives to achieve. They are as follows:
After taking the input, it runs the both algorithms to find the duplicates in the given
dataset.
11
2.3 FUNCTIONAL REQUIREMENTS:
The functional requirements describe the inputs and outputs of the application. The functional
requirements of this project are as follows:
2.3.1 Actors:
Actors represent external entities that interact with the system. An actor can be human or an
external system. During this activity, developers identify the Actors involved in this system are:
In this project, user and his responsibilities are as follows
Admin:
12
Finds the duplicates and returns the duplicates in the file to the user.
User:
2.3.2Use Case:
Use cases are used during requirement elicitation and analysis to represent the functionality of
the system. Use cases focus on the behavior of the system from an external point of view. A use
case describes a function provided by the system that yields a visible result for an actor. An actor
describes any entity that interacts with the system.
The identification of actors and use cases results in the definition of the boundary of the system,
which is, in differentiating the tasks accomplished by the system and the tasks accomplished by
its environment. The actors are outside the boundary of the system, whereas the use cases are
inside the boundary of the system.
Actors are external entities that interact with the system. Use cases describe the behavior of the
system as seen from an actor’s point of view. Actors initiate a use case to access the system
functionality. The use case then initiates other use cases and gathers more information from the
actors. When actors and use cases exchange information, they are said to be communicate. To
describe a use case we use a template composed of six fields:
13
Use case diagrams include four types of relations. They are as follows:
Communication Relationships
Inclusion Relationships
Exclusion Relationships
Inheritance relationships
14
USE CASE 2: Upload Dataset
Use Case Name: Upload Dataset
Participating Actors: Admin, user
Flow of events: User will upload the file which contains
duplicates and admin will upload the
identified duplicates.
Entry Condition: Data files are taken.
Exit Condition: Successfully uploaded.
Table 2.2: Use case table for Upload Dataset
USE CASE 3: Duplicate detection process
Use Case Name: Duplicate detection
Participating Actors: Admin
Flow of events: 1. Client and admin get into the system
with his respective credentials
2. Client upload his file
3. Admin Verifies the details and approve
the respective transaction
Entry Condition: Uploaded datasetsare taken for applying
algorithms to it.
Exit Condition: Duplicates successfully identified.
Table 2.3: Use case table for Duplicate detection
USE CASE 4: Download Dataset
Use Case Name: Download Dataset
Participating Actors: Admin, user
Flow of events: 1. Client and admin get into the system
with his respective credentials
2. Client upload his file
3. Admin Verifies the details and
downloads the file uploaded by the user.
4. Admin finds the duplicates in fileand
uploads it to the user.
5. The user downloads the file sent by the
admin.
Entry Condition: Clients undergoes various authentication
process
15
Exit Condition: Successful downloaded.
Table 2.4: Use case table for Download Dataset
2.3.3. Scenarios:
A use case is an abstraction that describes all possible scenarios involving the described
functionality. A scenario is an instance of a use case describing a concrete set of action.
Scenarios are used as examples for illustrating common cases. We describe a scenario using a
template with three fields: name of the scenario, participating actors and flow of events, which
describe the sequence of events step by step.
SCENARIO 1: Login
Scenario Name: Login
Participating Actors: Admin, User
Flow of events: Admin, user will get into the system
Table 2.5: Scenario table for Login
16
SCENARIO 4:Download Dataset
Scenario Name: Download Dataset
Participating Actors: Admin, User
Flow of events: 1.User and admin get into the system
with his respective credentials
2. User uploads his file
3. Admin Verifies the details and
downloads the file uploaded by the user.
4. Admin finds the duplicates in file and
uploads it to the user.
5. The user downloads the file sent by the
admin.
Table 2.8: Scenario table forDownload Dataset
2.4NON-FUNCTIONALREQUIREMENTS:
Reliability:The system is more reliable because the qualities of it are inherited from the
platform java. The code built by using java is more reliable.
Performance:This system is developing in the high level languages and using the advanced
front-end and back-end technologies it will give response to the end user on client system
with in very less time.
17
Here we have chosen JAVA as our programming language for the implementation of the system.
Java is platform-independent
One of the most significant advantages of Java is its ability to move easily from one
computer system to another. The ability to run the same program on many different
systems is crucial to World Wide Web software, and Java succeeds at this by being
platform-independent at both the source and binary levels.
Java is secure
Java considers security as part of its design. The Java language, compiler, interpreter, and
runtime environment were each developed with security in mind.
2. Error handling:
Before performing any operations on the dataset, contents on the dataset must be checked. If any
values that format or type is not matched, then Error Message is displayed so that user can take
an appropriate decision. For example, without entering data the system doesn’t access.
3. Performance consideration:
The performance of the system is very high when compared to current numerical model
techniques. The training time is less when compared to the current system.
4. Platform:
18
Monitor : Standard
Keyboard : Standard
Java is the foundation for virtually every type of networked application and is the global standard
for developing and delivering embedded and mobile applications, games, Web-based content,
and enterprise software. With more than 9 million developers worldwide, Java enables you to
efficiently develop, deploy and use exciting applications and services.
History of Java:
Java was originally developed by James Gosling at Sun Microsystems (which has since
been acquired by Oracle Corporation) and released in 1995 as a core component of Sun
Microsystems' Java platform. The language derives much of its syntax from C and C++, but it
has fewer low-level facilities than either of them.
19
unsuccessful, so in 1995 Sun changed the name to Java and modified the language to take
advantage of the burgeoning World Wide Web.
Java Features:
Simple:
Secure:
Portable:
Java programs can execute in any environment for which there is a Java run-time
system.(JVM)
Java programs can be run on any platform (Linux,Window,Mac)
Java programs can be transferred over world wide web (e.g. applets)
Object oriented:
Robust:
Java encourages error-free programming by being strictly typed and performing run-time
checks.
Multi-threaded:
20
Architecture neutral:
Interpreted:
Java supports cross-platform code through the use of Java byte code.
Byte code can be interpreted on any platform by JVM.
High Performance:
Distributed:
Dynamic:
Java programs carry with them substantial amounts of run-time type information that is
used to verify and resolve accesses to objects at run time.
Java Principles:
There were five primary goals in the creation of the Java language:
It must be "simple, object-oriented, and familiar".
It must be "robust and secure".
It must be "architecture-neutral and portable".
It must execute with "high performance".
It must be "interpreted, threaded, and dynamic".
21
Overview of OOP Terminology:
Class: A user-defined prototype for an object that defines a set of attributes that
characterize any object of the class. The attributes are data members (class variables
and instance variables) and methods, accessed via dot notation.
Class variable: A variable that is shared by all instances of a class. Class variables are
defined within a class but outside any of the class's methods. Class variables aren't used
as frequently as instance variables are
Data member: A class variable or instance variable that holds data associated with a
class and its objects.
Function overloading: The assignment of more than one behavior to a particular
function. The operation performed varies by the types of objects (arguments) involved.
Instance variable: A variable that is defined inside a method and belongs only to the
current instance of a class.
Inheritance: The transfer of the characteristics of a class to other classes that are
derived from it.
Instantiation : The creation of an instance of a class
Method: A special kind of function that is defined in a class definition.
Object: A unique instance of a data structure that's defined by its class. An object
comprises both data members (class variables and instance variables) and methods.
Operator overloading: The assignment of more than one function to a particular
operator.
Instance: An individual object of a certain class. An object obj that belongs to a class
Circle, for example, is an instance of the class Circle.
MySQL is a free, open-source database engine available for all major platforms.
(Technically, MySQL is a relational database management system (RDBMS)). MySQL
represents an excellent introduction to modern database technology, as well as being a
reliable mainstream database resource for high-volume applications.
22
A modern database is an efficient way to organize, and gain access to, large amounts of
data. A relational database is able to create relationships between individual database
elements, to organize data at a higher level than a simple table of records, avoid data
redundancy and enforce relationships that define how the database functions.
A database is a separate application that stores a collection of data. Each database has
one or more distinct APIs for creating, accessing, managing, searching and replicating
the data it holds.
Other kinds of data stores can be used, such as files on the file system or large hash
tables in memory but data fetching and writing would not be so fast and easy with those
types of systems.
RDBMS Terminology:
Before we proceed to explain MySQL database system, let's revise few definitions related to
database.
Table: A table is a matrix with data. A table in a database looks like a simple
spreadsheet.
23
Column: One column (data element) contains data of one and the same kind, for
example the column postcode.
Row: A row (= tuple, entry or record) is a group of related data, for example the data of
one subscription.
Primary Key: A primary key is unique. A key value cannot occur twice in one table.
With a key, you can find at most one row.
Foreign Key: A foreign key is the linking pin between two tables.
Compound Key: A compound key (composite key) is a key that consists of multiple
columns, because one column is not sufficiently unique.
Referential Integrity: Referential Integrity makes sure that a foreign key value always
points to an existing row.
MySQL is a fast, easy-to-use RDBMS being used for many small and big businesses. MySQL
is developed, marketed, and supported by MySQL AB, which is a Swedish company. MySQL
is becoming so popular because of many good reasons:
MySQL is released under an open-source license. So you have nothing to pay to use it.
MySQL is a very powerful program in its own right. It handles a large subset of the
functionality of the most expensive and powerful database packages.
MySQL works on many operating systems and with many languages including PHP,
PERL, C, C++, JAVA, etc.
MySQL works very quickly and works well even with large data sets.
MySQL is very friendly to PHP, the most appreciated language for web development.
24
MySQL supports large databases, up to 50 million rows or more in a table. The default
file size limit for a table is 4GB, but you can increase this (if your operating system can
handle it) to a theoretical limit of 8 million terabytes (TB).
25
3. ANALYSIS
Analysis object model is represented by class and object diagrams. Analysis focuses on
producing a model of the system, called the Analysis model, which is correct, complete,
consistent, and verifiable. Analysis is different from requirements elicitation, where, developer
focus on structuring and formalizing the requirements elicited from users. This formalization
leads to new insights and discovery of errors in the requirements. As the analysis model may not
be understandable to the users and the client, developers need to update the requirements
specification to reflect insights gained during analysis, and then review the changes with the
client and users. In the end, the requirements, however large, should be understandable by the
client and the users.
The analysis model is composed of three individual models: the Functional Model
represented by use cases and scenarios, the Analysis Object Model, represented by class and
object diagrams, and the Dynamic Model, represented by state chart and sequence diagrams. In
Requirements phase, we gather requirements from the users and represent them as use cases and
scenarios. We refine the functional model and derive the object and the dynamic model. This
leads to a more precise and complete specification as details is added to the analysis model. We
conclude by describing management activities related to analysis.
The analysis model represents the system under development from the user’s point of
view. The analysis object model is a part of the analysis and focuses on the individual concepts
that are manipulated by the system, their properties and their relationships. The analysis object
model, depicted with UML class diagrams, includes classes, attributes, and operations. The
analysis object model is a visual dictionary of the main concepts visible to the user.
26
3.1. ENTITY OBJECTS:
The Analysis object model consists of entity, boundary and control objects. Entity
objects represent the persistent information tracked by the system. Participating objects form the
basis of the analysis model.
Boundary object is the object used for interaction between the user and the system.
Moreover it is an interface used to communicate with the system. Boundary objects represent the
system interface with the actors. In each use case, each actor interacts with at least one boundary
object. The boundary object collects the information from the actor and translates into an
interface model from that can be used by the objects and also by the control objects.
The set of Boundary Objects that are involved in the system are as follows:
27
I) BOUNDARY OBJECTS FOR HOME PAGE OR LOGIN PAGE
Control objects are responsible for coordinating entity objects and boundary objects. A
control object is creating at the beginning of the use cases and ceases to exist at its end. Control
objects usually do not have a concrete counterpart in the real world. Control object is a
responsible for collecting information from the boundary objects and dispatching it to entity
object.
Here the files are taken and are processed to analyze the performance of the student.
Interaction diagrams model the behavior of use cases by describing the way groups of
objects interact to complete the task. The two kinds of interaction diagrams are sequence and
collaboration diagrams. Sequence diagrams generally show the sequence of events that
occur. Sequence diagrams demonstrate the behavior of objects in a use case by describing the
objects and the messages they pass. The diagrams are read left to right and
descending. Following are the Sequence Diagrams for the system under consideration
28
SEQUENCE DIAGRAM:
State chart diagrams are used to describe the behavior of a system. State diagrams
describe all of the possible states of an object as events occur. Each diagram usually represents
objects of a single class and tracks the different states of its objects through the system. Not all
classes will require a state diagram and state diagrams are not useful for describing the
collaboration of all objects in a use case. State diagrams have very few elements. The basic
elements are rounded boxes representing the state of the object and arrows indicting the
transition to the next state. The activity section of the state symbol depicts what activities the
object will be doing while it is in that state. All state diagrams being with an initial state of the
object. This is the state of the object when it is created. After the initial state the object begins
changing states.
29
Use Case
Data Reading
PreProcessing
Stemming
System
Prediction
Sequence Diagram
30
System Dataset
Preprocessing
Stemming
Training
Prediction
2: Preprocessing
3: Stemming
4: Training
5: Prediction
System Dataset
1:
31
Figure 3.2: State chart Diagram 1
32
Figure 3.3: State chart Diagram 2
4. SYSTEM DESIGN
System Design is the transformation of an analysis model into a system design model. In
System design, developers:
Design is the first step in the development phase for any techniques and principles for the
purpose of defining a device, a process or system in sufficient detail to permit its physical
realization.
Once the software requirements have been realized, analyzed and specified the software
design involves three technical activities design, coding, generation and testing that are required to
build and verify the software.
33
The design activities are of main importance in this phase, because in this activity,
decisions ultimately affecting the success of software implementation and its ease of maintenance
are made. These decisions have the final bearing upon reliability and maintainability of the system.
Design is the place where quality is fostered in development. Software design is a process
through which requirements are translated into a representation of software.
System Design is the transform of analysis model into a system design model. Developers
define the design goals of the project and decompose the system into smaller subsystems that can
be realized by individual teams. Developers also select strategies for building the system, such as
the hardware/software platform on which the system will run, the persistent data management
strategy, the goal control flow the access control policy and the handling of boundary conditions.
The result of the system design is model that includes a clear description of each of these
strategies, subsystem decomposition, and a UML deployment diagram representing the
hardware/software mapping of the system.
The Analysis model describes the system completely form the actors point of view and
serves as the basis of communication between the client and the developers. The Analysis model,
however, does not contain information about the internal structure of the system, its hardware
configuration or more generally, how the system should be realized. System design is the first
step in this direction.
During the system design activities, DevelopersBridge the gap between the requirements
specification, produced during requirements elicitation and analysis, and the system that is
delivered to the users.
Design goals are the qualities that the system should focus on. Many design goals can be
inferred from the nonfunctional requirements or from the application domain.
Cost:
JAVA is freeware. Hence no high development and maintenance costs.
Response time:
The system response is based on the length of the training data set.
34
Portability:
Java has an ability to move easily from one computer system to another. The ability to run
the same program on many different systems is crucial to World Wide Web software, and
Java succeeds at this by being platform-independent at both the source and binary levels.
Usability:
Users capable of handling simple GUI are able to use the system.
Reliability:
The system is trained with ID3 & C4.5 algorithm so that it can give us accurate
results.
- Repository
- Mode/View/Controller (MVC)
- Peer-to-Peer
- Client/Server
- Three-tier
35
system that are available to other subsystems form the “subsystem interface”. Subsystem
Interface includes the name of the operations, their parameters, their types and their return
values.
The subsystems that are factored out of the main system are as follows:
: Host
: Host
Give
Apply
Output
Algorithm
Control flow is the sequencing of actions in a system. It defines the order of execution of
operations. These decisions are based on external events generated by an actor or on the passage
of time. These are two possible control flow mechanisms.
Operations wait for input whenever they need data from an actor. In the selection of
algorithm, processing operation waits for decision maker to choose data source. Whenever
decision maker gives the input preprocessing operation can be executed.
A main loop waits for an external event. Whenever an event available, it is dispatched to
appropriate object based on information associated with the event.
36
During this activity we review the design decisions we made so far and identify additional
conditions i.e., how the system is started, initialized, shutdown and how to deal with major
failures such as data corruption.
Configure: For each persistent object, we examine in which use cases it is created or destroyed.
Download use case creates the persistent object Download files.
Start up and Shutdown:For each component we add three use cases to start, shutdown and
configure the component.
Exception Handling: For each type of component failure we decide how the system should
react. In general exception is an event or error that occurs during the execution of the system.
Exceptions are caused by different sources.
Hardware Failure: Hardware ages and fails. For example, failure of the network link, system
can identify it by using connect use case and inform the user.
37
5. OBJECT DESIGN
Object design closed the gap between the application objects and off-time-shelf
components by identifying additional solution. Object design is not sequential. Although
each group of activities described above addresses a specific object design issue, they usually
occur concurrently.
Reuse: Off-the shelf components identified during system design are used to help in the
realization of each subsystem. Class libraries and additional components are selected for
basic data structures and services. Design patterns are selected for solving common
problems and for protecting specific classes from future change.
Interface specification: During this activity, the subsystem services identified during
system design are specified in terms of class interfaces, including operations, arguments,
type signatures, and exceptions.
Object model Restructuring: Restructuring activities manipulate the system model to
increase code
Object model optimization: During this activity, object design model is transformed to
address performance criteria such as response time or memory utilization.
Object design closed the gap between the application objects and off-time-shelf
components by identifying additional solution objects and refining existing objects.
Operation parameters and return values are typed in the same way as attributes are. The
type constraints the range of values the parameter or the return value can take. The type of the
return value is called the signature of the operation. The visibility of an attribute or an operation
specifies whether other classes can use it or not. UML defines three levels of visibility.
Attributes represent the properties of individual objects, only the attributes relevant to the
system should be considered.
CLASS DIAGRAM:
Class diagrams model class structure and contents using design elements such as classes,
packages and objects. Class diagrams describe three different perspectives when designing a
39
system, conceptual, specification, and implementation. Classes are composed of three things: a
name, attributes, and operations. Class diagrams also display relationships such as containment,
inheritance, associations and others. The association relationship is the most common
relationship in a class diagram. The association shows the relationship between instances of
classes. The multiplicity of the association denotes the number of objects that can participate in
then relationship.
Type:The Type of an attribute specifies the range of values the attribute can take and the
operations that can be applied to the attributes.
Signature:Given an operation, the tuple made out of the types of its parameters and the type of
the return value is the Signature. Signatures are generally defined for operations.
Type, Signature and Visibility of the classes in this system are as follows:
5.1.2 Constraints:
We attach constraints to classes and operations to more precisely specify their behavior and
boundary cases. The following are the constraints in this project:
The input file must be .txt, .doc, .docx, .xls, .xlsx, .csv file formats.
40
5.1.3 Exceptions:
Exceptional conditions are usually associated with the violation of preconditions. Exceptions can
be found systematically by examining each parameter of the operation.
Constraints that the caller needs to satisfy before invoking an operation. When the user inputs a
file which doesn’t consists of any data then certain messages should be displayed.
5.1.4 Associations:
Associations are relationships between classes and represent groups of links. Each end of an
association can be labeled by a set of integers indicating the number of links that can legitimately
originate form an instance of the class connected to the association end. Associations are used to
represent a wide range of connections among a set of objects.
1 1
User Input data
5.2. ALGORITHMS:
Algorithm 2:
6. CODING
The goal of the coding or programming phase is to translate the design of the system produced
during the design phase into code in a given programming language, which can be executed by a
computer and that performs the computation specified by the design.
The coding phase affects both testing and maintenance. The goal of coding is not to reduce the
implementation cost but the goal should be to reduce the cost of later phases. In other words the
41
goal is not to simplify the job of programmer. Rather the goal should be to simplify the job of
the tester and maintainer.
Bottom-up Approach can best suit for developing the object-oriented systems. During system
design phase of reduce the complexity, we decompose the system into an appropriate number of
subsystems, for which objects can be modeled independently. These objects exhibit the way the
subsystems perform their operations.
Once objects have been modeled they are implemented by means of coding. Even though related
to the same system as the objects are implemented of each other the Bottom-Up approach is
more suitable for coding these objects. In this approach, we first do the coding of objects
independently and then we integrate these modules into one system to which they belong.
This code can detect duplicates from data. First, it takes data as input, then it scans the entire file
and performs the preprocessing step. Through this step the missing values, errors, etc. get
eliminated. Then it generates a unique key for each tuple and sorts the data using the key. And
then it identifies the duplicates and displays the result.
Any software system require some amount of information during its operation
selection of appropriate data structures can help us to produce the code so that objects of the
system can better operate with the available information decreased complexity.
In this project, if any of the field is vacant, then it could not proceed further steps and
prompts a message saying that “input data must be in specified range “.System will not have any
default values.
42
6.3. PROGRAMMING STYLE:
Programming style deals with act of rules that a programmer has to follow so that the
characteristics of coding such as Traceability, Understandability, Modifiability, and Extensibility
can be satisfied. In the current system, we followed the coding rules for naming the variables and
methods. As part of coding internal documentation is also provided that help the readers to
better understand the code.
Validations of all the inputs given to the system at various points in the forms are
validated while navigating to the next form. System raises appropriate custom and pre-
defined exceptions to alert the user about the errors occurred or likely to occur.
Validations at the level of individual controls are also applied wherever necessary.
System pops up appropriate and sensuous dialogs wherever necessary.
43
7. TESTING
Testing is the process of finding differences between the expected behavior specified by
system models and the observed behavior of the system. Testing is a critical role in quality
assurance and ensuring the reliability of development and these errors will be reflected in the
code so the application should be thoroughly tested and validated.
Unit testing finds the differences between the object design model and its
corresponding components. Structural testing finds differences between the system design model
and a subset of integrated subsystems. Functional testing finds differences between the use case
model and the system.
Testing a large system is a complex activity and like any complex activity. It has to be
broke into smaller activities. Thus incremental testing was performed on the project i.e.,
components and subsystems of the system were tested separately before integrating them to form
the subsystem for system testing.
Unit Testing:
Unit testing focuses on the building blocks of the software system that is the objects and
subsystems. There are three motivations behind focusing on components. First unit testing
reduces the complexity of overall test activities allowing focus on smaller units of the system,
second unit testing makes it easier to pinpoint and correct faults given that few components are
involved in the rest. Third unit testing allows parallelism in the testing activities, that is each
component are involved in the test. Third unit testing allows parallelism in the testing activities
44
that is each component can be tested independently of one another. The following are some unit
testing techniques.
1. Equivalence testing: It is a black box testing technique that minimizes the number of test
cases. The possible inputs are partitioned into equivalence classes and a test case is selected for
each class.
2. Boundary testing: It is a special case of equivalence testing and focuses on the conditions at
the boundary of the equivalence classes. Boundary testing requires that the elements be selected
from the edges of the equivalence classes.
3. Path testing: It is a white box testing technique that identifies faults in the implementation of
the component the assumption here is that exercising all possible paths through the code at least
once. Most faults will trigger failure. This acquires knowledge of source code.
Integration Testing:
Integration testing defects faults that have not been detected. During unit testing by focusing on
small groups on components two or more components are integrated and tested and once tests do
not reveal any new faults, additional components are added to the group. This procedure allows
testing of increasing more complex parts on the system while keeping the location of potential
faults relatively small. I have used the following approach to implements and integrated testing.
Top-down testing strategy unit tests the components of the top layer and then integrated
the components of the next layer down. When all components of the new layer have been tested
together, the next layer is selected. This was repeated until all layers are combined and involved
in the test.
Validation Testing:
The systems completely assembled as package, the interfacing have been uncovered and
corrected, and a final series of software tests are validation testing. The validation testing is
nothing but validation success when system functions in a manner that can be reasonably
expected by the customer. The system validation had done by series of Black-box test methods.
45
System Testing:
1. System testing ensures that the complete system compiles with the functional
requirementsand non-functional requirements of the system, the following are some
system testing activities.
2. Functional testing finds differences between the functional between the functional
requirements and the system. This is a black box testing technique. Test cases are
divided from the use case model.
3. Performance testing finds differences between the design and the system the design
goals are derived from the functional requirements.
4. Pilot testing the system is installed and used by a selected set of users – users exercise
the system as if it had been permanently installed.
5. Acceptance testing, I have followed benchmarks testing in a benchmarks testing the
client prepares a set of test cases represent typical conditions under which the system
operates. In our project, there are no existing benchmarks.
6. Installation testing, the system is installed in the target environment.
Test Planning enables a more reliable estimate of the testing effort up front.
It allows the project team time to consider ways to reduce the testing effort without being
under time pressure.
Test Plan helps to identify problem areas and focuses the testing team’s attention on the
critical paths.
Test plan reduces the probability of implementing non-tested components.
46
7.4. TEST CASE REPORT:
Test Case1-Login
Test Case id: 01
Test Case Name:Login
Test Case Type: Black Box Testing
Adminselects ‘login’ Admin get the access If the login id and Successful
button password are correct
then the admin will
get access
If Admin enters Admin does not get Admin does not get Successful
wrong password the access the access and has to
enter the correct
password
47
Test Case2-File Browse
Test Case id: 02
Test Case Name: Browse the file
Test Case Type: Black Box Testing
Userselects User uploads the file If the file is of valid file Successful
‘Upload’ button from the folder format, file will be taken.
Ta
ble
7.2:
Tes
t
cas
e2
If user selects User selects ppt or Only .csv, .txt, .doc, .xlsx, Successful
file of not valid pdf files, system will .xls files can be taken from
file format. not take. user.
8.SCREENS
9. SOURCE CODE
Sample Code
/*
* To change this license header, choose License Headers in Project Properties.
48
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.util;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import weka.classifiers.Classifier;
import weka.classifiers.lazy.IBk;
import weka.core.Instance;
import weka.core.Instances;
public class KNN {
public static BufferedReader readDataFile(String filename) {
BufferedReader inputReader = null;
try {
inputReader = new BufferedReader(new FileReader(filename));
} catch (FileNotFoundException ex) {
System.err.println("File not found: " + filename);
}
return inputReader;
}
public static void main(String[] args) throws Exception {
BufferedReader datafile = readDataFile("ads.txt");
Instances data = new Instances(datafile);
data.setClassIndex(data.numAttributes() - 1);
//do not use first and second
Instance first = data.instance(0);
Instance second = data.instance(1);
data.delete(0);
49
data.delete(1);
Classifier ibk = new IBk();
ibk.buildClassifier(data);
double class1 = ibk.classifyInstance(first);
double class2 = ibk.classifyInstance(second);
System.out.println("first: " + class1 + "\nsecond: " + class2);
}
}
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.util;
import java.util.Arrays;
import java.util.Random;
public class np {
static {
50
seed = System.currentTimeMillis();
random = new Random(seed);
}
/**
* Sets the seed of the pseudo-random number generator. This method enables
* you to produce the same sequence of "random" number for each execution of
* the program. Ordinarily, you should call this method at most once per
* program.
*
* @param s the seed
*/
public static void setSeed(long s) {
seed = s;
random = new Random(seed);
}
/**
* Returns the seed of the pseudo-random number generator.
*
* @return the seed
*/
public static long getSeed() {
return seed;
}
/**
* Returns a random real number uniformly in [0, 1).
*
* @return a random real number uniformly in [0, 1)
51
*/
public static double uniform() {
return random.nextDouble();
}
/**
* Returns a random integer uniformly in [0, n).
*
* @param n number of possible integers
* @return a random integer uniformly between 0 (inclusive) and {@code n}
* (exclusive)
* @throws IllegalArgumentException if {@code n <= 0}
*/
public static int uniform(int n) {
if (n <= 0) {
throw new IllegalArgumentException("argument must be positive: " + n);
}
return random.nextInt(n);
}
/**
* Returns a random long integer uniformly in [0, n).
*
* @param n number of possible {@code long} integers
* @return a random long integer uniformly between 0 (inclusive) and
* {@code n} (exclusive)
* @throws IllegalArgumentException if {@code n <= 0}
*/
public static long uniform(long n) {
if (n <= 0L) {
52
throw new IllegalArgumentException("argument must be positive: " + n);
}
long r = random.nextLong();
long m = n - 1;
// power of two
if ((n & m) == 0L) {
return r & m;
}
/**
* Returns a random integer uniformly in [a, b).
*
* @param a the left endpoint
* @param b the right endpoint
* @return a random integer uniformly in [a, b)
* @throws IllegalArgumentException if {@code b <= a}
* @throws IllegalArgumentException if {@code b - a >= Integer.MAX_VALUE}
*/
public static int uniform(int a, int b) {
if ((b <= a) || ((long) b - a >= Integer.MAX_VALUE)) {
53
throw new IllegalArgumentException("invalid range: [" + a + ", " + b + ")");
}
return a + uniform(b - a);
}
/**
* Returns a random real number uniformly in [a, b).
*
* @param a the left endpoint
* @param b the right endpoint
* @return a random real number uniformly in [a, b)
* @throws IllegalArgumentException unless {@code a < b}
*/
public static double uniform(double a, double b) {
if (!(a < b)) {
throw new IllegalArgumentException("invalid range: [" + a + ", " + b + ")");
}
return a + uniform() * (b - a);
}
/**
* @param m
* @param n
* @return random m-by-n matrix with values between 0 and 1
*/
public static double[][] random(int m, int n) {
double[][] a = new double[m][n];
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
a[i][j] = uniform(0.0, 1.0);
54
}
}
return a;
}
/**
* Transpose of a matrix
*
* @param a matrix
* @return b = A^T
*/
public static double[][] T(double[][] a) {
int m = a.length;
int n = a[0].length;
double[][] b = new double[n][m];
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
b[j][i] = a[i][j];
}
}
return b;
}
/**
* @param a matrix
* @param b matrix
* @return c = a + b
*/
public static double[][] add(double[][] a, double[][] b) {
int m = a.length;
55
int n = a[0].length;
double[][] c = new double[m][n];
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
c[i][j] = a[i][j] + b[i][j];
}
}
return c;
}
/**
* @param a matrix
* @param b matrix
* @return c = a - b
*/
public static double[][] subtract(double[][] a, double[][] b) {
int m = a.length;
int n = a[0].length;
double[][] c = new double[m][n];
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
c[i][j] = a[i][j] - b[i][j];
}
}
return c;
}
/**
* Element wise subtraction
*
56
* @param a scaler
* @param b matrix
* @return c = a - b
*/
public static double[][] subtract(double a, double[][] b) {
int m = b.length;
int n = b[0].length;
double[][] c = new double[m][n];
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
c[i][j] = a - b[i][j];
}
}
return c;
}
/**
* @param a matrix
* @param b matrix
* @return c = a * b
*/
public static double[][] dot(double[][] a, double[][] b) {
int m1 = a.length;
int n1 = a[0].length;
int m2 = b.length;
int n2 = b[0].length;
if (n1 != m2) {
throw new RuntimeException("Illegal matrix dimensions.");
}
double[][] c = new double[m1][n2];
57
for (int i = 0; i < m1; i++) {
for (int j = 0; j < n2; j++) {
for (int k = 0; k < n1; k++) {
c[i][j] += a[i][k] * b[k][j];
}
}
}
return c;
}
/**
* Element wise multiplication
*
* @param a matrix
* @param x matrix
* @return y = a * x
*/
public static double[][] multiply(double[][] x, double[][] a) {
int m = a.length;
int n = a[0].length;
if (x.length != m || x[0].length != n) {
throw new RuntimeException("Illegal matrix dimensions.");
}
double[][] y = new double[m][n];
for (int j = 0; j < m; j++) {
for (int i = 0; i < n; i++) {
y[j][i] = a[j][i] * x[j][i];
}
}
58
return y;
}
/**
* Element wise multiplication
*
* @param a matrix
* @param x scaler
* @return y = a * x
*/
public static double[][] multiply(double x, double[][] a) {
int m = a.length;
int n = a[0].length;
/**
* Element wise power
*
* @param x matrix
* @param a scaler
* @return y
*/
59
public static double[][] power(double[][] x, int a) {
int m = x.length;
int n = x[0].length;
/**
* @param a matrix
* @return shape of matrix a
*/
public static String shape(double[][] a) {
int m = a.length;
int n = a[0].length;
String Vshape = "(" + m + "," + n + ")";
return Vshape;
}
/**
* @param a matrix
* @return sigmoid of matrix a
*/
public static double[][] sigmoid(double[][] a) {
int m = a.length;
60
int n = a[0].length;
double[][] z = new double[m][n];
/**
* Element wise division
*
* @param a scaler
* @param x matrix
* @return x / a
*/
public static double[][] divide(double[][] x, int a) {
int m = x.length;
int n = x[0].length;
61
}
/**
* Element wise division
*
* @param A matrix
* @param Y matrix
* @param batch_size scaler
* @return loss
*/
public static double cross_entropy(int batch_size, double[][] Y, double[][] A) {
int m = A.length;
int n = A[0].length;
double[][] z = new double[m][n];
double sum = 0;
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
sum += z[i][j];
}
}
return -sum / batch_size;
}
public static double[][] softmax(double[][] z) {
double[][] zout = new double[z.length][z[0].length];
62
double sum = 0.;
for (int i = 0; i < z.length; i++) {
for (int j = 0; j < z[0].length; j++) {
sum += Math.exp(z[i][j]);
}
}
for (int i = 0; i < z.length; i++) {
for (int j = 0; j < z[0].length; j++) {
zout[i][j] = Math.exp(z[i][j]) / sum;
}
}
return zout;
}
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]> <html class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
63
<meta name="description" content="company is a free job board template">
<meta name="author" content="">
<div id="preloader">
<div id="status"><h1>Road Traffic Speed Prediction: A Probabilistic Model Fusing Multi-Source
Data </h1></div>
</div>
64
<!-- Body content -->
<div class="header-connect">
<div class="container">
<div class="row">
<div class="col-md-5 col-sm-8 col-xs-8">
<div class="header-half header-call">
</div>
</div>
<!-- Collect the nav links, forms, and other content for toggling -->
65
<h2 align="center">Road Traffic Speed Prediction: A Probabilistic Model Fusing Multi-Source
Data </h2>
<div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
<center>
<div class="button navbar-right">
<a href="registerfacebook.jsp"><button class="navbar-btn nav-button wow bounceInRight
login" data-wow-delay="0.8s">FaceBook Register</button></a>
<a href="registertwitter.jsp"><button class="navbar-btn nav-button wow fadeInRight" data-
wow-delay="0.6s">Twitter Register</button></a>
</div>
</center>
<!--
66
<li class="wow fadeInDown" data-wow-delay="0.2s"><a href="alumni.jsp">Admin</a></li>
<li class="wow fadeInDown" data-wow-delay="0.5s"><a href="contact.html">User
Login</a></li>
</ul>
-->
</div>
<div class="content-area">
<hr>
<hr>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
<script>window.jQuery || document.write('<script src="js/vendor/jquery-
1.10.2.min.js"><\/script>')</script>
<script src="js/bootstrap.min.js"></script>
<script src="js/owl.carousel.min.js"></script>
<script src="js/wow.js"></script>
<script src="js/main.js"></script>
</div>
</div>
67
</div>
<tr>
<td>User Name:</td>
<td>
<input type="text" name="username"/>
</td>
</tr>
<tr>
<td>Password:</td>
<td>
<input type="password" name="password"/>
</td>
</tr>
68
<tr>
<td></td>
<td>
<input type="submit" value="Submit" style="color: #080808"/>
<input type="reset" value="Clear" style="color: #080808"/>
</td>
</tr>
</table>
</form>
</center>
</div>
69
</div>
</body>
</html>
Conclusion :
This project proposes a novel probabilistic framework to predict road traffic speed with multiple cross-
domain data. Existing works are mainly based on speed sensing data, which suffers data spar sity and
low coverage. In our work, wehandle the challenges arising from fusing multi-source data,including
location uncertainty, language ambiguity and data heterogeneity, using Location Disaggregation Model,
TrafficTopic model and Traffic Speed Gaussian Process Model. Experiments on real data demonstrate
the effectivenessand efficiency of our model. For Future work, we plan toimplement kernel-based and
distributive GP, so the trafficprediction framework can be applied into a real-time largetraffic network
References :
[1]B. Abdulhai, H. Porwal, and W. Recker. Short-term traffic flowprediction using neuro-genetic
algorithms.ITS JournalIntelligentTransportation Systems Journal, 7(1):3–41, 2002.
[2]R. Alfelor, H. S. Mahmassani, and J. Dong. Incorporating weatherimpacts in traffic estimation and
prediction systems.Technicalreport, US Department of Transportation, 2009.
[3]M. T. Asif, N. Mitrovic, L. Garg, and J. Dauwels.Low-dimensionalmodels for missing data imputation in
road networks. 32(3):3527– 3531, 2013.
70
[4]C. M. Bishop.Pattern Recognition and Machine Learning (InformationScience and Statistics). Springer-
Verlag New York, Inc., 2006.
[7]J. Chen, K. H. Low, Y. Yao, and P. Jaillet.Gaussianprocessdecentralized data fusion and active sensing
for spatiotemporaltraffic modeling and prediction in mobility-on-demand systems.IEEE Transactions on
Automation Science and Engineering, 12(3):1–21, 2015.
[8]P.-T. Chen, F. Chen, and Z. Qian.Road traffic congestion monitoring in social media with hinge-loss
markov random fields.InICDM, pages 80–89. IEEE, 2014.
71