Академический Документы
Профессиональный Документы
Культура Документы
This project is aimed at predicting how strong a storm is and what impact storms have on the
affected place. At the core of forecasting work is climatology, the study of climates and how they
change. A basic understanding of how storm works relies on the historical weather record and a
pretty good understanding of the time of year when parts of the country are at the greatest risk
and which areas of the middle and southern United States are affected. Usually when warm,
moist air left over from winter cyclones meets winds from the jet streams; it creates high winds,
tornadoes and dangerous hail.
The analysis starts from understanding system, the intention and requirements to testing the basic
main functionality and drilling down all the way to many details until all the possible issues are
discovered.
Testing process includes aspects like identifying, documenting, reviewing, scripting and
execution for predicting the level of impact of storm.
The accuracy of the existing system is not up to 100%. But this system can predict values which
are approximate up to 80% to true values.
In the existing system, there are anomalies with respect to null input values i.e., the output is
improper when null, negative or irrational values are inserted for prediction of level of storm.
The prediction of estimated loss in dollars, number of injuries occurred and fatalities that can
take place due to the storm are not being displayed.
In the proposed system, the anomalies respect to null input values are eliminated and correct
level of storm is prediction. The accuracy of the existing system is enhanced so that the output is
approximately up to 100%.
The data about prediction of estimated loss in dollars, number of injuries occurred and fatalities
that can take place due to the storm are given to the user.
This application uses different classification algorithms to predict the level of storm based on
predictor values namely magnitude, length and width of the storm. This also provides
information on the accuracy with which level of storm is predicted.
The different types of classification algorithms are Support Vector Classifier, Random forest
Classifier and K-nearest neighbor Classifier.
Description
1-(om)
Tornado number- A count of tornadoes during the year: Prior to 200, these numbers were
assigned to the tornado as the information arrived in the NWS database. Since 2007, the numbers
may have been assigned in sequential (temporal) order after event date/times are converted to
CST. However, do not use “om” to count the sequence of tornadoes through the year as
sometimes new entries have come in late, or corrections are made, and the data are not re-
sequenced.
NOTE: Tornado segments that cross state borders and for more than 4 countries will have same
OM number.
2-(yr)
Year, 1950-2009
3-(mo)
Month, 1-12
4-(dy)
Day, 1-31
5-(date)
Time in HH:MM:SS
7-(tz)
Tine zone – All times, except for? =unknown and 9=GMT, were converted to 3=CST. This
should be accounted for when building queries for GMT summaries such as 12z- 12z.
8-(st)
9-(stf)
State FIPS number (Note some Puerto Rico codes are incorrect)
10-(stn)
State number – number of this tornado, in this state, in this year: May not be sequential in some
years.
NOTE: discontinued in 2008. This number can be calculated in a spreadsheet by sorting and
after accounting for border crossing tornadoes and 4+ county segments.
F-scale (EF-scale after Jan. 2007): values -9, 0, 1, 2, 3, 4, 5 (-9=unknown). Or, hail size in
inches. Or, wind speed in knots (1 knot=1.15 mph).
12-(in)
Injuries - when summing for state totals use sn=1, not sg=1
13-(fat)
Fatalities - when summing for state totals use sn=1, not sg=1
Estimated property loss information- Prior to 1996 this is a categorization of tornado damage by
dollar amount(0 or blank-unknown;1<$50,2=$50-$500,3=$500-$5000,4=$5000-$50,000,
5=$50,000-$500,000, 6=$500,000-$5,000,000, 7=$5,000,000-$50,000,000, 8=$50,000,000-
$500,000,000, 9=5000,000,000.)when summing for state total use sn=1, not sg=1. From 996, this
is tornado property damage in millions of dollars. Note: this may change to whole dollar amounts
in the future. Entry of 0 does not mean $0.
15-(closs)
Estimated crop loss in millions of dollars (started in 2007). Entry of 0 does not mean $0.
16-(slat)
17-(slon)
18-(elat)
19-(elon)
20-(len)
Length in miles
21-(wid)
Width in yards
Understanding these fields is critical to counting state tornadoes, totaling state fatalities/losses.
23- sn= State Number: 1 or) (1= entire track info in this state)
1, 1, 1 = Entire record for the track of the tornado (unless all 4 fips codes are non-zero)
1, 0, -9 = Continuing county fips code information only from 1, 1, 1 record, above (same om)
2, 0, 1 = A two-state tornado (st=state of touchdown, other fields summarize entire track)
2, 1, 2 = First state segment for a two-state (2, 0, 1) tornado (same state as above, same om)
2, 1, 2 = Second state segment for a two-state (2, 0, 1) tornado (state tracked into, same om)
2, 0, -9 = Continuing county fips for a 2, 1, 2 record that exceeds 4 counties (same om)
3, 0, 1 = A three-state tornado (st=state of touchdown, other fields summarize entire track)
3, 1, 2 = First state segment for a three-state (3, 0, 1) tornado (state same as 3, 0, 1, same om)
3, 1, 2 = Second state segment for a three-state (3, 0, 1) tornado (2 nd state tracked into, same om
as the initial 3, 0, 1 record)
3, 1, 2 = Third state segment for a three-state (3, 0, 1) tornado (3 rd state tracked into, same om as
the initial 3, 0, 1 record)
28-(f4) 4th County FIPS code – Additional counties will be included in sg=-9 records with same
om number
Tornado database file updated to add “fc” filed for estimated F-scale rating in 2016. Valid for
records altered between 1950-1982.
29-(fc)
Between 1953 and 1982, 1864 CONUS tornadoes were coded in the official database with an
The table below explains how these tornado records were modified to provide an estimated F –
scale rating. All changed records are identified in the database by the “fc” field (fc1 if the F –
scale was changed from -9 to another value, fc=0, all unchanged F –scales)
IF property loss in Then set F –scale IF path length <=5 IF path length >5
equal to: equal to: miles add: miles add:
0,1 (<$50) 0 0 +1
2,3 (up to $5K) 1 -1 +1
4,5 (up to $500K) 2 -1 +1
6,7 (up to $50M) 3 -1 +1
8,9 (up to $5B)* 4 -1 +1
F5: None
30-(imp)
Level of storm
Low/L (mag=1)
Medium/M (mag=2)
Serious/S (mag=3)
High/H (mag=4)
Extreme/E (mag=5)
Software Requirement Specification (SRS) is the starting point of the software developing
activity. As system grew more complex it became evident that the goal of the entire system
cannot be easily comprehended. Hence the need for the requirement phase arose. The software
project is initiated by the client needs. The SRS is the means of translating the ideas of the
minds of clients (the input) into a formal document (the output of the requirement phase.)
1) Problem/Requirement Analysis:
The process is order and more nebulous of the two, deals with understand the problem, the goal
and constraints.
2) Requirement Specification:
Here, the focus is on specifying what has been found giving analysis such as representation,
specification languages and tools, and checking the specifications are addressed during this
activity.
The Requirement phase terminates with the production of the validate SRS document.
Producing the SRS document is the basic goal of this phase.
Role of SRS
The purpose of the Software Requirement Specification is to reduce the communication gap
between the clients and the developers. Software Requirement Specification is the medium
though which the client and user needs are accurately specified. It forms the basis of software
development. A good SRS should satisfy all the parties involved in the system.
There are a set of guidelines to be followed while preparing the software requirement
specification document. This includes the purpose, scope, functional and nonfunctional
requirements, software and hardware requirements of the project. In addition to this, it also
contains the information about environmental conditions required, safety and security
requirements, software quality attributes of the project etc.
The purpose of SRS (Software Requirement Specification) document is to describe the external
behavior of the application developed or software. It defines the operations, performance and
interfaces and quality assurance requirement of the application or software. The complete
software requirements for the system are captured by the SRS.
This section introduces the requirement specification document for Storm Forecasting using
Machine Learning which enlists functional as well as non-functional requirements.
For documenting the functional requirements, the set of functionalities supported by the system
are to be specified. A function can be specified by identifying the state at which data is to be
input to the system, its input data domain, the output domain, and the type of processing to be
carried on the input data to obtain the output data.
Functional requirements define specific behavior or function of the application. Following are
the functional requirements:
2.1.1.2. It is achieved by creating user-friendly screens for data entry to handle large volume of
data. The goal of designing input is to make data entry easier and to be free from errors. The data
entry screen is designed in such a way that all the data manipulations can be performed. It also
provides viewing facilities.
2.1.1.3. When the data is entered, it will check for its validity. Data can be entered with the help
of screens. Appropriate messages are provided as when needed so that user will not be in maize
of instant. Thus the objective of input design is to create an input layout that is easy to follow.
A non-functional requirement is a requirement that specifies criteria that can be used to judge the
operation of a system, rather than specific behaviors. Especially these are the constraints the
system must work within. Following are the non-functional requirements:
Performance:
The performance of the developed applications can be calculated by using following methods:
Measuring enables you to identify how the performance of your application stands in relation to
your defined performance goals and helps you to identify the bottlenecks that affect your
application performance. It helps you identify whether your application is moving toward or
away from your performance goals. Defining what you will measure, that is, your metrics, and
defining the objectives for each metric is a critical part of your testing plan.
Throughput
Resource utilization
www.spc.noaa.gov /APK
Different organizations have different phases in STLC however generic Software Test Life Cycle
(STLC) for waterfall development model consists of the following phases.
1. Requirements Analysis
2. Test Planning
3. Test Analysis
4. Test Design
8. Post Implementation
In this phase testers analyze the customer requirements and work with developers during the
design phase to see which requirements are testable and how they are going to test those
requirements.
It is very important to start testing activities from the requirements phase itself because the cost
of fixing defect is very less if it is found in requirements phase rather than in future phases.
In this phase all the planning about testing is done like what needs to be tested, how the testing
will be done, test strategy to be followed, what will be the test environment, what test
methodologies will be followed, hardware and software availability, resources, risks etc. A high
level test plan document is created which includes all the planning inputs mentioned above and
circulated to the stakeholders.
Usually IEEE 829 test plan template is used for test planning.
3. Test Analysis
After test planning phase is over test analysis phase starts, in this phase we need to dig deeper
into project and figure out what testing needs to be carried out in each SDLC phase.
Automation activities are also decided in this phase, if automation needs to be done for software
product, how will the automation be done, how much time will it take to automate and which
features need to be automated.
Non functional testing areas (Stress and performance testing) are also analyzed and defined in
this phase.
In this phase various black-box and white-box test design techniques are used to design the test
cases for testing, testers start writing test cases by following those design techniques, if
automation testing needs to be done then automation scripts also needs to written in this phase.
In this phase testers prepare more test cases by keeping in mind the positive and negative
scenarios, end user scenarios etc. All the test cases and automation scripts need to be completed
in this phase and got reviewed by the stakeholders. The test plan document should also be
finalized and verified by reviewers.
Once the unit testing is done by the developers and test team gets the test build, The test cases
are executed and defects are reported in bug tracking tool, after the test execution is complete
and all the defects are reported. Test execution reports are created and circulated to project
stakeholders.
After developers fix the bugs raised by testers they give another build with fixes to testers, testers
do re-testing and regression testing to ensure that the defect has been fixed and not affected any
other areas of software.
Testing is an iterative process i.e. If defect is found and fixed, testing needs to be done after
every defect fix.
After tester assures that defects have been fixed and no more critical defects remain in software
the build is given for final testing.
In this phase the final testing is done for the software, non functional testing like stress, load and
performance testing are performed in this phase. The software is also verified in the production
kind of environment. Final test execution reports and documents are prepared in this phase.
In this phase the test environment is cleaned up and restored to default state, the process review
meetings are done and lessons learnt are documented. A document is prepared to cope up similar
problems in future releases.
Planning Create high level test plan Test plan, Refined Specification
Design Test cases are revised; select Revised test cases, test data sets,
which test cases to automate sets, risk assessment sheet
Final testing Execute remaining stress and Test results and different metrics
performance tests, complete on test efforts
documentation
Technologies Used
History of Python
Python was developed by Guido van Rossum in the late eighties and early nineties at the
National Research Institute for Mathematics and Computer Science in the Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,
SmallTalk, and Unix shell and other scripting languages.
Python is copyrighted. Like Perl, Python source code is now available under the GNU General
Public License (GPL).
Python is now maintained by a core development team at the institute, although Guido van
Rossum still holds a vital role in directing its progress.
Importance of Python
Python is Interactive − You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
Easy-to-learn − Python has few keywords, simple structure, and a clearly defined
syntax. This allows the student to pick up the language quickly.
Easy-to-read − Python code is more clearly defined and visible to the eyes.
A broad standard library − Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.
Interactive Mode − Python has support for an interactive mode which allows interactive
testing and debugging of snippets of code.
Portable − Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
Extendable − You can add low-level modules to the Python interpreter. These modules
enable programmers to add to or customize their tools to be more efficient.
GUI Programming − Python supports GUI applications that can be created and ported
to many system calls, libraries and windows systems, such as Windows MFC,
Macintosh, and the X Window system of Unix.
Scalable − Python provides a better structure and support for large programs than shell
scripting.
It provides very high-level dynamic data types and supports dynamic type checking.
It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
scikit-learn - the machine learning algorithms used for data analysis and data mining
tasks.
Hypertext Markup Language (HTML) is the standard markup language for creating web
pages and web applications. With Cascading Style Sheets (CSS) and JavaScript, it forms a triad
of cornerstone technologies for the World Wide Web. Web browsers receive HTML documents
from a web server or from local storage and render them into multimedia web pages. HTML
describes the structure of a web page semantically and originally included cues for the
appearance of the document.
HTML elements are the building blocks of HTML pages. With HTML constructs, images and
other objects, such as interactive forms, may be embedded into the rendered page. It provides a
means to create structured documents by denoting structural semantics for text such as headings,
paragraphs, lists, links, quotes and other items. HTML elements are delineated by tags, written
using angle brackets. Tags such as <img /> and <input /> introduce content into the page directly.
Others such as <p>...</p> surround and provide information about document text and may include
other tags as sub-elements. Browsers do not display the HTML tags, but use them to interpret the
content of the page.
HTML can embed programs written in a scripting language such as JavaScript which affect the
behavior and content of web pages. Inclusion of CSS defines the look and layout of content.
The World Wide Web Consortium (W3C), maintainer of both the HTML and the CSS standards,
has encouraged the use of CSS over explicit presentational HTML since 1997.
History of HTML
From 1991 to 1999, HTML developed from version 1 to version 4.
In year 2000, the World Wide Web Consortium (W3C) recommended XHTML 1.0. The XHTML
syntax was strict, and the developers were forced to write valid and "well-formed" code.
In 2004, W3C's decided to close down the development of HTML, in favor of XHTML.
In 2004 - 2006, the WHATWG gained support by the major browser vendors.
Features of HTML:
Web Workers: Certain web applications use heavy scripts to perform functions. Web
Workers use separate background threads for processing and it does not affect the
performance of a web page.
Video: You can embed video without third-party proprietary plug-ins or codec. Video
becomes as easy as embedding an image.
Canvas: This feature allows a web developer to render graphics on the fly. As with video,
there is no need for a plug in.
Application caches: Web pages will start storing more and more information locally on
the visitor's computer. It works like cookies, but where cookies are small, the new
feature allows for much larger files. Google Gears is an excellent example of this in
action.
Geolocation: Best known for use on mobile devices, geolocation is coming with
HTML5.
Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation of a
document written in a markup language. Although most often used to set the visual style of web
pages and user interfaces written in HTML and XHTML, the language can be applied to
any XML document, including plain XML, SVG and XUL, and is applicable to rendering
in speech, or on other media. Along with HTML and JavaScript, CSS is a cornerstone technology
used by most websites to create visually engaging webpages, user interfaces for web
applications, and user interfaces for many mobile applications.
Features of CSS:
Eventually, CSS -- along with HTML5 -- are going to be the future of the web. You should begin
making your Web pages compatible with these latest specifications. In this article, I explore 10 of
the exciting new features in CSS 3, which is going to change the way developers who used CSS2
build websites.
Selectors
In addition to the selectors that were available in CSS2, CSS introduces some new selectors.
Using these selectors you can choose DOM elements based on their attributes. So you don't need
to specify classes and IDs for every element. Instead, you can utilize the attribute field to style
them.
Rounded Corners
Rounded corner elements can spruce up a website, but creating a rounded corner requires a
designer to write a lot of code. Adjusting the height, width and positioning of these elements
is a never-ending chore because any change in content can break them.
CSS addresses this problem by introducing the border-radius property, which gives you the same
rounded-corner effect and you don't have to write all the code. Here are examples for displaying
rounded corners in different places of a website.
Border Image
Box Shadow
A box shadow allows you to create a drop shadow for an element. Usually this effect is achieved
using a repeated image around the element. However, with the property box-shadow this can be
achieved by writing a single line of CSS code.
After previously removing this property from the CSS 3 Backgrounds and Borders Module, the
W3C added it back in the last working draft.
Text Shadow
The new text-shadow property allows you to add drop shadows to the text on a webpage. Prior to
CSS 3, this would be done by either using an image or duplicating a text element and then
positioning it. A similar property called box-shadow is also available in CSS.
Gradient
While the Gradient effect is a sleek Web design tool, it can be a drain on resources if not
implemented correctly using current CSS techniques. Some designers use a complete image and
put in the background for the gradient effect, which increases the page load time.
RGBA: Color, Now with Opacity
The RGB property in CSS is used to set colors for different elements. CSS 3 introduces a
modification and added opacity manipulation to this property. RGB has been transformed to
RGBA (Red Green Blue Alpha channels), which simplifies how you control the opacity of
elements.
Transform (Element Rotation)
CSS 3 also introduced a property called transform, which enables rotating Web elements on a
webpage. As of now, if a designer wants to rotate of an element, he or she uses JavaScript. Many
JavaScript extensions/plugins are available online for this feature, but they can make the code
cumbersome and most importantly consume more resources.
Multicolumn Layout
Almost every webpage today is divided into columns or boxes, and adjusting these boxes so they
display correctly on different browsers takes a toll on Web designers. CSS 3 solves this problem
Web Fonts
CSS 3 also facilitates embedding any custom font on a webpage. Fonts are dependent on the
client system and Web pages can render only fonts that are supported by the browser or the client
machine. By using the @font-face property, you can include the font from a remote location and
can then use it.
The standout advantage of CSS is the added design flexibility and interactivity it brings to web
development. Developers have greater control over the layout allowing them to make precise
section-wise changes.
As customization through CSS is much easier than plain HTML, web developers are able to
create different looks for each page. Complex websites with uniquely presented pages are
feasible thanks to CSS.
Makes Updates Easier and Smoother
CSS works by creating rules. These rules are simultaneously applied to multiple elements within
the site. Eliminating the repetitive coding style of HTML makes development work faster and
less monotonous. Errors are also reduced considerably.
Since the content is completely separated from the design, changes across the website can be
implemented all at once. This reduces delivery times and costs of future edits.
Helps Web Pages Load Faster
Improved website loading is an underrated yet important benefit of CSS. Browsers download the
CSS rules once and cache them for loading all the pages of a website. It makes browsing the
website faster and enhances the overall user experience.
This feature comes in handy in making websites work smoothly at lower internet speeds.
Accessibility on low end devices also improves with better loading speeds.
Browser Dependent
The only major limitation of CSS is that its performance depends largely on browser support.
Besides compatibility, all browsers (and their many versions) function differently. So your CSS
needs to account for all these variations.
However, in case your CSS styling isn’t fully supported by a browser, people will still be able to
experience the HTML functionalities. Therefore, you should always have a well-structured
HTML along with good CSS.
Difficult to retrofit in old websites
The instinctive reaction after learning the many advantages of CSS is to integrate it into your
existing website. Sadly, this isn’t a simple process. CSS style sheets, especially the latest
versions, have to be integrated into the HTML code at the ground level and must also be
compatible with HTML versions. Retrofitting CSS into older websites is a slow tedious process.
There is also the risk of breaking the old HTML code altogether and thus making the site dead.
It’s best to wait till you redesign your website from scratch.
As you can see from above points, the advantages of CSS development outweigh its limitations.
It is a very useful web development tool that every programmer must master along with basic
HTML.
The Unified Modeling Language allows the software engineer to express an analysis model
using the modeling notation that is governed by a set of syntactic, semantic and pragmatic rules.
A UML system is represented using five different views that describe the system from distinctly
different perspective. Each view is defined by a set of diagram, which is as follows:
i. In this model, the data and functionality are arrived from inside the system.
ii. This model view models the static structures.
In this view, the structural and behavioral as parts of the system are represented as they are to be
built.
In this view, the structural and behavioral aspects of the environment in which the system is to be
implemented are represented.
To model a system, the most important aspect is to capture the dynamic behavior. To clarify a bit
in details, dynamic behavior means the behavior of the system when it is running /operating.
So only static behavior is not sufficient to model a system rather dynamic behavior is more
important than static behavior. In UML there are five diagrams available to model dynamic
nature and use case diagram is one of them. Now as we have to discuss that the use case diagram
is dynamic in nature there should be some internal or external factors for making the interaction.
These internal and external agents are known as actors. So use case diagrams are consists of
actors, use cases and their relationships. The diagram is used to model the system/subsystem of
an application. A single use case diagram captures a particular functionality of a system. So to
model the entire system numbers of use case diagrams are used.
Use case diagrams are used to gather the requirements of a system including internal and
external influences. These requirements are mostly design requirements. So when a system is
analyzed to gather its functionalities use cases are prepared and actors are identified.
The aim of a sequence diagram is to define event sequences, which would have a desired
outcome. The focus is more on the order in which messages occur than on the message per se.
However, the majority of sequence diagrams will communicate what messages are sent and the
order in which they tend to occur.
Class roles describe the way an object will behave in context. Use the UML object symbol to
illustrate class roles, but don't list object attributes.
Activation boxes represent the time an object needs to complete a task. When an object is busy
executing a process or waiting for a reply message, use a thin gray rectangle placed vertically on
its lifeline.
Messages
Messages are arrows that represent communication between objects. Use half-arrowed lines to
represent asynchronous messages.
Asynchronous messages are sent from an object that will not wait for a response from the
receiver before continuing its tasks.
Lifelines
Lifelines are vertical dashed lines that indicate the object's presence over time.
Objects can be terminated early using an arrow labeled "<< destroy >>" that points to an X. This
object is removed from memory. When that object's lifeline ends, you can place an X at the end
of its lifeline to denote a destruction occurrence.
Loops
A repetition or loop within a sequence diagram is depicted as a rectangle. Place the condition for
exiting the loop at the bottom left corner in square brackets [ ].
Guards
When modeling object interactions, there will be times when a condition must be met for a
message to be sent to an object. Guards are conditions that need to be used throughout UML
diagrams to control flow.
Step 1:
Step 2:
import pandas as pd
import numpy as np
import warnings
Step 3:
Step 4:
Create two HTML pages, one for input and one for output (input.html and result.html)
Step 5:
Step 6:
Step 7:
Stop
app = Flask(__name__)
@app.route('/')
def student():
return render_template('input.html')
@app.route('/result', methods=['POST'])
def result():
magnitude = request.form['magnitude']
length=request.form['length']
width=request.form['width']
if (s1 == 0 ):
r1="VERY LOW"
elif (s1 ==1):
r1="LOW"
elif (s1 == 2 ):
r1="MEDIUM"
elif (s1 == 3 ):
r1="STRONG"
elif (s1 == 4):
r1="HIGH"
else:
r1="VERY HIGH"
print(r1)
Knn.getinput(magnitude,length,width)
result2=Knn.knnresult()
s2=result2[0]
a2 = result2[1][0]
if (s2 == 0 ):
r="VERY LOW"
elif (s2 ==1):
r="LOW"
elif (s2 == 2 ):
r="MEDIUM"
elif (s2 == 3 ):
r="STRONG"
return render_template("result.html",r1=r1,r=r,a=a,a2=a2)
if __name__ == '__main__':
app.run(debug=True)
<!DOCTYPE html>
<head>
<style>
div{
background-color: #2c198c;
color: white;
font-size:350%;
align:top;
text-align:center;
}
body{
background-color:#FFFFF;
}
</style>
</head>
<div>STORM FORECASTING USING MACHINE LEARNING</div>
<body>
<!DOCTYPE html>
<html>
<head>
<style>
div{
background-color: #2c198c;
color: white;
font-size:400%;
align:top;
text-align:center;
}
body{
background-color:#FFFFF;
}
</style>
</head>
<div>LEVEL OF STORM PREDICTION</div>
</center>
</body>
</html>
class Svm():
@staticmethod
def svmresult():
print(Svm.l1)
defadv=pd.read_csv('C:\\Users\\HRIDDHI\\Desktop\\dataset.csv')
number = LabelEncoder()
defadv['imp'] = number.fit_transform(defadv['imp'].astype('str'))
defadv.drop(['mo'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['yr'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['dy'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['time'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['tz'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['slat'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['slon'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['elat'],axis=1,inplace=True)
defadv[:5]
warnings.filterwarnings("ignore")
df=pd.DataFrame({'Magnitude':defadv.mag,'injuries':defadv.inj,'fatalities':defadv.fat,'loss':def
adv.loss,'croploss':defadv.closs,'length':defadv.len,'width':defadv.wid})
x=defadv[['mag','len','wid']]
y=defadv['imp']
svc = SVC(probability=True)
svc.fit(x_train,y_train)
ypred=svc.predict(x_test)
yt=[]
yp=[]
for i in y_test :
dec=svc.decision_function(x)
acc=accuracy_score(y_test, ypred)
precision=precision_score(y_test, ypred, average = 'macro')
recall=recall_score(y_test, ypred, average = 'macro')
f1score=f1_score(y_test, ypred, average = 'macro')
l2=[acc,precision,recall,f1score]
cm1=confusion_matrix(y_test,svc.predict(x_test))
print(cm1)
print(l2)
return svc.predict([Svm.l1]),l2,yt,yp
class Knn():
l1=[]
@staticmethod
def getinput(mag, len, wid):
Knn.l1.append(int(mag))
Knn.l1.append(float(len))
Knn.l1.append(float(wid))
print(Knn.l1)
@staticmethod
def knnresult():
defadv = pd.read_csv('C:\\Users\\HRIDDHI\\Desktop\\dataset.csv')
defadv.drop(['mo'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['yr'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['dy'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['time'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['tz'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['stn'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['slat'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['slon'],axis=1,inplace=True)
defadv[:5]
defadv.drop(['elat'],axis=1,inplace=True)
defadv[:5]
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(x_train, y_train)
ypred1=neigh.predict(x_test)
print(ypred1)
cm1=confusion_matrix(y_test,neigh.predict(x_test))
print(cm1)
y_prob = neigh.predict_proba(x_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, y_prob, pos_label=1)
roc_auc = auc(fpr, tpr)
plt.title('K-nearest neighbour')
plt.plot(fpr, tpr, 'b', label='AUC = %s'% round(roc_auc, 4))
return neigh.predict([Knn.l1]),l2
6. SYSTEM ANALYSIS
Machine learning algorithms used for natural language processing (NLP) currently take too long
to complete their learning function. This slow learning performance tends to make the model
ineffective for an increasing requirement for real time applications such as voice transcription,
language translation, text summarization topic extraction and sentiment analysis. Moreover,
Libraries imported
Following machine learning libraries are being implemented in this project for testing:
1. sklearn/scikit-learn
Classification report
Classification report is used to evaluate a model’s predictive power. It is one of the most critical
steps in machine learning.
After you have trained and fitted your machine learning model it is important to evaluate the
model’s performance.
One way to do this is by using sklearn’s classification report.
It provides the following that will help in evaluating the model:
Precision
Recall
Support
The first step is importing the classification_report library.
from sklearn.metrics import classification_report
Once the library has been imported you can now run the classification report with this Python
command:
print(classification_report(y_test,predictions))
y_test the dependent variable from your test data set. (train-test split of data)
predictions is the data output of your model.
Make sure that the exact arrangement where y_test variable comes before predictions variable in
the Python code is followed.
If not the it will give a wrong model performance which will lead to a wrong evaluation.
Here is a sample output for classification_report:
You can see here that on average the model has predicted 85% of the classification correctly.
For Class 0.0 it has predicted 86% of the test data correctly.
Classification_report is also useful when comparing two models with different specifications
against each other and determining which model is better to use.
Confusion matrix
A confusion matrix is a summary of prediction results on a classification problem. The number
of correct and incorrect predictions are summarized with count values and broken down by each
class. This is the key to the confusion matrix.
The confusion matrix shows the ways in which your classification model
is confused when it makes predictions. It gives you insight not only into the errors being made by
your classifier but more importantly the types of errors that are being made.
It is this breakdown that overcomes the limitation of using classification accuracy alone.
1. You need a test dataset or a validation dataset with expected outcome values.
2. Make a prediction for each row in your test dataset.
3. From the expected outcomes and predictions count:
a. The number of correct predictions for each class.
b. The number of incorrect predictions for each class, organized by the class that was predicted.
These numbers are then organized into a table, or a matrix as follows:
Expected down the side: Each row of the matrix corresponds to a predicted class.
Predicted across the top: Each column of the matrix corresponds to an actual class.
The counts of correct and incorrect classification are then filled into the table.
The total number of correct predictions for a class goes into the expected row for that class value
and the predicted column for that class value.
In the same way, the total number of incorrect predictions for a class goes into the expected row
for that class value and the predicted column for that class value.
This matrix can be used for 2-class problems where it is very easy to understand, but can easily
be applied to problems with 3 or more class values, by adding more rows and columns to the
confusion matrix.
ROC curve
The ROC curve stands for Receiver Operating Characteristic curve, and is used to visualize
the performance of a classifier. When evaluating a new model performance, accuracy can be
very sensitive to unbalanced class proportions. The ROC curve is insensitive to this lack of
balance in the data set.
On the other hand when using precision and recall, we are using a single discrimination threshold
to compute the confusion matrix. The ROC Curve allows the modeler to look at the performance
of his model across all possible thresholds. To understand the ROC curve we need to understand
the x and y axes used to plot this. On the x axis we have the false positive rate, FPR or fall-out
rate. On the y axis we have the true positive rate, TPR or recall.
2) Generate actual and predicted values. First let use a good prediction probabilities array:
1 actual = [1,1,1,0,0,0]
2 predictions = [0.9,0.9,0.9,0.1,0.1,0.1]
3) Then we need to calculated the fpr and tpr for all thresholds of the classification. This is where
the roc_curve call comes into play. In addition we calculate the auc or area under the curve
which is a single summary value in [0,1] that is easier to report and use for other purposes. You
usually want to have a high auc value from your classifier.
4) Finally we plot the fpr vs tpr as well as our auc for our very good classifier.
The figure show how a perfect classifier roc curve looks like:
Figure 8: Graph 1
Here the classifier did not make a single error. The AUC is maximal at 1.00. Let’s see what
happens when we introduce some errors in the prediction.
1 actual = [1,1,1,0,0,0]
2 predictions = [0.9,0.9,0.1,0.1,0.1,0.1]
The sklearn.preprocessing package provides several common utility functions and transformer
classes to change raw feature vectors into a representation that is more suitable for the
downstream estimators.
In general, learning algorithms benefit from standardization of the data set. If some outliers are
present in the set, robust scalers or transformers are more appropriate. The behavior of the
different scalers, transformers, and normalizers on a dataset containing marginal outliers is
highlighted in Compare the effect of different scalers on data with outliers.
The preprocessing module further provides a utility class StandardScaler that implements the
Transformer API to compute the mean and standard deviation on a training set so as to be able to
later reapply the same transformation on the testing set. This class is hence suitable for use in the
early steps of a sklearn.pipeline.Pipeline:
Label encoding:
LabelEncoder is a utility class to help normalize labels such that they contain only values
between 0 and n_classes-1. This is sometimes useful for writing efficient Cython routines.
It can also be used to transform non-numerical labels (as long as they are hashable and
comparable) to numerical labels:
sklearn.model_selection.train_test_split(*arrays, **options):
Split arrays or matrices into random train and test subsets
Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to
input data into a single call for splitting (and optionally subsampling) data in a oneliner.
Feature selection:
The classes in the sklearn.feature_selection module can be used for feature
selection/dimensionality reduction on sample sets, either to improve estimators’ accuracy scores
or to boost their performance on very high-dimensional datasets.
Removing features with low variance:
VarianceThreshold is a simple baseline approach to feature selection. It removes all features
whose variance doesn’t meet some threshold. By default, it removes all zero-variance features,
i.e. features that have the same value in all samples.
Univariate feature selection works by selecting the best features based on univariate statistical
tests. It can be seen as a preprocessing step to an estimator. Scikit-learn exposes feature selection
routines as objects that implement the transform method.
2. matplotlib
Matplotlib:
Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety
of hardcopy formats and interactive environments across platforms. Matplotlib can be used in
Python scripts, the Python and IPythonshells, the Jupyter notebook, web application servers, and
four graphical user interface toolkits.
Matplotlib tries to make easy things easy and hard things possible. You can generate plots,
histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code.
For examples, see the sample plots and thumbnails gallery.
For simple plotting the pyplot module provides a MATLAB-like interface, particularly when
combined with IPython. For the power user, you have full control of line styles, font properties,
axes properties, etc, via an object oriented interface or via a set of functions familiar to
MATLAB users.
Pyplot :
Pylab combines pyplot with numpy into a single namespace. This is convenient for interactive
work, but for programming it is recommended that the namespaces be kept separate.
3. NumPy
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional
container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly
and speedily integrate with a wide variety of databases.
import numpy as np
4. Pandas
import pandas as pd
Library features
Tools for reading and writing data between in-memory data structures and different file
formats.
The library is highly optimized for performance, with critical code paths written in Cython or C.
Support Vector Machines(SVMs) have been extensively researched in the data mining and
machine learning communities for the last decade and actively applied to applications in various
domains. SVMs are typically used for learning classification, regression, or ranking functions,
for which they are called classifying SVM, support vector regression (SVR), or ranking SVM (or
RankSVM) respectively. Two special properties of SVMs are that SVMs achieve (1) high
generalization by maximizing the margin and (2) support an efficient learning of nonlinear
functions by kernel trick.
Working:
• Family of machine-learning algorithms that are used for mathematical and engineering
problems including for example handwriting digit recognition, object recognition, speaker
identification, face detections in images and target detection.
• Task: Assume we are given a set S of points xi ? R n with i =1,2,..., N. Each point xi belongs to
either of two classes and thus is given a label yi ? {-1,1}. The goal is to establish the equation of
a hyperplane that divides S leaving all the points of the same class on the same side.
• SVM performs classification by constructing an N-dimensional hyperplane that optimally
separates the data into two categories.
Kernel Functions:
• Kernel Function computes the similarity of two data points in the feature space using dot
product.
Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A, B
and C). Now, identify the right hyper-plane to classify star and circle
You need to remember a thumb rule to identify the right hyper-plane: “Select the hyper-
plane which segregates the two classes better”. In this scenario, hyper-plane “B”
has excellently performed this job.
Identify the right hyper-plane (Scenario-2): Here, we have three hyper-planes (A, B
and C) and all are segregating the classes well. Now, How can we identify the right
hyper-plane?
Here, maximizing the distances between nearest data point (either class) and hyper-plane
will help us to decide the right hyper-plane. This distance is called as Margin. Let’s look
at the below snapshot:
Above, you can see that the margin for hyper-plane C is high as compared to both A and
B. Hence, we name the right hyper-plane as C. Another lightning reason for selecting the
hyper-plane with higher margin is robustness. If we select a hyper-plane having low
margin then there is high chance of miss-classification.
Some of you may have selected the hyper-plane B as it has higher margin compared to A. But,
here is the catch, SVM selects the hyper-plane which classifies the classes accurately prior
to maximizing margin. Here, hyper-plane B has a classification error and A has classified all
correctly. Therefore, the right hyper-plane is A.
Can we classify two classes (Scenario-4)?: Below, I am unable to segregate the two
classes using a straight line, as one of star lies in the territory of other(circle) class as an
outlier.
As I have already mentioned, one star at other end is like an outlier for star class. SVM
has a feature to ignore outliers and find the hyper-plane that has maximum margin.
Hence, we can say, SVM is robust to outliers.
SVM can solve this problem. Easily! It solves this problem by introducing additional
feature. Here, we will add a new feature z=x^2+y^2. Now, let’s plot the data points on
axis x and z:
o In the original plot, red circles appear close to the origin of x and y axes, leading
to lower value of z and star relatively away from the origin result to higher value
of z.
In SVM, it is easy to have a linear hyper-plane between these two classes. But, another
burning question which arises is, should we need to add this feature manually to have a
hyper-plane. No, SVM has a technique called the kerneltrick These are functions which
takes low dimensional input space and transform it to a higher dimensional space i.e. it
converts not separable problem to separable problem, these functions are called kernels.
It is mostly useful in non-linear separation problem. Simply put, it does some extremely
complex data transformations, then find out the process to separate the data based on the
labels or outputs you’ve defined.
When we look at the hyper-plane in original input space it looks like a circle:
Kernel Parameter:
SVM plays an important role in classification. Here different kernel parameters are used as a
tuning parameter to improve the classification accuracy. There are mainly four different types of
kernels (Linear, Polynomial, RBF, and Sigmoid) that are popular in SVM classifier.
Gamma Parameter:
Gamma is the parameters for a nonlinear support vector machine (SVM) with a Gaussian radial
basis function kernel. A standard SVM seeks to find a margin that separates all positive and
negative examples. Gamma is the free parameter of the Gaussian radial basis function.
2. K-Nearest Neighbour
It is as simple as that.
KNN has no model other than storing the entire dataset, so there is no learning required.
Efficient implementations can store the data using complex data structures like k-d trees to make
look-up and matching of new patterns during prediction efficient.
Because the entire training dataset is stored, you may want to think carefully about the
consistency of your training data. It might be a good idea to curate it, update it often as new data
becomes available and remove erroneous and outlier data.
To determine which of the K instances in the training dataset are most similar to a new input a
distance measure is used. For real-valued input variables, the most popular distance measure
is Euclidean distance.
Euclidean distance is calculated as the square root of the sum of the squared differences between
a new point (x) and an existing point (xi) across all input attributes j.
Euclidean is a good distance measure to use if the input variables are similar in type (e.g. all
measured widths and heights). Manhattan distance is a good measure to use if the input variables
are not similar in type (such as age, gender, height, etc.).
The value for K can be found by algorithm tuning. It is a good idea to try many different values
for K (e.g. values from 1 to 21) and see what works best for your problem.
The computational complexity of KNN increases with the size of the training dataset. For very
large training sets, KNN can be made stochastic by taking a sample from the training dataset
from which to calculate the K-most similar instances.
KNN has been around for a long time and has been very well studied. As such, different
disciplines have different names for it, for example:
When KNN is used for regression problems the prediction is based on the mean or the median of
the K-most similar instances.
When KNN is used for classification, the output can be calculated as the class with the highest
frequency from the K-most similar instances. Each instance in essence votes for their class and
the class with the most votes is taken as the prediction.
Class probabilities can be calculated as the normalized frequency of samples that belong to each
class in the set of K most similar instances for a new data instance. For example, in a binary
classification problem (class is 0 or 1):
If you are using K and you have an even number of classes (e.g. 2) it is a good idea to choose a
K value with an odd number to avoid a tie. And the inverse, use an even number for K when you
have an odd number of classes.
Ties can be broken consistently by expanding K by 1 and looking at the class of the next most
similar instance in the training dataset.
From the above table, the efficiency of SVM is greater than the efficiency of KNN. Hence on
comparing the above two algorithms,we say that SVM algorithm is more efficient than KNN.
Therefore we use SVM algorithm to predict the level of storm for the parameters given by
the user.
Test Cases:
Output:
Output:
The accuracy of the existing system will be made approximately up to 100% to provide more
accurate information about level of storm.
The anomalies with respect to null input values can be eliminated to predict correct level of
storm.
Providingbrief data about amount of estimated loss in dollars, number of injuries and fatalities
that can take place due to impact of the storm to educate users about theconsequences caused by
storms.
However, in recent years, with the advancement in technology, it has been possible to forecast
storms correctly using machine learning techniques namely Support Vector Machine (SVM) and
K-Nearest Neighbor (KNN).
In our project, we implemented these algorithms to predict the level of storm based on
parameters like magnitude, length and width of the storm and determine the roc curve for the
best working algorithm.
1. https://www.ncdc.noaa.gov/stormevents/ftp.jsp
2. https://www.kaggle.com/jtennis/spctornado/data
3. https://machinelearningmastery.com/
4. http://stackabuse.com/using-machine-learning-to-predict-the-weather-part-1/
5. http://www.spc.noaa.gov/wcm/data/SPC_severe_database_description.pdf