Вы находитесь на странице: 1из 7

Machine Learning based Crop Prediction System Using

Multi-Linear Regression
Prof. D.S. Zingade, Omkar Buchade, Nilesh Mehta, Shubham Ghodekar, Chandan Mehta
deeplakshmisach@gmail.com, omkar543210@gmail.com, mehtanilesh13@gmail.com, smghodekar19@gmail.com,
chandanmehtahelpus@gmail.com
B.E. Computer Dept., Aissms Ioit, Kennedy Road, Pune

ABSTRACT the new crops and are not completely aware of the benefits
India being an agricultural country, its economy they get while farming them. Also, the farm productivity can
predominantly depends on agriculture yield growth and allied be increased by understanding and forecasting crop
agro industry products. In India, agriculture is largely performance in a variety of environmental conditions. Thus,
influenced by rainwater which is highly unpredictable. the proposed system takes the location of the user as an input.
Agriculture growth also depends on diverse soil parameters, From the location, the nutrients of the soil such as Nitrogen,
namely Nitrogen, Phosphorus, Potassium, Crop rotation, Soil Phosphorous, Potassium, forecasted weather is obtained. The
moisture, pH, surface temperature and weather aspects like proposed system applies Machine Learning and prediction
temperature, rainfall, etc. India now is rapidly progressing algorithm like Multiple Linear Regression to identify the
towards technical development. Thus, technology will prove pattern among data and then process it as per input conditions.
to be beneficial to agriculture which will increase crop This in turn will propose the best feasible crops according to
productivity resulting in better yields to the farmer. The given environmental conditions. As past year production is
proposed project provides a solution for Smart Agriculture by also taken into account, the prediction will be more accurate.
monitoring the agricultural field which can assist the farmers Thus, this system will suggest profitable crops providing a
in increasing productivity to a great extent. Weather forecast choice directly to the farmer.
data obtained from IMD (Indian Metrological Department)
such as temperature and rainfall and soil parameters
2. SYSTEM DESCRIPTION
repository gives insight into which crops are suitable to be There is no system existing which recommends crops based
cultivated in a particular area. This work presents a system, in on multiple factors such as Nitrogen, Phosphorus and
form of an android based application and a website, which Potassium nutrients in soil, pH and weather components
uses Machine Learning techniques in order to predict the most which include temperature and rainfall. The proposed system
profitable crop in the current weather and soil conditions. The suggests an android and a web based application, which can
proposed system will integrate the data obtained from precisely predict the most profitable crop to the farmer. The
repository, weather department and by applying machine user location is identified with the help of GPS. According to
learning algorithm: Multiple Linear Regression, a prediction user location, the feasible crops in the respective location is
identified from the soil, pH and weather database. These soils
of most suitable crops according to current environmental
conditions is made. This provides a farmer with variety of are compared with past year production database to identify
options of crops that can be cultivated. Thus, the project the most profitable crop in the current location. After this
develops a system by integrating data from various sources, processing is done at server side, the result is sent to the user’s
data analytics, prediction analysis which can improve crop android and web application. The previous production of the
yield productivity and increase the profit margins of farmer crops is also taken into account which in turn leads to precise
helping them over a longer run. crop proposition. Location is the only input for the
extrapolation system. Depending on the numerous scenarios
Keywords and additional filters according to the user requirement the
Data Analytics, Prediction, Machine learning, Multiple linear most producible crop is suggested.
regression.

1. INTRODUCTION
Agriculture is one of the most important occupation practiced
in our country. It is the broadest economic sector and plays an
important role in overall development of the country. About
60 % of the land in the country is used for agriculture in order
to suffice the needs of 1.2 billion people. Thus, modernization
of agriculture is very important and thus will lead the farmers
of our country towards profit. [1] Data analytic (DA) is the
process of examining data sets in order to draw conclusions
about the information they contain, increasingly with the aid
of specialized systems and software. [2] Earlier yield
prediction was performed by considering the farmer's
experience on a particular field and crop. However, as the
conditions change day by day very rapidly, farmers are forced
to cultivate more and more crops. Being this as the current Fig 1: System Architecture
situation, many of them don’t have enough knowledge about

Volume: 3 Issue: 2 April - 2018 31


3. PROPOSED SYSTEM ARCHITECTURE

Fig. 2: Proposed architecture of System

4. ALGORITHMIC SURVEY independent variables. The data is plotted by a technique of


Regression: Regression analysis is a form of predictive successive approximations.[9]
modelling technique which investigates the association
between a dependent (targets) and autonomous variable (s)
(independent variables).

Fig. 4: Point plot of non-linear regression


Fig. 3: Regression classification
Linear Regression: Linear regression is a linear methodology
for demonstrating the link between a scalar dependent
Non Linear Regression: Nonlinear regression is a form of
variable y and one or more independent variables denoted X.
regression breakdown in which observational data are
The instance of solitary independent variable is called simple
displayed by a function which is a nonlinear amalgamation of
linear regression.[10]
the model parameters and depends on one or more

Volume: 3 Issue: 2 April - 2018 32


model. The following model is A Multiple Linear Regression
model with two predictor variables, and .

Where,
… are coefficients of Multiple Linear Regression
… are independent variables.
The model is linear because it is linear in the
parameters , and . The model describes a plane in the
three-dimensional space of , . The parameter is
the intercept of this plane. Parameters and are referred
to as partial regression coefficients. Parameter represents
the change in the mean response corresponding to a unit
change in when is held constant.
Parameter represents the change in the mean response
corresponding to a unit change in when is held
constant.
2. Regression Coefficients:
To obtain the regression model, β should be known. β can be
Fig. 5: Point plot of linear regression estimated by method of Least Squares Estimates. The equation
for it is:
5. MULTI-LINEAR REGRESSION
A Linear Regression model that contains more than one Thus,
predictor variable is called a Multiple Linear Regression

5. Adjusted R-Square:
3. R-Square: Adjusted R2 shows how well data points fit a curve or a line,
R2 is the regression sum of squares divided by the total but adjusts the number of data points in a model. If you add
sum of squares. Alternatively, as demonstrated in this, more and more useless variables to your model, adjusted R 2
since SSTO = SSR + SSE, the quantity r2 also equals one will decrease. If you add more useful variables to your model,
minus the ratio of the error sum of squares to the total Adjusted R2 will increase.
sum of squares:
1. Since r2 is a proportion, it is always a number
between 0 and 1.
2. If r2 = 1, all of the data points fall perfectly on the Where,
regression line. N = Number of points in data sample.
3. If r2 = 0, the estimated regression line is perfectly K = Number of independent regressors i.e. number of
horizontal. variables in model, excluding constant.
6. Standard Error:
Standard Error(S) represents the average distance of values
from the regression line. It tells how wrong the regression
model is. Smaller value of S is desirable as it indicates
observations being close to regression line. It is calculated by,

7. ANOVA:
ANOVA is used to compare differences of means among
4. Multiple R: more than two groups. It does this by looking at the variation
This is the correlation coefficient. It tells you how strong the in the data and where the variation is found. Specifically,
linear relationship is. For example, a value of 1 means a ANOVA compares the amount of variation between groups to
perfect positive relationship and a value of zero means no amount of variation within groups. The ANOVA table deals
relationship at all. It is the square root of r. with multiple deciding factors of Multiple Linear Regression
Thus, by means of calculating Degrees of Freedom, Sum of
Squares, Mean of Squares and F-Test to find F-Ratio.

Volume: 3 Issue: 2 April - 2018 33


6. MATHEMATICAL
REPRESENTATION OF ALGORITHM
FOR PROPOSED SYSTEM
Train data:
……. for i=1,2, …, n
Where, …are coefficients of Multiple Linear
Regression
are independent variables.
X {weather attributes, soil attributes}
Y{production}

Fig. 7 Database of Crops


Y- production matrix X- attributes matrix B- Partial Source: https://www.apnikheti.com
coefficient matrix E- error control
= (X’X)-1 X’Y ………………Least Square Estimate
X’ - Transpose X-1 - Inverse of Matrix
Prediction: = X
Result: res =
The system of equations involved in a Multiple Linear
Regression can be represented as:

where,

Fig. 8 Nutrients of different states in India


Source: Bhimashankar Industry (Sponsoring Firm)

Matrix X is called as Design Matrix which contains


information about levels of predictor variables at which
observations are obtained. Vector β represents all
regression coefficients.

7. IMPLEMENTATION
7.1 Datasets used

Fig. 9 Rainfall & Temperature of states in India


Source: Source: https://data.gov.in

7.2 Programming Environment:


Fig. 6 Crop production for a sample crop Arcanut 1. Python:
Source: https://data.gov.in Object oriented based python programming language is used
for implement Multiple Linear Regression in the proposed

Volume: 3 Issue: 2 April - 2018 34


model as it supports handful of libraries making Android is an Open Source and Linux based OS for mobile
implementation of machine learning easier for developer. devices. It is based on Java programming language. For
Libraries used: acquiring user location we use two of the following libraries:
numpy: Used for working with N-dimensional array objects. Geocoder: It is used for reverse geocoding i.e. retrieving
Pandas: It is used for data analysis, including structures address from location on Google Map.
such as data frames. Google Map API: It is used for displaying map on the
Matplotlib: It is 2-D plotting library which has been used for website.
publication of quality graphs helping in overall analysis of
implemented project.
7.3 Farm First application Implementation
Scikit-learn: Used for splitting the data into training set and
testing set. This library has a great impact on data analysis and
data mining tasks.
2. Node.js
Node.js is an open source server framework which uses
JavaScript on the server. It is a single-threaded, non-blocking,
asynchronously programming, which is highly memory
efficient.
1.body-parser: Takes body of your request and parse it to
whatever one wants to receive in POST/PUT requests.
2.express: provides robust set of features for mobile and web
applications.
3. python-shell: Simple way to run python scripts from
node.js with efficient inter-process communication and error-
handling.
3. HTML AND CSS:
HTML is a standard markup language for creating web pages.
It is used for describing the structure of the web pages using
mark up. CSS stands for Cascading Style Sheets that
describes how HTML elements are displayed on screen, paper
or in other media.
4. JavaScript:
JavaScript is an interpreted, light-weight, network-centric
applications. It is complimentary to and integrated with Java.
It supports event-driven, functional, and imperative
programming styles. Fig. 10 Android application home screen

5. Android:

Volume: 3 Issue: 2 April - 2018 35


Fig. 11 Website home page

Fig. 12 Website result page 1.08197 1.07916 0.281


1.93008 1.9326 -0.252
7.4 Algorithm’s accuracy test
Following table shows test values taken as input and the 0.91591 0.914217 0.1693
predicted values using the algorithm. 0.9644 0.962379 0.2021
0.87233 0.870926 0.1404
Percent Change wrt
Y_Test Y_Predicted Y_Test(%) 0.76741 0.766713 0.0697
1.12486 1.12176 0.31
Table 1: Test data for crop
0.6761 0.67602 0.008 After implementation of algorithm and testing its accuracy we
0.33509 0.317066 1.8024 can see that the percent deviation between the predicted
values and tested values is very low making the algorithm
0.8654 0.864045 0.1355 efficient for real world applications. The figure below shows
the deviation graphically.

0
1 2 3 4 5 6 Y_Test
7 8 9 10

Y_Test Y_Predicted

Fig. 13 Graphical comparison between testing and predicted values

Volume: 3 Issue: 2 April - 2018 36


CONCLUSION AND FUTURE SCOPE: [5] Monali Paul, Santosh K. Vishwakarma, Ashok Verma,
The proposed system lists out all possible crops feasible in a “Analysis of Soil Behavior and Prediction of Crop Yield
particular area, helping the farmer in decision making of using Data Mining approach”, 2015 International Conference
which crop to cultivate. A careful examination of the data on Computational Intelligence and Communication Networks.
related to soil, weather, pH and past year production has been [6]Abdullah Na, William Isaac, ShashankVarshney, Ekram
done by the system and suggests which are the most profitable Khan, “An IoT Based System for Remote Monitoring of Soil
crops which can be cultivated in the apropos environmental Characteristics”, 2016 International Conference of
condition. Also, this system examines the past production of Information Technology.
data which will help the farmer get insight into the demand
and the cost of various crops in market. As maximum types [7] Dr.N.Suma, Sandra Rhea Samson, S.Saranya,
of crops will be covered under this system, farmer may get to G.Shanmugapriya, R.Subhashri, “IOT Based Smart
know about the crop which may never have been cultivated. Agriculture Monitoring System”, Feb 2017 IJRITCC.

IOT may lead to connection of all farming devices together [8] N.Heemageetha, “A survey on Application of Data Mining
with help of internet in future. Different types of sensors Techniques to Analyze the soil for agricultural purpose”,
employed in farm will give real time data of farm condition 2016IEEE.
and the devices can be used to increase the moisture, acidity,
etc. accordingly. Farm vehicles like tractor will be connected [9] https://en.wikipedia.org/wiki/Nonlinear_regression
to internet in future which will, in real time pass data to
farmer about crop harvesting and the disease crops may be [10] https://en.wikipedia.org/wiki/Linear_regression
suffering from thus helping the farmer in taking appropriate
[11] DhivyaB ,Manjula , Siva Bharathi, Madhumathi, “A
action. Further the best profitable crop can also be found in
Survey on Crop Yield Prediction based on Agricultural Data”,
light of the monetary and inflation ratio.
International Conferencence in Modern Science and
Engineering,March 2017.
8. REFERENCES
[1]https://en.wikipedia.org/wiki/Agriculture [12] GiritharanRavichandran, ,Koteeshwari R S “Agricultural
Crop Predictor and Advisor using ANN for Smartphones”,
[2]https://en.wikipedia.org/wiki/Data_analysis 2016 IEEE.
[3]JeetendraShenoy, YogeshPingle, “IOT in agriculture”, [13]R.Nagini, Dr. T.V. Rajnikanth, B.V. Kiranmayee,
2016 IEEE. “Agriculture Yield Prediction Using Predictive Analytic
Techniques , 2nd InternationalConference on Contemporary
[4] M.R. Bendre, R.C. Thool, V.R.Thool, “Big Data in Computing and Informatics (ic3i),2016
Precision agriculture”,Sept,2015 NGCT.
[14] Awanit Kumar, Shiv Kumar, “Prediction of production of
crops using K-Means and Fuzzy Logic”, IJCSMC, 201

Volume: 3 Issue: 2 April - 2018 37

Вам также может понравиться