Вы находитесь на странице: 1из 444

Informatics Practices

with PYTHON
for class XII
Krishan Meena PGT IP
Working with NumPy
• NumPy and pandas as very use full and
important libraries of PYTHON.
• Recall before we proceed that how to start
and use NumPy.
• You must need to import the library by using
the following command:
What is numpy arrays
• A named group of same type of elements.
• For example students marks
• [55,59,62,96,69]
• All the above values are same types.
• They looks like list.
Types of arrays
• Two types:
• 1D array known as vectors (have single
row/column only)
• Multi dimensional array as matrices (can have
multiple rows and columns) 2D array is an
example of multi dimensional array.
2D numpy array
• If you want to determine the number of
elements in 2D array
Accessing array
Anatomy of numpy array
• Axes
• An array can be 1D then only one row
2D array axes
Multidimensional array axes
shape
• It tell the number of elements along each axis
of it.
Data type
• dtype it tells about the type of data stored in
ndarray.
Item size
• Size of each element of an ndarray in bytes.
• It is always as per data type if data type is int
16 then item size is 2 byte equal to 16 bits.
• Let see with example where the data type is
int 32-
What it data type is float 64?
Numpy array us list
• Once created can’t change it’s size
• Overwriting is possible for existing array
• They must contain same types of values or
elements (same data types values)
• They support vectorized operation (apply
function it perform on each item)
Creating array
• Using array()
• Lsit=[1,2]
• Arraylist=np.array(List)
• This is useful for creating array form existing
list and tuples.
• If you pass other sequence like dictionary it
will create array but the way you access it will
show error let see-.
fromiter() function
• It is use to create array form any sequence
(like numeric sequence, string sequence and
dictionary etc.)
• It is very useful when you want to create array
from non numeric sequence.
• np.fromiter(sequence, data type)
• Let see-
Creating array for individual letters
Count
• The fromiter function can also support one
more argument count.
• You can put a limit on how many element
from the iterable are to be picked for ndarray.
• For example from a big string you want to
create a ndarray for only 10 letters then let
see-
Quiz
• Only 3 elements are taken from dictionary
adict({1:’a’,2:’b’,3:’c’,4:’d’,5:’e’,6:’f’})
• Create ndarry for above dictionary as per
condition.
arange
• We can create array with using numerical
range arnage()
• It is similar like range() function but it return
array and range() will return list.
• Let see
Jump or step
linspace()
• When you want to put space between two
given limits like the next example
Quiz
• Create ndarray with six value falling in range
of 2.5 to 5
2D array slices
Column
Row index
0 1 2 3 4 index

0
1
2
3
4
5
6
7
1
2
3
4
5
6
7
Upper limit will
Be ignored as per
Slicing rule so it
start with 0 row
and end on 2 row

Row starting
With 0
Row end on 3
index
Same for
Column start with 3
And no end so all
Remaining will come

column starting
With 3
Column end not
Defined so all
Will result
Quiz
result
• array([[13, 14]])
Quiz
Result
Joining numpy array
• We can join tow or more than two array.
• hstack() function will join array in horizontally
• vstack() function can join array in vertically
• concatenate() it can join array in axis 0 or axis
1
vstack()
• Ar1=np.array([[0,1,2],[3,4,5],[6,7,8]])
• Ar2=np.array([[10,11,12],[13,14,15],[16,17,18]])
• Ar3=np.vstack((Ar1,Ar2))
• Ar3
• Array([[0,1,2],
• [3,4,5],
• [6,7,8],
• [10,11,12],
• [13,14,15],
• [16,17,18]])
hstack()
• Ar1=np.array([[0,1,2],[3,4,5],[6,7,8]])
• Ar2=np.array([[10,11,12],[13,14,15],[16,17,18]
])
• Ar3=np.hstack((Ar1,Ar2))
• Ar3
• array([[0,1,2,10,11,12],
• [3,4,5,13,14,15],
• [6,7,8,16,17,18]])
concatenate()
• Jo1=np.concatenate((Ar1,Ar2),axis=0)
• Jo1

• Note if axis is 0 then the shape of array being


joined must match on column dimension
• And if axis is 1 the shape of array being joined
must match on row dimension.
They will join only if
The shape are matching
3 rows and 3 columns

Columns are same


2 rows and
But row are
3 columns
different
Quiz
• Join=np.concatenate((ar2,ar1),axis=0)
• Join
• Result ?
Result
• Join=np.concatenate((ar3,ar4),axis=0)
• Join

• Join=np.concatenate((ar3,ar4),axis=1)
• Join
Splitting numpy array
• This function divides the array into subarrays
along a specified axis.
• The array of numpy can be split horizontally,
vertically using three function
• 1 hsplit()
• 2 vsplit()
• 3 split()
hsplit()
vsplit()
split()
Axis 0 means we have to
Split in horizontally on
vertical axis
Axis 1 means we have to
Split in vertically on
Horizontal axis

1,3
[0:1] [1:3] [3: ]
Quiz np.split(ary,[2,5],axis=1)
Extracting
• Extracting Condition based non contiguous
subset
• The specified condition will be applied on
each element of the array
• It can possible with extract() function
• Let see
• numpy.extract(condition,array)
• First let see how to put condition on array
Module or reminder
As we need 1 if array
Divided by 5
Arithmetic operation on 2D array
Shape must
same
Application of numpy arrays
• Covariance
• Correlation
• Linear regression
Descriptive statistics with pandas
• Also called aggregate function
• min()
• max()
• mode()
• mean()
• median()
• count()
• sum()
• quantile()
• var()
min()
• It use to find minimum values from data frame
• Structure: dataframe.min()
• dataframe.min
• (axis=None,skipna=None,numeric_only=None)
Axis 1 it find
Min values in
columns
If axis=0
max()
• It use to find maximum values from data
frame
• Structure: dataframe.max()
• dataframe.max
• (axis=None,skipna=None,numeric_only=None
Quiz
• 1. salesdf.max(axis=0)
• 2. salesdf.max(axis=1)
• `
quiz
mode()
• It use to find the values that appear most
often from data frame
• Structure: dataframe.mode()
• dataframe.mode
• (axis=None,numeric_only=None)
• Default axis will be 0 only.
Most often appearing values
Are 95 and 99
Now if you give command
marksdf.mode() first it find
Most often appearing values in
Both columns.
Then remaining area filled by
NaN
It take all
Values
95
99

Gets the mode(s) of each element along the axis selected.


Adds a row for each mode per label, fills in gaps with nan.
mean() and median()
• Return mean and median of the values for the
requested axis.
• Structure: dataframe.mean()
• It always skip NaN values when computing result.
• As we had earlier seen in covariance same it will
work

• Let see
The following values get
Added and divided by no of rows
99+95+95+99=388
And 388/4=97
Quiz
NaN
axis=1
• Axis 0 or index: it get result of each column
• Axis 1 or column it get result of each row
median()
• It return middle number from a set of
numbers
• Let see
They are the mid numbers
count()
• It count non NaN values in each row or
columns
• Let see
If NaN
Axis=1
sum()
• It return sum of the given values
• Let see Sum Based on axis
quantile()
var() function
• It compute variance and returns unbiased
variance over requested axis.
• dataframe.var(axis=None,skipna=None)
• Applying function on a subset of dataframe.
• Applying function on column of a DataFrame
• dataframe[column name].function()

• Let see
Applying function on multiple columns
• dataframe[[column name, column name,….]]
• Let see
Function on row
• dataframe.loc[row index,:].function
• Let see
Function on Range of row
Upper limit will not be ignored
While applying function on rows
In DataFrame
Function on subset
• dataframe.loc[start row : end row, start
column : end columns]
• Let see
Function on subset
Advance operation on DF
• Pivoting
• Sorting
• Aggregation
Advance operation on DF
• Pivoting
• It rearrange the data form rows and columns
by possibly aggregating data
• Based on following points
• 1. summarises extensive data
• 2. it rotate data into rows to columns
• Reshape data (produce a “pivot” table) based
on column values.
• Now once the data is represented in
DataFrame we can pivot it using function
pivot()
• Let see-->
• dataframe.pivot(index=column name,column
ns=column name, values=column name)
Pivoting based on
Country
Pivoting based on
tutor
• The result of pivoting has the index rows as
per index argument columns as per the values
of columns arguments.
• If entry do not have match NaN will display
• Some time it will show error
• When index contain duplicate values
• Let see
Index values
Get
Repeated
So it will
create
error
• Let us consider one tutor say km
• And entries are like

• Now if you try to create a row for the tutor km


from above data with columns as country how
would u do
• india
• km 12 12 Multiple entries for columns for a single
Row error in pivoting
Will cause error
pivot_table function
• When you are not able to pivot dataframe
• With same values for row and columns then
you can use the another function pivot_table()

• It will aggregate the multiple values using


aggregate function like sum count mean etc
• Let see
Now we applied aggregate function
It will produce result without error.
Note: If no aggregate function specify it will show mean or average of the multiple
Values having the same row and columns combination.
We can aggregate the result using
more than one field also.
It will show error because it
Is a key for pivoting table
Quiz

1. Compute total
passengers per year
2. Compute
Average passengers
Per month
Sorting
• Arranging values in particular order.
• df.sort_values(by,
axis=0,ascending=True,inplace=False,kind=‘qui
cksort’,na_position=‘last’)
Aggregation
We have already learned about aggregate function
in previous session.
count()  it will count the number of elements
sum() it will count the sum of all elements
mean()  find average or mean of the items
median() find the middle value in elements
min() will find the min values in the elements
max()  find the max values in the elements
var()  find the variance of the elements
mad() function
• It use to calculate the mean absolute
deviation of the values for the requested axis.
• It is a set of data average distance between
each data value and the mean.
• df.mad(axis=None,skipna=None)
• Let see
mad function

Krishan meena
PGT Comp. Sci
Mean Absolute Deviation
• The Mean Absolute Deviation (MAD) of a set of
data is the average distance between each data
value and the mean.
• The steps to find the MAD include:
1. find the mean (average)
2. find the difference between each data value
and the mean
3. take the absolute value of each difference
4. find the mean (average) of these differences
• In mathematics, the absolute value or
modulus |x| of a real number x is the non-
negative value of x without regard to its sign.
Namely, |x| = x for a positive x, |x| = −x for a
negative x, and |0| = 0. For example, the
absolute value of 3 is 3, and the absolute
value of −3 is also 3.
• Let understand with simple example
• Data Set: {92, 83, 88, 94, 91, 85, 89, 90}
Find the Mean Absolute Deviation.
Step 1
Step 2

Find sum
Step 3

=89
Step 4
-3
=6
=1
=-5
=-2
=4
=0
=-1
Finally
Absolute values
Step 5
• Find the sum and average of the Absolute
value and it will be the mad of data set let
see
It is the mad() of the data set
Quiz
Create data frame and find the mad() of the below data frame?
Quiz-2
Create data frame and find the mad() of the below data frame?
Quiz-3
find the mad() based on axis of the below data frame?
std() function
The Standard Deviation is a measure of how spread out numbers are.
Its symbol is σ (the greek letter sigma)
The formula is easy: it is the square root of the Variance.
Variance
• The Variance is defined as:
• The average of the squared differences from the Mean.
• To calculate the variance follow these steps:
• Work out the Mean (the simple average of the
numbers)
• Then for each number: subtract the Mean and square
the result (the squared difference).
• Then work out the average of those squared
differences.
• Let see with example
Simple exmaple
Step 2
Step 3

Data - mean
1-3=-2
Step 4

Square of the
Each numbers

(-2)*(-2)=4
Step 5

Now from total values


We will minus 1. As we
Are having total values
=5-1=4
Step 5

From the previous result


4 we will divide the square
Data sum which is 16
16/4=4
Step 5

Finally we got 4
Now the square root for 4

√4
Will result 2
This is the standard
deviation
Histogram
• Histogram: a graphical display of data using
bars of different heights.
• A histogram is a plot that lets you discover and
show underlying frequency distribution of a
set of continuous data.
• The hist( ) function use to plot or create
histogram.
• We can use PyPlot libray to create histogram.
The above data set containing the age of 20 people. When we create histogram
There will be no gap between bars this is because a histogram represent a continuous
Data set and such there are no gaps in the data.
The above data set containing the age of 20 people. When we create histogram
There will be no gap between bars this is because a histogram represent a continuous
Data set and such there are no gaps in the data.
structure
• DataFrame.hist
• (column=None,by=None,grid=True,bins=10)
• Let see the example----
example
• Data=7,7,7,8,8,8,8,8,9,10,10,10,11,11,12,12,12,13
• Let find the frequency

This is frequency
• 3
• 5 Max frequency

• 3
• 2
• 3
• 1
=9
Function application
• It can use to solve many data calculation task.
• It can apply on whole data frame
• It can be apply row wise or column wise
• It can be apply on individual element.
• Three function 
• 1) pipe( )
• 2) apply( )
• 3)applymap( )
pipe( )
• Dataframe wise function application:
• It will take result as input for another function
Example
• power(sqrt(n),2)
• It mean
• sqrt(n)power(2)
• In pipe we are making the chain of functions
in order they are executed
• This we want to execute:
• df.add(div(power(sqrt(n),2),3),100)
• df.add(div(power(sqrt(n),2),3),100)
• df.pipe(sqrt,n).pipe(power,2.pipe(div,3).pipe(a
dd,100)
Quiz
• Using the same data apply pipe function add()
• Followed by multiply(), followed by sqrt(), and
floor() on data
• Add 5 divide by 2 subtracting by 2
The apply() function
• apply() and applymap() will be clubbed
together because it is easy to understand
• apply() is a series function
• It apply on one row or one column of data
frame
• applymap() function is an element function it
apply on each individual element separately.
By default axis for apply() is 0

apply () function apply on each columns


Considering each column is as series
For
qtr 1=12+14+16=42/3=14
applymap() function calculating result
On each individual values without
Considering any other elements.
numpy.cumsum()
• It can be clubbed with apply() function
• Let see
12 14 11
12+14=26 14+16=30 11+22=33
26+16=42 30+18=48 33+33=66

apply() apply on columns


But applysum() apply on each elements
• Note:
• For apply() All the function should be array or
series function that work on series type
object.
groupby() function
• Within a data frame based on field’s values we
can group data.
groupby() Putting data into group by name
It will not display any result

a will be a group
b will another group
c will be another group
groups showing data into groups by name
And index(how many data can be grouped)
and data types.

a will be a group
b will another group
c will be another group

get_group(‘group name’)showing
data into groups by with all data falling
In that group.
• The groupby command can be use with axis
also by default it will be 0.
• groupbyobject.count()  this function will be
use to count the non NaN in group.
Without NaN

With NaN

It will not count NaN


groupbyobject.size()
• Will count non NaN values of each columns in
groups.
Grouping on multiple columns
• df.groupby([‘col’,’col’]) let see
Aggregate with groupby
Quiz
transform ()
Transform function
• Transform also calculate same as agg function
but it repeat result again and again.
Quiz
Reindexing and altering labels
• When u create data frame it gets row number
and columns labels automatically.
• If u want to change row index and columns
index as per your own labels
• 1) rename() it rename the index and columns
• 2) reindex() it can specify new order of index
and columns
• 3) reindex_like() it can create index and
columns based on other data frame.
rename()
Old column name

New column name


reindex()
reindex_like()
• Same matching row and columns must
required
Data visualization
• Representing data in graphical way using
chart, graphs and maps etc.
• We will learn how to plot data using PyPlot
and matplotlib library which allow us to plot
2D figures.
• We have to install anaconda.
• We have to use anaconda navigator.
• Let experience
Must import matplotlib.pyplot
Vertical axis

Horizontal axis
• Let us learn how to label the axes.
• xlabel(“some values”)
• ylabel(“some values”)
• plot.(x,y) these all command must execute
together without pressing shift+ enter.

• Let see
• `
Quiz
write code for the following line chart
Applying various settings in plot()
function
• Color
• Marker size
• Marker type
color
• r for red
• g for green
• b for blue
• m for magenta
• y for yellow
• k for black
• c for cyan
• White for w
Changing line type, size and color
Quiz
Quiz
Scatter chart
• It can be create using two ways
• 1)plot() function
• 2) scatter() function
• Let see
This is a scatter chart
Using .plot() function
Using scatter
Size and marker

For size
Size and marker

For color
We can use
c
Bar charts
• Graphical display of data using bar graphs.
• Using bar(x,y) function
• Let see
Width
• Bar chart will take by default width
• It can be user defined also
• It can apply on all bars together
• It can be apply different on different bars also
• plt.bar(x,y,width=1/2)
• plt.bar(x,y,width=1/4)
• Let see
For individual bars
Bar color
Different color for different bars
Creating multiple bar charts
It generate 0,1,3
quiz
• A list is having 3 list inside it. It contain
summarised data of three list inside it. Create
bar chart keep the width of each bar as 0.25.
Horizontal Bar

barh keyword use


To create harizontal
bar
Anatomy of a chart
• Title : a text that appear on the top of bar
chart.
• Legends: these are the different color that
define different set of data plotted on chart.
Adding title
Setting xlimit and ylimit
• It take automatically
• User defined is also possible
• It can be define using xlim and ylim keyword
• Let see
While setting up
the limit for axes
you must keep
in mind that
only the data
falls into limit
will plotted rest
will not show in
the plot.
Beyond the limit
Setting ticks
Adding legends
• When we need to plot multiple ranges on a
single plot legends are compulsory.

Labels not
defined
Saving figure

plt.savefig(‘F:\kk\data.pdf’)

.pdf and .jpg also can be saved


Histogram using pyplot
• Using same library
• import matplotlib.pyplot as plt
• plt.pyplot.hist(x,bins=none,cumulative=False,
histtype=‘bar’,align=‘mid’,orientation=‘vertical
’)
Frequency polygon
• Number of observations is marked with single
point a the mid point of an interval and a
straight line then connect each set of point.
• 1) plot histogram
• 2) mark single point at midpoint of an interval
• 3)draw straight line to connect points
• Remember there is no function to do the
same it is manual process.
• Let see
Box plot
Box plot
Box plot

Arranging data in ascending order


Finding median value for notch
Min value for start point
Max value for end point
Ascending order of data y axis
Steps to draw box plot
• The minimum range
• The max range
• The upper quartile
• The lower quartile
• The median
• Let see
data=[2,3,15,6,4,12,1,11]

Data after sorting

Min value
data=[2,3,15,6,4,12,1,11]

Max value

lower quartile
2+3/2=2.5
data=[2,3,15,6,4,12,1,11]

Max value

notch
4+6/2=5
data=[2,3,15,6,4,12,1,11]

Max value

Upper quartile
11+12/2=11.5
data=[2,3,15,6,4,12,1,11]

Max value

Max value
example 2
Quiz
• ary=[5,20,30,45,60,88,100,140,150,200,240]
• Draw a box plot.
Software
• A set of instruction
• It can be very simple program
• It can be a complex program also
• To develop large complex program, software
system require engineering approach.
What is software engineering?
• The structured approach to designing, building and
maintaining large software called software
engineering.
• Need for software engineering:
• Correct specifications: must be in a scientific manner
• Scalability scope: must be updatable or editable
• Cost control: it must be in budget even after delivery
and maintains must be cheaper.
• Quality: must be useful for user based on correct
specification.
What is software process?
• Set of logically related activities for deliver
software in time.
• 1. software specification: as per the
customer’s requirements and expectations.
• 2. software design and implementation: as per
specification
• 3. software verification and validation: it work
as per design as per all the specification.
• 4. software evolution: it include the scalability
after delivery.
Software process model
• 1. the waterfall model
• 2. evolutionary model
• 3. component based model
Waterfall model
• In a waterfall model, each phase must be
completed before the next phase can begin
and there is no overlapping in the phases.
• Waterfall approach was first SDLC Model to be
used widely in Software Engineering to ensure
success of the project.
The sequential phases in Waterfall
model are −
• Requirement Gathering and analysis − All possible requirements of the system to
be developed are captured in this phase and documented in a requirement specification
document.
• System Design − The requirement specifications from first phase are studied in this
phase and the system design is prepared. This system design helps in specifying hardware
and system requirements and helps in defining the overall system architecture.
• Implementation − With inputs from the system design, the system is first developed in
small programs called units, which are integrated in the next phase. Each unit is developed
and tested for its functionality, which is referred to as Unit Testing.
• Integration and Testing − All the units developed in the implementation phase are
integrated into a system after testing of each unit. Post integration the entire system is tested
for any faults and failures.
• Deployment of system − Once the functional and non-functional testing is done; the
product is deployed in the customer environment or released into the market.
• Maintenance − There are some issues which come up in the client environment. To fix
those issues, patches are released. Also to enhance the product some better versions are
released. Maintenance is done to deliver these changes in the customer environment.
Waterfall Model - Advantages
• Simple and easy to understand and use
• Easy to manage due to the rigidity of the model. Each
phase has specific deliverables and a review process.
• Phases are processed and completed one at a time.
• Works well for smaller projects where requirements
are very well understood.
• Clearly defined stages.
• Well understood milestones.
• Easy to arrange tasks.
• Process and results are well documented.
Disadvantage
• No working software is produced until late
during the life cycle
• It is difficult to measure progress within
stages.
• Tuff to estimate time and cost.
V model
• The V-model is an SDLC model where
execution of processes happens in a
sequential manner in a V-shape. It is also
known as Verification and Validation model.
• The V-Model is an extension of the waterfall
model
• The following pointers are some of the most suitable
scenarios to use the V-Model application.
• Requirements are well defined, clearly documented
and fixed.
• Product definition is stable.
• Technology is not dynamic and is well understood by
the project team.
• There are no ambiguous or undefined requirements.
• The project is short.
V-Model - Pros and Cons
• his is a highly-disciplined model and Phases
are completed one at a time.
• Works well for smaller projects where
requirements are very well understood.
• Simple and easy to understand and use.
• Easy to manage due to the rigidity of the
model. Each phase has specific deliverables
and a review process.
• High risk and uncertainty.
• Not a good model for complex and object-
oriented projects.
• Poor model for long and ongoing projects.
• No working software is produced until late
during the life cycle.
Evolutionary model
• It also called prototype model
• Initial software developed and given to user
for work and receiving feedback for changes.
• This process repeated until full fledged
software is developed.
• It is suitable for high technical risk project
where time is aggressive.
Advantage and disadvantage
It reduce risk of failure Changes suggestion are too many disturb
the development process
User feed back is available at an early for It increase complexity of development
better solutions
Types of evolutional models
• 1. exploratory programming: explore
customers requirements and deliver a final
system. The software add new feature based
on users feedback.
• 2. throwaway prototype: a trail development
for user experiments.
Component based model
• It will develop after one has determined the basic
software specification for the new system.
• Using followings phase or steps:
• 1) component analysis: from available software a
search made for specification for new software.
• 2)requirements modification: the software
requirements are re-analysis for modification
• 3) system design with reuse: it use reusable
components from existing software for new.
• 4) development and integration: in this phase a fresh
software component take place which no reusable
component.
Existing software
Advantage and disadvantage

It reduce the amount It lead the system that does not need of
of software users
development

Faster deliver of the system Requirements may be compromised


Delivery model
• Once a software deliver it mean it will not a
final delivery. Changes do occur and
unavoidable.
• Like growth of users
• Policies may changes
• Technological changes
• Etc.
• Delivery model interactively update system for
changes.
• Two such delivery model
• 1) incremental delivery model: it is development
and delivery model. It combines the strengths of
waterfall model and evolutionary model.
• 2)spiral development model: it has many cycles.
It combines the advantages of waterfall model
and prototype model. It is suitable for large and
complex software.
Agile methods of software engineering
• Agile : quick and well coordinated movement
• It was developed in 1990 with set of rules
called agile manifesto.
• 1)individuals and interactions: self
organization and motivation are important for
individuals and interactions will be with the
help of other programmer side by side.
• 2) working software: compare to document
working software will attract clients.
• 3) customer collaboration: requirements
cannot be fully collected at the beginning of
the software development life cycle (SDLC)
continuous customer involvement is very
important.
• 4)responding to change: it focus on quick
responses on changes.
Pair programming
• An agile practice where two programmers
work side by side at one problem at the same
computer discussing on the same design,
coding and testing.
• The keyboard owner called driver and the
other partner called navigator.
• Advantages:
• Better code
• Collective code ownership
• Increased discipline
• Team bonding
• Disadvantage:
• Disagreement may occur
• Absence of partners
• Rushing
Scrum

Вам также может понравиться