Вы находитесь на странице: 1из 25

INTRODUCTORY DATA SCIENCE

AND MACHINE LEARNING


Why Should You Know About Data
Science?
Applications of Data
Science
• Gaming, Recommendation
Systems, Fraud Detection
• Self Driving Cars
• Virtual Assistants
• YouTube Algorithms
• Cancer Research
Roadmap

Learn few
Discussion
Machine
about Data Lets do
Learning
Science, something
algorithms
Machine great together
and Python
Learning
Libraries
What is Data Science?

■ Combines Fields of Statistics and


Computer Science
■ Common Subdomains and
applications of Data Science are
ML/AI, Data Mining, Data
Warehousing, Big Data etc.
General tasks in Data Science

Exploratory
Data
Preprocessing Analysis and
Collection
Visualization

Stats/Machine Prediction,
Learning Correction and
Models Refining
Data Science Tools
Introduction to Machine Learning
Machine Learning Tasks
Supervised Learning

Classification Algorithms Regression Algorithms


■ K Nearest Neighbors ■ Linear Regression
■ Support Vector Machines ■ Polynomial Regression
■ Logistic Regression ■ Ridge Regression
■ Decision Trees
Many More…
Unsupervised Learning

■ Clustering :
■ K-Means Clustering (Spherical Clusters)
■ Hierarchical Clustering
■ DBSCAN (Density Based Clustering)
K Nearest Neighbors (KNN)
Linear Regression
Decision Trees
What is Deep Learning?

■ Deep Learning makes use of Deep


Neural Networks
■ A Neural Network generally contains 3
types of layers: Input Layer, Hidden
layer, Output Layer.
■ A Deep Neural Network contains a
large number of hidden layers
■ Nodes are connect with Weights and
Biases. Each node is activated if the
activation function produces a value
greater than the threshold value
PYTHON DATA SCIENCE
LIBRARIES
Libraries

■ Numpy – Fast array operations


■ Pandas – Data Processing and spreadsheets
■ Matplotlib and Seaborn – Data Visualization
■ Sklearn (Scikit-Learn) – Machine Learning Algorithms
Iris Dataset

■ Hello World of classification problems


■ Task is to classify flower species based on parameters
Numpy

■ Array computation library


■ A=Np.array()
■ A.mean() A.sum() A.max() A.min()
■ A.size A.shape A.ndim
■ Operations on np array occurs element wise unlike lists
Pandas

■ Data Processing Library


■ Df=pd.DataFrame(dictionary/np array)
■ Df.shape, df.ndim, df.size, df.describe()
■ Pd.read_csv()
■ Df.info()
matplotlib

■ Data Visualization Library


■ Pyplot.plot(x,y,’colorcode shape’)
■ Pyplot.xlabel()
■ Pyplot.ylabel()
■ Pyplot.title()
■ Pyplot.legend()
Seaborn

sns.lmplot(x = col1, y = col2, data = dataset, hue = target, fit_reg = True)


sns.countplot(x,data)
sklearn

■ Fit(x,y)
■ Predict(x)
■ accuracy_score(testdata,prediction)*100
LETS IMPLEMENT THESE
CONCEPTS PRACTICALLY
Where to get data from?

■ Kaggle.com – Data Science Community (14K Plus)


■ UCI ML Repository (ML Friendly Data)
■ Data.world (dataset aggregator)
■ Datasets subreddit
■ https://registry.opendata.aws
■ https://github.com/awesomedata/awesome-public-datasets
■ Making or collecting your own – Web Scraping, Web Crawling, Survey using Google
Forms etc.
Where to learn more? (Good Resources)

■ Books – ML for absolute beginners, Hands on ML using scikit-learn and tensorflow,


Python data science handbook and more.
■ Websites – Kaggle.com, towardsdatascience.com, medium.com, analyticsvidya.com,
kdnuggets.com, machinelearningmastery.com
■ Courses: edx.org, coursera.org, cognitiveclass.ai, datacamp.com (3 Free courses
and cheatsheets)
■ Youtube: Deep Learning TV, Google’s Series on ML, Siraj Raval, csdojo

Вам также может понравиться