Академический Документы
Профессиональный Документы
Культура Документы
Why Python?
Python is fast as compared to other programming languages like Java C++.
Unlike in Java ,C++ there is no need to compile python program before
running
Python interpreter handles the compilation process in background
Python Libraries for Machine Learning
Some basic libraries in python specifically used in machine learning are:
• Pandas
• Numpy
• Matplotlib
• Sklearn
• NLTK
• OpenCv
What is Machine Learning
Machine learning is a subset of artificial intelligence in the field of
computer science that often uses statistical techniques to give computers
the ability to "learn" (i.e., progressively improve performance on a specific
task) with data, without being explicitly programmed.
Classification in Supervised
Machine Learning
PROJECT
Connecting Hearts
What is Connecting Hearts
The project addresses the data analysis for analyzing the basic dating
habits of an individual. The project is 'data centric' i.e. all of the analysis,
results and conclusions are based on the provided data .
All the people involved in the competition play an important role in
finding out the basic ideas they have in mind while searching for a mate
and what are the odds of finding love as desired.
Project is been created in 3 phases
o Data Pre-processing
o analysis
o Visualisation
Dataset
The project "CONNECTING HEARTS " is a product of an analysis of a
dataset compiled by Columbia business school professors Ray Fisman and
Sheena Iyengar for their paper Gender Differences in Mate Selection:
Evidence from a Speed Dating Experiment.
The Dataset compiled gives us an overview of what the dating are all about
as it is collected by a very large scale survey in which all the participants
were asked to rate the six attributes:-Attractiveness, Sincerity, Intelligence,
Fun ,Ambition and shared interests
VISUALISATION :
Here we analysed how often do they go out. We analysed how much they go out
Like some people go out like once a month, some go out several times a week while some people almost never go out.
IN WHICH ACTIVITY THEY ARE INTRESTED
Here we analyzed in which activity our participates are mostly interested .every participant were asked to rate
some mentioned activities out of 10. Visualization of this is given as
Conclusion
In the dataset we have found that the general length of a spam is 145 to 160
characters , however exceptions were found.
The general word count of sms was found to be 22 to 34 words however
exceptions were found.
Some general keywords like Free,Off,Call,Now,Win and many more were
found after applying natural language processing.
Confusion Matrix (By Length) Confusion Matrx (By Confusion Matrix (NLTK)
Word Count)
Future Scope
In future we will try to connect it with of inboxes of mobile phones and other
messengers present.
Try to improve the accuracy which is 89.2% till now