Академический Документы
Профессиональный Документы
Культура Документы
Expected Outcome Upon Completion of the course, the students will be able to
Build a sample search engine using available open source tools
Describe the browser security model in web security
Identify the different components of a web page that can be used for
mining
Apply machine learning concepts to web content mining
Implement Page Ranking algorithm and modify the algorithm for mining
information
Design a system to harvest information available on the web to build
recommender systems
Analyse social media data using appropriate data/web mining techniques
Modify an existing search engine to make it personalized
7 QUERY PROCESSING
Relevance Feedback and Query Expansion - Automatic Local and
3 11
Global Analysis – Measuring Effectiveness and Efficiency
8 Recent Trends 2
7. To implement the effective compression schemes for storing the data using
less storage space.
Search engine would scan every document in the corpus through
indexing. The indexed documents should be compressed in effective manner.
8. To develop the effective query refinement mechanism based on query
algebra.
Query expansion (QE) or refinement is the process of reformulating a
seed query to improve retrieval performance. In the context of search engines, query
expansion involves evaluating a user's input (what words were typed into the search
query area) and expanding the search query to match additional documents.
9. Personalize the search engine.
A web search engine is a software system that is designed to search for
information on the World Wide Web. Personalize the search engine for kids, to list
only research articles, image, and so on.
Hierarchies are becoming ever more popular for the organization of text
documents, particularly on the Web. Web directories and Wikipedia are two
examples of such hierarchies. Along with their widespread use comes the need for
automated classification of new documents to the categories in the hierarchy. As the
size of the hierarchy grows and the number of documents to be classified increases, a
number of interesting machine learning problems arise. In particular, it is one of the
rare situations where data sparsity remains an issue, despite the vastness of available
data: as more documents become available, more classes are also added to the
hierarchy, and there is a very high imbalance between the classes at different levels
of the hierarchy
13. Company Web
Given the data related to current employees and their provisioned access, models can
be built that automatically determine access privileges as employees enter and leave
roles within a company. These auto-access models seek to minimize the human
involvement required to grant or revoke employee access. The model will take an
employee's role information and a resource code and will return whether or not
access should be granted.
Text Books
1. Bing Liu, “ Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric
Systems and Applications)”, Springer; 2nd Edition 2010
2. Zdravko Markov, Daniel T. Larose, “Data Mining the Web: Uncovering Patterns in Web Content,
Structure, and Usage”, John Wiley & Sons, Inc., 2012
Reference Books
1. Guandong Xu ,Yanchun Zhang, Lin Li, “Web Mining and Social Networking: Techniques and
Applications”, Springer; 1st Edition.2010
2. Soumen Chakrabarti, “Mining the Web: Discovering Knowledge from Hypertext Data”, Morgan
Kaufmann; edition 2012
3. Adam Schenker, “Graph-Theoretic Techniques for Web Content Mining”, World Scientific Pub Co
Inc , 2015
4. Min Song, Yi Fang and Brook Wu, Handbook of research on Text and Web mining technologies,
IGI global, information Science Reference – imprint of :IGI publishing, 2011.
Web Mining
Knowledge Areas that contain topics and learning outcomes covered in the course
This course is a
Elective Course.
Suitable from 4th semester onwards.
Knowledge of basic mathematics is essential.
This Course is designed with 100 minutes of in-classroom sessions per week, 60 minutes of
video/reading instructional material per week, 100 minutes of lab hours per week, as well as
200 minutes of non-contact time spent on implementing course related project. Generally this
course should have the combination of lectures, in-class discussion, case studies, guest-lectures,
mandatory off-class reading material, quizzes.
Additional weightage will be given based on their rank in crowd sourced projects/ Kaggle
like competitions.
Additional topics
[List notable topics covered in the course that you do not find in the CS2013 Body of
Knowledge]
Other comments
[optional]
Session wise plan
Student Outcomes Covered: 2, 11, 14, 17
45 Hours (3
Credit hours
/week 15
Weeks
schedule)