Вы находитесь на странице: 1из 437

Dept: Computer

Class: BE
Sem: I

ACADEMIC BOOK

Contents:

 High Performance Computing


 Artificial Intelligence and Robotics
 Data Analytics
 Elective I- Data Mining and Warehousing
 Elective II- Mobile Communication

Academic Year: 2019-20


Sir Visvesvaraya Institute of Technology , Nashik

Vision
“To provide quality technical education in rural area to
create competent human resources.”

Mission
“Committed to produce competent engineers to cater
the needs of society by imparting skill based education
through effective teaching learning process.”

Computer Engineering Department

Vision
“Develop Department of Computer Engineering into
centre of excellence through imparting technical
education of international standards and research in the
field of Computer Engineering.”

Mission
“To provide quality engineering education to the
students through state of art education in Computer
Engineering.”
Pravara Rural Education Society’s
Pravara Technical Education Campus
Sir Visvesvaraya Institute of Technology, Nashik
Academic Calendar SE to BE - 2019-20 (Sem-I)
Week Days No. of
Week
Month Working Events
No. Mon Tue Wed Thu Fri Sat
Days
1 1 -- 05- Ramjan Id - Holiday
2 3 4 5 6 7 8 -- 6-8- Administrative Audit (IQAC)
3 10 11 12 13 14 15 -- 10- Principal, HOD & Deans Meeting
4 17 18 19 20 21 22 6 11-12- Orientation program by the faculty department
June- level
19 14- Orientation program for faculty Institute level
17- Commencement of Teaching SE to BE
5 24 25 26 27 28 29 6 17- Project/ Mini Project/ Internship presentation by the
students of all Department
21-International Yoga Day
5- Earn and Learn students selection
7 1 2 3 4 5 6 5 12- Ashadi Ekadashi – Holiday
13-Seminar on Rules and Regulations for woman at
8 8 9 10 11 12 13 5 work place
15-19 -1st Industrial Visit week
9 15 16 17 18 19 20 5 20- 1st Display & submission of Academic &
Attendance Defaulter list to dean office.
22-24- Academic Audit
10 July- 22 23 24 25 26 27 6 22-27 1st Assignment week
19 22- Collection of application for Student council 2019-
20
25- Students Feedback- I
27- HR Meet (Training & Placement)
11 29 30 31 3 27- Principal, HOD, Deans Meeting
29- First year Induction Program
29-02-Class Test –I
31-Student Council 2019-20 selection
31- Mentoring report by the department
1-2- 1st Project Evaluation
12 1 2 3 2 05-10 2ndIndustrial Visit week
8- Display of Class Test-I Marks
10- Alumni Meet
13 5 6 7 8 9 10 6 10- Pleasure Trip for Pravara Technical Campus staff
12- Bakari- Id- Holiday
14-Late Padmashri Dr.VitthalraoVikhePatil Jayanti.
14 12 13 14 15 16 17 3 15- Independence day celebration
15- Student Council Meeting
15- Principal, HOD & Deans Meeting
15 Aug.- 19 20 21 22 23 24 6 17- Parshi (New Year)- Holiday
19
20- 2nd Display & submission of Academic &
Attendance Defaulter list to dean office.
19-24-University Insem Exam tentative (SE, TE & BE)
22-Bakri- Id- Holiday
26-31 2nd Assignment week
16 26 27 28 29 30 31 6 26-31- Foot Prints- Sports Event
26-31- 1st makeup classes
29-31- Parents Teacher Interaction Meet
30 – 2nd Project Evaluation
31- Mentoring report by the department
2- 12 Ganesh Utsav (Pravarecha Raja)
17 2 3 4 5 6 7 3 3- Student Council Meeting
5- Teacher's Day celebration & Accolade 2K19
10-Moharum- Holiday
18 9 10 11 12 13 14 5 14- Engineers day celebration
16-20- Class Test -II
20 -3rd Display & submission of Academic &
19 Sep.- 16 17 18 19 20 21 5 Attendance Defaulter list to dean office.
19 23-28 3rd Industrial Visit week
24- Student Council meeting
20 23 24 25 26 27 28 6 26- Display of Class Test-II Marks
27- Students Feedback- II
27- Student Council Meeting
21 30 1 28- Principal, HOD & Deans Meeting
29- 3rd Project Evaluation
30- Mentoring report by the department
22 1 2 3 4 5 4 1- 5 3rdAssignment week
2- Mahatma Gandhi Jayanti- Holiday
23 7 8 9 10 11 12 5 07-12 Preliminary Exam (SE to BE)
8- Dasara – Holiday
24 14 15 16 17 18 19 3 12- SE final Submission
14-TE final Submission
25 Oct.- 21 22 23 24 25 26 -- 15- BE final Submission
19 16- Display of Preliminary Exam Marks
16- Display & submission of Academic &
Attendance Defaulter list to dean office.
26 28 29 30 31 -- 16- Last day of term/ conclusion of teaching
18-05- University Oral/Practical Exam (SE to BE)
28- Diwali (Bali Pratipada)- Holiday
29- Bhaubij- Holiday
27 1 2 -- Cont. University Oral/Practical Exam (SE to BE)
28 4 5 6 7 8 9 -- 11- Datta Jayanti- Holiday
Nov.- 12- Gurunanak Jayanti- Holiday
29 11 12 13 14 15 16 --
19 14-7th Dec. University End Sem Examination (SE to
30 18 19 20 21 22 23 --
BE)
31 25 26 27 28 29 30 --
Total Working Days 91
Conducting the Aptitude and doing its analysis. FE to
BE
Mentor Meetings FE to
BE
Technical Expert Lectures and Soft Skills Training FE to Continuing Process
BE
Conducting the Technical Interviews, Personal
BE
Interviews.
Colour Index
Working days with activity Working teaching days University Exam Days Holidays

Industrial Visits, Expert lectures & Other activities will be conduct in each month of July to Sept. 2019
Faculty of Engineering Savitribai Phule Pune University

Savitribai Phule Pune University


Fourth Year of Computer Engineering (2015 Course)
(with effect from 2018-19)
Semester I
Course Course Teaching Scheme Examination Scheme and Marks Credit
Code Hours / Week
Theory Practical In- End- TW PR OR/ Total TH/ PR
Sem Sem *PRE TUT
410241 High Performance 04 -- 30 70 -- -- -- 100 04 --
Computing
410242 Artificial 03 -- 30 70 -- -- -- 100 03 --
Intelligence and
Robotics
410243 Data Analytics 03 -- 30 70 -- -- -- 100 03 --
410244 Elective I 03 -- 30 70 -- -- -- 100 03 --
410245 Elective II 03 -- 30 70 -- -- -- 100 03 --
410246 Laboratory -- 04 -- -- 50 50 -- 100 -- 02
Practice I
410247 Laboratory -- 04 -- -- 50 -- *50 100 -- 02
Practice II
410248 Project Work -- 02 -- -- -- -- *50 50 -- 02
Stage I
16 06
Total Credit
Total 16 10 150 350 100 50 100 750 22
410249 Audit Course 5 Grade
Elective I Elective II

410244 (A) Digital Signal Processing 410245 (A) Distributed Systems


410244 (B) Software Architecture and Design 410245 (B) Software Testing and Quality Assurance
410244 (C) Pervasive and Ubiquitous Computing 410245 (C) Operations Research
410244 (D) Data Mining and Warehousing 410245 (D) Mobile Communication

410249-Audit Course 5 (AC5) Options:

AC5-I Entrepreneurship Development AC5-IV: Industrial Safety and Environment Consciousness


AC5-II: Botnet of Things AC5-V: Emotional Intelligence
AC5-III: 3D Printing AC5-VI: MOOC- Learn New Skills
Abbreviations:

TW: Term Work TH: Theory OR: Oral PR: Practical


Sem: Semester *PRE: Project/ Mini-Project Presentation

Syllabus for Fourth Year of Computer Engineering ` #4/87


Pravara Rural Education Society's
Sir Visvesvaraya Institute Of Technology, Nashik
Time Table
Department Of Computer Engineering
B.E. (Odd Semester)
Session: 2019-20

09.30 – 10.30 - 13.15 - 14.15 – 15.15 - 16.15 –


Day's 11.30 - 12.30
10.30 11.30 14.15 15.15 16.15 17.15
B1- 410246
B2- 410247
MONDAY 410242 410245(D) APTITUDE 410243 410241
B3- 410246
B4- 410247
B1- 410247
B2- 410246
TUESDAY 410244(D) 410242 LIBRARY 410243 410241
B3- 410247
B4- 410246
B1- 410246
B2- 410247
WEDNESDAY 410245(D) 410244(D) APTITUDE 410243 410241
B3- 410246
B4- 410247
B1- 410247
PERSONALITY B2- 410246
THURSDAY 410245(D) 410241 410242 410244(D)
DEVELOPMENT B3- 410247
B4- 410246
FRIDAY GATE TRAINING GATE TRAINING GATE TRAINING
LECTURE LECTURE MENTOR AUDIT
SATURDAY 410248
SERIES SERIES ING COURSE

Ms. Prachi S. Tambe 410241 High Performance Computing


Mr. Devidas S. Thosar 410242 Artificial Intelligence and Robotics
Mr. Ravindra B. Bhosale 410243 Data Analytics
Ms. Priyanks S. Hase 410244 Elective II Data Mining and Warehousing
Ms. Puja A. Cholke 410245 Elective II Mobile Communication
Mr. Devidas S. Thosar (B1, B2)
410246 Laboratory Practice I
Ms. Prachi S. Tambe (B3, B4)
Ms. Puja A. Cholke (B1, B2)
410247 Laboratory Practice II
Ms. Priyanks S. Hase (B3, B4)
Ms. Prachi S. Tambe 410248 Project Work Stage I
Ms. Puja A. Cholke 410249 Audit Course 5
Subject – 1
High Performance Computing (410241)
B. E. (Even Semester), Session 2018-2019
Scheme, Syllabus and Evaluation Guidelines, Of “High Performance
Computing (410241)”

Course Course Name Lectures Assigned


Code
410241 High Performance Theory Practical Tutorial Total
Computing 4 4 - 8

Course Code Course Examination Evaluation Scheme


Name Theory Practical
Internal Assessment
Ext
TestAvg of Besttw oClass

Teacher assessment

Insem Exam
Class Test 1

Class Test 2

Attendance

End Sem.Exam

External
Internal
Prelim

Total

Total
410241 High 20 20 70 20 5 5 30 70 100 - 50 150
Performance
Computing
High Performance Computing
Course Contents
Unit -1 : Introduction 09 Hours
Motivating Parallelism, Scope of Parallel Computing, Parallel Programming Platforms: Implicit
Parallelism, Trends in Microprocessor and Architectures, Limitations of Memory, System
Performance, Dichotomy of Parallel Computing Platforms, Physical Organization of Parallel
Platforms, Communication Costs in Parallel Machines, Scalable design principles, Architectures:
N-wide superscalar architectures, Multi-core architecture.

Unit – 2:Parallel Programming 09 Hours


Principles of Parallel Algorithm Design: Preliminaries, Decomposition Techniques, Characteristics
of Tasks and Interactions, Mapping Techniques for Load Balancing, Methods for Containing
Interaction Overheads, Parallel Algorithm Models, The Age of Parallel Processing, the Rise of
GPU Computing, A Brief History of GPUs, Early GPU.

Unit – 3 : Basic Communication 09 Hours


Operations- One-to-All Broadcast and All-to-One Reduction, All-to-All Broadcast and Reduction,
All-Reduce and Prefix-Sum Operations, Scatter and Gather, All-to-All Personalized
Communication, Circular Shift, Improving the Speed of Some Communication Operations. .

Unit -4 :Analytical Models of Parallel Programs 09 Hours


Analytical Models: Sources of overhead in Parallel Programs, Performance Metrics for Parallel
Systems, and The effect of Granularity on Performance, Scalability of Parallel Systems, Minimum
execution time and minimum cost, optimal execution time. Dense Matrix Algorithms:
MatrixVector Multiplication, Matrix-Matrix Multiplication.

Unit – 5 :Parallel Algorithms- Sorting and Graph 09 Hours


Issues in Sorting on Parallel Computers, Bubble Sort and its Variants, Parallelizing Quick sort, All-
Pairs Shortest Paths, Algorithm for sparse graph, Parallel Depth-First Search, Parallel BestFirst
Search.

Unit – 6:CUDA Architecture 09 Hours


CUDA Architecture, Using the CUDA Architecture, Applications of CUDA Introduction to
CUDA C-Write and launch CUDA C kernels, Manage GPU memory, Manage communication and
synchronization, Parallel programming in CUDA- C.

Books:
Text:
1. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar, "Introduction to Parallel
Computing", 2nd edition, Addison-Wesley, 2003, ISBN: 0-201-64865-2 2.
2. Jason sanders, Edward Kandrot, “CUDA by Example”, Addison-Wesley, ISBN-13: 978-0- 13-
138768-3

References:
1. Kai Hwang, ”Scalable Parallel Computing”, McGraw Hill 1998, ISBN:0070317984
2. Shane Cook, “CUDA Programming: A Developer's Guide to Parallel Computing with
GPUs”, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA 2013 ISBN:
9780124159884
3. David Culler Jaswinder Pal Singh, ”Parallel Computer Architecture: A Hardware/Software
Approach”, Morgan Kaufmann,1999, ISBN 978-1-55860-343-1
4. Rod Stephens, “ Essential Algorithms”, Wiley, ISBN: ISBN: 978-1-118-61210-1
Evaluation Guidelines:

Internal Assessment (IA) : [CT (20Marks)+TA/AT(10 Marks)]

Class Test (CT) [20 marks]:- Three class tests, 20 marks each, will be conducted in a semester
and out of these three, the average of best two will be selected for calculation of class test marks.
Format of question paper is same as university.

TA [5 marks]: Three/four assignments will be conducted in the semester. Teacher assessment will be
calculated on the basis of performance in assignments, class test and pre-university test

Attendance (AT) [5 marks]: Attendance marks will be given as per university policy.

Paper pattern and marks distribution for Class tests:

1. Question Paper will have 5 questions. Question 1 is objective question contain 5 sub
questions each carry 1 marks.
2. Attempt any 3 questions from remaining 4 question each carry 5 marks.

In semester Exam :
30 Marks in semester exam : As per university guidelines.

Pre-University Test [ 50 Marks]


Paper pattern and marks distribution for PUT: Same as End semester exam

End Semester Examination [ 70 Marks]:


Paper pattern and marks distribution for End Semester Exam: As per university guidelines.
Lecture Plan
High Performance Computing
1 Introduction Motivating Parallelism
2 Scope of Parallel Computing, Parallel Programming Platforms: Implicit Parallelism
3 Trends in Microprocessor and Architectures, Limitations of Memory
4 System Performance, Dichotomy of Parallel Computing Platforms
5 Physical Organization of Parallel Platforms
6 Communication Costs in Parallel Machines
7 Scalable design principles
8 Architectures: N-wide superscalar architectures
9 Multi-core architecture
10 Parallel Programming Principles of Parallel Algorithm Design: Preliminaries
11 Decomposition Techniques
12 Characteristics of Tasks and Interactions
13 Mapping Techniques for Load Balancing
14 Methods for Containing Interaction Overhead
15 Parallel Algorithm Models
16 The Age of Parallel Processing
17 the Rise of GPU Computing
18 A Brief History of GPUs, Early GPU
19 Basic Communication Operations- One-to-All Broadcast and
20 All-to-One Reduction
21 All-to-All Broadcast and Reduction
22 All-Reduce and Prefix-Sum Operations
23 Scatter and Gather
24 All-to-All Personalized Communication
25 Circular Shift
26 Improving the Speed of Some
27 Communication Operations.
28 Analytical Models of Parallel Programs
Analytical Models: Sources of overhead in Parallel Programs
29 Performance Metrics for Parallel Systems
30 and The effect of Granularity on Performance
31 Scalability of Parallel Systems
32 Minimum execution time and minimum cost
33 optimal execution time
34 Dense Matrix Algorithms
35 MatrixVector Multiplication
36 Matrix-Matrix Multiplication
37 Parallel Algorithms- Sorting and Graph
Issues in Sorting on Parallel Computers
38 Bubble Sort and
39 its Variants
40 Parallelizing Quick sort
41 All-Pairs Shortest Paths
42 Algorithm for sparse graph
43 Parallel Depth-First Search
44 Parallel Best
45 First Search
46 CUDA Architecture CUDA Architecture
47 Using the CUDA Architecture
48 Applications of CUDA
49 Introduction to CUDA
50 C-Write and launch CUDA C kernels
51 Manage GPU memory
52 Manage communication
53 and synchronization
54 Parallel programming in CUDA- C
Course Delivery, Objectives, Outcomes
High Performance Computing
Semester – 7
Course Delivery :
The course will be delivered through lectures, assignment/tutorial sessions, class room
interaction, and presentations.

Course Objectives:
 To study parallel computing hardware and programming models 
To be conversant with performance analysis and modeling of parallel programs 
 To understand the options available to parallelize the programs 
 To know the operating system requirements to qualify in handling the parallelization

Course Outcomes:
On completion of the course, student will be able to–
Describe different parallel architectures, inter-connect networks, programming models 
Develop an efficient parallel algorithm to solve given problem 
 Analyze and measure performance of modern parallel computing systems 
Build the logic to parallelize the programming task

CO-PO Mapping

Course PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
Outcomes
CO1 2 1
CO2 2 1 1 1
CO3 1 1
CO4 1 2 1

Justification Of CO-PO Mapping

CO1 with PO1 According to CO1 students learn to describe different parallel
architectures, inter-connect networks, programming models. So it is
moderately correlated to PO1.
CO1 with PO2 According to CO1 students learn to describe different parallel
architectures, inter-connect networks, programming models. So it is
slightly correlated to PO4.
CO2 with PO2 According to CO2 students learn to develop an efficient parallel algorithm
to solve given problem . So it is moderately correlated to PO2.
CO2 with PO3 According to CO2 students learn to develop an efficient parallel algorithm
to solve given problem. So it is slightly correlated to PO3.
CO2 with PO4 According to CO2 students learn to develop an efficient parallel
algorithm to solve given problem. So it is slightly correlated to PO4.
CO2 with PO12 According to CO2 students learn to develop an efficient parallel
algorithm to solve given problem. So it is slightly correlated to PO12.
CO3 with PO6 According to CO3 students get the knowledge to analyze and measure
performance of modern parallel computing systems . So it is slightly
related to PO6.
CO3 with PO7 According to CO3 students get the knowledge to analyze and measure
performance of modern parallel computing systems . So it is slightly
related to PO7.
CO4 with PO1 According to CO4 Students are able to build the logic to parallelize the
programming task. So it is slightly correlated with PO1.
CO4 with PO4 According to CO4 Students are able to build the logic to parallelize the
programming task. So it is moderately correlated with PO4.
CO4 with PO12 According to CO4 Students are able to build the logic to parallelize the
programming task. So it is slightly correlated with PO12.
Assignments

High Performance Computing 410241


Assignment 1
Q. Questions Max. Unit. CO Bloom's
No. Marks No. as mappe Taxono
per d my
syllab Level
us
1. 04 1 1 2
Explain SIMD, MIMD & SIMT Architecture.
2. What is the basic working principle of VLIW Processor 04 1 2 1
3. Define UMA with diagram 02 1 1 1
4. 04 2 2
Explain decomposition, Task & Depedancy graph. 2
5. Explain Granularity, Concurrency & Task interaction 04 2 1 2
6. What is overhead in networking? 02 2 1 1

Solution(Assignment 1 )

Q. Questions Max. Unit. CO Bloom'


No. Mark No. as mapp s
s per ed Taxon
syllabu omy
s Level
1. Explain SIMD, MIMD & SIMT Architecture. 05 1 1 2

Answer:
SISD (Single Instruction, Single Data stream)
Single Instruction, Single Data (SISD) refers to an Instruction Set Architecture in which a single
processor (one CPU) executes exactly one instruction stream at a time and also fetches or stores one
item of data at a time to operate on data stored in a single memory unit. Most of the CPU design,
based on the von Neumann architecture, from the beginning till recent times are based on the SISD.
The SISD model is a typical non-pipelined architecture with the general-purpose registers, as well
as dedicated special registers such as the Program Counter (PC), the Instruction Register (IR),
Memory Address Registers (MAR) and Memory Data Registers (MDR).

SIMD (Single Instruction, Multiple Data streams)


Single Instruction, Multiple Data (SIMD) is an Instruction Set Architecture that have a single
control unit (CU) and more than one processing unit (PU) that operates like a von Neumann
machine by executing a single instruction stream over PUs, handled through the CU. The CU
generates the control signals for all of the PUs and by which executes the same operation on
different data streams. The SIMD architecture, in effect, is capable of achieving data level
parallelism just like with vector processor.
Some of the examples of the SIMD based systems include IBM's AltiVec and SPE for PowerPC,
HP's PA-RISC Multimedia Acceleration eXtensions (MAX), Intel's MMX and iwMMXt, SSE,
SSE2, SSE3 and SSSE3, AMD's 3DNow! etc.

MISD (Multiple Instruction, Single Data stream)


Multiple Instruction, Single Data (MISD) is an Instruction Set Architecture for parallel computing
where many functional units perform different operations by executing different intructions on the
same data set. This type of architecture is common mainly in the fault-tolerant computers executing
the same instructions redundantly in order to detect and mask errors.

Q. Questions Max. Unit. CO Bloom'


No. Marks No. as mapp s
per ed Taxon
syllabu omy
s Level
2. What is the basic working principle of VLIW Processor 05 1 2 1
Answer:

Very long instruction word (VLIW) describes a computer processing architecture in which a
language compiler or pre-processor breaks program instruction down into basic operations that
can be performed by the processor in parallel (that is, at the same time). These operations are put
into a very long instruction word which the processor can then take apart without further analysis,
handing each operation to an appropriate functional unit.
VLIW is sometimes viewed as the next step beyond the reduced instruction set computing ( RIS
C ) architecture, which also works with a limited set of relatively basic instructions and can usually
execute more than one instruction at a time (a characteristic referred to as superscalar ). The main
advantage of VLIW processors is that complexity is moved from the hardware to the software,
which means that the hardware can be smaller, cheaper, and require less power to operate. The
challenge is to design a compiler or pre-processor that is intelligent enough to decide how to build
the very long instruction words. If dynamic pre-processing is done as the program is run,
performance may be a concern.
The Crusoe family of processors from Transmeta uses very long instruction words that are
assembled by a pre-processor that is located in a flash memory chip. Because the processor does
not need to have the ability to discover and schedule parallel operations, the processor contains only
about a fourth of the transistor s of a regular processor. The lower power requirement enables
computers based on Crusoe technology to be operated by battery almost all day without a recharge.
The Crusoe processors emulate Intel's x86 processor instruction set. Theoretically, pre-processors
could be designed to emulate other processor architectures.

Q. Questions Max. Unit. CO Bloom'


No. Marks No. as mapp s
per ed Taxon
syllabu omy
s Level
3. Define UMA with diagram 02 1 1 1
Answer:

UMA (Uniform Memory Access) system is a shared memory architecture for the multiprocessors.
In this model, a single memory is used and accessed by all the processors present the multiprocessor
system with the help of the interconnection network. Each processor has equal memory accessing
time (latency) and access speed. It can employ either of the single bus, multiple bus or crossbar
switch. As it provides balanced shared memory access, it is also known as SMP (Symmetric
multiprocessor) systems.

Q. Questions Max. Unit. CO Bloom'


No. Marks No. as mapp s
per ed Taxon
syllabu omy
s Level
4. Explain decomposition, Task & Depedancy graph. 05 2 2 2
Answer:
The process of dividing a computation into smaller parts, some or all of which may potentially be
executed in parallel, is called decomposition. Tasks are programmer-defined units of computation
into which the main computation is subdivided by means of decomposition. Simultaneous execution
of multiple tasks is the key to reducing the time required to solve the entire problem. Tasks can be
of arbitrary size, but once defined, they are regarded as indivisible units of computation. The tasks
into which a problem is decomposed may not all be of the same size.
Example Dense matrix-vector multiplication
Consider the multiplication of a dense n x n matrix A with a vector b to yield another vector y.
The ith element y[i] of the product vector is the dot-product of the ith row of A with the input
vector b; i.e., . As shown later in fig, the computation of each y[i] can be
regarded as a task. Alternatively, as shown later in fig, the computation could be decomposed into
fewer, say four, tasks where each task computes roughly n/4 of the entries of the vector y.
Figure:. Decomposition of dense matrix-vector multiplication into n tasks, where n is the number of rows in the
matrix. The portions of the matrix and the input and output vectors accessed by Task 1 are highlighted.

Note that all tasks in figure are independent and can be performed all together or in any sequence.
However, in general, some tasks may use data produced by other tasks and thus may need to wait
for these tasks to finish execution. An abstraction used to express such dependencies among tasks
and their relative order of execution is known as a task-dependency graph. A task-dependency
graph is a directed acyclic graph in which the nodes represent tasks and the directed edges indicate
the dependencies amongst them. The task corresponding to a node can be executed when all tasks
connected to this node by incoming edges have completed. Note that task-dependency graphs can
be disconnected and the edge-set of a task-dependency graph can be empty. This is the case for
matrix-vector multiplication, where each task computes a subset of the entries of the product vector.
To see a more interesting task-dependency graph, consider the following database query processing
example.

Q. Questions Max. Unit. CO Bloom'


No. Marks No. as mapp s
per ed Taxon
syllabu omy
s Level
5. Explain Granularity, Concurrency & Task interaction 05 21 2
Answer:
The number and size of tasks into which a problem is decomposed determines the granularity of
the decomposition. A decomposition into a large number of small tasks is called fine-grained and a
decomposition into a small number of large tasks is called coarse-grained. For example, the
decomposition for matrix-vector multiplication shown in fig would usually be considered fine-
grained because each of a large number of tasks performs a single dot-product. Figure shows a
coarse-grained decomposition of the same problem into four tasks, where each tasks computes n/4
of the entries of the output vector of length n.

Figure : Decomposition of dense matrix-vector multiplication into four tasks. The portions of the
matrix and the input and output vectors accessed by Task 1 are highlighted.

A concept related to granularity is that of degree of concurrency. The maximum number of tasks
that can be executed simultaneously in a parallel program at any given time is known as its
maximum degree of concurrency. In most cases, the maximum degree of concurrency is less than
the total number of tasks due to dependencies among the tasks. For example, the maximum degree
of concurrency in the task-graphs of Figures and is four. In these task-graphs, maximum
concurrency is available right at the beginning when tables for Model, Year, Color Green, and
Color White can be computed simultaneously. In general, for task-dependency graphs that are trees,
the maximum degree of concurrency is always equal to the number of leaves in the tree.
A more useful indicator of a parallel program's performance is the average degree of concurrency,
which is the average number of tasks that can run concurrently over the entire duration of execution
of the program.
Both the maximum and the average degrees of concurrency usually increase as the granularity of
tasks becomes smaller (finer). For example, the decomposition of matrix-vector multiplication
shown in fig has a fairly small granularity and a large degree of concurrency. The decomposition
for the same problem shown in fig has a larger granularity and a smaller degree of concurrency.
The degree of concurrency also depends on the shape of the task-dependency graph and the same
granularity, in general, does not guarantee the same degree of concurrency. For example, consider
the two task graphs in fig, which are abstractions of the task graphs of Figures and , respectively
(Problem 3.1). The number inside each node represents the amount of work required to complete
the task corresponding to that node. The average degree of concurrency of the task graph in fig is
2.33 and that of the task graph in fig is 1.88 (Problem 3.1), although both task-dependency graphs
are based on the same decomposition.

Q. Questions Max. Unit. CO Bloom'


No. Marks No. as mapp s
per ed Taxon
syllabu omy
s Level
6. What is overhead in networking? 02 2 1 1
\
Answer:In computer science, overhead is any combination of excess or indirect computation
time, memory, bandwidth, or other resources that are required to perform a specific task. ...
Examples of computing overhead may be found in functional programming, data transfer, and
data structures.
Assignment 2
Q. Questions Max. Unit. CO Bloom's
No. Marks No. as mapped Taxono
per my
syllab Level
us
1. 04 3 1 2
Explain One-to-all broadcast and reduction on a Ring?
2. Explain Scatter and Gather Operation. 04 3 2 2
3. What is All-to-All broadcast and reduction? 02 3 2 1
4. Write a short note on Performance Metrics for Parallel 04 4 2 2
System.
5. Explain Dense Matrix Algorithms with 1-D and 2-D 04 4 2 2
Partitioning.
6. What is Canon’s Algorithms. 02 4 3 1

Solution(Assignment 2 )

Q. Questions Max. Unit. CO Bloom'


No. Marks No. as mapp s
per ed Taxon
syllabu omy
s Level
1. Explain One-to-all broadcast and reduction on a Ring? 04 3 1 2

Answer:
Parallel algorithms often require a single process to send identical data to all other processes or to a
subset of them. This operation is known as one-to-all broadcast. Initially, only the source process
has the data of size m that needs to be broadcast. At the termination of the procedure, there are p
copies of the initial data - one belonging to each process. The dual of one-to-all broadcast is all-to-
one reduction. In an all-to-one reduction operation, each of the p participating processes starts
with a buffer Mcontaining m words. The data from all processes are combined through an
associative operator and accumulated at a single destination process into one buffer of size m.
Reduction can be used to find the sum, product, maximum, or minimum of sets of numbers - the i
th word of the accumulated M is the sum, product, maximum, or minimum of the i th words of
each of the original buffers. figure shows one-to-all broadcast and all-to-one reduction among p
processes.

Figure: One-to-all broadcast and all-to-one reduction.

One-to-all broadcast and all-to-one reduction are used in several important parallel algorithms
including matrix-vector multiplication, Gaussian elimination, shortest paths, and vector inner
product. In the following subsections, we consider the implementation of one-to-all broadcast in
detail on a variety of interconnection topologies.
A naive way to perform one-to-all broadcast is to sequentially send p - 1 messages from the source
to the other p - 1 processes. However, this is inefficient because the source process becomes a
bottleneck. Moreover, the communication network is underutilized because only the connection
between a single pair of nodes is used at a time. A better broadcast algorithm can be devised using a
technique commonly known as recursive doubling. The source process first sends the message to
another process. Now both these processes can simultaneously send the message to two other
processes that are still waiting for the message. By continuing this procedure until all the processes
have received the data, the message can be broadcast in log p steps.The steps in a one-to-all
broadcast on an eight-node linear array or ring are shown in fig. The nodes are labeled from 0 to 7.
Each message transmission step is shown by a numbered, dotted arrow from the source of the
message to its destination. Arrows indicating messages sent during the same time step have the
same number.

Figure: One-to-all broadcast on an eight-node ring. Node 0 is the source of the broadcast. Each message transfer step
is shown by a numbered, dotted arrow from the source of the message to its destination. The number on an arrow
indicates the time step during which the message is transferred.

Note that on a linear array, the destination node to which the message is sent in each step must be
carefully chosen. In fig, the message is first sent to the farthest node (4) from the source (0). In the
second step, the distance between the sending and receiving nodes is halved, and so on. The
message recipients are selected in this manner at each step to avoid congestion on the network. For
example, if node 0 sent the message to node 1 in the first step and then nodes 0 and 1 attempted to
send messages to nodes 2 and 3, respectively, in the second step, the link between nodes 1 and 2
would be congested as it would be a part of the shortest route for both the messages in the second
step. Reduction on a linear array can be performed by simply reversing the direction and the
sequence of communication, as shown in fig. In the first step, each odd numbered node sends its
buffer to the even numbered node just before itself, where the contents of the two buffers are
combined into one. After the first step, there are four buffers left to be reduced on nodes 0, 2, 4, and
6, respectively. In the second step, the contents of the buffers on nodes 0 and 2 are accumulated on
node 0 and those on nodes 6 and 4 are accumulated on node 4. Finally, node 4 sends its buffer to
node 0, which computes the final result of the reduction.
Figure : Reduction on an eight-node ring with node 0 as the destination of the reduction.

Q. Questions Max. Unit. CO Bloom'


No. Marks No. as mapp s
per ed Taxon
syllabu omy
s Level
2. Explain Scatter and Gather Operation. 04 3 2 2

Answer:
Gather and scatter operations are used in many domains. However, to use these types of functions
on an SIMD architecture creates some programming challenges. As SIMD systems are optimized to
work with memory laid out in a contiguous manner. Whereas a gather operation reads elements
from memory and packs them in an SIMD register, the scatter operation unpacks the data and then
writes to individual memory locations.
Typical coding for this will result in the non-optimal use of the SIMD instructions on an Intel Xeon
Phi coprocessor. Gathers and scatters will result in more work than when memory that is being
access is laid out in a contiguous manner. More cache line misses and more pages in memory will
have to be accessed.
The Intel architecture, using the Streaming SIME Extensions (SSE) and the Intel Advanced Vector
Extensions (AVX), gather and scatter operations would need to be performed with the scalar loads
and stores. AVX2 and the Intel Initial Many Core (IMCI) instructions can also be used.
An example of this use is within the molecular dynamics domain. N-body simulations may use
scatter and gather techniques to optimize the compute intensive portions of the applications. Using a
number of the techniques mentioned below, a performance gain of 2X was observed on the miniMD
application using the Intel Xeon processors or the Intel Xeon Phi coprocessors.
A number of optimization techniques can be used for improving gather and scatter operations.
 Improve the temporal and special locality
 Choosing the right data layout, Structure of Arrays or Arrays of Structures.
 Transposition between AoS and SoA.
 Amortize the costs of gatter/scatter.
Q. Questions Max. Unit. CO Bloom'
No. Marks No. as mapp s
per ed Taxon
syllabu omy
s Level
3. What is All-to-All broadcast and reduction? 02 3 2 1
Answer:
All-to-all broadcast is a generalization of one-to-all broadcast in which all p nodes simultaneously
initiate a broadcast. A process sends the same m-word message to every other process, but different
processes may broadcast different messages. All-to-all broadcast is used in matrix operations,
including matrix multiplication and matrix-vector multiplication. The dual of all-to-all broadcast is
all-to-all reduction, in which every node is the destination of an all-to-one reduction

Q. Questions Max. Unit. CO Bloom'


No. Marks No. as mapp s
per ed Taxon
syllabu omy
s Level
4. Write a short note on Performance Metrics for Parallel 04 4 2 2
System.
Answer:
It is important to study the performance of parallel programs with a view to determining the best
algorithm, evaluating hardware platforms, and examining the benefits from parallelism. A number
of metrics have been used based on the desired outcome of performance analysis.
Execution Time
The serial runtime of a program is the time elapsed between the beginning and the end of its
execution on a sequential computer. The parallel runtime is the time that elapses from the moment
a parallel computation starts to the moment the last processing element finishes execution. We
denote the serial runtime by TS and the parallel runtime by TP.
Total Parallel Overhead
The overheads incurred by a parallel program are encapsulated into a single expression referred to
as the overhead function. We define overhead function or total overhead of a parallel system as
the total time collectively spent by all the processing elements over and above that required by the
fastest known sequential algorithm for solving the same problem on a single processing element.
We denote the overhead function of a parallel system by the symbol To.
The total time spent in solving a problem summed over all processing elements is pTP . TS units
of this time are spent performing useful work, and the remainder is overhead. Therefore, the
overhead function (To) is given by
Equation 5.1

Speedup
When evaluating a parallel system, we are often interested in knowing how much performance gain
is achieved by parallelizing a given application over a sequential implementation. Speedup is a
measure that captures the relative benefit of solving a problem in parallel. It is defined as the ratio
of the time taken to solve a problem on a single processing element to the time required to solve the
same problem on a parallel computer with p identical processing elements. We denote speedup by
the symbol S.
Example :Adding n numbers using n processing elements
Consider the problem of adding n numbers by using n processing elements. Initially, each
processing element is assigned one of the numbers to be added and, at the end of the computation,
one of the processing elements stores the sum of all the numbers. Assuming that n is a power of
two, we can perform this operation in log n steps by propagating partial sums up a logical binary
tree of processing elements. Following figure illustrates the procedure for n = 16. The processing
elements are labeled from 0 to 15. Similarly, the 16 numbers to be added are labeled from 0 to 15.
The sum of the numbers with consecutive labels from i to j is denoted by .
Figure. Computing the global sum of 16 partial sums using 16 processing elements. denotes the sum of
numbers with consecutive labels from i to j.

Q. Questions Max. Unit. CO Bloom'


No. Marks No. as mapp s
per ed Taxon
syllabu omy
s Level
5. Explain Dense Matrix Algorithms with 1-D and 2-D 04 4 2 2
Partitioning.
Answer:
This section addresses the problem of multiplying a dense n x n matrix A with an n x 1 vector
x to yield the n x 1 result vector y. algorithm shows a serial algorithm for this problem. The
sequential algorithm requires n2 multiplications and additions. Assuming that a multiplication and
addition pair takes unit time, the sequential run time is

Equation

At least three distinct parallel formulations of matrix-vector multiplication are possible, depending
on whether rowwise 1-D, columnwise 1-D, or a 2-D partitioning is used.
Algorithm :A serial algorithm for multiplying an n x n matrix A with an n x 1 vector x to
yield an n x 1 product vector y.
1. procedure MAT_VECT ( A, x, y)
2. begin
3. for i := 0 to n - 1 do
4. begin
5. y[i]:=0;
6. for j := 0 to n - 1 do
7. y[i] := y[i] + A[i, j] x x[j];
8. endfor;
9. end MAT_VECT
Rowwise 1-D Partitioning
This section details the parallel algorithm for matrix-vector multiplication using rowwise block 1-D
partitioning. The parallel algorithm for columnwise block 1-D partitioning is similar (Problem 8.2)
and has a similar expression for parallel run time. Figure describes the distribution and movement
of data for matrix-vector multiplication with block 1-D partitioning.
Figure:Multiplication of an n x n matrix with an n x 1 vector using rowwise block 1-D partitioning. For the one-
row-per-process case, p = n.
One Row Per Process
First, consider the case in which the n x n matrix is partitioned among n processes so that each
process stores one complete row of the matrix. The n x 1 vector x is distributed such that each
process owns one of its elements. The initial distribution of the matrix and the vector for rowwise
block 1-D partitioning is shown in fig(a). Process Pi initially owns x[i] and A[i, 0], A[i, 1], ...,
A[i, n-1] and is responsible for computing y[i]. Vector x is multiplied with each row of the matrix
(Algorithm); hence, every process needs the entire vector. Since each process starts with only one
element of x, an all-to-all broadcast is required to distribute all the elements to all the processes.
Fig(b), process Pi computes (lines 6 and 7 of algorithm). As Fig(b)
shows, the result vector y is stored exactly the way the starting vector x was stored.
Parallel Run Time Starting with one vector element per process, the all-to-all broadcast of the
vector elements among n processes requires time Q(n) on any architecture . The multiplication of
a single row of A with x is also performed by each process in time Q(n). Thus, the entire
procedure is completed by n processes in time Q(n), resulting in a process-time product of Q(n2).
The parallel algorithm is cost-optimal because the complexity of the serial algorithm is Q(n2).
Using Fewer than n Processes
Consider the case in which p processes are used such that p < n, and the matrix is partitioned
among the processes by using block 1-D partitioning. Each process initially stores n/p complete
rows of the matrix and a portion of the vector of size n/p. Since the vector x must be multiplied
with each row of the matrix, every process needs the entire vector (that is, all the portions belonging
to separate processes). This again requires an all-to-all broadcast as shown in fig(b) and (c). The
all-to-all broadcast takes place among p processes and involves messages of size n/p. After this
communication step, each process multiplies its n/p rows with the vector x to produce n/p
elements of the result vector.Fig(d) shows that the result vector y is distributed in the same format
as that of the starting vector x.
Parallel Run Time According to table, an all-to-all broadcast of messages of size n/p among p
processes takes time ts log p + tw(n/ p)( p - 1). For large p, this can be approximated by ts log
p + twn. After the communication, each process spends time n2/pmultiplying its n/p rows with the
vector. Thus, the parallel run time of this procedure is

Equation

The process-time product for this parallel formulation is n2 + ts p log p + twnp. The algorithm is
cost-optimal for p = O(n).
Scalability Analysis We now derive the isoefficiency function for matrix-vector multiplication
along the lines of the analysis in section by considering the terms of the overhead function one at a
time. Consider the parallel run time given by equation for the hypercube architecture. The relation
To = pTP - W gives the following expression for the overhead function of matrix-vector
multiplication on a hypercube with block 1-D partitioning:

Equation

Recall from chapter that the central relation that determines the isoefficiency function of a parallel
algorithm is W = KTo , where K = E/(1 - E) and E is the desired efficiency. Rewriting this
relation for matrix-vector multiplication, first with only the ts term of To,

Equation

Equation gives the isoefficiency term with respect to message startup time. Similarly, for the tw
term of the overhead function,

Since W = n2 (Equation), we derive an expression for W in terms of p, K , and tw (that is, the
isoefficiency function due to tw) as follows:

Equation 8.5

Now consider the degree of concurrency of this parallel algorithm. Using 1-D partitioning, a
maximum of n processes can be used to multiply an n x n matrix with an n x 1 vector. In other
words, p is O(n), which yields the following condition:

Equation 8.6
The overall asymptotic isoefficiency function can be determined by comparing Equations 8.4, 8.5,
and 8.6. Among the three, Equations 8.5 and 8.6 give the highest asymptotic rate at which the
problem size must increase with the number of processes to maintain a fixed efficiency. This rate of
Q(p2) is the asymptotic isoefficiency function of the parallel matrix-vector multiplication algorithm
with 1-D partitioning.
2-D Partitioning
This section discusses parallel matrix-vector multiplication for the case in which the matrix is
distributed among the processes using a block 2-D partitioning. Figure 8.2 shows the distribution
of the matrix and the distribution and movement of vectors among the processes.

Figure:Matrix-vector multiplication with block 2-D partitioning. For the one-element-per-process case, p = n 2 if the
matrix size is n x n.

Q. Questions Max. Unit. CO Bloom'


No. Marks No. as mapp s
per ed Taxon
syllabu omy
s Level
6. What is Canon’s Algorithms. 02 4 3 1

Answer:
Cannon's algorithm is a distributed algorithm for matrix multiplication for two-dimensional
meshes. It is especially suitable for computers laid out in an N × N mesh. While Cannon's algorithm
works well in homogeneous 2D grids, extending it to heterogeneous 2D grids has been shown to be
difficult.The main advantage of the algorithm is that its storage requirements remain constant
and are independent of the number of processors.
Assignment 3
Q. Questions Max. Unit. CO Bloom's
No. Marks No. as mapped Taxono
per my
syllab Level
us
1. Explain Bitonic sort with an example. 04 5 3 2
2. Enlist the issues in sorting on parallel computers. 02 5 3 1
3. Explain the working of parallel quick sort algorithms with an 04 5 4 2
example.
4. Explain CUDA Architecture with Schematic Diagram. 04 6 3 3
5. Write a short note on Memory Hierarchy. 04 6 2 3
6

Solution(Assignment 3 )
Q. Questions Max. Unit. CO Bloom's
No. Marks No. as mapped Taxonom
per y Level
syllabu
s
1. Explain Bitonic sort with an example. 05 5 3 2

Answer:
Bitonic sort is a parallel sorting algorithm which performs O(n2 log n) comparisons. Although, the
number of comparisons are more than that in any other popular sorting algorithm, It performs better
for the parallel implementation because elements are compared in predefined sequence which must
not be depended upon the data being sorted. The predefined sequence is called Bitonic sequence.
What is Bitonic Sequence ?
In order to understand Bitonic sort, we must understand Bitonic sequence. Bitonic sequence is the
one in which, the elements first comes in increasing order then start decreasing after some particular
index. An array A[0... i ... n-1] is called Bitonic if there exist an index i such that,
A[0] < A[1] < A[2] .... A[i-1] < A[i] > A[i+1] > A[i+2] > A[i+3] > ... >A[n-1]

where, 0 <= i <= n-1. A rotation of Bitonic sort is also Bitonic.


How to convert Random sequence to Bitonic sequence ?
Consider a sequence A[ 0 ... n-1] of n elements. First start constructing Bitonic sequence by using 4
elements of the sequence. Sort the first 2 elements in ascending order and the last 2 elements in
descending order, concatenate this pair to form a Bitonic sequence of 4 elements. Repeat this
process for the remaining pairs of element until we find a Bitonic sequence.
After this step, we get the Bitonic sequence for the given sequence as 2, 10, 20, 30, 5, 5, 4, 3.
Bitonic Sorting :
Bitonic sorting mainly involves the following basic steps.
Form a Bitonic sequence from the given random sequence which we have formed in the above step.
We can consider it as the first step in our procedure. After this step, we get a sequence whose first
half is sorted in ascending order while second step is sorted in descending order.
Compare first element of first half with the first element of the second half, then second element of
the first half with the second element of the second half and so on. Swap the elements if any
element in the second half is found to be smaller.
After the above step, we got all the elements in the first half smaller than all the elements in the
second half. The compare and swap results into the two sequences of n/2 length each. Repeat the
process performed in the second step recursively onto every sequence until we get single sorted
sequence of length n.
The whole procedure involved in Bitonic sort is described in the following image.
Q. Questions Max. Unit. No. CO Bloom's
No. Marks as per mapped Taxonomy
syllabus Level
2. Enlist the issues in sorting on parallel computers 05 5 3 1
Answer:
One Element Per Process
Consider the case in which each process holds only one element of the sequence to be sorted. At
some point in the execution of the algorithm, a pair of processes (Pi, Pj) may need to compare their
elements, ai and aj. After the comparison, Pi will hold the smaller and Pj the larger of {ai, aj}.
We can perform comparison by having both processes send their elements to each other. Each
process compares the received element with its own and retains the appropriate element. In our
example, Pi will keep the smaller and Pjwill keep the larger of {ai, aj}. As in the sequential case,
we refer to this operation as compare-exchange. As fig illustrates, each compare-exchange
operation requires one comparison step and one communication step.

Figure :. A parallel compare-exchange operation. Processes Pi and Pj send their elements to each other. Process Pi
keeps min{ai, aj}, and Pj keeps max{ai , aj}.
If we assume that processes Pi and Pj are neighbors, and the communication channels are
bidirectional, then the communication cost of a compare-exchange step is (ts + tw), where ts and
tw are message-startup time and per-word transfer time, respectively. In commercially available
message-passing computers, ts is significantly larger than tw, so the communication time is
dominated by ts. Note that in today's parallel computers it takes more time to send an element from
one process to another than it takes to compare the elements. Consequently, any parallel sorting
formulation that uses as many processes as elements to be sorted will deliver very poor performance
because the overall parallel run time will be dominated by interprocess communication.
More than One Element Per Process
A general-purpose parallel sorting algorithm must be able to sort a large sequence with a relatively
small number of processes. Let p be the number of processes P0, P1, ..., Pp-1, and let n be the
number of elements to be sorted. Each process is assigned a block of n/p elements, and all the
processes cooperate to sort the sequence. Let A0, A1, ... A p-1 be the blocks assigned to
processes P0, P1, ... Pp-1, respectively. We say that Ai Aj if every element of Ai is less
than or equal to every element in Aj. When the sorting algorithm finishes, each process Pi holds a
set such that for i j, and .
As in the one-element-per-process case, two processes Pi and Pj may have to redistribute their
blocks of n/p elements so that one of them will get the smaller n/p elements and the other will get
the larger n/p elements. Let Ai and Aj be the blocks stored in processes Pi and Pj. If the block
of n/p elements at each process is already sorted, the redistribution can be done efficiently as
follows. Each process sends its block to the other process. Now, each process merges the two sorted
blocks and retains only the appropriate half of the merged block. We refer to this operation of
comparing and splitting two sorted blocks as compare-split. The compare-split operation is
illustrated in figure.

Q. Questions Max. Unit. CO Bloom's


No. Marks No. as mapped Taxonom
per y Level
syllabu
s
3. Explain the working of parallel quick sort 04 5 4 2
algorithms with an example.
Answer:
All the algorithms presented so far have worse sequential complexity than that of the lower bound
for comparison-based sorting, Q(n log n). This section examines the quicksort algorithm, which
has an average complexity of Q(n log n). Quicksort is one of the most common sorting algorithms
for sequential computers because of its simplicity, low overhead, and optimal average complexity.
Quicksort is a divide-and-conquer algorithm that sorts a sequence by recursively dividing it into
smaller subsequences. Assume that the n-element sequence to be sorted is stored in the array
A[1...n]. Quicksort consists of two steps: divide and conquer. During the divide step, a sequence
A[q...r] is partitioned (rearranged) into two nonempty subsequences A[q...s] and A[s + 1...r] such
that each element of the first subsequence is smaller than or equal to each element of the second
subsequence. During the conquer step, the subsequences are sorted by recursively applying
quicksort. Since the subsequences A[q...s] and A[s + 1...r] are sorted and the first subsequence has
smaller elements than the second, the entire sequence is sorted.
How is the sequence A[q...r] partitioned into two parts - one with all elements smaller than the
other? This is usually accomplished by selecting one element x from A[q...r] and using this
element to partition the sequence A[q...r] into two parts - one with elements less than or equal to x
and the other with elements greater than x.Element x is called the pivot. The quicksort algorithm
is presented in algorithm. This algorithm arbitrarily chooses the first element of the sequence
A[q...r] as the pivot. The operation of quicksort is illustrated in figure.

Figure :.Example of the quicksort algorithm sorting a sequence of size n = 8.

Algorithm The sequential quicksort algorithm.


1. procedure QUICKSORT (A, q, r)
2. begin
3. if q < r then
4. begin
5. x := A[q];
6. s := q;
7. for i := q + 1 to r do
8. if A[i] x then
9. begin
10. s := s + 1;
11. swap(A[s], A[i]);
12. end if
13. swap(A[q], A[s]);
14. QUICKSORT (A, q, s);
15. QUICKSORT (A, s + 1, r);
16. end if
17. end QUICKSORT
The complexity of partitioning a sequence of size k is Q(k). Quicksort's performance is greatly
affected by the way it partitions a sequence. Consider the case in which a sequence of size k is
split poorly, into two subsequences of sizes 1 and k - 1. The run time in this case is given by the
recurrence relation T(n) = T(n - 1) + Q(n), whose solution is T(n) = Q(n2). Alternatively,
consider the case in which the sequence is split well, into two roughly equal-size subsequences of
and elements. In this case, the run time is given by the recurrence relation T(n) = 2T(n/
2) + Q(n), whose solution is T(n) = Q(n log n). The second split yields an optimal algorithm.
Although quicksort can have O(n2) worst-case complexity, its average complexity is significantly
better; the average number of compare-exchange operations needed by quicksort for sorting a
randomly-ordered input sequence is 1.4n log n, which is asymptotically optimal. There are several
ways to select pivots. For example, the pivot can be the median of a small number of elements of
the sequence, or it can be an element selected at random. Some pivot selection strategies have
advantages over others for certain input sequences.

Q. Questions Max. Unit. CO Bloom's


No. Marks No. as mapped Taxonom
per y Level
syllabu
s
5. Explain CUDA Architecture with Schematic 04 3 3 2
Diagram.

Answer:

CPUs are designed to process as many sequential instructions as quickly as possible. While most
CPUs support threading, creating a thread is usually an expensive operation and high-end CPUs can
usually make efficient use of no more than about 12 concurrent threads.GPUs on the other hand are
designed to process a small number of parallel instructions on large sets of data as quickly as
possible. For instance, calculating 1 million polygons and determining which to draw on the screen
and where. To do this they rely on many slower processors and inexpensive threads.

Physical Architecture:CUDA-capable GPU cards are composed of one or more Streaming


Multiprocessors (SMs), which are an abstraction of the underlying hardware. Each SM has a set of
Streaming Processors (SPs), also called CUDA cores, which share a cache of shared memory that
is faster than the GPU’s global memory but that can only be accessed by the threads running on the
SPs the that SM. These streaming processors are the “cores” that execute instructions.

The numbers of SPs/cores in an SM and the number of SMs depend on your device: see the
Finding your Device Specifications section below for details. It is important to realize, however,
that regardless of GPU model, there are many more CUDA cores in a GPU than in a typical
multicore CPU: hundreds or thousands more. For example, the Kepler Streaming Multiprocessor
design, dubbed SMX, contains 192 single-precision CUDA cores, 64 double-precision units, 32
special function units, and 32 load/store units. (See the Kepler Architecture Whitepaper for a
description and diagram.)

CUDA cores are grouped together to perform instructions in a what nVIDIA has termed a warp of
threads. Warp simply means a group of threads that are scheduled together to execute the same
instructions in lockstep. All CUDA cards to date use a warp size of 32. Each SM has at least one
warp scheduler, which is responsible for executing 32 threads. Depending on the model of GPU, the
cores may be double or quadruple pumped so that they execute one instruction on two or four
threads in as many clock cycles. For instance, Tesla devices use a group of 8 quadpumped cores to
execute a single warp. If there are less than 32 threads scheduled in the warp, it will still take as
long to execute the instructions.
The CUDA programmer is responsible for ensuring that the threads are being assigned efficiently
for code that is designed to run on the GPU. The assignment of threads is done virtually in the code
using what is sometimes referred to as a ‘tiling’ scheme of blocks of threads that form a grid.
Programmers define a kernel function that will be executed on the CUDA card using a particular
tiling scheme.
Virtual Architecture
When programming in CUDA C we work with blocks of threads and grids of blocks. What is the
relationship between this virtual architecture and the CUDA card’s physical architecture?
When kernels are launched, each block in a grid is assigned to a Streaming Multiprocessor. This
allows threads in a block to use __shared__ memory. If a block doesn’t use the full resources of
the SM then multiple blocks may be assigned at once. If all of the SMs are busy then the extra
blocks will have to wait until a SM becomes free.
Once a block is assigned to an SM, it’s threads are split into warps by the warp scheduler and
executed on the CUDA cores. Since the same instructions are executed on each thread in the warp
simultaneously it’s generally a bad idea to have conditionals in kernel code. This type of code is
sometimes called divergent: when some threads in a warp are unable to execute the same
instruction as other threads in a warp, those threads are diverged and do no work.
Because a warp’s context (it’s registers, program counter etc.) stays on chip for the life of the warp,
there is no additional cost to switching between warps vs executing the next step of a given warp.
This allows the GPU to switch to hide some of it’s memory latency by switching to a new warp
while it waits for a costly read.
CUDA Memory
CUDA on chip memory is divided into several different regions
Registers act the same way that registers on CPUs do, each thread has it’s own set of
registers.Local Memory local variables used by each thread. They are not accessible by other
threads even though they use the same L1 and L2 cache as global memory.
Shared Memory is accessible by all threads in a block. It must be declared using the __shared__
modifier. It has a higher bandwidth and lower latency than global memory. However, if multiple
threads request the same address, the requests are processed serially, which slows down the
application.
Constant Memory is read-accessible by all threads and must be declared with the __const__
modifier. In newer devices there is a separate read only constant cache.
Global Memory is accessible by all threads. It’s the slowest device memory, but on new cards, it is
cached. Memory is pulled in 32, 64, or 128 byte memory transactions. Warps executing global
memory accesses attempt to pull all the data from global memory simultaneously therefore it’s
advantageous to use block sizes that are multiples of 32. If multidimensional arrays are used, it’s
also advantageous to have the bounds padded so that they are multiples of 32 Texture/Surface
Memory is read-accesible by all threads, but unlike Constant Memory, it is optimized for 2D
spacial locality, and cache hits pull in surrounding values in both x and y directions.
Q. Questions Max. Unit. CO Bloom's
No. Marks No. as mapped Taxonom
per y Level
syllabu
s
5. Write a short note on Memory Hierarchy. 05 6 2 3

Answer:
The memory in a computer can be divided into five hierarchies based on the speed as well as use.
The processor can move from one level to another based on its requirements. The five hierarchies in
the memory are registers, cache, main memory, magnetic discs, and magnetic tapes. The first three
hierarchies are volatile memories which mean when there is no power, and then automatically they
lose their stored data. Whereas the last two hierarchies are not volatile which means they store the
data permanently.
A memory element is the set of storage devices which stores the binary data in the type of bits. In
general, the storage of memory can be classified into two categories such as volatile as well as
non- volatile.
Memory Hierarchy in Computer Architecture
The memory hierarchy design in a computer system mainly includes different storage devices.
Most of the computers were inbuilt with extra storage to run more powerfully beyond the main
memory capacity. The following memory hierarchy diagram is a hierarchical pyramid for computer
memory. The designing of the memory hierarchy is divided into two types such as primary
(Internal) memory and secondary (External) memory.
Memory Hierarchy

Primary Memory:
The primary memory is also known as internal memory, and this is accessible by the processor
straightly. This memory includes main, cache, as well as CPU registers.
Secondary Memory:
The secondary memory is also known as external memory, and this is accessible by the processor
through an input/output module. This memory includes an optical disk, magnetic disk, and magnetic
tape.

Characteristics of Memory Hierarchy


The memory hierarchy characteristics mainly include the following.
Performance:
Previously, the designing of a computer system was done without memory hierarchy, and the speed
gap among the main memory as well as the CPU registers enhances because of the huge disparity in
access time, which will cause the lower performance of the system. So, the enhancement was
mandatory. The enhancement of this was designed in the memory hierarchy model due to the
system’s performance increase.
Ability:
The ability of the memory hierarchy is the total amount of data the memory can store. Because
whenever we shift from top to bottom inside the memory hierarchy, then the capacity will increase.
Access Time:
The access time in the memory hierarchy is the interval of the time among the data availability as
well as request to read or write. Because whenever we shift from top to bottom inside the memory
hierarchy, then the access time will increase

Q. Questions Max. Unit. CO Bloom's


No. Marks No. as mapped Taxonom
per y Level
syllabu
s
6. What is bubble sort and its variants? 02 6 2 1
Answer:
A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly
manipulate and alter memory to accelerate the creation of images in a frame buffer intended for
output to a display device. GPUs are used in embedded systems, mobile phones, personal comput
ers, workstations, and game consoles. Modern GPUs are very efficient at manipulating computer
graphics and image processing. Their highly parallel structure makes them more efficient than
general-purpose central processing units (CPUs) for algorithms that process large blocks of data
in parallel. In a personal computer, a GPU can be present on a video card or embedded on the mot
herboard
Question Bank
Unit -I
Q.1.What are the applications of Parallel Computing.
Q.2.What is the basic working principle of VLIW Processor.
Q.3.Explain control structure of Parallel platform in details.
Q.4.Explain basic working principle of Superscalar Processor.
Q.5.What are the limitation of Memory System Performance.
Q.6.Explain SIMD, MIMD & SIMT Architecture.
Q.7.What are the types of Dataflow Execution model.
Q.8.Write a short notes on UMA, NUMA & Level of parallelism.
Q.9.Explain cache coherence in multiprocessor system.
Q.10.Explain N-wide Superscalar Architecture.
Q.11.Explain interconnection network with its type?
Q.12.Write a short note on Communication Cost In Parallel machine.
Q.13.Compare between Write Invalidate and Write Update protocol.

Unit-II
Q.1.Explain decomposition, Task & Depedancy graph.
Q.2.Explain Granularity, Concurrency & Task interaction.
Q.3.Explain decomposition techniques with its types.
Q.4.What are the characteristics of Task and Interactions?
Q.5.Explain the Mapping techniques in details.
Q.6.Explain parallel Algorithm Model.
Q.7.Explain Thread Organization.
Q.8.Write a short note on IBM CBE
Q.9.Explain history of GPUs and NVIDIA Tesla GPU.

Unit-III
Q.1.Explain Broadcast & Reduce operation with help of diagram.
Q.2.Explain One-to-all broadcast and reduction on a Ring?
Q.3.Explain Operation of All to one broadcast & Reduction on a ring?
Q.4.Write a pseudo code for One-to-all broadcast algorithm on hyper cube with different cases?
Q.5.Explain term of All-to-all broadcast & reduction on Liner array, mesh and Hypercube
Typologies.
Q.6.Explain Scatter and Gather Operation.
Q.7.Write short note on Circular shaft on Mesh and hypercube.
Q.8.Explain different approaches of Communication operation.
Q.9.Explain all to all personalized communication?
Unit-IV

Q.1.Explain the Sources of Overhead in Parallel Program.


Q. 2.Write a short note on Performance Metrics for Parallel System.
Q. 3.Describe the Effect of Granularity on Performance in Parallel System.
Q.4.Write a short note on Scalability in Parallel System.
Q.5.Explain Minimum execution time and Minimum cost optimal execution time.
Q.6.Explain Dense Matrix Algorithms with 1-D and 2-D Partitioning.
Q.7. Explain the Canon’s Algorithms.(i)Polynomial kernels

Unit-V

Q.1. Explain Bitonic sort with an example.


Q.2. Explain canon’s algorithms to multiply 2 dense matrices.
Q.3. Write a short note on odd-even transportation sort and odd-even shell sort.
Q.4. Write a short note on issues in sorting on parallel computers.
Q.5. Explain a DNS algorithm to multiply 2 dense multiply.
Q.6. Explain the working of parallel quick sort algorithms with an example.
Q.7. Explain matrix vector multiplication using row-wise 1D partitioning.
Q.8.Write a short notes on
i ) bandwidth limitations.
ii ) Latency Limitations

Unit-VI

Q.1.Explain CUDA Architecture with Schematic Diagram.


Q.2.Describe about application of CUDA.
Q.3.Explain Heterogeneous System Architecture & Paradigm Heterogeneous computing.
Q.4.Explain the Processor Architecture for hetrogeneous computing.
Q.5.Explain CUDA Programming model for HPC Architecture.
Q.6.Write a short note on GPU Programming Model.
Q.7.Write a short note on Memory Hierarchy.
.Q.8.Explain CUDA Memory and Cache architecture.
Q.9.How parameter passing take place on CUDA kernel. Explain with code.
Q.10.How to manage GPU Memory.
Q.11.Write a short note on CUDA C Programming.
CLASS TEST- I
(AY 2018-19)
Branch: Computer Engineering (BE) Date:
Semester: V Duration: 1 hour
Subject: High Performance Computing - 410241 Max. Marks: 20M

Note: 1. Attempt all questions in Section A 2. Attempt any 3 questions in Section B


3. All questions are as per course outcomes 4. Assume suitable data wherever is required.

Questio Questions Max. Unit no. CO Bloom’s


n No. Marks as per Mapped Taxonom
syllabus y Level
Section A
01. The throughput of a super scalar processor is 01 01 02 02
_______
a) less than 1
b) 1
c) More than 1
d) Not Known
02. When the processor executes multiple 01 01 02 02
instructions at a time it is said to use _______
a) single issue
b) Multiplicity
c) Visualization
d) Multiple issues
03. When the processor executes multiple 01 01 01 02
instructions at a time it is said to use _______
a) single issue
b) Multiplicity
c) Visualization
d) Multiple issues
04. Which of the following is informal name of 01 02 01 01
address register for the memory operations?
1.Storage register
2. Memory address register
3. Instruction register
4. Microinstruction register
05. Which of the following included in generic 02 05 02 02
vector operation?
1.Arithmetic operation
2. Logical operation
3. Both (1) and (2)
4. None of the above
Section B
01. Explain SIMD, MIMD & SIMT Architecture. 05 01 CO1 2
02. Write a short note on UMA and NUMA 05 01 CO2 2
03. What is a Parallel system? What are the 02 CO1 1
05
sources of overhead in parallel programs?
04. Explain decomposition, Task & Depedancy 02 CO2 2
graph 05

Section A

Q1.The throughput of a super scalar processor is _______


a) less than 1
b) 1
c) More than 1
d) Not Known
Ans:More than 1

Q2.When the processor executes multiple instructions at a time it is said to use _______
a) single issue
b) Multiplicity
c) Visualization
d) Multiple issues
Ans:Multiple issues

Q3.When the processor executes multiple instructions at a time it is said to use _______
a) single issue
b) Multiplicity
c) Visualization
d) Multiple issues
Ans:Multiple issues

Q4.Which of the following is informal name of address register for the memory operations?
1. Storage register
2. Memory address register
3. Instruction register
4. Microinstruction register

Ans:Memory address register

Q5.Which of the following included in generic vector operation?

1. Arithmetic operation
2. Logical operation
3. Both (1) and (2)
4. None of the above

Ans:Both (1) and (2)


Section B
Q1.Explain SIMD, MIMD & SIMT Architecture.

Ans:SISD (Single Instruction, Single Data stream)

Single Instruction, Single Data (SISD) refers to an Instruction Set Architecture in which a single processor (one
CPU) executes exactly one instruction stream at a time and also fetches or stores one item of data at a time to
operate on data stored in a single memory unit. Most of the CPU design, based on the von Neumann
architecture, from the beginning till recent times are based on the SISD. The SISD model is a typical non-
pipelined architecture with the general-purpose registers, as well as dedicated special registers such as the
Program Counter (PC), the Instruction Register (IR), Memory Address Registers (MAR) and Memory Data
Registers (MDR).

SIMD (Single Instruction, Multiple Data streams)

Single Instruction, Multiple Data (SIMD) is an Instruction Set Architecture that have a single control unit (CU)
and more than one processing unit (PU) that operates like a von Neumann machine by executing a single
instruction stream over PUs, handled through the CU. The CU generates the control signals for all of the PUs
and by which executes the same operation on different data streams. The SIMD architecture, in effect, is
capable of achieving data level parallelism just like with vector processor.

Some of the examples of the SIMD based systems include IBM's AltiVec and SPE for PowerPC, HP's PA-RISC
Multimedia Acceleration eXtensions (MAX), Intel's MMX and iwMMXt, SSE, SSE2, SSE3 and SSSE3,
AMD's 3DNow! etc.

MISD (Multiple Instruction, Single Data stream)

Multiple Instruction, Single Data (MISD) is an Instruction Set Architecture for parallel computing where many
functional units perform different operations by executing different intructions on the same data set. This type
of architecture is common mainly in the fault-tolerant computers executing the same instructions redundantly in
order to detect and mask errors.
Q2.Write a short note on UMA and NUMA

Ans:UMA (Uniform Memory Access) system is a shared memory architecture for the multiprocessors. In this
model, a single memory is used and accessed by all the processors present the multiprocessor system with the
help of the interconnection network. Each processor has equal memory accessing time (latency) and access
speed. It can employ either of the single bus, multiple bus or crossbar switch. As it provides balanced shared
memory access, it is also known as SMP (Symmetric multiprocessor) systems.

Q3.What is a Parallel system? What are the sources of overhead in parallel


programs?
Ans:
A parallel system is the combination of an algorithm and the parallel architecture on which it is
implemented. Interprocess Interaction: Any nontrivial parallel system requires its processing elements
to interact and communicate data (e.g., intermediate results). The time spent communicating data
between processing elements is usually the most significant source of parallel processing overhead.
Idling: Processing elements in a parallel system may become idle due to many reasons such as load
imbalance, synchronization, and presence of serial components in a program. Excess Computation:
The difference in computation performed by the parallel program and the best serial program is the
excess computation overhead incurred by the parallel program.

Q4.Explain decomposition, Task & Depedancy graph

Ans:The process of dividing a computation into smaller parts, some or all of which may potentially be
executed in parallel, is called decomposition. Tasks are programmer-defined units of computation
into which the main computation is subdivided by means of decomposition. Simultaneous execution of
multiple tasks is the key to reducing the time required to solve the entire problem. Tasks can be of
arbitrary size, but once defined, they are regarded as indivisible units of computation. The tasks into
which a problem is decomposed may not all be of the same size.

Example Dense matrix-vector multiplication

Consider the multiplication of a dense n x n matrix A with a vector b to yield another vector y.
The ith element y[i] of the product vector is the dot-product of the ith row of A with the input
vector b; i.e., . As shown later in fig, the computation of each y[i] can be
regarded as a task. Alternatively, as shown later in fig, the computation could be decomposed into
fewer, say four, tasks where each task computes roughly n/4 of the entries of the vector y.

Figure:. Decomposition of dense matrix-vector multiplication into n tasks, where n is the number of rows in the matrix.
The portions of the matrix and the input and output vectors accessed by Task 1 are highlighted.

Note that all tasks in figure are independent and can be performed all together or in any sequence.
However, in general, some tasks may use data produced by other tasks and thus may need to wait for
these tasks to finish execution. An abstraction used to express such dependencies among tasks and
their relative order of execution is known as a task-dependency graph. A task-dependency graph is a
directed acyclic graph in which the nodes represent tasks and the directed edges indicate the
dependencies amongst them. The task corresponding to a node can be executed when all tasks
connected to this node by incoming edges have completed. Note that task-dependency graphs can be
disconnected and the edge-set of a task-dependency graph can be empty. This is the case for matrix-
vector multiplication, where each task computes a subset of the entries of the product vector. To see a
more interesting task-dependency graph, consider the following database query processing example.
CLASS TEST- II
(AY 2018-19)
Branch: Computer Engineering (BE) Date:
Semester: V Duration: 1 hour
Subject: High Performance Computing - 410241 Max. Marks: 20M

Note: 1. Attempt all questions in Section A 2. Attempt any 3 questions in Section B


3. All questions are as per course outcomes 4. Assume suitable data wherever is required.

Question Questions Max. Unit no. CO Bloom’s


No. Marks as per Mapped Taxonomy
syllabus Level
Section A
01. CUDA means ------------ 01 04 05 02
a) compute unified Device Architecture
b) computer unified Device Architecture
c) complicated unified Device Algorithm
d) none of these
02. Data decomposition used for 01 04 02 03
a)faster speed of operation
b)Parallel computing
c)Recursive function
d)None of the above
03. Dense Matrix Algorithm 01 05 04 01
a) used in run time performance
b) Two dimensional block
c) None of the mentioned
04. Parallel system used for 01 04 02 02
a)for small problems
b)larger configuration
c) complex system
d) none
05. IPC stands for 01 05 01 03
a) inter process communication
b) Intercommunication
c) In Process handler control
d) none of these
Section B
01. What are different partitioning techniques used in 05 CO3 2
05
matrix vector multiplication.
02. How search overhead factor works ? 05 05 CO2 2

03. What is Green Computing Optical computing 04 04 CO1 1

04. Define Describe Cannon’s Algorithm for Matrix 05 05 CO4 2


multiplication with suitable example.
Section A

Q1.CUDA means ------------


a) compute unified Device Architecture b) computer unified Device Architecture
c) complicated unified Device Algorithm d) none of these

Ans:compute unified Device Architecture

Q2.Data decomposition used for


a)faster speed of operation
b)Parallel computing
c)Recursive function
d)None of the above
Ans:None of the above

Q3.Dense Matrix Algorithm


a) used in run time performance b) Two dimensional block
c) None of the mentioned
Ans: Two dimensional block

Q4.Parallel system used for


a)for small problems
b)larger configuration
c) complex system
d) none

Ans:complex system

Q5.IPC stands for


a) inter process communication
b) Intercommunication
c) In Process handler control
d) none of these

Ans: inter process communication

Section B
Q1.What are different partitioning techniques used in matrix vector multiplication.

Ans:This section addresses the problem of multiplying a dense n x n matrix A with an n x 1 vector
x to yield the n x 1 result vector y. algorithm shows a serial algorithm for this problem. The
sequential algorithm requires n2 multiplications and additions. Assuming that a multiplication and
addition pair takes unit time, the sequential run time is

Equation

At least three distinct parallel formulations of matrix-vector multiplication are possible, depending on
whether rowwise 1-D, columnwise 1-D, or a 2-D partitioning is used.
Algorithm :A serial algorithm for multiplying an n x n matrix A with an n x 1 vector x to yield
an n x 1 product vector y.

1. procedure MAT_VECT ( A, x, y)
2. begin
3. for i := 0 to n - 1 do
4. begin
5. y[i]:=0;
6. for j := 0 to n - 1 do
7. y[i] := y[i] + A[i, j] x x[j];
8. endfor;
9. end MAT_VECT
Rowwise 1-D Partitioning

This section details the parallel algorithm for matrix-vector multiplication using rowwise block 1-D
partitioning. The parallel algorithm for columnwise block 1-D partitioning is similar (Problem 8.2) and
has a similar expression for parallel run time. Figure describes the distribution and movement of data
for matrix-vector multiplication with block 1-D partitioning

2-D Partitioning

This section discusses parallel matrix-vector multiplication for the case in which the matrix is
distributed among the processes using a block 2-D partitioning. fig shows the distribution of the
matrix and the distribution and movement of vectors among the processes.
Q2.How search overhead factor works ?

Ans:Parallel search algorithms incur overhead from several sources. These include communication
overhead, idle time due to load imbalance, and contention for shared data structures. Thus, if both the
sequential and parallel formulations of an algorithm do the same amount of work, the speedup of
parallel search on p processors is less than p. However, the amount of work done by a parallel
formulation is often different from that done by the corresponding sequential formulation because they
may explore different parts of the search space.Let W be the amount of work done by a single
processor, and Wp be the total amount of work done by p processors. The search overhead factor
of the parallel system is defined as the ratio of the work done by the parallel formulation to that done
by the sequential formulation, or Wp/W. Thus, the upper bound on speedup for the parallel system is
given by p x(W/Wp). The actual speedup, however, may be less due to other parallel processing
overhead. In most parallel search algorithms, the search overhead factor is greater than one. However,
in some cases, it may be less than one, leading to superlinear speedup. If the search overhead factor is
less than one on the average, then it indicates that the serial search algorithm is not the fastest
algorithm for solving the problem.To simplify our presentation and analysis, we assume that the time
to expand each node is the same, and W and Wp are the number of nodes expanded by the serial and
the parallel formulations, respectively. If the time for each expansion is tc, then the sequential run
time is given by TS = tcW. In the remainder of the chapter, we assume that tc = 1. Hence, the
problem size W and the serial run time TS become the same.

Q3.What is Green Computing Optical computing


Ans:Green Computing:

The relatively new research field of green computing pursues energy conservation not just as a
commercial advantage, (longer battery life, less weight), but as an environmental goal in itself. Some
of the green computing topics studied at Stanford include long-term trends in energy-efficient
computing, resource management in large multi-core systems, and data center economics and best
practices. Stanford engineers are developing low-power wireless networks, tiny semiconductor lasers
for low-energy data interconnects, nano-sized electromechanical relays for ultra-low power
computation, and an image and signal processor 20 times more power efficient than conventional
processors. They are also working on circuit, architecture and application optimization tools;
nanomaterials for energy-efficient transistors, data storage and integrated circuits; and efficient
networks for homes and offices.

Optical computing:Computers have enhanced human life to a great extent. The speed of conventional
computers is achieved by miniaturizing electronic components to a very small micron-size scale so
that those electrons need to travel only very short distances within a very short time. The goal of
improving on computer speed has resulted in the development of the Very Large Scale Integration
(vlsi) technology with smaller device dimensions and greater complexity. Last year, the smallest-to-
date dimensions of vlsi reached 0.08 ìm by researchers at Lucent Technology. Whereas vlsi
technology has revolutionized the electronics industry and established the 20th century as the
computer age, increasing usage of the Internet demands better accommodation of a 10 to 15 percent
per month growth rate. Additionally, our daily lives demand solutions to increasingly sophisticated
and complex problems, which requires more speed and better performance of computers.

For these reasons, it is unfortunate that vlsi technology is approaching its fundamental limits in the
sub-micron miniaturization process. It is now possible to fit up to 300 million transistors on a single
silicon chip. It is also estimated that the number of transistor switches that can be put onto a chip
doubles every 18 months. Further miniaturization of lithography introduces several problems such as
dielectric breakdown, hot carriers, and short channel effects. All of these factors combine to seriously
degrade device reliability. Even if developing technology succeeded in temporarily overcoming these
physical problems, we will continue to face them as long as increasing demands for higher integration
continues. Therefore, a dramatic solution to the problem is needed, and unless we gear our thoughts
toward a totally different pathway, we will not be able to further improve our computer performance
for the future.

Optical interconnections and optical integrated circuits will provide a way out of these limitations to
computational speed and complexity inherent in conventional electronics. Optical computers will use
photons traveling on optical fibers or thin films instead of electrons to perform the appropriate
functions. In the optical computer of the future, electronic circuits and wires will be replaced by a few
optical fibers and films, making the systems more efficient with no interference, more cost effective,
lighter and more compact. Optical components would not need to have insulators as those needed
between electronic components because they don’t experience cross talk. Indeed, multiple frequencies
(or different colors) of light can travel through optical components without interfacing with each
others, allowing photonic devices to process multiple streams of data simultaneously.

Q4.Define Describe Cannon’s Algorithm for Matrix multiplication with suitable example.

Ans:Cannon's algorithm is a distributed algorithm for matrix multiplication for two-


dimensional meshes. It is especially suitable for computers laid out in an N × N mesh. While Cannon's
algorithm works well in homogeneous 2D grids, extending it to heterogeneous 2D grids has been
shown to be difficult.The main advantage of the algorithm is that its storage requirements remain
constant and are independent of the number of processors.

Cannon’s algorithm

 Consider two n × n matrices A and B partitioned into p blocks.


 A(i , j) and B(i , j) (0 ≤ i,j ≤ √p ) of size (n ∕ √p)×(n ∕ √p) each.
 Process P(i , j) initially stores A(i , j) and B(i , j) computes block C(i , j) of the result matrix.
 The initial step of the algorithm regards the alignment of the matrixes.
 Align the blocks of A and B in such a way that each process can independently start multiplying
its local submatrices.
 This is done by shifting all submatrices A(i , j) to the left (with wraparound) by i steps and all
submatrices B(i , j) up (with wraparound) by j steps.
 Perform local block multiplication.
 Each block of A moves one step left and each block of B moves one step up (again with
wraparound).
 Perform next block multiplication, add to partial result, repeat until all blocks have been
multiplied.
Prelim Exam (AY 2018-19)
Branch:BE Date:09/10/2018
Semester: I Duration: 2:30 hour
Subject:High Performance Computing (2012 Pattern)
Max. Marks: 70
Note: (1) Answer Q. 1 or Q. 2, Q. 3 or Q. 4, Q. 5 or Q. 6, Q. 7 or Q. 8, Q. 9 or Q. 10.
(2) Figures to the right indicate full marks.
(3) Neat diagrams must be drawn wherever necessary.
(4) Assume suitable data, if necessary
Questions Max. CO Bloom's
Marks mapped Taxonomy
Level
Q.1 a. Explain SIMD, MIMD and SIMT architecture 05 CO3 4
b. Explain Basic Principal working principal of Super scalar Processor 05 CO1 1
OR
Q.2 a. State difference between write- invalidate and write update Protocol 05 CO2 1
b. Write S.N on Dataflow Model 05 CO1 2
Q.3 a. Explain how Pthread_Mutex_trylock reduce locking overhead 05 CO1 3
b. explain graph partitioning with suitable example 05 CO3 4
OR
Q.4 a. Write S.N on NVIDIA Telsa GPU 05 CO3 3
b. Describe barrier synchronization for shared address space model 05 CO1 3

Q.5 a. Explain MPI Routline 08 CO3 3


b. Write pseudo code for Parallel Quick Sort 08 CO1 4
OR
Q.6 a. Explain Sorting network with suitable example
b. Write S.N on topologies and Embedding 08 CO1 2
08 CO2 3
Q.7 a. Explain non blocking communication using MPI 4*2 CO2 4
b. Explain Recursive decomposition 08 CO1 2
OR
Q.8 a. Explain parallel depth -first-search 08 CO3 1
b. Share the thought about initiative MAKE IN INDIA 08 CO1 4

Q.9 a. Define term HPC and elaborate its use in Indian society 09 CO2 3
b. What is the Search-Overhead -Factor 09 CO2 3
OR
Q.10 a. Explain Power aware Processing 09 CO3 4
b. Explain Quantum Computer with suitable example 09 CO2 2
Q.1 a. Explain SIMD, MIMD and SIMT architecture
5
Answer:SISD (Single Instruction, Single Data stream)
Single Instruction, Single Data (SISD) refers to an Instruction Set Architecture in which a single
processor (one CPU) executes exactly one instruction stream at a time and also fetches or stores one item
of data at a time to operate on data stored in a single memory unit. Most of the CPU design, based on the
von Neumann architecture, from the beginning till recent times are based on the SISD. The SISD model is
a typical non-pipelined architecture with the general-purpose registers, as well as dedicated special
registers such as the Program Counter (PC), the Instruction Register (IR), Memory Address Registers
(MAR) and Memory Data Registers (MDR).

SIMD (Single Instruction, Multiple Data streams)

Single Instruction, Multiple Data (SIMD) is an Instruction Set Architecture that have a single control unit
(CU) and more than one processing unit (PU) that operates like a von Neumann machine by executing a
single instruction stream over PUs, handled through the CU. The CU generates the control signals for all
of the PUs and by which executes the same operation on different data streams. The SIMD architecture,
in effect, is capable of achieving data level parallelism just like with vector processor.

Some of the examples of the SIMD based systems include IBM's AltiVec and SPE for PowerPC, HP's
PA-RISC Multimedia Acceleration eXtensions (MAX), Intel's MMX and iwMMXt, SSE, SSE2, SSE3
and SSSE3, AMD's 3DNow! etc.

MISD (Multiple Instruction, Single Data stream)

Multiple Instruction, Single Data (MISD) is an Instruction Set Architecture for parallel computing where
many functional units perform different operations by executing different intructions on the same data set.
This type of architecture is common mainly in the fault-tolerant computers executing the same
instructions redundantly in order to detect and mask errors.

b)Explain Basic Principal working principal of Super scalar Processor


5
Answer:Consider a processor with two pipelines and the ability to simultaneously issue two
instructions. These processors are sometimes also referred to as super-pipelined processors. The
ability of a processor to issue multiple instructions in the same cycle is referred to as superscalar
execution. Since the architecture illustrated in the following figure allows two issues per clock
cycle, it is also referred to as two-way superscalar or dual issue execution. Consider the
execution of the first code fragment in the figure for adding four numbers. The first and second
instructions are independent and therefore can be issued concurrently. This is illustrated in the
simultaneous issue of the instructions load R1, @1000 and load R2, @1008 at t = 0. The
instructions are fetched, decoded, and the operands are fetched. The next two instructions, add
R1, @1004 and add R2, @100C are also mutually independent, although they must be executed
after the first two instructions. Consequently, they can be issued concurrently at t = 1 since the
processors are pipelined. These instructions terminate at t = 5. The next two instructions, add R1,
R2 and store R1, @2000 cannot be executed concurrently since the result of the former (contents
of register R1) is used by the latter. Therefore, only the add instruction is issued at t = 2 and the
store instruction at t = 3. Note that the instruction add R1, R2 can be executed only after the
previous two instructions have been executed.

Q.2 a. State difference between write- invalidate and write update Protocol
Answer:The performance differences between write update and write invalidate protocols arise from
three characteristics: 1 Multiple writes to the same word with no intervening reads require multiple write
broadcasts in an update protocol, but only one initial invalidation in a write invalidate protocol. 2 With
multiword cache blocks, each word written in a cache block requires a write broadcast in an update
protocol, although only the first write to any word in the block needs to generate an invalidate in an
invalidation protocol. An invalidation protocol works on cache blocks, while an update protocol must
work on individual words (or bytes, when bytes are written). It is possible to try to merge writes in a write
broadcast scheme. 3 The delay between writing a word in one processor and reading the written value in
another processor is usually less in a write update scheme, since the written data are immediately updated
in the reader’s cache.

b. Write S.N on Dataflow Model


Answer:i) The dataflow model of computation is based on the graphical representation of
programs in which operations are represented by nodes and arcs are used to represent
dependencies.
ii) The parallel processing activities under dataflow model of computing provides various
properties to solve the computing problems in efficient ways. Some of the properties associated
with dataflow computing model are:
Asynchronous Executions:
The dataflow model of computations on asynchronous by nature. This means an instruction can
only execute if all its required operands are available. This provides the nature of implict
synchronization for parallel activities in dataflow computing model.
No Sequencing.
In dataflow model of computing instructions are not necessary to execute in any sequential
fashion. This dataflow model only based on data dependencies in the program and other than this
no sequencing is expected.
Dataflow Representation.
In dataflow model of computing no sequencing required and it makes the possibility of dataflow
representation of a program. The dataflow representation of a program provides the use of all
forms of parallel program execution without the assistance of any explict tools of parallel
executions.
Q.3 a. Explain how Pthread_Mutex_trylock reduce locking overhead
Answer:Syntax:

#include <pthread.h>

int pthread_mutex_trylock(pthread_mutex_t *mutex);

Service Program Name: QP0WPTHR


Default Public Authority: *USE
Threadsafe: Yes
Signal Safe: Yes

The pthread_mutex_trylock() function attempts to acquire ownership of the mutex specified


without blocking the calling thread. If the mutex is currently locked by another thread, the call to
pthread_mutex_trylock() returns an error of EBUSY.

A failure of EDEADLK indicates that the mutex is already held by the calling thread.

Mutex initialization using the PTHREAD_MUTEX_INITIALIZER does not immediately


initialize the mutex. Instead, on first use, pthread_mutex_timedlock_np() or
pthread_mutex_lock() or pthread_mutex_trylock() branches into a slow path and causes the
initialization of the mutex. Because a mutex is not just a simple memory object and requires that
some resources be allocated by the system, an attempt to call pthread_mutex_destroy() or
pthread_mutex_unlock() on a mutex that was statically initialized using
PTHREAD_MUTEX_INITIALIZER and was not yet locked causes an EINVAL error.

The maximum number of recursive locks by the owning thread is 32,767. When this number is
exceeded, attempts to lock the mutex return the ERECURSE error.
Basically, the producer produces goods while the consumer consumes the goods and typically
does something with them.

In our case our producer will produce an item and place it in a bound-buffer for the consumer.
Then the consumer will remove the item from the buffer and print it to the screen.

A buffer is a container of sorts; a bound-buffer is a container with a limit. We have to be very


careful in our case that we don’t over fill the buffer or remove something that isn’t there; in c this
will produce a segmentation fault.

What: A synchronization tool used in concurrent programming

Where: We will use semaphores in any place where we may have concurrency issues. In other
words any place where we feel more than one thread will access the data or structure at any
given time.

When: To help solve concurrency issues when programming with threads.

Why: Think about how registers work in the operating system for a second. Here is an example
of how registers work when you increment a counter-

register1 = counter;

register1 = register1 + 1;

counter = register1;

Now image two threads manipulating this same example but one thread is decrementing–

(T1) register1 = counter; [register1 = 5]

(T1) register1 = register1 + 1;[register1 = 6]

(T2) register2 = counter; [register2 = 5]

(T2) register2 = register2 – 1; [register2 = 4]

(T1) counter = register1; [counter = 6]

(T2) counter = register2;[counter = 4]

Because both threads were allowed to run without synchronization our counter now has a
definitely wrong value. With synchronization the answer should come out to be 5 like it started.

How: We implement a semaphore as an integer value that is only accessible through two atomic
operations wait() and signal(). Defined as follows:

/* The wait operation */

wait(S) {

while(S <= 0); //no-operation


S–;

/* The signal operation */

signal(S) {

S++;

S: Semaphore

The operation wait() tells the system that we are about to enter a critical section and signal()
notifies that we have left the critical section and it is now accessible to other threads.

Therefore:

wait(mutex);

//critical section where the work is done

signal(mutex)

//continue on with life

Mutex stands for mutual exclusion. Meaning only one process may execute the section at a time.

We have an example that demonstrates how semaphores are used in reference to pthreads
coming up right after this problem walk-through.

Basically, we are going to have a program that creates an N number of producer and consumer
threads. The job of the producer will be to generate a random number and place it in a bound-
buffer. The role of the consumer will be to remove items from the bound-buffer and print them to
the screen. Remember the big issue here is concurrency so we will be using semaphores to help
prevent any issues that might occur. To double our efforts we will also be using a pthread mutex
lock to further guarantee synchronization.

The user will pass in three arguments to start to application: <INT, time for the main method to
sleep before termination> <INT, Number of producer threads> <INT, number of consumer
threads>

We will then use a function in initialize the data, semaphores, mutex lock, and pthread attributes.

Create the producer threads.

Create the consumer threads.

Put main() to sleep().

Exit the program


b. explain graph partitioning with suitable example
Answer:

Q.4 a. Write S.N on NVIDIA Telsa GPU

Answer:Powering the world’s leading Supercomputers, Microway designs customized GPU


clusters, servers, and WhisperStations based on NVIDIA Tesla and NVIDIA Quadro® GPUs.
We have been selected as the vendor of choice for a number of NVIDIA GPU Research Centers,
including Carnegie Mellon University, Harvard, Johns Hopkins and Massachusetts General
Hospital.

Unique features available in the latest NVIDIA GPUs include:

 High-speed, on-die GPU memory

 NVLink interconnect speeds up data transfers up to 10X over PCI-Express

 Unified Memory allows applications to directly access the memory of all GPUs and all of
system memory

 Direct CPU-to-GPU NVLink connectivity on OpenPOWER systems supports NVLink


transfers between the CPUs and GPUs

 ECC memory error protection – meets a critical requirement for computing accuracy and
reliability in data centers and supercomputing centers.

 System monitoring features – integrate the GPU subsystem with the host system’s
monitoring and management capabilities such as IPMI. IT staff can manage the GPU
processors in the computing system with widely-used cluster/grid management tools.

b. Describe barrier synchronization for shared address space model

Answer:i) In MPI programming, point to point communication is handled using message


passing whereas global synchronization is done using collective communications.

ii) Important point while using communication is synchronization point among processes.

iii) MPI provides special function which is designed and implemented for synchronization
named as MPI_Barrier().

iv) The working of this function is such a way that no process a allowed to cross a barrier
untill all the processes have reached up to that barrier in their respective codes.

v) Syntax: int MPI_Barrier(MPI_Communication)

vi) The argument passed to this function is the communicator. A group of processes need to
be synchronized are defined in communicator. Calling process blocks until all the processes
in the given communicator have called it. This means the call only returns when all processes
have entered the call.
vii) The MPI_Barrier() function is invoked by process 0.

viii) When Process 0 reaches barrier, it stops and wait for remaining process to reach to the
barrier point.

ix) After every process reaches barrier point, the execution continues. Like this
synchronization is achieved using Barrier.

Q.5 a. Explain MPI Routline


Answer:

b. Write pseudo code for Parallel Quick Sort


Answer: procedure QUICKSORT (A, q, r)
2. begin
3. if q < r then
4. begin
5. x := A[q];
6. s := q;
7. for i := q + 1 to r do
8. if A[i] x then
9. begin
10. s := s + 1;
11. swap(A[s], A[i]);
12. end if
13. swap(A[q], A[s]);
14. QUICKSORT (A, q, s);
15. QUICKSORT (A, s + 1, r);
16. end if
17. end QUICKSORT

Q.6 a. Explain Sorting network with suitable example


Answer:i) In general, Sorting network is made up od series of columns. Each column contains
number of comparators, that are connected in parallel.

ii) The number of columns present in the network is called as depth of the network.

iii) Comparator plays an important role in the network. It is a device which takes two inputs a
and b, and generates two outputs a’ and b’.

iv) There are two types of comparators: increasing and decreasing.


v) In an increasing comparator a’ = min(a,b) , b’ = max(a,b).

vi) In a decreasing comparator a’ = max(a,b), b’ = max(a,b).

vii) Each column performs a permutation and the sorted output is taken from the last column.

Fig: Schematic diagram of a sorting network

b. Write S.N on topologies and Embedding

Answer:MPI views the processes as being arranged in a one-dimensional topology and uses a
linear ordering to number the processes. However, in many parallel programs, processes are
naturally arranged in higher-dimensional topologies (e.g., two- or three-dimensional). In such
programs, both the computation and the set of interacting processes are naturally identified by
their coordinates in that topology. For example, in a parallel program in which the processes are
arranged in a two-dimensional topology, process (i, j) may need to send message to (or receive
message from) process (k, l). To implement these programs in MPI, we need to map each MPI
process to a process in that higher-dimensional topology.

Many such mappings are possible. Figure 6.5 illustrates some possible mappings of eight MPI
processes onto a 4 x 4 two-dimensional topology. For example, for the mapping shown in Figure
6.5(a), an MPI process with rank rank corresponds to process (row, col) in the grid such that
row = rank/4 and col = rank%4 (where '%' is C's module operator). As an illustration, the
process with rank 7 is mapped to process (1, 3) in the grid.

Figure 6.5. Different ways to map a set of processes to a two-dimensional grid. (a) and (b) show
a row- and column-wise mapping of these processes, (c) shows a mapping that follows a space-
filling curve (dotted line), and (d) shows a mapping in which neighboring processes are directly
connected in a hypercube.

n general, the goodness of a mapping is determined by the pattern of interaction among the
processes in the higher-dimensional topology, the connectivity of physical processors, and the
mapping of MPI processes to physical processors. For example, consider a program that uses a
two-dimensional topology and each process needs to communicate with its neighboring
processes along the x and y directions of this topology. Now, if the processors of the
underlying parallel system are connected using a hypercube interconnection network, then the
mapping shown in Figure 6.5(d) is better, since neighboring processes in the grid are also
neighboring processors in the hypercube topology.

However, the mechanism used by MPI to assign ranks to the processes in a communication
domain does not use any information about the interconnection network, making it impossible to
perform topology embeddings in an intelligent manner. Furthermore, even if we had that
information, we will need to specify different mappings for different interconnection networks,
diminishing the architecture independent advantages of MPI. A better approach is to let the
library itself compute the most appropriate embedding of a given topology to the processors of
the underlying parallel computer. This is exactly the approach facilitated by MPI. MPI provides a
set of routines that allows the programmer to arrange the processes in different topologies
without having to explicitly specify how these processes are mapped onto the processors. It is up
to the MPI library to find the most appropriate mapping that reduces the cost of sending and
receiving messages.

Q.7 a. Explain non blocking communication using MPI


Answer:One can improve performance on many systems by overlapping communication and
computation. This is especially true on systems where communication can be executed
autonomously by an intelligent communication controller. Light-weight threads are one
mechanism for achieving such overlap. An alternative mechanism that often leads to better
performance is to use nonblocking communication. A nonblocking send start call initiates the
send operation, but does not complete it. The send start call will return before the message was
copied out of the send buffer. A separate send complete call is needed to complete the
communication, i.e., to verify that the data has been copied out of the send buffer. With suitable
hardware, the transfer of data out of the sender memory may proceed concurrently with
computations done at the sender after the send was initiated and before it completed. Similarly, a
nonblocking receive start call initiates the receive operation, but does not complete it. The call
will return before a message is stored into the receive buffer. A separate receive complete call is
needed to complete the receive operation and verify that the data has been received into the
receive buffer. With suitable hardware, the transfer of data into the receiver memory may
proceed concurrently with computations done after the receive was initiated and before it
completed. The use of nonblocking receives may also avoid system buffering and memory-to-
memory copying, as information is provided early on the location of the receive buffer.
Nonblocking send start calls can use the same four modes as blocking sends: standard, buffered,
synchronous and ready. These carry the same meaning. Sends of all modes, ready excepted, can
be started whether a matching receive has been posted or not; a nonblocking ready send can be
started only if a matching receive is posted. In all cases, the send start call is local: it returns
immediately, irrespective of the status of other processes. If the call causes some system resource
to be exhausted, then it will fail and return an error code. Quality implementations of MPI should
ensure that this happens only in ``pathological'' cases. That is, an MPI implementation should be
able to support a large number of pending nonblocking operations.
The send-complete call returns when data has been copied out of the send buffer. It may carry
additional meaning, depending on the send mode.
If the send mode is synchronous, then the send can complete only if a matching receive has
started. That is, a receive has been posted, and has been matched with the send. In this case, the
send-complete call is non-local. Note that a synchronous, nonblocking send may complete, if
matched by a nonblocking receive, before the receive complete call occurs. (It can complete as
soon as the sender ``knows'' the transfer will complete, but before the receiver ``knows'' the
transfer will complete.)
If the send mode is buffered then the message must be buffered if there is no pending receive. In
this case, the send-complete call is local, and must succeed irrespective of the status of a
matching receive.
If the send mode is standard then the send-complete call may return before a matching receive
occurred, if the message is buffered. On the other hand, the send-complete may not complete
until a matching receive occurred, and the message was copied into the receive buffer.
Nonblocking sends can be matched with blocking receives, and vice-versa.
b. Explain Recursive decomposition
Answer:
Recursive decomposition is a method for inducing concurrency in problems that can be solved
using the divide-and-conquer strategy. In this technique, a problem is solved by first dividing it
into a set of independent subproblems. Each one of these subproblems is solved by recursively
applying a similar division into smaller subproblems followed by a combination of their results.
The divide-and-conquer strategy results in natural concurrency, as different subproblems can be
solved concurrently.

Example Quicksort

Consider the problem of sorting a sequence A of n elements using the commonly used
quicksort algorithm. Quicksort is a divide and conquer algorithm that starts by selecting a pivot
element x and then partitions the sequence A into two subsequences A0 and A1 such that all
the elements in A0 are smaller than x and all the elements in A1 are greater than or equal to
x. This partitioning step forms the divide step of the algorithm. Each one of the subsequences
A0 and A1 is sorted by recursively calling quicksort. Each one of these recursive calls further
partitions the sequences. This is illustrated in fig for a sequence of 12 numbers. The recursion
terminates when each subsequence contains only a single element.

Figure:The quicksort task-dependency graph based on recursive decomposition for sorting a


sequence of 12 numbers.

In fig, we define a task as the work of partitioning a given subsequence. Therefore, fig also
represents the task graph for the problem. Initially, there is only one sequence (i.e., the root of
the tree), and we can use only a single process to partition it. The completion of the root task
results in two subsequences (A0 and A1, corresponding to the two nodes at the first level of the
tree) and each one can be partitioned in parallel. Similarly, the concurrency continues to increase
as we move down the tree.

Sometimes, it is possible to restructure a computation to make it amenable to recursive


decomposition even if the commonly used algorithm for the problem is not based on the divide-
and-conquer strategy. For example, consider the problem of finding the minimum element in an
unordered sequence A of n elements. The serial algorithm for solving this problem scans the
entire sequence A, recording at each step the minimum element found so far as illustrated in
algorithm. It is easy to see that this serial algorithm exhibits no concurrency.
Q.8 a. Explain parallel depth -first-search

Answer:i) To solve a discrete optimization problem, depth first search is used if it can be
formulated as tree search problem. Depth-first search can be performed in parallel by
partitioning the search space into many small, disjunct parts (subtrees) that can be explored
concurrently. DFS starts with initial node by generating its successors.

ii) If any node has no successors, then it indicates that there is no solution in that path. Thus
backtracking is done and continued to expand another node. Following figure gives the DFS
expansion of the 8-puzzle.

iii) The initial configuration is given in (A) .There are only two possible moves,Blank up or
blank right. Thus from (A) two children or successors are generated. Those are (B) and (C).

iv) This is done in step 1. In step 2, any one of (B) and (C) is selected. If (B) is selected then
its successors (D), (E) and (F) are generated. If (C) is selected then its successors (G), (H)
and (I) are generated.

v) Assuming (B) is selected, and then in the next step (D) is selected. It is clear that (D) is
same as (A), thus backtracking is necessary. This process is repeated until the required result
is found.

b. Share the thought about initiative MAKE IN INDIA

Answer: Make in India is an initiative launched by the Government of India to encourage multi
national as well as national companies to manufacture their products in India. It was launched by
Prime Minister Narendra Modi on 25 september 2014. After the initiation of the programme in
2015, India emerged as the top destination for foreign direct investment.

ii) Make in India focuses on the following sectors of the economy:


Automobiles

Biotechnology

Construction

Chemicals

Electronic System

Aviation

Mining

Railways

Roads and Highways.

Oil and Gas.

iii) The Mission Make in India also includes development of highly Professional High
Performance Computing (HPC) aware human resource for meeting challenges of development of
these applications. As far as HPC are considered, the construction of super computer is a big
achievement. Till now India has developed many supercomputers, among them, 8 computers are
in the list of world’s best 500 supercomputers.
Q.9 a. Define term HPC and elaborate its use in Indian society

b. What is the Search-Overhead -Factor


Answer:Parallel search algorithms incur overhead from several sources. These include
communication overhead, idle time due to load imbalance, and contention for shared data
structures. Thus, if both the sequential and parallel formulations of an algorithm do the same
amount of work, the speedup of parallel search on p processors is less than p. However, the
amount of work done by a parallel formulation is often different from that done by the
corresponding sequential formulation because they may explore different parts of the search
space.

Let W be the amount of work done by a single processor, and Wp be the total amount of work
done by p processors. The search overhead factor of the parallel system is defined as the ratio of
the work done by the parallel formulation to that done by the sequential formulation, or Wp/W.
Thus, the upper bound on speedup for the parallel system is given by p x(W/Wp). The actual
speedup, however, may be less due to other parallel processing overhead. In most parallel search
algorithms, the search overhead factor is greater than one. However, in some cases, it may be
less than one, leading to superlinear speedup. If the search overhead factor is less than one on the
average, then it indicates that the serial search algorithm is not the fastest algorithm for solving
the problem.

To simplify our presentation and analysis, we assume that the time to expand each node is the
same, and W and Wp are the number of nodes expanded by the serial and the parallel
formulations, respectively. If the time for each expansion is tc, then the sequential run time is
given by TS = tcW. In the remainder of the chapter, we assume that tc = 1. Hence, the problem
size W and the serial run time TS become the same.

Q.10 a. Explain Power aware Processing

Answer:i) In high-performance systems, power-aware design techniques aim to maximize


performance under power dissipation and power consumption constraints. Along with thsi, the
low-power design techniques try to reduce power or energy consumption in portable equipment
to meet a desired performance. All the power a system consumes eventually dissipates and
transforms into heat.

b. Explain Quantum Computer with suitable example

Answer:Quantum is the minimum amount of any physical entity which is involved in any
communication.
ii) The computer which is designed by using the principles of quantum physics is called quantum
computer.
iii) A quantum computer stores the information using special types of bits called quantum bit
represented as |0> and | 1 >.
iv) This increases the flexibility of the computations. It performs the calculations based on the
laws of quantum physics.
v) The quantum bits are implemented using the two energy levels of an atom. An excited state
represents | 1> and a ground state represents | 0>.
vi) Quantum gates are used to perform operations on the data. They are very similar to the
traditional logical gates.
vii) Since the quantum gates are reversible, we can generate the original input from the obtained
output as well.
viii) A quantum computer has the power of atoms to perform any operation. It has a capability of
processing millions of operations parallely.
High Performance Computing BE Computer Engineering

BE/Insem./Oct.-583
B. E. (Computer Engineering)
HIGH PERFORMANCE COMPUTING
(2015 Pattern) (Semester – I)
[Time : 1 Hour] Max Marks : 30

HPC MODEL ANSWER NOV – 2018-19

Q 1.a ) Explain with suitable diagram SIMD, MIMD Architecture. [4]

Ans :-

.
High Performance Computing BE Computer Engineering

b ) Explain the impact of Memory Latency & Memory Bandwidth on system performance. [6]

Ans:

.
High Performance Computing BE Computer Engineering

OR

Q 2.a ) Describe UMA & NUMA with diagram. [6]


Ans :-

.
High Performance Computing BE Computer Engineering

.
High Performance Computing BE Computer Engineering

b ) Describe the scope of parallel computing. Give application of parallel computing. [4]

Ans :-

.
High Performance Computing BE Computer Engineering

Q 3.a ) Explain any three data decomposition techniques with example [6]

Ans :-

.
High Performance Computing BE Computer Engineering

.
High Performance Computing BE Computer Engineering

.
High Performance Computing BE Computer Engineering

b ) Give characteristics of tasks. [4]


Ans :-

OR

.
High Performance Computing BE Computer Engineering

Q 4. a) Give Characteristics of GPUs and any 2 applications of GPU Processing. [4]

Ans :-

Characteristics of GPUs :

A Graphics Processing Unit (GPU) is a single-chip processor primarily used to manage and boost the
performance of video and graphics. GPU features include:

 2-D or 3-D graphics


 Digital output to flat panel display monitors
 Texture mapping
 Application support for high-intensity graphics software such as AutoCAD
 Rendering polygons
 Support for YUV color space
 Hardware overlays
 MPEG decoding

These features are designed to lessen the work of the CPU and produce faster video and graphics.
A GPU is not only used in a PC on a video card or motherboard; it is also used in mobile phones,
display adapters, workstations and game consoles.
This term is also known as a visual processing unit (VPU).

Application of GPUs :

 Bioinformatics
 Computational Finance
 Computational Fluid Dynamics
 Data Science, Analytics, and Databases
 Defense and Intelligence
 Electronic Design Automation
 Imaging and Computer Visions
 Machine Learning
 Materials Science
 Media and Entertainment
 Medical Imaging
 Molecular Dynamics
 Numerical Analytics
 Physics
 Quantum Chemistry
 Oil and Gas/Seismic
 Structural Mechanics
 Visualization and Docking
 Weather and Climate

b ) Explain any three parallel algorithm models with suitable example. [6]

.
High Performance Computing BE Computer Engineering

Ans :

1.Data Parallel Model:


In data parallel model, tasks are assigned to processes and each task performs similar types of
operations on different data. Data parallelism is a consequence of single operations that is being
applied on multiple data items.

Data-parallel model can be applied on shared-address spaces and message-passing paradigms. In data-
parallel model, interaction overheads can be reduced by selecting a locality preserving decomposition,
by using optimized collective interaction routines, or by overlapping computation and interaction.

The primary characteristic of data-parallel model problems is that the intensity of data parallelism
increases with the size of the problem, which in turn makes it possible to use more processes to solve
larger problems.

Example − Dense matrix multiplication.

2.Task Graph Model :


In the task graph model, parallelism is expressed by a task graph. A task graph can be either trivial or
nontrivial. In this model, the correlation among the tasks are utilized to promote locality or to
minimize interaction costs. This model is enforced to solve problems in which the quantity of data
associated with the tasks is huge compared to the number of computation associated with them. The
tasks are assigned to help improve the cost of data movement among the tasks.

Examples − Parallel quick sort, sparse matrix factorization, and parallel algorithms derived via divide-
and-conquer approach.

.
High Performance Computing BE Computer Engineering

Here, problems are divided into atomic tasks and implemented as a graph. Each task is an independent
unit of job that has dependencies on one or more antecedent task. After the completion of a task, the
output of an antecedent task is passed to the dependent task. A task with antecedent task starts
execution only when its entire antecedent task is completed. The final output of the graph is received
when the last dependent task is completed (Task 6 in the above figure).
3.Master-Slave Model:
In the master-slave model, one or more master processes generate task and allocate it to slave
processes. The tasks may be allocated beforehand if −

 the master can estimate the volume of the tasks, or


 a random assigning can do a satisfactory job of balancing load, or
 slaves are assigned smaller pieces of task at different times.
This model is generally equally suitable to shared-address-space or message-passing
paradigms, since the interaction is naturally two ways.

In some cases, a task may need to be completed in phases, and the task in each phase must be
completed before the task in the next phases can be generated. The master-slave model can be
generalized to hierarchical or multi-level master-slave model in which the top level master feeds the
large portion of tasks to the second-level master, who further subdivides the tasks among its own
slaves and may perform a part of the task itself.

.
High Performance Computing BE Computer Engineering

Q 5. a) Explain Broadcast and Reduction example for multiplying matrix with a vector. [6]

Ans :-

.
High Performance Computing BE Computer Engineering

b ) Explain Scatter and Gather. [4]

Ans :

In the scatter operation, a single node sends a unique message of size m to every other node. This
operation is also known as one-to-all personalized communication. One-to-all personalized
communication is different from one-to-all broadcast in that the source node starts with p unique
messages, one destined for each node. Unlike one-to-all broadcast, one-to-all personalized
communication does not involve any duplication of data. The dual of one-to-all personalized
communication or the scatter operation is the gather operation, or concatenation, in which a single
node collects a unique message from each node. A gather operation is different from an all-to-one
reduce operation in that it does not involve any combination or reduction of data. illustrates the scatter
and gather operations.

Scatter and gather operations.

Although the scatter operation is semantically different from one-to-all broadcast, the scatter algorithm
is quite similar to that of the broadcast shows the communication steps for the scatter operation on an
eight-node hypercube. The communication patterns of one-to-all broadcast and scatter are identical.
Only the size and the contents of messages are different. In , the source node (node 0) contains all the
messages. The messages are identified by the labels of their destination nodes. In the first
communication step, the source transfers half of the messages to one of its neighbors. In subsequent
steps, each node that has some data transfers half of it to a neighbor that has yet to receive any data.
There is a total of log p communication steps corresponding to the log p dimensions of the hypercube.

The scatter operation on an eight-node hypercube.

.
High Performance Computing BE Computer Engineering

The gather operation is simply the reverse of scatter. Each node starts with an m word message. In the
first step, every odd numbered node sends its buffer to an even numbered neighbor behind it, which
concatenates the received message with its own buffer. Only the even numbered nodes participate in
the next communication step which results in nodes with multiples of four labels gathering more data
and doubling the sizes of their data. The process continues similarly, until node 0 has gathered the
entire data

OR

Q 6.a ) Compare the one-to-all broadcast operation on Ring, Mesh and Hypercube topologies . [6]

Ans :

Sr. Ring Mesh. Hypercube


No

1. It is 1- Dimensional It is 2-Dimensional It is 3 – Dimensional

2.

3. O(n) O(n)2 O log n

4. It performs as liner array It performs as 2-d array It performs as 3-d array

5. Easy to use Moderate to use Difficult to use

6 Fault identification is easy Difficult as compare to ring difficult

.
High Performance Computing BE Computer Engineering

b ) Explain the prefix-sum operation for an eight-node hypercube. [4]

Ans :

**************** THE END ****************

.
B. E. (Computer Engineering)
HIGH PERFORMANCE COMPUTING (2015 Pattern) (Semester - I) (410241)
Time : 2½ Hours] [Max. Marks : 70
Instructions to the candidates:
1)Answer Q.1 or Q.2, Q.3 or Q.4, Q.5 or Q.6, Q.7 or Q.8.
2)Neat diagrams must be drawn wherever necessary.
3)Figures to the right indicate full marks.
4)Assume suitable data if necessary.

Q1) a) State and explain basic working principle of Super Scalar Processors.[6]
b)Explain basic working of VLIW Processor. [6]
c)Elaborate four subclasses of the Parallel Random Access Machine (PRAM). [8]
Ans:
Parallel Random Access Machines (PRAM) is a model, which is considered for most
of the parallel algorithms. Here, multiple processors are attached to a single block of
memory. A PRAM model contains −
 A set of similar type of processors.

 All the processors share a common memory unit. Processors can communicate
among themselves through the shared memory only.

 A memory access unit (MAU) connects the processors with the single shared
memory.
Here, n number of processors can perform independent operations on nnumber of
data in a particular unit of time. This may result in simultaneous access of same
memory location by different processors.
To solve this problem, the following constraints have been enforced on PRAM
model −
 Exclusive Read Exclusive Write (EREW) − Here no two processors are
allowed to read from or write to the same memory location at the same time.
 Exclusive Read Concurrent Write (ERCW) − Here no two processors are
allowed to read from the same memory location at the same time, but are allowed
to write to the same memory location at the same time.
 Concurrent Read Exclusive Write (CREW) − Here all the processors are
allowed to read from the same memory location at the same time, but are not
allowed to write to the same memory location at the same time.
 Concurrent Read Concurrent Write (CRCW) − All the processors are
allowed to read from or write to the same memory location at the same time.
There are many methods to implement the PRAM model, but the most prominent
ones are −
 Shared memory model
 Message passing model
 Data parallel model

OR
Q2) a) Differentiate Static and Dynamic mapping techniques for load balancing.[6]

Answer: Once a computation has been decomposed into tasks, these tasks are mapped
onto processes with the objective that all tasks complete in the shortest amount of
elapsed time. In order to achieve a small execution time, the overheads of executing
the tasks in parallel must be minimized. A good mapping of tasks onto processes must
strive to achieve the twin objectives of (1) reducing the amount of time processes
spend in interacting with each other, and (2) reducing the total amount of time some
processes are idle while the others are engaged in performing some tasks. Mapping
techniques used in parallel algorithms can be broadly classified into two categories:
static and dynamic. The parallel programming paradigm and the characteristics of
tasks and the interactions among them determine whether a static or a dynamic
mapping is more suitable.
• Static Mapping: Static mapping techniques distribute the tasks among processes
prior to the execution of the algorithm. For statically generated tasks, either static or
dynamic mapping can be used. The choice of a good mapping in this case depends on
several factors, including the knowledge of task sizes, the size of data associated with
tasks, the characteristics of inter-task interactions, and even the parallel programming
paradigm. Even when task sizes are known, in general, the problem of obtaining an
optimal mapping is an NP-complete problem for nonuniform tasks. However, for
many practical cases, relatively inexpensive heuristics provide fairly acceptable
approximate solutions to the optimal static mapping problem. Algorithms that make
use of static mapping are in general easier to design and program.
• Dynamic Mapping: Dynamic mapping techniques distribute the work among
processes during the execution of the algorithm. If tasks are generated dynamically,
then they must be mapped dynamically too. If task sizes are unknown, then a static
mapping can potentially lead to serious load-imbalances and dynamic mappings are
usually more effective. If the amount of data associated with tasks is large relative to
the computation, then a dynamic mapping may entail moving this data among
processes. The cost of this data movement may outweigh some other advantages of
dynamic mapping and may render a static mapping more suitable. However, in a
shared-address-space paradigm, dynamic mapping may work well even with large
data associated with tasks if the interaction is read-only. The reader should be aware
that the shared-addressspace programming paradigm does not automatically provide
immunity against data-movement costs.

b)Write a short note on All-to-one reduction with suitable example. [6]


Answer: Parallel algorithms often require a single process to send identical data to all
other processes or to a subset of them. This operation is known as one-to-all
broadcast. Initially, only the source process has the data of size m that needs to be
broadcast. At the termination of the procedure, there are p copies of the initial data –
one belonging to each process. The dual of one-to-all broadcast is all-to-one
reduction. In an all-to-one reduction operation, each of the p participating processes
starts with a buffer M containing m words. The data from all processes are combined
through an associative operator and accumulated at a single destination process into
one buffer of size m.

c)Explain any four methods for containing interaction overheads. [8]


Ans:Maximizing Data Locality
In most nontrivial parallel programs, the tasks executed by different processes require
access to some common data. For example, in sparse matrix-vector multiplication y
= Ab, in which tasks correspond to computing individual elements of vector y all
elements of the input vector b need to be accessed by multiple tasks. In addition to
sharing the original input data, interaction may result if processes require data
generated by other processes. The interaction overheads can be reduced by using
techniques that promote the use of local data or data that have been recently fetched.
Data locality enhancing techniques encompass a wide range of schemes that try to
minimize the volume of nonlocal data that are accessed, maximize the reuse of
recently accessed data, and minimize the frequency of accesses. In many cases, these
schemes are similar in nature to the data reuse optimizations often performed in
modern cache based microprocessors.
Minimize Volume of Data-Exchange A fundamental technique for reducing the
interaction overhead is to minimize the overall volume of shared data that needs to be
accessed by concurrent processes. This is akin to maximizing the temporal data
locality, i.e., making as many of the consecutive references to the same data as
possible. Clearly, performing as much of the computation as possible using locally
available data obviates the need for bringing in more data into local memory or cache
for a process to perform its tasks. As discussed previously, one way of achieving this
is by using appropriate decomposition and mapping schemes. For example, in the case
of matrix multiplication, we saw that by using a two-dimensional mapping of the
computations to the processes we were able to reduce the amount of shared data (i.e.,
matrices A and B) that needs to be accessed by each task to as opposed
to n2/p + n2 required by a one-dimensional mapping. In general, using higher
dimensional distribution often helps in reducing the volume of nonlocal data that
needs to be accessed.
Another way of decreasing the amount of shared data that are accessed by multiple
processes is to use local data to store intermediate results, and perform the shared data
access to only place the final results of the computation. For example, consider
computing the dot product of two vectors of length n in parallel such that each of
the p tasks multiplies n/p pairs of elements. Rather than adding each individual
product of a pair of numbers to the final result, each task can first create a partial dot
product of its assigned portion of the vectors of length n/p in its own local location,
and only access the final shared location once to add this partial result. This will
reduce the number of accesses to the shared location where the result is stored to p
from n.
Minimize Frequency of Interactions Minimizing interaction frequency is important
in reducing the interaction overheads in parallel programs because there is a relatively
high startup cost associated with each interaction on many architectures. Interaction
frequency can be reduced by restructuring the algorithm such that shared data are
accessed and used in large pieces. Thus, by amortizing the startup cost over large
accesses, we can reduce the overall interaction overhead, even if such restructuring
does not necessarily reduce the overall volume of shared data that need to be
accessed. This is akin to increasing the spatial locality of data access, i.e., ensuring the
proximity of consecutively accessed data locations. On a shared-address-space
architecture, each time a word is accessed, an entire cache line containing many
words is fetched. If the program is structured to have spatial locality, then fewer cache
lines are accessed. On a message-passing system, spatial locality leads to fewer
message-transfers over the network because each message can transfer larger amounts
of useful data. The number of messages can sometimes be reduced further on a
message-passing system by combining messages between the same source-destination
pair into larger messages if the interaction pattern permits and if the data for multiple
messages are available at the same time, albeit in separate data structures.
Sparse matrix-vector multiplication is a problem whose parallel formulation can use
this technique to reduce interaction overhead. In typical applications, repeated sparse
matrix-vector multiplication is performed with matrices of the same nonzero pattern
but different numerical nonzero values. While solving this problem in parallel, a
process interacts with others to access elements of the input vector that it may need
for its local computation. Through a one-time scanning of the nonzero pattern of the
rows of the sparse matrix that a process is responsible for, it can determine exactly
which elements of the input vector it needs and from which processes. Then, before
starting each multiplication, a process can first collect all the nonlocal entries of the
input vector that it requires, and then perform an interaction-free multiplication. This
strategy is far superior than trying to access a nonlocal element of the input vector as
and when required in the computation.

Q3) a) Explain Parallel Matrix-Vector Multiplication algorithm with example. [8]


Ans:Matrix-Vector Multiplication
Multiplying a square matrix by a vector
Sequential algorithm
• Simply a series of dot products
Input: Matrix mat[m][n]
Vector vec[n]
Output: out[m]
for ( i = 0; i < m; i++ )
{ out[i] = 0;
for ( j = 0; j < n; j++ )
out[i] += mat[i][j] * vec[j];
}
– Inner loop requires n multiplications and n − 1 additions
– Complexity of inner loop is Θ(n)
– There are a total of m dot products
– Overall complexity: Θ(mn); Θ(n 2 ) for a square matrix
Data decomposition options
• Domain decomposition strategy
• Three options for decomposition
1. Rowwise block striping – Divide matrix elements into group of rows (same as
Floyd’s algorithm) – Each process responsible for a contiguous group of either bm/pc
or dm/pe rows
2. Columnwise block striping – Divide matrix elements into group of columns – Each
process responsible for a contiguous group of either bn/pc or dn/pe columns
3. Checkerboard block decomposition – Form a virtual grid – Matrix is divided into
2D blocks aligning with the grid – Let the grid have r rows and c columns – Each
process responsible for a block of matrix containing at most dm/re rows and dn/ce
columns

b) Explain the Performance Metrics for Parallel Systems. [8]


Ans:It is important to study the performance of parallel programs with a view to
determining the best algorithm, evaluating hardware platforms, and examining the
benefits from parallelism. A number of metrics have been used based on the desired
outcome of performance analysis.
Execution Time
The serial runtime of a program is the time elapsed between the beginning and the end
of its execution on a sequential computer. The parallel runtime is the time that
elapses from the moment a parallel computation starts to the moment the last
processing element finishes execution. We denote the serial runtime by TS and the
parallel runtime by TP.
Total Parallel Overhead
The overheads incurred by a parallel program are encapsulated into a single
expression referred to as the overhead function. We define overhead function or
total overhead of a parallel system as the total time collectively spent by all the
processing elements over and above that required by the fastest known sequential
algorithm for solving the same problem on a single processing element. We denote
the overhead function of a parallel system by the symbol To.
The total time spent in solving a problem summed over all processing elements is
pTP . TS units of this time are spent performing useful work, and the remainder is
overhead. Therefore, the overhead function (To) is given by
Equation 5.1

Speedup
When evaluating a parallel system, we are often interested in knowing how much
performance gain is achieved by parallelizing a given application over a sequential
implementation. Speedup is a measure that captures the relative benefit of solving a
problem in parallel. It is defined as the ratio of the time taken to solve a problem on a
single processing element to the time required to solve the same problem on a parallel
computer with p identical processing elements. We denote speedup by the symbol
S.
Example :Adding n numbers using n processing elements
Consider the problem of adding n numbers by using n processing elements.
Initially, each processing element is assigned one of the numbers to be added and, at
the end of the computation, one of the processing elements stores the sum of all the
numbers. Assuming that n is a power of two, we can perform this operation in log
n steps by propagating partial sums up a logical binary tree of processing elements.
Following figure illustrates the procedure for n = 16. The processing elements are
labeled from 0 to 15. Similarly, the 16 numbers to be added are labeled from 0 to 15.
The sum of the numbers with consecutive labels from i to j is denoted by .

Figure. Computing the global sum of 16 partial sums using 16 processing elements. denotes
the sum of numbers with consecutive labels from i to j.

OR
Q4) a) Explain Parallel Matrix-Matrix Multiplication algorithm with an example.[8]
Ans:We start by examining algorithms for various distributions of A , B ,
and C . We first consider a one-dimensional, column wise decomposition in which
each task encapsulates corresponding columns from A ,B , and C . One parallel
algorithm makes each task responsible for all computation associated with its

. As shown in Figure 4.10, each task requires all of matrix A in order to


compute its . data are required from each of P-1 other tasks, giving
the following per-processor communication cost:

Note that as each task performs computation, if N P , then the


algorithm will have to transfer roughly one word of data for each multiplication and
addition performed. Hence, the algorithm can be expected to be efficient only when
N is much larger than P or the cost of computation is much larger than .

Figure:Matrix-matrix multiplication A.B=C with matrices A , B , and C decomposed in


two dimensions. The components of A , B , and C allocated to a single task are shaded black.
During execution, this task requires corresponding rows and columns of matrix A and B ,
respectively (shown stippled).

Next, we consider a two-dimensional decomposition of A , B , and C . As in


the one-dimensional algorithm, we assume that a task encapsulates corresponding
elements of A , B , and C and that each task is responsible for all computation
associated with its . The computation of a single element requires an entire

row and column of A and B , respectively. Hence, as shown in


Figure 4.11, the computation performed within a single task requires the A and B
submatrices allocated to tasks in the same row and column, respectively. This is a

total of data, considerably less than in the one-dimensional algorithm.


Figure : Matrix-matrix multiplication algorithm based on two-dimensional decompositions. Each
step involves three stages: (a) an A submatrix is broadcast to other tasks in the same row; (b) local
computation is performed; and (c) the B submatrix is rotated upwards within each column.

To complete the second parallel algorithm, we need to design a strategy for


communicating the submatrices between tasks. One approach is for each task to
execute the following logic (Figure 4.12):

set

for j

=0 to

in each row i

, the th task broadcasts

to the other tasks in the row

accumulate .

send to upward neighbor

endfor
b) Interpret the effect of Granularity on Performance of parallel execution. [8]
Ans: illustrated an instance of an algorithm that is not cost-optimal. The algorithm
discussed in this example uses as many processing elements as the number of inputs,
which is excessive in terms of the number of processing elements. In practice, we
assign larger pieces of input data to processing elements. This corresponds to
increasing the granularity of computation on the processing elements. Using fewer
than the maximum possible number of processing elements to execute a parallel
algorithm is called scaling down a parallel system in terms of the number of
processing elements. A naive way to scale down a parallel system is to design a
parallel algorithm for one input element per processing element, and then use fewer
processing elements to simulate a large number of processing elements. If there are n
inputs and only p processing elements (p < n), we can use the parallel algorithm
designed for n processing elements by assuming n virtual processing elements
and having each of the p physical processing elements simulate n/p virtual
processing elements.As the number of processing elements decreases by a factor of
n/p, the computation at each processing element increases by a factor of n/p
because each processing element now performs the work of n/p processing
elements. If virtual processing elements are mapped appropriately onto physical
processing elements, the overall communication time does not grow by more than a
factor of n/p. The total parallel runtime increases, at most, by a factor of n/p, and
the processor-time product does not increase. Therefore, if a parallel system with n
processing elements is cost-optimal, using p processing elements (where p <
n)to simulate n processing elements preserves cost-optimality.
A drawback of this naive method of increasing computational granularity is that if a
parallel system is not cost-optimal to begin with, it may still not be cost-optimal after
the granularity of computation increases. This is illustrated by the following example
for the problem of adding n numbers.
Example 5.9 Adding n numbers on p processing elements
Consider the problem of adding n numbers on p processing elements such that p
< n and both n and p are powers of 2. We use the same algorithm as in
example and simulate n processing elements on p processing elements. The steps
leading to the solution are shown in figure for n = 16 and p = 4. Virtual
processing element i is simulated by the physical processing element labeled i
mod p; the numbers to be added are distributed similarly. The first log p of the log
n steps of the original algorithm are simulated in (n/p) log p steps on p
processing elements. In the remaining steps, no communication is required because
the processing elements that communicate in the original algorithm are simulated by
the same processing element; hence, the remaining numbers are added locally. The
algorithm takes Q((n/p) log p) time in the steps that require communication, after
which a single processing element is left with n/p numbers to add, taking time
Q(n/p). Thus, the overall parallel execution time of this parallel system is Q((n/p) log
p). Consequently, its cost is Q(n log p), which is asymptotically higher than the
Q(n) cost of adding n numbers sequentially. Therefore, the parallel system is not cost-
optimal.

Q5) a)Compare an algorithm for sequential and parallel Merge sort. Analyze the
complexity for the same. [8]
Ans:
Sorting is a common and important problem in computing. Given a sequence
of N data elements, we are required to generate an ordered sequence that
contains the same elements. Here, we present a parallel version of the well-known
mergesort algorithm. The algorithm assumes that the sequence to be sorted is
distributed and so generates a distributed sorted sequence. For simplicity, we assume
that N is an integer multiple of P , that the N data are distributed evenly among
P tasks, and that is an integer power of two. Relaxing these assumptions
does not change the essential character of the algorithm but would complicate the
presentation.

Figure : Mergesort, used here to sort the sequence [6,2,9,5].

The two partition phases each split the input sequence; the two merge phases each
combine two sorted subsequences generated in a previous phase.
The sequential mergesort algorithm is as follows; its execution is illustrated in Figure
.
 If the input sequence has fewer than two elements, return.
 Partition the input sequence into two halves.
 Sort the two subsequences using the same algorithm.
 Merge the two sorted subsequences to form the output sequence.
The merge operation employed in step (4) combines two sorted subsequences to
produce a single sorted sequence. It repeatedly compares the heads of the two
subsequences and outputs the lesser value until no elements remain. Mergesort
requires time to sort N elements, which is the best that can be
achieved (modulo constant factors) unless data are known to have special properties
such as a known distribution or degeneracy.
We first describe two algorithms required in the implementation of parallel
mergesort: compare-exchange and parallel merge.
Compare-Exchange.
A compare-exchange operation merges two sorted sequences of length M ,
contained in tasks A and B . Upon completion of the operation, both tasks have
M data, and all elements in task A are less than or equal to all elements in task B
. As illustrated in Figure, each task sends its data to the other task. Task A identifies
the M lowest elements and discards the remainder; this process requires at least
M/2 and at most M comparisons. Similarly, task B identifies the M highest
elements.
Figure : The compare-exchange algorithm, with M=4 . (a) Tasks A and B
exchange their sorted subsequences. (b) They perform a merge operation to identify
the lowest and highest M elements, respectively. (c) Other elements are discarded,
leaving a single sorted sequence partitioned over the two tasks.

Notice that a task may not need all M of its neighbor's data in order to identify the
M lowest (or highest) values. On average, only M/2 values are required. Hence, it
may be more efficient in some situations to require the consumer to request data
explicitly. This approach results in more messages that contain a total of less than M
data, and can at most halve the amount of data transferred.
Figure : The parallel merge operation, performed in hypercubes of dimension one,
two, and three. In a hypercube of dimension d , each task performs d compare-
exchange operations. Arrows point from the ``high'' to the ``low'' task in each
exchange.

Parallel Merge.
A parallel merge algorithm performs a merge operation on two sorted sequences of
length , each distributed over tasks, to produce a single sorted sequence of
length distributed over tasks. As illustrated in Figure, this is
achieved by using the hypercube communication template. Each of the tasks
engages in d+1 compare-exchange steps, one with each neighbor. In effect, each
node executes Algorithm , applying the following operator at each step.

if ( myid AND > 0 ) then

state = compare_exchange_high(state,message)

else

state = compare_exchange_low(state,message)

endif

In this code fragment, AND is a bitwise logical and operator, used to determine
whether the task is ``high'' or ``low'' in a particular exchange; myid and i are as in
Algorithm .
Mergesort.
We next describe the parallel mergesort algorithm proper. Each task in the
computation executes the following logic.
procedure parallel_mergesort(myid, d, data, newdata)
begin
data = sequential_mergesort(data)
for dim = 1 to d
data = parallel_merge(myid, dim, data)
endfor
newdata = data
end
First, each task sorts its local sequence using sequential mergesort. Second, and again
using the hypercube communication structure, each of the tasks executes
the parallel merge algorithm d times, for subcubes of dimension 1.. d . The i th
parallel merge takes two sequences, each distributed over tasks, and generates
a sorted sequence distributed over tasks. After d such merges, we have a single
sorted list distributed over tasks.

b) Modify Depth First Search for parallel execution and analyze its complexity. [8]
Ans:
Two characteristics of parallel DFS are critical to determining its performance. First is
the method for splitting work at a processor, and the second is the scheme to
determine the donor processor when a processor becomes idle.
Work-Splitting Strategies
When work is transferred, the donor's stack is split into two stacks, one of which is
sent to the recipient. In other words, some of the nodes (that is, alternatives) are
removed from the donor's stack and added to the recipient's stack. If too little work is
sent, the recipient quickly becomes idle; if too much, the donor becomes idle. Ideally,
the stack is split into two equal pieces such that the size of the search space
represented by each stack is the same. Such a split is called a half-split. It is difficult
to get a good estimate of the size of the tree rooted at an unexpanded alternative in the
stack. However, the alternatives near the bottom of the stack (that is, close to the
initial node) tend to have bigger trees rooted at them, and alternatives near the top of
the stack tend to have small trees rooted at them. To avoid sending very small
amounts of work, nodes beyond a specified stack depth are not given away. This
depth is called the cutoff depth.
Some possible strategies for splitting the search space are (1) send nodes near the
bottom of the stack, (2) send nodes near the cutoff depth, and (3) send half the nodes
between the bottom of the stack and the cutoff depth. The suitability of a splitting
strategy depends on the nature of the search space. If the search space is uniform, both
strategies 1 and 3 work well. If the search space is highly irregular, strategy 3 usually
works well. If a strong heuristic is available (to order successors so that goal nodes
move to the left of the state-space tree), strategy 2 is likely to perform better, since it
tries to distribute those parts of the search space likely to contain a solution. The cost
of splitting also becomes important if the stacks are deep. For such stacks, strategy 1
has lower cost than strategies 2 and 3.
Fig shows the partitioning of the DFS tree of Figure (a) into two subtrees using
strategy 3. Note that the states beyond the cutoff depth are not partitioned. Fig also
shows the representation of the stack corresponding to the two subtrees. The stack
representation used in the figure stores only the unexplored alternatives.
Figure 11.9. Splitting the DFS tree in Fig. The two subtrees along with their stack
representations are shown in (a) and (b).

Load-Balancing Schemes
This section discusses three dynamic load-balancing schemes: asynchronous round
robin, global round robin, and random polling. Each of these schemes can be coded
for message passing as well as shared address space machines.
Asynchronous Round Robin In asynchronous round robin (ARR), each processor
maintains an independent variable, target. Whenever a processor runs out of work, it
uses target as the label of a donor processor and attempts to get work from it. The
value of target is incremented (modulo p) each time a work request is sent. The
initial value of target at each processor is set to ((label + 1) modulo p) where
label is the local processor label. Note that work requests are generated
independently by each processor. However, it is possible for two or more processors
to request work from the same donor at nearly the same time.
Global Round Robin Global round robin (GRR) uses a single global variable called
target. This variable can be stored in a globally accessible space in shared address
space machines or at a designated processor in message passing machines. Whenever
a processor needs work, it requests and receives the value of target, either by
locking, reading, and unlocking on shared address space machines or by sending a
message requesting the designated processor (say P0). The value of target is
incremented (modulo p) before responding to the next request. The recipient
processor then attempts to get work from a donor processor whose label is the value
of target. GRR ensures that successive work requests are distributed evenly over all
processors. A drawback of this scheme is the contention for access to target.
Random Polling Random polling (RP) is the simplest load-balancing scheme. When
a processor becomes idle, it randomly selects a donor. Each processor is selected as a
donor with equal probability, ensuring that work requests are evenly distributed.

OR
Q6) a) Discuss the issues in sorting for parallel computers. [8]
Ans:Parallelizing a sequential sorting algorithm involves distributing the elements to
be sorted onto the available processes. This process raises a number of issues that we
must address in order to make the presentation of parallel sorting algorithms clearer.
Where the Input and Output Sequences are Stored
In sequential sorting algorithms, the input and the sorted sequences are stored in the
process's memory. However, in parallel sorting there are two places where these
sequences can reside. They may be stored on only one of the processes, or they may
be distributed among the processes. The latter approach is particularly useful if
sorting is an intermediate step in another algorithm. In this chapter, we assume that
the input and sorted sequences are distributed among the processes.
Consider the precise distribution of the sorted output sequence among the processes.
A general method of distribution is to enumerate the processes and use this
enumeration to specify a global ordering for the sorted sequence. In other words, the
sequence will be sorted with respect to this process enumeration. For instance, if Pi
comes before Pj in the enumeration, all the elements stored in Pi will be smaller
than those stored in Pj . We can enumerate the processes in many ways. For certain
parallel algorithms and interconnection networks, some enumerations lead to more
efficient parallel formulations than others.
How Comparisons are Performed
A sequential sorting algorithm can easily perform a compare-exchange on two
elements because they are stored locally in the process's memory. In parallel sorting
algorithms, this step is not so easy. If the elements reside on the same process, the
comparison can be done easily. But if the elements reside on different processes, the
situation becomes more complicated.
One Element Per Process
Consider the case in which each process holds only one element of the sequence to be
sorted. At some point in the execution of the algorithm, a pair of processes (Pi, Pj)
may need to compare their elements, ai and aj. After the comparison, Pi will
hold the smaller and Pj the larger of {ai, aj}. We can perform comparison by
having both processes send their elements to each other. Each process compares the
received element with its own and retains the appropriate element. In our example,
Pi will keep the smaller and Pjwill keep the larger of {ai, aj}. As in the sequential
case, we refer to this operation as compare-exchange. As fig illustrates, each
compare-exchange operation requires one comparison step and one communication
step.

Figure :. A parallel compare-exchange operation. Processes Pi and Pj send their elements to


each other. Process Pi keeps min{ai, aj}, and Pj keeps max{ai , aj}.
If we assume that processes Pi and Pj are neighbors, and the communication
channels are bidirectional, then the communication cost of a compare-exchange step
is (ts + tw), where ts and tw are message-startup time and per-word transfer
time, respectively. In commercially available message-passing computers, ts is
significantly larger than tw, so the communication time is dominated by ts. Note
that in today's parallel computers it takes more time to send an element from one
process to another than it takes to compare the elements. Consequently, any parallel
sorting formulation that uses as many processes as elements to be sorted will deliver
very poor performance because the overall parallel run time will be dominated by
interprocess communication.
More than One Element Per Process
A general-purpose parallel sorting algorithm must be able to sort a large sequence
with a relatively small number of processes. Let p be the number of processes P0,
P1, ..., Pp-1, and let n be the number of elements to be sorted. Each process is
assigned a block of n/p elements, and all the processes cooperate to sort the
sequence. Let A0, A1, ... A p-1 be the blocks assigned to processes P0, P1, ...
Pp-1, respectively. We say that Ai Aj if every element of Ai is less than or
equal to every element in Aj. When the sorting algorithm finishes, each process Pi
holds a set such that for i j, and .
As in the one-element-per-process case, two processes Pi and Pj may have to
redistribute their blocks of n/p elements so that one of them will get the smaller n/
p elements and the other will get the larger n/p elements. Let Ai and Aj be
the blocks stored in processes Pi and Pj. If the block of n/p elements at each
process is already sorted, the redistribution can be done efficiently as follows. Each
process sends its block to the other process. Now, each process merges the two sorted
blocks and retains only the appropriate half of the merged block. We refer to this
operation of comparing and splitting two sorted blocks as compare-split. The
compare-split operation is illustrated in figure.

b) Explain Dijkstra’s shortest path algorithm. [8]


Ans:Given a graph and a source vertex in the graph, find shortest paths from source to
all vertices in the given graph.
Dijkstra’s algorithm is very similar to Prim’s algorithm for minimum spanning tree.
Like Prim’s MST, we generate a SPT (shortest path tree) with given source as root.
We maintain two sets, one set contains vertices included in shortest path tree, other set
includes vertices not yet included in shortest path tree. At every step of the algorithm,
we find a vertex which is in the other set (set of not yet included) and has a minimum
distance from the source.
Below are the detailed steps used in Dijkstra’s algorithm to find the shortest path from
a single source vertex to all other vertices in the given graph.
Algorithm
1) Create a set sptSet (shortest path tree set) that keeps track of vertices included
in shortest path tree, i.e., whose minimum distance from source is calculated and
finalized. Initially, this set is empty.
2) Assign a distance value to all vertices in the input graph. Initialize all distance
values as INFINITE. Assign distance value as 0 for the source vertex so that it is
picked first.
3) While sptSet doesn’t include all vertices
….a) Pick a vertex u which is not there in sptSet and has minimum distance value.
….b) Include u to sptSet.
….c) Update distance value of all adjacent vertices of u. To update the distance
values, iterate through all adjacent vertices. For every adjacent vertex v, if sum of
distance value of u (from source) and weight of edge u-v, is less than the distance
value of v, then update the distance value of v.
Let us understand with the following example:
The set sptSet is initially empty and distances assigned to vertices are {0, INF, INF,
INF, INF, INF, INF, INF} where INF indicates infinite. Now pick the vertex with
minimum distance value. The vertex 0 is picked, include it in sptSet. So sptSet
becomes {0}. After including 0 to sptSet, update distance values of its adjacent
vertices. Adjacent vertices of 0 are 1 and 7. The distance values of 1 and 7 are
updated as 4 and 8. Following subgraph shows vertices and their distance values, only
the vertices with finite distance values are shown. The vertices included in SPT are
shown in green colour.

Pick the vertex with minimum distance value and not already included in SPT (not in
sptSET). The vertex 1 is picked and added to sptSet. So sptSet now becomes {0, 1}.
Update the distance values of adjacent vertices of 1. The distance value of vertex 2
becomes 12.

Pick the vertex with minimum distance value and not already included in SPT (not in
sptSET). Vertex 7 is picked. So sptSet now becomes {0, 1, 7}. Update the distance
values of adjacent vertices of 7. The distance value of vertex 6 and 8 becomes finite
(15 and 9 respectively).

Pick the vertex with minimum distance value and not already included in SPT (not in
sptSET). Vertex 6 is picked. So sptSet now becomes {0, 1, 7, 6}. Update the distance
values of adjacent vertices of 6. The distance value of vertex 5 and 8 are updated.
We repeat the above steps until sptSet doesn’t include all vertices of given graph.
Finally, we get the following Shortest Path Tree (SPT).

Q7) a) Explain parallelism in Best First Search algorithm. Give an appropriate


example. [8]
Ans:In most parallel formulations of BFS, different processors concurrently expand
different nodes from the open list. These formulations differ according to the data
structures they use to implement the open list. Given p processors, the simplest
strategy assigns each processor to work on one of the current best nodes on the open
list. This is called the centralized strategy because each processor gets work from a
single global open list. Since this formulation of parallel BFS expands more than
one node at a time, it may expand nodes that would not be expanded by a sequential
algorithm. Consider the case in which the first node on the open list is a solution.
The parallel formulation still expands the first p nodes on the open list. However,
since it always picks the best p nodes, the amount of extra work is limited.Figure
illustrates this strategy. There are two problems with this approach:
1.The termination criterion of sequential BFS fails for parallel BFS. Since at any
moment, p nodes from the open list are being expanded, it is possible that one of
the nodes may be a solution that does not correspond to the best goal node (or the path
found is not the shortest path). This is because the remaining p - 1 nodes may lead
to search spaces containing better goal nodes. Therefore, if the cost of a solution
found by a processor is c, then this solution is not guaranteed to correspond to the
best goal node until the cost of nodes being searched at other processors is known to
be at least c. The termination criterion must be modified to ensure that termination
occurs only after the best solution has been found.
2.Since the open list is accessed for each node expansion, it must be easily
accessible to all processors, which can severely limit performance. Even on shared-
address-space architectures, contention for the open list limits speedup. Let texp
be the average time to expand a single node, and taccess be the average time to
access the open list for a single-node expansion. If there are n nodes to be
expanded by both the sequential and parallel formulations (assuming that they do an
equal amount of work), then the sequential run time is given by n(taccess + texp).
Assume that it is impossible to parallelize the expansion of individual nodes. Then the
parallel run time will be at least ntaccess, because the open list must be accessed
at least once for each node expanded. Hence, an upper bound on the speedup is
(taccess + texp)/taccess.
Figure:. A general schematic for parallel best-first search using a centralized strategy.
The locking operation is used here to serialize queue access by various processors.

b)Design a simple CUDA kernel function to multiply two integers. [6]


Ans:
__global__ void mm_kernel(foaat A foaat BA foaat A ina n) {
ina col = blockIdx.x t blockDim.x + ahreadIdx.x;
ina row = blockIdx.y t blockDim.y + ahreadIdx.y;
if (row < n && col < n)
{
for (ina i = 0; i < n; ++i)
{ [row t n + col] += [row t n + i] t B[i t n + col];
}
} } mm_kernel<<<dimGridA dimBlock>>> (d_aA d_bA d_cA n);

OR

Q8) a) Describe CUDA Architecture in details with neat diagram. [8]


Ans:CPUs are designed to process as many sequential instructions as quickly as
possible. While most CPUs support threading, creating a thread is usually an
expensive operation and high-end CPUs can usually make efficient use of no more
than about 12 concurrent threads.GPUs on the other hand are designed to process a
small number of parallel instructions on large sets of data as quickly as possible. For
instance, calculating 1 million polygons and determining which to draw on the screen
and where. To do this they rely on many slower processors and inexpensive threads.

Physical Architecture:CUDA-capable GPU cards are composed of one or more


Streaming Multiprocessors (SMs), which are an abstraction of the underlying
hardware. Each SM has a set of Streaming Processors (SPs), also called CUDA
cores, which share a cache of shared memory that is faster than the GPU’s global
memory but that can only be accessed by the threads running on the SPs the that SM.
These streaming processors are the “cores” that execute instructions.

The numbers of SPs/cores in an SM and the number of SMs depend on your device:
see the Finding your Device Specifications section below for details. It is important
to realize, however, that regardless of GPU model, there are many more CUDA cores
in a GPU than in a typical multicore CPU: hundreds or thousands more. For example,
the Kepler Streaming Multiprocessor design, dubbed SMX, contains 192 single-
precision CUDA cores, 64 double-precision units, 32 special function units, and 32
load/store units. (See the Kepler Architecture Whitepaper for a description and
diagram.)

CUDA cores are grouped together to perform instructions in a what nVIDIA has
termed a warp of threads. Warp simply means a group of threads that are scheduled
together to execute the same instructions in lockstep. All CUDA cards to date use a
warp size of 32. Each SM has at least one warp scheduler, which is responsible for
executing 32 threads. Depending on the model of GPU, the cores may be double or
quadruple pumped so that they execute one instruction on two or four threads in as
many clock cycles. For instance, Tesla devices use a group of 8 quadpumped cores to
execute a single warp. If there are less than 32 threads scheduled in the warp, it will
still take as long to execute the instructions.
The CUDA programmer is responsible for ensuring that the threads are being
assigned efficiently for code that is designed to run on the GPU. The assignment of
threads is done virtually in the code using what is sometimes referred to as a ‘tiling’
scheme of blocks of threads that form a grid. Programmers define a kernel function
that will be executed on the CUDA card using a particular tiling scheme.
Virtual Architecture
When programming in CUDA C we work with blocks of threads and grids of blocks.
What is the relationship between this virtual architecture and the CUDA card’s
physical architecture?
When kernels are launched, each block in a grid is assigned to a Streaming
Multiprocessor. This allows threads in a block to use __shared__ memory. If a
block doesn’t use the full resources of the SM then multiple blocks may be assigned
at once. If all of the SMs are busy then the extra blocks will have to wait until a SM
becomes free.
Once a block is assigned to an SM, it’s threads are split into warps by the warp
scheduler and executed on the CUDA cores. Since the same instructions are executed
on each thread in the warp simultaneously it’s generally a bad idea to have
conditionals in kernel code. This type of code is sometimes called divergent: when
some threads in a warp are unable to execute the same instruction as other threads in a
warp, those threads are diverged and do no work.
Because a warp’s context (it’s registers, program counter etc.) stays on chip for the
life of the warp, there is no additional cost to switching between warps vs executing
the next step of a given warp. This allows the GPU to switch to hide some of it’s
memory latency by switching to a new warp while it waits for a costly read.
CUDA Memory
CUDA on chip memory is divided into several different regions
Registers act the same way that registers on CPUs do, each thread has it’s own set of
registers.Local Memory local variables used by each thread. They are not accessible
by other threads even though they use the same L1 and L2 cache as global memory.
Shared Memory is accessible by all threads in a block. It must be declared using the
__shared__ modifier. It has a higher bandwidth and lower latency than global
memory. However, if multiple threads request the same address, the requests are
processed serially, which slows down the application.
Constant Memory is read-accessible by all threads and must be declared with the
__const__ modifier. In newer devices there is a separate read only constant cache.
Global Memory is accessible by all threads. It’s the slowest device memory, but on
new cards, it is cached. Memory is pulled in 32, 64, or 128 byte memory transactions.
Warps executing global memory accesses attempt to pull all the data from global
memory simultaneously therefore it’s advantageous to use block sizes that are
multiples of 32. If multidimensional arrays are used, it’s also advantageous to have
the bounds padded so that they are multiples of 32 Texture/Surface Memory is read-
accesible by all threads, but unlike Constant Memory, it is optimized for 2D spacial
locality, and cache hits pull in surrounding values in both x and y directions.
b) Write advantages and limitations of CUDA. [5]
Ans:
Advantages of CUDA:
 Huge increase in processing power over conventional CPU processing.
 Early reports suggest speed increases of 10x to 200x over CPU processing speed.
 Researchers can use several GPU's to preform the same amount of operations as
many servers in less time, thus saving money, time, and space.
 C language is widely used, so it is easy for devolopers to learn how to program
for CUDA.
 All graphics cards in the G80 series and beyond support CUDA.
 Harnesses the power of the GPU by using parallel processing; running thousands
of simultanious reads instead of single, dual, or quad reads on the CPU.
Disadvantages of CUDA:
 Limited user base- Only NVIDIA G80 and onward video cards can use CUDA,
thus isolating all ATI users.
 Speeds may be bottlenecked at the bus between CPU and GPU.
 Developers still sceptical as to whether CUDA will catch on.
 Mainly developed for researchers- not many uses for average users.
 System is still in development

c) Give five applications of CUDA. [5]


Ans:
 Dense Linear Algebra
 Sparse Linear Algebra
 Spectral Methods
 N-Body Methods
 Structured Grids
 Unstructured Grids

  
BE (Computer) Semester -VII
Part A:Scheme, Course Outcomes, Syllabus, and Evaluation guidelines of
Artificial Intelligence and Robotics (410242)

Subject Subject Teaching Scheme Credits Assigned


Code Name (Hrs.)
Theory Practical Tutorial Theory Practical Tutorial Total
410242 Artificial 3 hrs -- -- 3 -- -- 03
Intelligence
and Robotics

Course Course Evaluation Scheme


Code Name
410242 Artificial Theory Practical Total
Intelligence Credit
and Internal External Total Internal External Total
Robotics Insem Endsem Total
00 30 70 100 100 00 00 00 3

(Mr..Devidas S.Thosar) (Mr.Kishor N. Shedge)


Subject In-Charge Head, Comp Engg Dept
Academic Year 2019-20 (SEM-I)
Class : BE Computer Subject: Artificial Intelligence and Robotics (410242)
Course Objectives:
1. To understand the concept of Artificial Intelligence (AI)
2. To learn various peculiar search strategies for AI
3. To acquaint with the fundamentals of mobile robotics
4. To develop a mind to solve real world problems unconventionally with optimality
5. To learn about different sensors used in Robotics & Artificial Neural Network (ANN).
6. To understand the concept of Robots in Practice.

Course Outcomes:
On completion of the course, student will be able to–

1. CO1: Identify and apply suitable intelligent agents for various AI applications
2. CO2: Design smart system using different informed search / uninformed search or
heuristic approaches.
3. CO3: Identify knowledge associated and represent it by ontological engineering to
plan a strategy to solve given problem.
4. CO4: Apply the suitable algorithms to solve AI problems.
5. CO5: Identify and use suitable sensors to solve Robotics problems.

(Mr..Devidas S.Thosar) (Mr.Kishor N. Shedge)


Subject In-Charge Head, Comp Engg Dept
Syllabus
Sub: Artificial Intelligence and Robotics (410242)
Unit I: Introduction (8h)
Artificial Intelligence: Introduction, Typical Applications. State Space Search: Depth Bounded DFS,
Depth First Iterative Deepening. Heuristic Search: Heuristic Functions, Best First Search, Hill
Climbing, Variable Neighborhood Descent, Beam Search, Tabu Search. Optimal Search: A*
algorithm, Iterative Deepening A*, Recursive Best First Search, Pruning the CLOSED and OPEN
Lists.

Unit II: Problem Decomposition and Planning (8h)


Problem Decomposition: Goal Trees, Rule Based Systems, Rule Based Expert Systems.
Planning: STRIPS, Forward and Backward State Space Planning, Goal Stack Planning, Plan
Space Planning, A Unified Framework For Planning. Constraint Satisfaction : N-Queens,
Constraint Propagation, Scene Labeling, Higher order and Directional Consistencies,
Backtracking and Look ahead Strategies.

Unit III: Logic and Reasoning (8h)


Knowledge Based Reasoning: Agents, Facets of Knowledge. Logic and Inferences: Formal
Logic, Propositional and First Order Logic, Resolution in Propositional and First Order Logic,
Deductive Retrieval, Backward Chaining, Second order Logic. Knowledge Representation:
Conceptual Dependency, Frames, Semantic nets.

Unit IV: Natural Language Processing and ANN (8h)


Natural Language Processing: Introduction, Stages in natural language Processing, Application of
NLP in Machine Translation, Information Retrieval and Big Data Information Retrieval.
Learning: Supervised, Unsupervised and Reinforcement learning. Artificial Neural Networks
(ANNs): Concept, Feed forward and Feedback ANNs, Error Back Propagation, Boltzmann
Machine.
Unit V: Robotics (8h)
Robotics: Fundamentals, path Planning for Point Robot, Sensing and mapping for Point Robot,
Mobile Robot Hardware, Non Visual Sensors like: Contact Sensors, Inertial Sensors, Infrared
Sensors, Sonar, Radar, laser Rangefinders, Biological Sensing. Robot System Control: Horizontal
and Vertical Decomposition, Hybrid Control Architectures, Middleware, High-Level Control,
Human-Robot Interface.

Unit VI: Robots in Practice (8h)


Robot Pose Maintenance and Localization: Simple Landmark Measurement, Servo Control,
Recursive Filtering, Global Localization. Mapping: Sensorial Maps, Topological Maps,
Geometric Maps, Exploration. Robots in Practice: Delivery Robots, Intelligent Vehicles, Mining
Automation, Space Robotics, Autonomous Aircrafts, Agriculture, Forestry, Domestic Robots.

Reference Books:
1. Nilsson Nils J , “Artificial Intelligence: A new Synthesis, Morgan Kaufmann Publishers Inc.
San Francisco, CA, ISBN: 978-1-55-860467-4
2. Patrick Henry Winston, “Artificial Intelligence”, Addison-Wesley Publishing Company, ISBN:
0-201-53377-4
3. Andries P. Engelbrecht-Computational Intelligence: An Introduction, 2nd Edition-Wiley India-
ISBN: 978-0-470-51250-0
Teaching Plan
Sub: Artificial Intelligence and Robotics (410242)

Bloom Levels (BL) : 1. Remember 2. Understand 3. Apply 4. Create


Lect. Planned Conducted Topics / Sub- Topics Course BL Reference (Text
No. Date Date Outcome/ Level Book, Website)
Addressed
1 17/06/19 PO,PSO & CO of Subject -- 1 Blooms taxonomy
2 18/06/19 Unit I: Introduction 1 1 Artificial Intelligence: A
new Synthesis by Nilsson
Artificial Intelligence:
Introduction,
3 20/06/19 Typical Applications. State 1 2 Artificial Intelligence: A
new Synthesis by Nilsson
Space Search
4 24/06/19 Depth Bounded DFS, Depth 1 1 Artificial Intelligence: A
new Synthesis by Nilsson
First Iterative Deepening.
5 25/06/19 Heuristic Search: Heuristic 2 2 Artificial Intelligence: A
new Synthesis by Nilsson
Functions,
6 27/06/19 Best First Search, Hill 2 2 Artificial Intelligence: A
new Synthesis by Nilsson
Climbing, Variable
Neighborhood Descent,
7 01/07/19 Beam Search, Tabu Search. 3 2 Artificial Intelligence: A
new Synthesis by Nilsson
Optimal Search: A* algorithm,
8 02/07/19 Iterative Deepening A*, 2 2 Artificial Intelligence: A
new Synthesis by Nilsson
Recursive Best First Search,
9 04/07/19 Pruning the CLOSED and 2 2 Artificial Intelligence: A
OPEN Lists. new Synthesis by Nilsson
10 08/07/19 Unit II: Problem 3 4 Artificial Intelligence by
Decomposition and Planning Patrick Henry
Problem Decomposition: Goal
Trees,
11 09/07/19 Rule Based Systems, Rule 2 4 Artificial Intelligence by
Patrick Henry
Based Expert Systems.
12 11/07/19 Planning: STRIPS, 2 3 Artificial Intelligence by
Patrick Henry
13 15/07/19 Forward and Backward State 2 2 Artificial Intelligence by
Patrick Henry
Space Planning,
14 16/07/19 Goal Stack Planning, Plan 2 2 Artificial Intelligence by
Patrick Henry
Space Planning,
15 18/07/19 A Unified Framework For 3 2 Artificial Intelligence by
Patrick Henry
Planning. Constraint
Satisfaction : N-Queens,
16 22/07/19 Constraint Propagation, Scene 3 3 Artificial Intelligence by
Patrick Henry
Labeling,
17 23/07/19 Higher order and Directional 3 3 Artificial Intelligence by
Patrick Henry
Consistencies, Backtracking and
Look ahead Strategies.
18 25/07/19 Unit III: Logic and Reasoning 2 3 Computational
Knowledge Based Reasoning: Intelligence: An
Introduction by Andries
Agents
19 29/07/19 Facets of Knowledge. Logic and 2 3 Computational
Intelligence: An
Inferences Introduction by Andries
20 30/07/19 Formal Logic, Propositional and 3 2 Computational
Intelligence: An
First Order Logic, Introduction by Andries
21 01/08/19 Resolution in Propositional and 3 2 Computational
Intelligence: An
First Order Logic, Introduction by Andries
22 05/08/19 Deductive Retrieval, 3 2 Computational
Intelligence: An
Introduction by Andries
23 06/08/19 Backward Chaining, Second 3 3 Computational
Intelligence: An
order Logic. Introduction by Andries
24 08/08/19 Knowledge Representation: 3 3 Computational
Intelligence: An
Conceptual Dependency, Introduction by Andries
25 13/08/19 Frames, Semantic nets. 3 2 Computational
Intelligence: An
Introduction by Andries
26 19/08/19 Unit IV: Natural Language 3 2 Artificial Intelligence: A
new Synthesis by Nilsson
Processing and ANN
Natural Language Processing:.

27 20/08/19 Introduction, Stages in natural 3 2 Artificial Intelligence: A


new Synthesis by Nilsson
language Processing,
28 22/08/19 Application of NLP in Machine 3 2 Artificial Intelligence: A
new Synthesis by Nilsson
Translation,
29 26/08/19 Information Retrieval and Big 3 1 Artificial Intelligence: A
new Synthesis by Nilsson
Data Information Retrieval.
30 27/08/19 Learning: Supervised, 3 2 Artificial Intelligence: A
new Synthesis by Nilsson
Unsupervised and
Reinforcement learning.
31 29/08/19 Artificial Neural Networks 4 1 Artificial Intelligence: A
new Synthesis by Nilsson
(ANNs): Concept,
32 03/09/19 Feed forward and Feedback 4 1 Artificial Intelligence: A
new Synthesis by Nilsson
ANNs,
33 05/08/19 Error Back Propagation, 4 2 Artificial Intelligence: A
new Synthesis by Nilsson
Boltzmann Machine
34 09/08/19 Unit V: Robotics 5 1 https://www.robotics.org
Robotics: Fundamentals,
35 10/08/19 path Planning for Point Robot, 4 2 https://www.robotics.org
Sensing and mapping for Point
Robot,
36 12/08/19 Mobile Robot Hardware, 5 2 https://www.robotics.org
37 16/08/19 Non Visual Sensors like: 5 3 https://www.robotics.org
Contact Sensors, Inertial
Sensors,
38 17/08/19 Infrared Sensors, Sonar, Radar, 5 3 https://www.robotics.org
laser Rangefinders,
39 19/08/19 Biological Sensing. Robot 5 2 https://www.robotics.org
System Control: Horizontal and
Vertical Decomposition,
40 23/08/19 Hybrid Control Architectures, 5 2 https://www.robotics.org
Middleware,
41 24/08/19 High-Level Control, Human- 5 2 https://www.robotics.org
Robot Interface.
42 26/08/19 Unit VI: Robots in Practice 4 3 Artificial Intelligence: A
Robot Pose Maintenance and new Synthesis by Nilsson
Localization: https://www.robotics.org
43 30/08/19 Simple Landmark 2 3 Artificial Intelligence: A
Measurement, new Synthesis by Nilsson
https://www.robotics.org
44 01/09/19 Servo Control, Recursive 3 2 Artificial Intelligence: A
Filtering, new Synthesis by Nilsson
https://www.robotics.org
45 03/09/19 Global Localization. Mapping: 2 2 Artificial Intelligence: A
Sensorial Maps, new Synthesis by Nilsson
https://www.robotics.org
46 07/09/19 Topological Maps, Geometric 3 2 Artificial Intelligence: A
Maps, new Synthesis by Nilsson
https://www.robotics.org
47 10/09/19 Exploration. Robots in Practice: 5 2 Artificial Intelligence: A
Delivery Robots, Intelligent new Synthesis by Nilsson
Vehicles, https://www.robotics.org
48 14/09/19 Mining Automation, Space 5 3 Artificial Intelligence: A
Robotics, Autonomous new Synthesis by Nilsson
Aircrafts, https://www.robotics.org
49 15/09/19 Agriculture, Forestry, Domestic 5 3 Artificial Intelligence: A
Robots new Synthesis by Nilsson
https://www.robotics.org
50 17/09/19 Review
51 21/09/19 Review

(Mr..Devidas S.Thosar) (Mr.Kishor N. Shedge)


Subject In-Charge Head, Comp Engg Dept

Evaluation Guidelines
Internal Assessment (IA):
1. Two Class tests & One Prelim must be conducted. Average marks will be considered
2. Three Assignments on each 2 units in entire syllabus to be conducted and average of
three to be considered.
3. Attendance marks as per Institute rule to be considered.

External Evaluation
Insem Examination:
1. Insem Examination will be conducted in mid semester on first 03 units and carries 30
marks
2. Question paper consist of total 06 questions carries 10 marks of each questions. It
consist of solve question No. 1 OR question No. 2 on Unit No. 1, Solve question No. 3 OR
question No. 4 on Unit No. 2, and solve question No. 5 OR question no. 6 on Unit No. 3.

End Semester Examination:


1. End Semester Examination will be conducted in at end of semester on entire syllabus of 06
units and carries 70 marks. It consists of 20 marks on first 03 units and 50 marks on Unit
No. 4 to 6.
2. Question paper consists of total 10 questions. It consist of solve question No. 1 OR
question No. 2 and Solve question No. 3 OR question No. 4 on Unit No. 1 to 3 for 10 marks
each questions. Remaining questions are solving question No. 5 OR question no. 6 on Unit
No. 4, Solve question No. 7 OR question no. 8 on Unit No. 5, and solve question No. 9 OR
question no.10 on Unit No. 6.

(Mr..Devidas S.Thosar) (Mr.Kishor N. Shedge)


Subject In-Charge Head, Comp Engg Dept
Part B: Course Delivery, Objectives, Outcomes
Assessment and Evaluation, and CO mapping with the POs
Sub: Artificial Intelligence and Robotics (410242)
Prepared by the Course Coordinator: Mr.Devidas S. Thosar

Pre-requisite:

1. Principles of Programming Languages (210254)

Course Delivery:
The course will be delivered through lectures, class room interaction, and presentations.

Course Objectives:

1. To understand the concept of Artificial Intelligence (AI)


2. To learn various peculiar search strategies for AI
3. To acquaint with the fundamentals of mobile robotics
4. To develop a mind to solve real world problems unconventionally with optimality
5. To learn about different sensors used in Robotics & Artificial Neural Network (ANN).
6. To understand the concept of Robots in Practice.

Course Outcomes:
On completion of the course, student will be able to–

1. Identify [L1: Knowledge] and apply suitable Intelligent agents for various AI applications.
2.Design [L2: Analysis] smart system using different informed search / uninformed search or
heuristic approaches.
3.Identify [L1: Knowledge] knowledge associated and represent it by ontological engineering to
plan a strategy to solve given problem.
4. Apply [L3: Application] the suitable algorithms to solve AI problems.

*Level of Bloom’s Taxonomy Level to be met


L1: Knowledge , L2: Comprehension 1
L3: Application, L4: Analysis 2
L5: Synthesis, L6: Evaluation 3
Contribution to outcomes will be achieved through content delivery:

Modes of Content Delivery:

i Class Room Teaching v Self Learning Online Resources ix Industry Visit


ii Tutorial/MAB vi Slides x Group Discussion
iii Remedial Coaching vii Simulations/Demonstrations xi Seminar/Oral
iv Lab Experiment viii Expert Lecture xii Case Study

Compiler subject mode of delivery

i Class Room Teaching vi Expert Lecture


ii Remedial Coaching vii Industry Visit
iii Lab Experiment viii Group Discussion
iv Self Learning Online Resources viii Oral
v Slides

CO Mapping with Content Delivery


S. No. Course Mode of Delivery
Outcome
i ii iii iv v vi vii viii
1 CO410242.1 X X X X X
2 CO410242.2 X X X X
3 CO410242.3 X X X X X
4 CO410242.4 X X X X X X X

Program Outcomes (POs) & Program Specific Outcomes (PSOs) of Computer


Engineering

Programs Outcomes (POs):

Engineering Graduates will be able to:


1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering
fundamentals, and an engineering specialization to the solution of complex engineering problems.
2. Problem analysis:To analyze the problem by finding its domain and applying domain specific
skills
3. Design/development of solutions:To understand the design issues of the product/software and
develop effective solutions with appropriate consideration for public health and safety, cultural,
societal, and environmental considerations.
4. Conduct investigations of complex problems:To find solutions of complex problems by
conducting investigations applying suitable techniques.
5. Modern tool usage:To adapt the usage of modern tools and recent software.
6. The engineer and society:To contribute towards the society by understanding the impact of
Engineering on global aspect.
7. Environment and sustainability:To understand environment issues and design a sustainable
system.
8. Ethics:To understand and follow professional ethics.
9. Individual and team work:To function effectively as an individual and as member or leader in
diverse teams and interdisciplinary settings.
10. Communication:To demonstrate effective communication at various levels.
11. Project management and finance:To apply the knowledge of Computer Engineering for
development of projects, and its finance and management.
12. Life-long learning:To keep in touch with current technologies and inculcate the practice of
lifelong learning.
Program Specific Outcomes (PSO):
Computerl Engineering Students are able to:

PSO 1 : Graduate of programme having ability to represents the fundamentals and


functioning of the hardware and software perspective related to automatic data processing
system.

PSO 2 : Graduate of programme should be able to use proficient engineering praxis &
strategies for the build out, maintenance and testing of software solutions.

PSO 3 : Graduate of programme should be able to provide conclusive and cost effective real
time solutions using savoir faire in IT domain.

Mapping of Course Outcomes (COs) with Program Outcome (POs) and Program
Specific Outcome (PSOs)
1: Slight (Low) 2: Moderate (Medium) 3: Substantial (High)
If there is no correlation, put “-“
CO/PO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
CO1 1 - - 1 - - - - - - - - 2 1 1
CO2 2 1 2 1 - - - - - - - - 1 2
CO3 1 1 - 2 - - - - - - - - 1 2 1
CO4 - 1 1 - - - - - - - - - 2 1 2
AIR-
course 1 1 1 1 - - - - - - - - 1 1 1
average
Justification of CO-PO Mapping:

JUSTIFICATION OF CO-PO MATCHING

CO1 WITH PO1 According to CO1 students get basic knowledge of AI and its
applications. So it is slightly correlated to PO1.

CO2 WITH PO2 According to CO2 students get basic knowledge of smart system
using different informed search / uninformed search or heuristic
approaches. So it is moderately correlated to PO2.

CO3 WITH PO3 According to CO3 students get knowledge of Identify


knowledge of problem. So it is moderately correlated to PO3.

CO4 WITH PO4 According to CO4 students get knowledge of the algorithms to
solve AI problems. So it is slightly correlated to PO4.

Course Assessment and Evaluation:

What When (Frequency Max Evidence Contributing


in the course) Marks Collected to Course
Outcomes
Directassessmentmethods

Class Tests Two times 20 Answer 1,2,3


CIE*

Assignment 03(After completion 20 Record book 2,3


of Two units) each
Oral test Once 20 Rubric for the 5
oral
Pre-University End of course TE & Answer 1,2,3,4,5
Examination of BE -70 Books
three hour duration FE &
SE- 50
University End of course 70 1,2,3,4,5
SEE~

Examination of TE (Answering 5 out of


Result
& BE two and half 10 questions)
Declared by
hour duration
University
FE & SE two (Answering 4 out of 50
hours duration 8 questions)
Course Exit End of the course -- Questionnaire 1,2,3,4,
Survey 5Effectivenes
s of delivery
of
instructions
and
assessment
methods

*Continuous Internal Evaluation ~Semester End Evaluation

Questions for CIE and SEE will be designed to evaluate the various educational components
(Blooms taxonomy) such as:

Remembering and understanding the course contents (weightage: 60%)


Developing and debugging the programming skills and applying the knowledge acquired
from the course (weightage: 40%)

Course Exit Feedback Analysis: The exit survey was carried out using online with “Survey
Monkey” software package. The printouts of survey details are attached.

Total number of students who gave the survey:


Course Outcome Weighted Average (on the scale of 1 to 3)
CO1
CO2
CO3
CO4
CO5

Attainment of Course Outcomes by Direct and Indirect Methods:


EE: Exceed Expectations (Attain. > 5% of goal), ME: Meet Expectations (Attain.
between -5% < goal < 5%), BE: Below Expectation (Attain. < 5% of goal), NA: Can Not
Rate
Assessment Tool
Indirect Method
Direct Method

Course Outcome Clas Test Presentat Cour


1&2 Lab ions se
Or P
and Wo Exit
al U
Assignm rk Surv
ents 1-4 ey
Memorize the
basic concepts of
dynamic C
behaviour of O1
simple processes.
To understand C
the feedback O2
and feed
forward
controllers and
its mechanism.
Develop and
design of single- C
loop feedback O3
control systems
Describe/
Understand
and compare
Stability
Analysis, of C
feed-back O4
systems and
Frequency
response
analysis
Improve
written, oral,
and presentation
communication
skills related to C
stability O5
analysis,
advance control
system and plan
wide control.

*Goal: Assume 30% of the students score more than 60%, 60% of the students score between
40 & 60% and 10% of the students score less than 40% of marks. Thus,

Goal = ((30 x 5) + (60 x 3) + 10 x 1)) / 5 = 68%

**Average Attainment (%) = (0.8  Direct Method Average %) + (0.2  Indirect Method
Average %)

(Note: For the calculation of CO attainment level, 80% weightage is given to the direct
assessment method and 20% weightage is given to the indirect assessment method.)

Remarks:
PO Attainment of AIR :

Program Outcomes
Course PO PO PO PO PO PO PO1
Outcomes PO1 PO2 PO3 4 5 6 7 8 9 PO10 1 PO12
CO1 1
CO2 2
CO3 2
CO4 2
PO attainment 1.75
with Compiler
Course:

B. E. Computer Engineering
“ARTIFICIAL INTELLIGENCE AND ROBOTICS -[410242] ”
Final Year, Semester VII
Prepared by the Course Coordinator: Mr. Devidas S. Thosar
Unit wise Question Bank

Unit – 1: Introduction

Q.1. Define automation and mention benefits of industrial automation.


Q.2. Compare and contrast different types of uninformed or heuristic search strategies.
Q.3. Explain different types of sensors and their implications for robot design.
Q.4. Explain in brief the historical development of artificial intelligence.
Q.5. Explain in brief about robotics? And state its applications.
Q.6. Sketch and explain the functions basic building blocks of automation.
Q.7. Explain A* algorithm with example.

Unit – 2: Problem Decomposition and Planning

Q.1 Explain Backtracking and Look ahead Strategies.


Q.2 ExplainUnified Framework ?
Q.3 Explain Forward and Backward State Space Planning ?
Q.4 Short Note on N-Queens problem?
Q.5 Explain role of STRIPS in problem ?

Unit – 3: Logic and Reasoning

Q.1 Explain Knowledge Based Reasoning.


Q.2 Explain unification algorithm.
Q.3 Explain with an example backward chaining.
Q.4 Explain with an example semantic network.
Q.5 Explain in brief the building blocks of conceptual dependency.

Unit – 4: Natural Language Processing and ANN

Q.1 Explain Natural Language Processing with its applications.


Q.2 What is mean by Information Retrieval? Explain the process of IR.
Q.3 Explain the concept of Artificial Neural Network.
Q.4 Explain the role of Big Data Information Retrieval.
Q.5 Explain Stages in natural language Processing.
Unit – 5: Robotics

Q.1 Explain Robotics.


Q.2 Explain Mobile Robot Hardware, Non Visual Sensors.
Q.3 Explain Robot System Control.
Q.4 Short Note on:
1. Contact Sensors
2. Inertial Sensors
3. Infrared Sensors
Q.5 Explain Sonar, Radar, laser Range finders, Biological Sensing

Unit – 6: Robots in Practice

Q.1 Explain Robot Pose Maintenance and Localization.


Q.2 Explain Global Localization.
Q.3 Explain Sensorial Maps, Topological Maps, Geometric Maps.
Q.4 Short Note on Space Robotics.
Q.5 Explain real time applications of robots.
B. E. Computer Engineering
“ARTIFICIAL INTELLIGENCE AND ROBOTICS -[410242] ”
Final Year, Semester VII
Prepared by the Course Coordinator: Mr. Devidas S. Thosar
Assignment Questions
Assignment No. 1
(On Unit –1&2)
Q.1. Define automation and mention benefits of industrial automation.
Q.2. Draw and explain the functions basic building blocks of automation.
Q.3. Explain Best First Search in detail.
Q.4. Explain Backtracking and Look ahead Strategies.
Q.5. Explain Forward and Backward State Space Planning ?
Q.6.Explain role of STRIPS in problem ?

Assignments No. 2
(On Unit –3 & 4)
Q.1 . Explain in brief the building blocks of conceptual dependency.
Q.2 . Explain backward chaining with suitable example.
Q.3. How Deductive Retrieval works? Explain with suitable example.
Q.4 .What is mean by Information Retrieval? Explain the process of IR.
Q.5 . Explain the role of Big Data Information Retrieval.
Q.6 . Explain Stages in natural language Processing.

Assignments No. 3
(On Unit –5 & 6)
Q.1. Explain Mobile Robot Hardware, Non Visual Sensors.Q.2
Q.2. Short Note on:
1. Contact Sensors
2. Inertial Sensors
Q.3 Explain Sonar, Radar, laser Range finders, Biological Sensing .
Q.4. Explain Sensorial Maps, Topological Maps, And Geometric Maps.
Q.5. Explain Robot Pose Maintenance and Localization.
Q.6.Expain the concept Mining Automation in Robotics in detail.
University Question Papers
Subject
Data Analytics (410243)
B. E. (Odd Semester), Session 2019-2020
Scheme, Syllabus and Evaluation Guidelines, Of “Data Analytics
(410241)”
SEMESTER – I

Teaching Scheme
Course Code Course Name
Lecture Tutorial Practical
Data Analytics
410243 03 - -

Examination scheme

Theory
Practical
Internal External

Course
Class test 1

Class test 2

Prelim

Test average

Attendance

Teacher Assessment

InSem

EndSem

Total

Internal

External

Total
Course
Code Name

Data Analytics 20 20 70 5 5 30 70 100 - - -


410243
410243: DATA ANALYTICS

Course Contents
UNIT – I INTRODUCTION AND LIFE CYCLE 08 Hours
Introduction: Big data overview, state of the practice in Analytics- BI Vs Data Science, Current
Analytical Architecture, drivers of Big Data, Emerging Big Data Ecosystem and new approach.
Data Analytic Life Cycle: Overview, phase 1- Discovery, Phase 2- Data preparation, Phase 3-
Model Planning, Phase 4- Model Building, Phase 5- Communicate
Results, Phase 6- Operationalize. Case Study: GINA

UNIT –II BASIC DATA ANALYTICS METHODS 08 Hours


Statistical Methods for Evaluation- Hypothesis testing, difference of means, wilcoxon rank–sum
test, type 1 type 2 errors, power and sample size, ANNOVA. Advanced Analytical Theory and
Methods: Clustering- Overview, K means- Use cases, Overview of methods, determining number
of clusters, diagnostics, reasons to choose and cautions.

UNIT – III ASSOCIATION RULES AND REGRESSION 08 Hours


Advanced Analytical Theory and Methods: Association Rules- Overview, a-priori
algorithm,evaluation of candidate rules, case study-transactions in grocery store, validation and
testing, diagnostics. Regression- linear, logistics, reasons to choose and cautions, additional
regression models.

UNIT IV CLASSIFICATION 08 Hours


Decision trees- Overview, general algorithm, decision tree algorithm, evaluating a decision tree.
Naïve Bayes – Bayes‟ Algorithm, Naïve Bayes‟ Classifier, smoothing, diagnostics. Diagnostics of
classifiers, additional classification methods.

UNIT – V BIG DATA VISUALIZATION 08 Hours


Introduction to Data visualization, Challenges to Big data visualization, Conventional data
visualization tools, Techniques for visual data representations, Types of data visualization,
Visualizing Big Data, Tools used in data visualization, Analytical techniques used in Big data
visualization.

UNIT – VI Advanced Analytics-Technology and Tools 08 Hours


Analytics for unstructured data- Use cases, Map Reduce, Apache Hadoop. The Hadoop Ecosystem-
Pig, HIVE, HBase, Mahout, NoSQL. An Analytics Project-Communicating, operationalizing,
creating final deliverables.

Text Books:
1. David Dietrich, Barry Hiller, “Data Science and Big Data Analytics”, EMC education
services, Wiley publications, 2012, ISBN0-07-120413-X
2. Ashutosh Nandeshwar , “Tableau Data Visualization Codebook”, Packt Publishing, ISBN
978-1-84968-978-6

Reference Books:
1. Maheshwari, Anil,Rakshit, Acharya, “Data Analytics”,McGraw Hill, ISBN:
789353160258.
2. Mark Gardner, “Beginning R: The Statistical Programming Language”, Wrox
Publication,ISBN: 978-1-118-16430-3
3. Luís Torgo, “Data Mining with R, Learning with Case Studies”, CRC Press, Talay and
Francis Group, ISBN9781482234893
4. Carlo Vercellis, “Business Intelligence - Data Mining and Optimization for Decision
Making”, Wiley Publications, ISBN: 9780470753866.
Evaluation Guidelines:
Internal Assessment (IA) : [CT (20Marks)+TA/AT(10 Marks)]
Class Test (CT) [20 marks]:- Three class tests, 20 marks each, will be conducted in a semester
and out of these three, the average of best two will be selected for calculation of class test marks.
Format of question paper is same as university.
Teacher Assessment TA [5 marks]: Three/four assignments will be conducted in the semester.
Teacher assessment will be calculated on the basis of performance in assignments, class test and
pre-university test
Attendance (AT) [5 marks]: Attendance marks will be given as per university policy.
Paper pattern and marks distribution for Class tests:
1. Question Paper will have 5 questions. Question 1 is objective question contain 5 sub
questions each carry 1 marks.
2. Attempt any 3 questions from remaining 4 question each carry 5 marks.
In semester Exam :
30 Marks in semester exam : As per university guidelines.
Pre-University Test [ 70 Marks]
Paper pattern and marks distribution for PUT: Same as End semester exam
End Semester Examination [ 70 Marks]:
Paper pattern and marks distribution for End Semester Exam: As per university guidelines.
Lecture Plan
Data Analytics (410243)

Lect. Name of Topic


No.
Unit – I Introduction and Life Cycle (08 Hrs)
1
Introduction: Big data overview, state of the practice
2 Analytics- BI Vs Data Science
3 Current Analytical Architecture
4 Equivalence of NFA and DFA
5 Emerging Big Data Ecosystem and new approach.
6 Data Analytic Life Cycle: Overview
phase 1- Discovery, Phase 2- Data preparation, Phase
7
3- Model Planning
Phase 4- Model Building, Phase 5- Communicate
8
Results, Phase 6- Opearationalize.
Unit – II Basic Data Analytic Methods (8 Hours)
9 Statistical Methods for Evaluation- Hypothesis testing,

10 difference of means
11 wilcoxon rank–sum test,
12 power and sample size
13 ANNOVA
14 Advanced Analytical Theory and Methods:
15 K means- Use cases, Overview of methods
determining number of clusters, diagnostics,reasons to choose
16 and cautions.
Assignment-I
Unit – III Association Rules and Regression(8 Hours)
17 Advanced Analytical Theory and Methods:
Association Rules-
18 Overview a-priori algorithm
19 evaluation of candidate rules
20 case study-transactions in grocery store
21 validation and testing,
22 Regression- linear, logistics, reasons to choose and cautions
23 Regression- linear, logistics, reasons to choose and cautions
24 Additional regression models.
UNIT – IV Classification (8 Hours)
25
Decision trees- Overview
26 general algorithm, decision tree algorithm,
27 evaluating a decision tree
28 Naïve Bayes – Bayes‟ Algorithm
29 Naïve Bayes‟ Classifier
30 Smoothing, diagnostics. Diagnostics of classifiers
31 Additional classification methods.
Revision
32
Assignment-II
UNIT V Big Data Visualization (8 Hours)
33
Introduction to Data visualization .
34 Challenges to Big data visualization,
35 Conventional data visualization tools
36 Techniques for visual data representations
37 Types of data visualization
38 Visualizing Big Data
39 Tools used in data visualization
40 Analytical techniques used in Big data visualization.
UNIT – VI Advanced Analytics-Technology and Tools (8
41 Hours)
Analytics for unstructured data- Use cases
42 Map Reduce, Apache Hadoop
43 The Hadoop Ecosystem- Pig
44 HIVE, HBase,
45 Mahout, NoSQL
46 An Analytics Project-Communicating
47 Operational zing, creating final deliverables.
Revision
48
Assignment-III
Course Delivery, Objectives, Outcomes
DATA ANALYTICS (410243)
Semester-VII

Course Objectives :

1. To develop problem solving abilities using Mathematics

2. To apply algorithmic strategies while solving problems

3. To develop time and space efficient algorithms


4. To study algorithmic examples in distributed, concurrent and parallel environments

Course Outcomes :

On completion of the course, student will be able to–

CO1-Write case studies in Business Analytic and Intelligence using mathematical models
CO2- Present a survey on applications for Business Analytic and Intelligence
CO3-Provide problem solutions for multi-core or distributed, concurrent/Parallel
environments

CO-PO Mapping
Course
Outcomes PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12

CO1 1 1

CO2 2 2

CO3 3 3
Justification of CO - PO Mapping
According to CO1 students get basic knowledge of Business
CO1 WITH PO1 Analytics. So its moderately related with PO1.

According to CO1 students will be able to analyze the problem and


CO1 WITH PO3 find the solution for same. So it is moderately correlated to PO3

According to CO2 students analyze the


Problems, will relate with business analytics and with team will work
CO2 WITH PO9 on it. So it is moderately correlated to PO9

Students will get the knowledge of real time projects used in BI. So it is
CO2 WITH PO11 moderately correlated to PO1

According to CO3 students will understand problem solutions for


multi-core or distributed, concurrent/Parallel environments. So its
CO3 WITH PO1 moderately related with PO1.
According to CO3 students will work in a team and will understand
CO3 WITH PO6 the environment for projects.
QUESTION BANK
Data Analytics (410243)
UNIT 1: Introduction and Life Cycle
Q.1 Explain big data with examples.
Q.2 What are the differences Difference between BI Vs Data Science?
Q.3 Why there is need of framing problem in data analysis lifecycle?
Q.4 Which different activities need to perform during discovery phase?
Q.5 What are the different Data repositories generally used by Analyst?
Q.6 Explain typical analytical architecture.
Q.7 Which are different tools used in Data Preparation Phase?

UNIT –II Basic Data Analytic Methods


Q.1 Explain hypothesis testing in brief.
Q.2 Write short note on difference of means
Q.3 What is student t-test?
Q.4 Explain type-I and type-II Errors.
Q.5 What is annova? Explain with example.
Q.6 Define K-Means. Write applications of K-mean.

UNIT: III: Association Rules and Regulations


Q.1 Explain Association rules.
Q.2 Explain in detail Apriori Algorithm with example.
Q.3 Write note on Validation and Testing in Data Analytics.
Q.4 Explain Regression analysis in detail.
Q.5 Explain Linear Regression.
Q.6 What is the reason to choose a particular regression and cautions?

UNIT IV: Classification


Q.1 Write short note decision tree algorithm.
Q.2 Explain Naive Bayes Classifier.
Q.3 Explain Smoothing process.
Q.4 Write note on diagnostics of classifiers
Q.5 Explain ID3 algorithm with pseudo code.
Q.6 Explain Bays theorem.
Q.7 Write short note on bagging, boosting, random forest, and support vector machines
Q.8 Explain conditional probability and posterior probability
UNIT V- Big Data Visualization
Q.1 What are the challenges and their possible solutions in big data visualization?
Q.2 Explain types of data visualization.
Q.3 What are the tools used in data visualization?
Q.4 What are the advanced analytical methods in big data visualization?

Unit VI Advanced Analytics- Technology and Tools


Q.1 Write note on analytics for unstructured data.
Q.2 Explain different components of HADOOP.
Q.3 Write note on Mahout.
Q.4 Explain the final Deliverables in analytic project.
Q.5 what are the different Key outputs from a successful analytic project?
Q.6 Enlist and explain components of the final presentations for the project sponsor
and an analyst audience
Q.7 what are the four major categories of NOSQLTools>
Q.8 Explain HDFS.
Q9. Explain MapReduce paradigm with example.
Assignamet 01
Q. No Questions Max. Unit no. as CO Blooms
Marks per syllabus Mapped Taxonomy
Leval
1 Give difference between BI and Data 4 1 CO 1 1
Science.
Ans:
BI Data Science

Perspective Looking backwords Looking forwards

Action Slice and dice interact

Expertise Business users Data Scientiest

Data Warehouse (structured) Distributed, real time

(structured/ unstructured)

Scope Unlimited Specific business question

Questions What happened? What will happen? What if?

Output Table , reports, adhoc views Answer, statistical model

Applicability Historic, possible confounding factor Futurecorrecting for influences

Tools SAP, SAS, Cognos Tableau, Revolution R enterprises

Automation High Low

Business Driver Decision support Planning

Business value Trend identification Hypothesis testing

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
2 Write short note on Wilcoxon rank 4 2 CO 1 2
sum test.
Ans:
The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used to compare two
related samples, matched samples, or repeated measurements on a single sample to assess whether their
population mean ranks differ (i.e. it is a paired difference test). It can be used as an alternative to the
paired Student's t-test (also known as "t-test for matched pairs" or "t-test for dependent samples") when
the population cannot be assumed to be normally distributed. A Wilcoxon signed-rank test is a non-
parametric test that can be used to determine whether two dependent samples were selected from
populations having the same distribution.
Assumptions
 Data are paired and come from the same population.
 Each pair is chosen randomly and independently.
 The data are measured on at least an interval scale when, as is usual, within-pair
differences are calculated to perform the test
Let the two populations again be popl and pop2, w ith independently random samples of size n 1 and n 2
respectively. The total number of observations is then N = n 1 + n 2• The first step of the Wilcoxon test
is to rank the set of observations from the t wo groups as if they came from one la rge group. The
smallest observation receives a rank of 1, the second smallest observation receives a rank of 2, and so on
with the largest observation being assigned the rank of N. Ties among the observations receive a rank
equal to the average of the ranks they span. The test uses ranks instead of numerical outcomes to avoid
specific assumptions about the shape of the distribution.

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
3 Why there is need of framing 2 1 CO 1 1
problem in data analysis
lifecycle?
Ans:
 Framing is the process of stating the analytics problem to be solved.
 Each team member may hear slightly different things related to the needs and the
problem and have somewhat different ideas of possible solutions For these reasons, it is
crucial to state the analytics problem.
 it is important to identify the main objectives of the project, needs to be achieved in
business terms, needs to be done to meet the needs.
 It is best practice to share the statement of goals and success criteria with the team and
confirm alignment with the project sponsor's expectations.
 Establishing criteria for both success and failure helps the participants avoid
unproductive effort and remain aligned with the project sponsors

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
4 What is Type I and Type II 5 2 CO 1 1
Errors?
Ans:
A hypothesis test may result in two types of errors, depending on whether the test accepts or rejects the
null hypothesis. These two errors are known as type I and type II errors.
• A type I error is the rejection of the null hypothesis when the null hypothesis is TRUE. The probability
of the type I error is denoted by the Greek letter alpha.
• A type II error is the acceptance of a null hypothesis when the null hypothesis is FALSE. The prob-
ability of the type II error is denoted by the Greek letter beta (β)) .
The significance level, as mentioned in the Student's t-test discussion, is equivalent to the type I error.
For a significance level such as alpha = 0.05, if the null hypothesis (Jm1 = m2) is TRUE, there is a 5%
chance that the observed T value based on the sample data will be large enough to reject the null
hypothesis. By selecting an appropriate significance level, the probability of committing a type I error
can be defined before any data is collected or analyzed.
The probability of committing a Type II error is somewhat more difficult to determine. If two population
means are truly not equal, the probability of committing a type II error will depend on how far apart the
means truly are. To reduce the probability of a type II error to a reasonable level, it is often necessary to
increase the sample size.

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
5 ApExplain any 2 applications of K 4 2 CO 1 2
means algorithm

Answer:
Application of K-Means includes Image processing, Medical, Customer Segmentation etc.
Image Processing:
Video is one example of the growing volumes of unstructured data being collected. Within each frame
of a video, k-means analysis ca n be used to identify objects in the video. For each frame, the task is to
determine which pixels are most similar to each other. The attributes of each pixel ca n include
brightness, color, and location, the x and y coordinates in the frame. With security video images, for
example, successive frames are examined to identify any changes to the clusters. These newly
identified clusters may indicate unauthorized access to a facility.
Medical
Patient attributes such as age, height, weight, systolic and diastolic blood pressures, cholesterol level,
and other attributes can identify naturally occurring clusters. These clusters could be used to target
individuals for specific preventive measures or clinical trial participation. Clustering, in general, is
useful in biology for the classification of plants and animals as well as in the field of human genetics.

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
6 Enlist decision that 2 2 CO 1 1
practitioner must make while
choosing K means algo.
Ans:
 What object attributes should be included in the analysis?
 What unit of measure (for example, miles or kilometers) should be used for each
attribute?
 Do the attributes need to be re-scaled so that one attribute does not have a
disproportionate effect on the results?
 What other considerations might apply?

Assignment 02
Q. No Questions Max. Unit no. as CO Blooms
Marks per syllabus Mapped Taxonomy
Leval
1 Explain different approaches to improve 4 3 CO 1 1
Apriori's efficiency
Ans: Some approaches to improve Apriori's efficiency:
• Partitioning: Any item set that is potential ly frequent in a transaction database must be frequent in at
least one of the partitions of the transaction database.
• Sampling: This extracts a subset of the data with a lower support threshold and uses the subset to
perform association rule mining.
• Transaction reduction: A transaction that does not contain frequent k-itemsets is useless in subsequent
scans and therefore can be ignored.
• Hash- based item set cou nting: If the corresponding hashing bucket count of a k-itemset is below a
certain threshold, the k-itemset cannot be frequent.
• Dynamic itemset counting: Only add new candidate itemsets when all of their subsets are estimated to
be frequent.

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
2 Explain linear regression. 4 3 CO 1 1
Ans:
Linear Regression: Linear regression is an analytical technique used to model the relationship
between several input variables and a continuous outcome variable. A key assumption is that the
relationship between an input variable and the outcome variable is linear.
The physical sciences have well known linear models, such as Ohm 's Law, which states that the
electrical current flowing through a resistive circuit is linearly proportional to the voltage applied
to the circuit.
A linear regression model is a probabilistic one that accounts for the randomness that can affect
any particular outcome. Based on known input values, a linear regression model provides the
expected value of the outcome variable based on the values of the input variables, but some
uncertainty may remain in predicting any particular outcome.

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
3 Define 1) support count 2 3 CO 1 1
2) Apriori property
Ans:
1. Support count: Support / occurrence frequency of an itemset is the number of transactions that
contain the itemset.
2. Apriori property: All nonempty subsets of a frequent itemset must also be frequent.

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
4 Define Entropy & Information gain 4 4 CO 1 1
Ans:
Entropy,:which measures the impurity of an attribute,.
Simply it is a measure of the randomness in the information being processed. The higher the
entropy, the harder it is to draw any conclusions from that information.

Information gain: which measures the purity of an attribute


Simply is the amount of information that's gained by knowing the value of the attribute, which
is the entropy of the distribution before the split minus the entropy of the distribution after it. The
largest information gain is equivalent to the smallest entropy.

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
5 Explain any 2 decision tree algorithm
4 in 4 CO 1 1
short.

Ans:
1: ID3
The ID3 algorithm begins with the original set as the root node. On each iteration of the
algorithm, it iterates through every unused attribute of the set and calculates the entropy (or
information gain ) of that attribute. It then selects the attribute which has the smallest entropy (or
largest information gain) value. The set is then split or partitioned by the selected attribute to
produce subsets of the data.
Recursion on a subset may stop in one of these cases:
 every element in the subset belongs to the same class
 there are no more attributes to be selected, but the examples still do not belong to the
same class.
 there are no examples in the subset

2: C4.5
The C4.5 algorithm is improvements in the ID3 algorithm. The C4.5 algorithm can handle
missing data. If the training records contain unknown attribute values, the C4 .5 evaluates the
gain for an attribute by considering only the records w here the attribute is defined. Both
categorical and continuous attributes are supported by C4.5. Values of a continuous variable are
sorted and partitioned. For the corresponding records of each partition, the gain is calculated, and
the partition that maximizes the gain is chosen for the next split.

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
6 Explain Baye's Theorem 2 4 CO 1 1
Baye's Theorem
Bayes' Theorem is named after Thomas Bayes. There are two types of probabilities
 Posterior Probability [P(H/X)]
 Prior Probability [P(H)]
where X is data tuple and H is some hypothesis. According to Bayes' Theorem,
P(H/X)= {P(X/H)*P(H) }/ P(X)

Assignment 03
Q. No Questions Max. Unit no. as CO Blooms
Marks per syllabus Mapped Taxonomy
Leval
1 What are the challenges in Big data 4 5 CO 1 1
visualization?

Answer:
Problems for big data visualization :
•Visual noise: Most of the objects in data-set are too relative to each other. Users cannot divide
them as separate objects on the screen.
•Information loss: Reduction of visible data sets can be used, but leads to information loss.
•Large image perception: Data visualization methods are not only limited by aspect ratio and
resolution of device, but also by physical perception limits.
•High rate of image change: Users observe data and cannot react to the number of data change
or its intensity on display.
•High performance requirements: It can be hardly noticed in static visualization because of
lower visualization speed requirements--high performance requirement.

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
2 Explain tools used in data 4 5 CO 1 1
visualization.

Answer:
common tools used in data visualization are
1. R (Base package, lattice, ggplot2) 2.Tableau 3. DataHero 4. Chart.js 5. Dygraphs
Tableau is a Data Visualisation tool that is widely used for Business Intelligence but is not limited to
it. It helps create interactive graphs and charts in the form of dashboards and worksheets to gain
business insights Visualisation in Tableau is possible through dragging and dropping Measures and
Dimensions onto these different Shelves.
Rows and Columns : Represent the x and y – axis of your graphs / charts.
Filter : Filters help you view a strained version of your data. For example, instead of seeing the
combined 0Sales of all the Categories, you can look at a specific one, such as just Furniture.
Pages :Pages work on the same principle as Filters, with the difference that you can actually see the
changes as you shift between the Paged values.Remember that Rosling chart? You can easily make
one of your own using Pages.
Marks : The Marks property is used to control the mark types of your data. You may choose to
represent your data using different shapes, sizes or text.

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
3 Enlist possible solutions to 2 5 CO 1 1
big data visualization

1. Meeting the need for speed: One possible solution is hardware. Increased memory and
powerful parallel processing can be used.
2. Understanding the data: One solution is to have the proper domain expertise in place.
3. Addressing data quality: It is necessary to ensure the data is clean through the process of data
governance or information management.
4. Displaying meaningful results: One way is to cluster data into a higher-level view where
smaller groups of data are visible and the data can be effectively visualized.

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
4 Enlist and explain four main 4 6 CO 1 1
deliverable from a successful analytic
project?
Ans:
• Presentation for Project Sponsors contains high-level takeaways for executive-level
stakeholders, with a few key messages to aid their decision-making process. Focus on clean, easy
visua ls for the presenter to explain and for the viewer to grasp.
• Presentation for Analysts, which describes changes to business processes and reports. Data
scientists reading this presentation are comfortable with technical graphs (such as Receiver
Operating Characteristic [ROC) curves, density plots, and histograms) and will be interested in
the details.
• Code for technical people, such as engineers and others managing the prod uction environment
• Technical specifications for implementing the code

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
5 Enlist use cases for MapReduce. 4 6 CO 1 1
Explain any one of them.
Ans:
1. IBM Watson
2. Linkedln
3. Yahoo
To educate Watson, Hadoop was utilized to process various data sources such as encyclopedias,
dictionaries, news wire feeds, literature, and the entire contents of Wikipedia [2]. For each clue
provided during the game, Watson had to perform the following tasks in less than three seconds
 Deconstruct the provided clue into words and phrases
 Establish the grammatical relationship between the words and the phrases
 Create a set of similar terms to use in Watson's search for a response
 Use Hadoop to coordinate the search for a response across terabytes of data
 Determine possible responses and assign their likelihood of being correct
 Actuate the buzzer
 Provide a syntactically correct response in English
Among other applications, Watson is being used in the medical profession to diagnose patients
and provide treatment recommendations

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
6 Enlist Four major categories 2 6 CO 1 1
of NoSQL tools.
Ans:
Four major categories of NoSQL tools are
1. Key/value stores : which contain data (the value) that can be simply accessed by a given
identifier (the key)
2. Document stores: are useful when the value of the key/ value pair is a file and the file itself is
self describing
3. Column family: stores are useful for sparse datasets, records with thousands of columns but
only a few columns have entries.
4. Graph databases: are intended for use cases such as networks, where there are items (people
or web page links) and relationships between these items
Class test Question Papers
Sir Visvesvaraya Institute Of Technology, Nashik

Computer Engineering Department

CLASS TEST- I
(AY 2018-19)
Branch: Computer Engineering Department Date:
Semester: I Duration: 1 hour
Subject: Data Analytics(410243)) Max. Marks: 20M

Note:
1. All Questions are compulsory
2. Bloom’s Taxonomy level: Bloom Levels (BL): 1. Remember 2. Understand 3. Apply 4. Create
3. All questions are as per course outcomes
4. Assume suitable data wherever is required.

Q. No Questions Max. Unit no. as CO Blooms


Marks per syllabus Mapped Taxonomy
Leval
1 What is Data Analytics? 5 1 CO 1 2
2 Give difference between BI and Data 5 1 CO 1 1
Science.
3 Explain the Model Planning phase 5 1 CO 1 2
from Data Analytic Life Cycle.

4 Give the new approach for Big Data 5 1 CO 1 3


Ecosystem.

Solution:
What is Data Analytics?
Ans:
Data analytics refers to qualitative and quantitative techniques and processes used to enhance
productivity and business gain. Data is extracted and categorized to identify and analyze
behavioral data and patterns, and techniques vary according to organizational requirements.
Data analytics is also known as data analysis.
Data analytics is primarily conducted in business-to-consumer (B2C) applications. Global
organizations collect and analyze data associated with customers, business processes, market
economics or practical experience. Data is categorized, stored and analyzed to study purchasing
trends and patterns.
Evolving data facilitates thorough decision-making. For example, a social networking website
collects data related to user preferences, community interests and segment according to specified
criteria such as demographics, age or gender. Proper analysis reveals key user and customer
trends and facilitates the social network's alignment of content, layout and overall strategy.

Q 2. Give difference between BI and Data Science.


Ans:
BI Data Science

Perspective Looking backwords Looking forwards

Action Slice and dice interact

Expertise Business users Data Scientiest

Data Warehouse (structured) Distributed, real time

(structured/ unstructured)

Scope Unlimited Specific business question

Questions What happened? What will happen? What if?

Output Table , reports, adhoc views Answer, statistical model

Applicability Historic, possible confounding factor Futurecorrecting for influences

Tools SAP, SAS, Cognos Tableau, Revolution R enterprises

Automation High Low

Business Driver Decision support Planning

Business value Trend identification Hypothesis testing

Explain the Model Planning phase from Data Analytic Life Cycle.
Ans: The data science team identifies candidate models to apply to the data for clustering,
classifying, or finding relationships in the data depending on the goal of the project
Some of the activities to consider in this phase include the following:
• Assess the structure of the data-sets. The structure of the data sets is one factor that dictates the
tools and analytical techniques for the next phase. Depending on whether the team plans to
analyze textual data or transnational data, for example, different tools and approaches are
required.
• Ensure that the analytical techniques enable the team to meet the business objectives and accept
or reject the working hypotheses.
• Determine if the situation warrants a single model or a series of techniques as part of a larger
analytic workflow.
In many cases, stakeholders and subject matter experts have instincts and hunches about what
the data science team should be considering and analyzing. Likely, this group had some
hypothesis that led to the genesis of the project. Often, stakeholders have a good grasp of the
problem and domain, although they may not be aware of t he subtleties within the data or the
model needed to accept or reject a hypothesis.
Q.4 Give the new approach for Big Data Ecosystem.
Ans:
Organizations and data collectors are realizing that the data they can gather from
individuals contains intrinsic value and, as a result, a new economy is emerging. As this new
digital economy continues to evolve, the market sees the introduction of data vendors and data
cleaners that use crowd sourcing to test the outcomes of machine learning techniques
1. Data devices and the "Sensornet" gather data from multiple locations and continuously generate
new data about this data. For each gigabyte of new data created, an additional petabyte of data is created
about that data.

2. Data collectors include sample entities that collect data from the device and users.

Data results from a cable TV provider tracking the shows a person watches, which TV channels someone
will and will not pay for to watch on demand, and the prices someone is willing to pay for premium TV
content

3. Data aggregators make sense of the data collected from the various entities from the "SensorNet" or
the "Internet of Things." These organizations compile data from the devices and usage patterns collected
by government agencies, retail stores, and websites. ln turn, they can choose to transform and package the
data as products to sell to list brokers, who may want to generate marketing lists of people who may be
good targets for specific ad campaigns.

4. Data users and buyers :These groups directly benefit from the data collected and aggregated by others
within the data value chain.

CLASS TEST- II(AY 2018-19)


Branch: Computer Engineering Date: -09-2018
Semester: VII Duration: 1 hour
Subject: Data Analytics(410243) Max. Marks: 20M

Note: 1. Attempt All Questions in Section A, 2. Attempt any 3 Questions in Section B


3. All questions are as per course outcomes4. Assume suitable data wherever is
required.

Questio Questions Max. Unit no. CO Bloom’s


n No. Mark as per Mapped Taxonom
s syllabus y Level
Section A
01. What are the five V’s of Big Data? 01 05 CO2 1
A. Volume B. Velocity
C. Variety D. All the above
02. A significant difference between data visualiza- 01 05 CO1 2
tion methods and traditional text-based data
methods is that _____
a) Data Text-based data is more detailed
and therefore more accurate than data
visualization presentations
b) visualization methods are only necessary
with complex data
c) Data visualization brings better under-
standing much quicker and easier than
text-based data
d) The volumes comprising the text-based
data depict the complete representation
of the situation while the visuals in data
visualization do not
03. What are the main components of Big Data? 01 05 CO1 1
A. MapReduce B. HDFS
C. YARN D. All of these
04. Decision Nodes are represented by 01 04 CO2 2
____________
A. Disks B. Squares
C. Circles D. Triangles
05. What are the different features of Big Data An- 01 05 CO3 1
alytics?
A. Open-Source B. Scalability
C. Data Recovery D. All the above
Section B
01. What is Naive Bayes? 05 04 CO1 2
02. What are the challenges and their possible 05 05 CO3 2
solutions in Big data visualization?
03. Explain tools used in data visualization. 05 05 CO3 2

04. Explain Decision Tree algorithm in detail. 05 04 CO3 3


ection A
Q 1 What are the five V’s of Big Data?
A. Volume B. Velocity
C. Variety D. All the above
Answer: D

Q2: significant difference between data visualization methods and traditional text-based
data methods is that _____
A. Data Text-based data is more detailed and therefore more accurate than data visualization
presentations
B. visualization methods are only necessary with complex data
C. Data visualization brings better understanding much quicker and easier than text-based data
D. The volumes comprising the text-based data depict the complete representation of the
situation while the visuals in data visualization do not
Ans: C

Q3: What are the main components of Big Data?


A. MapReduce B. HDFS
C. YARN D. All of these
Ans: D

Q4:Decision Nodes are represented by ____________


A. Disks B. Squares
C. Circles D. Triangles
Ans: D

Q5: What are the different features of Big Data Analytics?


A. Open-Source B. Scalability
C. Data Recovery D. All the above
Ans: D

Section B:
Q1: What is Naive Bayes?
Ans:A Naive Bayes classifier is a probabilistic machine learning model that’s used for
classification task. The crux of the classifier is based on the Bayes theorem.
Naive Bayes is a classification algorithm for binary (two-class) and multi-class classification
problems. The technique is easiest to understand when described using binary or categorical
input values.
It is called naive Bayes or idiot Bayes because the calculation of the probabilities for each
hypothesis are simplified to make their calculation tractable. Rather than attempting to calculate
the values of each attribute value P(d1, d2, d3|h), they are assumed to be conditionally
independent given the target value and calculated as P(d1|h) * P(d2|H) and so on.
This is a very strong assumption that is most unlikely in real data, i.e. that the attributes do not
interact. Nevertheless, the approach performs surprisingly well on data where this assumption
does not hold.
Representation Used By Naive Bayes Models
The representation for naive Bayes is probabilities.
A list of probabilities are stored to file for a learned naive Bayes model. This includes:
Class Probabilities: The probabilities of each class in the training dataset.
Conditional Probabilities: The conditional probabilities of each input value given each class
value.

Q2: What are the challenges and their possible solutions in Big data visualization?
Ans:
Problems in big data visualization :
•Visual noise: Most of the objects in data-set are too relative to each other. Users cannot divide
them as separate objects on the screen.
•Information loss: Reduction of visible data sets can be used, but leads to information loss.
•Large image perception: Data visualization methods are not only limited by aspect ratio and
resolution of device, but also by physical perception limits.
•High rate of image change: Users observe data and cannot react to the number of data change or
its intensity on display.
•High performance requirements: It can be hardly noticed in static visualization because of lower
visualization speed requirements--high performance requirement.
perceptual and interactive scalability are also challenges of big data visualization. Visualizing
every data point can lead to over-plotting and may overwhelm users’ perceptual and cognitive
capacities; reducing the data through sampling or filtering can elide interesting structures or
outliers. Querying large data stores can result in high latency, disrupting fluent interaction.
Potential solutions to some challenges or problems about visualization and big data were
presented :
1. Meeting the need for speed: One possible solution is hardware. Increased memory and
powerful parallel processing can be used. Another method is putting data in-memory but using a
grid computing approach, where many machines are used.
2. Understanding the data: One solution is to have the proper domain expertise in place.
3. Addressing data quality: It is necessary to ensure the data is clean through the process of data
governance or information management.
4. Displaying meaningful results: One way is to cluster data into a higher-level view where
smaller groups of data are visible and the data can be effectively visualized.
5. Dealing with outliers: Possible solutions are to remove the outliers from the data or create a
separate chart for the outliers.

Q3: Explain tools used in data visualization.


Ans:common tools used in data visualization are
1. R (Base package, lattice, ggplot2) 2.Tableau 3. DataHero 4. Chart.js 5. Dygraphs
Tableau is a Data Visualisation tool that is widely used for Business Intelligence but is not
limited to it. It helps create interactive graphs and charts in the form of dashboards and
worksheets to gain business insights Visualisation in Tableau is possible through dragging and
dropping Measures and Dimensions onto these different Shelves.
Rows and Columns : Represent the x and y – axis of your graphs / charts.
Filter : Filters help you view a strained version of your data. For example, instead of seeing the
combined Sales of all the Categories, you can look at a specific one, such as just Furniture.
Pages :Pages work on the same principle as Filters, with the difference that you can actually see
the changes as you shift between the Paged values.Remember that Rosling chart? You can easily
make one of your own using Pages.
Marks : The Marks property is used to control the mark types of your data. You may choose to
represent your data using different shapes, sizes or text.
R supports four different graphics systems: base graphics, grid graphics, lattice graphics, and
ggplot2. Base graphics is the default graphics system in R, the easiest of the four systems to learn
to use, and provides a wide variety of useful tools, especially for exploratory graphics where we
wish to learn what is in an unfamiliar dataset.
Q4: Explain Decision Tree algorithm in detail.
Ans:
A decision tree (also called prediction tree) uses a tree structure to specify sequences of decisions
and consequences.
The prediction can be achieved by constructing a decision tree with test points and branches. At
each test point, a decision is made to pick a specific branch and traverse down the tree.
Due to its flexibility and easy visualization, decision trees are commonly deployed in data
mining applications for classification purposes.

The input values of a decision tree can be categorical or continuous


A decision tree employs a structure of test points (called nodes) and branches, which represent
the decision being made. A node without further branches is called a leaf n ode. The leaf nodes
return class labels and, in some implementations, they return the probability scores.
Decision trees have two varieties: classification trees and regression trees. Classification trees
usually apply to output variables that are categorical.
Regression trees can apply to output variables that are numeric or continuous, such as the
predicted price of a consumer good or the likelihood a subscription will be purchased.
The term branch refers to the outcome of a decision and is visualized as a line connecting two
nodes. If a decision is numerical, the "greater than" branch is usually placed on the right, and the
"less than" branch is placed on the left.
Internal nodes are the decision or test points. Each internal node refers to an input variable or an
attribute. The top internal node is called the root.

Prelim Exam (A.Y. 2018-19)


Branch: BE Computer Date:
Semester: I Duration: 2:30 hour
Subject: Data Analytics (2015 Pattern) Max. Marks: 70
Note: (1) Answer Q. 1 or Q. 2, Q. 3 or Q. 4, Q. 5 or Q. 6, Q. 7 or Q. 8, Q. 9 or Q. 10.
(2) Figures to the right indicate full marks.
(3) Neat diagrams must be drawn wherever necessary.
(4) Assume suitable data, if necessary
Questions Max. CO Bloom's
Marks mapped Taxonomy
Level
Q.1 a. Write short note on Big Data. 10 02 2
b. Explain Current Analytical Architecture with suitable diagram. 03 2
OR
Q.2 a. Explain hypothesis testing in brief. 10 01 2
b. Explain types of Regression Analysis. 03 3
Q.3 a. Write Diagnostics of Apriori algorithm. 10 03 3
b. Explain Data Analytical Life Cycle with all the six phases. 03 2
OR
Q.4 a. Write applications of K-Means. 10 02 1
b. Explain Linear Regression. 03 2
Q.5 a. Explain decision tree algorithm in detail. 16 03 3
b. What are the challenges and their possible solutions in Big data visualization? 03 2
OR
Q.6 a. Explain tools used in data visualization. 16 03 3
b. Explain different components of HADOOP. 01 1
Q.7 a. Explain Smoothing process. 16 02 2
b. What are the visual data representation techniques? 02 3
OR
Q.8 a. Explain Data visualization with Tableau. 16 02 2
b. Write short note on Hadoop Distributed File System. 03 1
Q.9 a. Explain HIVE with its architecture. 18 03 1
b. Write note on diagnostics of classifiers. 03 1
OR
Q.10 a. Explain types of data visualization. 18 02 3
b. What is HBase? Discuss various HBase Data Model and application. 02 2

Solution:

Q1 a. Write short note on Big Data.


Ans:Definition: Big Data is data whose scale, distribution, diversity, and/ or timeliness require the use
of new technical architectures and analytic to enable insights that unlock new sources of business
value.
Big Data is a term used to describe a collection of data that is huge in size and yet growing
exponentially with time. In short such data is so large and complex that none of the traditional data
management tools are able to store it or process it efficiently.
Mobile phones, social media, imaging technologies to determine a medical diagnosis-all these
and more create new data, and that must be stored somewhere for some purpose. Devices and sensors
automatically generate diagnostic information that needs to be stored and processed in real time.
Analyzing such vast amounts of data needed to transform business, government, science, and
everyday life.
Example:
1. Credit card companies monitor every purchase their customers make and can identify fraudulent
purchases with a high degree of accuracy using rules derived by processing billions of transactions.
2. Mobile phone companies analyze subscribers' calling patterns to determine so company can offer
the subscriber an incentive to remain in his contract.

Big Data characteristics:


• Huge volume of data: Big Data can contains billions of rows and millions of columns.
• Complexity of data types and structures: Big Data reflects the variety of new data sources, formats,
and structures, including digital traces being left on the web and other digital repositories for
subsequent analysis.
• Speed of new data creation and growth: Big Data can describe high velocity data, with rapid data
ingestion and near real time analysis.

Big data can come in multiple forms, including structured and non-structured data such as financial
data, text files, multimedia files, and genetic mappings. Contrary to much of the traditional data
analysis performed by organizations, most of the Big Data is unstructured or semi-structured in nature,
which requires different techniques and tools to process and analyze. Analyzing structured data tends
to be the most familiar technique, a different technique is required to meet the challenges to analyze
semi-structured data (eg XML), quasi-structured , and unstructured data.

Q1. b Explain Current Analytical Architecture with suitable diagram.


Answer:
1. For data sources to be loaded into the data warehouse, data needs to be well understood, structured,
and normalized with the appropriate data type definitions. Although this kind of centralization enables
security, backup, and failure of highly critical data, it also means that data typically must go through
significant pre-processing and checkpoints before it can enter this sort of controlled environment,
which does not lend itself to data exploration and iterative analytic.
2. As a result of this level of control on the EDW, additional local systems may emerge in the form of
departmental warehouses and local data marts that business users create to accommodate their need
for flexible analysis. These local data marts may not have the same constraints for security and
structure as the main EDW and allow users to do some level of more in-depth analysis. However,
these one-off systems reside in isolation, often are not synchronized or integrated with other data
stores, and may not be backed up.

EWD:Enterprise Data Warehouse

3. Once in the data warehouse, data


is read by additional applications
across the enterprise for Bl and reporting purposes. These are high-priority operational processes
getting critical data feeds from the data warehouses and repositories.
4. At the end of this workflow, analysts get data provisioned for their downstream analytic. Because
users generally are not allowed to run custom or intensive analytic on production databases, analysts
create data extracts from the EDW to analyze data offline in R or other local analytical tools. Many
times these tools are limited to in-memory analytic on desktops analyzing samples of data, rather than
the entire population of a data set. Because these analyses are based on data extracts, they reside in a
separate location, and the results of the analysis-and any insights on the quality of the data or
anomalies-rarely are fed back into the main data repository.

Q.2 a. Explain hypothesis testing in brief.


Ans:When comparing populations, such as testing or evaluating the difference of the means from two
samples of data a common technique to assess the difference or the significance of the difference is
hypothesis testing.
The basic concept of hypothesis testing is to form an assertion and test it with data. When performing
hypothesis tests, the common assumption is that there is no difference between two samples. This
assumption is used as the default position for building the test or conducting a scientific experiment.
Statisticians refer to this as the null hypothesis (H0 ). The alternative hypothesis (HA) is that there is
a difference between two samples. For example, if the task is to identify the effect of drug A
compared to drug B on patients, the null hypothesis and alternative hypothesis would be this.
•H0: Drug A and drug B have the same effect on patients.
•HA: Drug A has a greater effect than drug B on patients
It is important to state the null hypothesis and alternative hypothesis, because misstating them is likely
to undermine the subsequent steps of the hypothesis testing process. A hypothesis test leads to either
rejecting the null hypothesis in favor of the alternative or not rejecting the null hypothesis.
Accuracy Forecast
H0: xModel X does not predict better than the existing model.
HA: Model X predicts better than the existing model.
Recommendation Engine
H0: Algorithm Y does not produce better recommendations than the current algorithm being used.
HA: Algorithm Y does not produce better recommendations than the current algorithm being used.
Once a model is built over the t raining data, it needs to be evaluated over the testing data to see
if the proposed model predicts better than the existing model currently being used. Th e null
hypothesis is that the proposed model does not predict better than the existing model. The alternative
hypothesis is that the proposed model indeed predicts better than the existing model. In accuracy
forecast, the null model could be that the sales of the next month are the same as the prior month. The
hypothesis test needs to evaluate if the proposed model provides a better prediction.

Q2 b. Explain types of Regression Analysis.


Ans: regression analysis attempts to explain the influence that a set of variables has on the outcome of
another variable of interest. Often, the outcome variable is called a dependent variable because the
outcome depends on the other variables. These additional variables are sometimes called the input
variables or the independent variables. Regression analysis is useful for answering the following kinds
of questions:
• What is a person's expected income?
• What is the probability that an applicant will default on a loan?
Linear regression is a useful tool for answering the first question, and logistic regression is a popular
method for addressing the second. This chapter examines these two regression techniques and
explains when one technique is more appropriate than the other.
Regression analysis is a useful explanatory tool that can identify the input variables that have the
greatest statistical influence on the outcome. With such knowledge and insight, environmental
changes can be attempted to produce more favorable values of the input variables. For example, if it is
found that the reading level of 10-year-old students is an excellent predictor of the students' success in
high school and a factor in their attending college, then additional emphasis on reading can be
considered, implemented, and evaluated to improve students' reading levels at a younger age.
Linear Regression: Linear regression is an analytical technique used to model the relationship between
several input variables and a continuous outcome variable. A key assumption is that the relationship
between an input variable and the outcome variable is linear.
The physical sciences have well known linear models, such as Ohm 's Law, which states that the
electrical current flowing through a resistive circuit is linearly proportional to the voltage applied to
the circuit.
A linear regression model is a probabilistic one that accounts for the randomness that can affect any
particular outcome. Based on known input values, a linear regression model provides the expected
value of the outcome variable based on the values of the input variables, but some uncertainty may
remain in predicting any particular outcome.

Logistic Regression
In linear regression modeling, the outcome variable is a continuous variable. As seen in the earlier
Income example, linear regression can be used to model the relationship between age and education to
income. Suppose a person's actual income was not of interest, but rather whether someone was
wealthy or poor. In such a case, when the outcome variable is categorical in nature, logistic regression
can be used to predict the likelihood of an outcome based on the input variables. Although logistic
regression can be applied to an outcome variable that represents multiple values

Q.3 a. Write Diagnostics of Apriori algorithm.


Ans:
Apriori algorithm is easy to understand and implement, some of the rules generated are uninteresting
or practically useless. Additionally, some of the rules may be generated due to coincidental
relationships between the variables. Measures like confidence, lift, and leverage should be used along
with human insights to address this problem.
Another problem with association rules is that, the team must specify the minimum support prior to
the model execution, which may lead to too many or too few rules.
The Apriori algorithm reduces the computational workload by only examining item-sets that meet the
specified minimum threshold. For each level of support, the algorithm requires a scan of the entire
database to obtain the result. Accordingly, as the database grows, it takes more time to compute in
each run.
Some approaches to improve Apriori's efficiency:
• Partitioning: Any item set that is potential ly frequent in a transaction database must be frequent in
at least one of the partitions of the transaction database.
• Sampling: This extracts a subset of the data with a lower support threshold and uses the subset to
perform association rule mining.
• Transaction reduction: A transaction that does not contain frequent k-item-sets is useless in
subsequentscans and therefore can be ignored.
• Hash- based item set counting: If the corresponding hashing bucket count of a k-item-set is below a
certain threshold, the k-item-set cannot be frequent.
• Dynamic item-set counting: Only add new candidate item-sets when all of their subsets are estimated
to be frequent.

Q.3 b. Explain Data Analytical Life Cycle with all the six phases.
Answer:
• Phase 1- Discovery: In Phase 1, the team learns the business domain, including relevant history such
as whether the organization or business unit has attempted similar projects in the past from which they
can learn. The team assesses the resources available to support the project in terms of people,
technology, time, and data. Important activities in this phase include framing the business problem as
an analytic challenge that can be addressed in subsequent phases and formulating initial hypotheses
(IHs) to test and begin learning the data.
• Phase 2- Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team
can work with data and perform analytic for the duration of the project. The team needs to execute
extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox.
The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT
process so t he team can work with it and analyze it. In t his phase, the team also needs to familiarize
itself with the data thoroughly and take steps to condition the data
• Phase 3-Model planning: Phase 3 is model planning, where the team determines the methods,
techniques, and workflow it intends to follow for the subsequent model building phase. The team
explores the data to learn about the relationships between variables and subsequently selects key
variables and the most suitable models.
• Phase 4-Model building: In Phase 4, the team develops data sets for testing, training, and production
purposes. In addition, in this phase the team builds and executes models based on the work done in the
model planning phase. The team also considers whether its existing tools will suffice for running the
models, or if it will need a more robust environment for executing models and work flows (for
example, fast hardware and parallel processing, if applicable).
• Phase 5-Communicate results: In Phase 5, the team, in collaboration with major stakeholders,
determines if the results of the project are a success or a failure based on the criteria developed in
Phase 1. The team should identify key findings, quantify the business value, and develop a narrative to
summarize and convey findings to stakeholders.
• Phase 6-0perationalize: In Phase 6, the team delivers final reports, briefings, code, and technical
documents. In addition, the team may run a pilot project to implement the models in a production
environment.

Q.4 a. Write applications of K-Means.


Answer:
Given a collection of objects each with n measurable attributes, k-means is an analytical technique
that, for a chosen value of k, identifies k clusters of objects based on the objects' proximity to the
center of the k groups. The center is determined as the arithmetic average (mean) of each cluster's n-
dimensional vector of attributes.
Application of K-Means includes Image processing, Medical, Customer Segmentation etc.
Image Processing:
Video is one example of the growing volumes of unstructured data being collected. Within each frame
of a video, k-means analysis ca n be used to identify objects in the video. For each frame, the task is to
determine which pixels are most similar to each other. The attributes of each pixel ca n include
brightness, color, and location, the x and y coordinates in the frame. With security video images, for
example, successive frames are examined to identify any changes to the clusters. These newly
identified clusters may indicate unauthorized access to a facility.
Medical
Patient attributes such as age, height, weight, systolic and diastolic blood pressures, cholesterol level,
and other attributes can identify naturally occurring clusters. These clusters could be used to target
individuals for specific preventive measures or clinical trial participation. Clustering, in general, is
useful in biology for the classification of plants and animals as well as in the field of human genetics.
Customer Segmentation
Marketing and sales groups use k-means to better identify customers who have similar behaviors and
spending patterns. For example, a wireless provider may look at the following customer attributes:
monthly bill, number of text messages, data volume consumed, minutes used during various daily
periods, and years as a customer. The wireless company could then look at the naturally occurring
clusters and consider tactics to increase sales or reduce the customer churn rate, the proportion of
customers who end their relationship with a particular company.

Q.4 b. Explain Linear Regression.


Answer:
Linear regression is an analytical technique used to model the relationship between several input
variables and a continuous outcome variable.
A key assumption is that the relationship between an input variable and the outcome variable is
linear. Although this assumption may appear restrictive, it is often possible to properly transform the
input or outcome variables to achieve a linear relationship between the modified input and outcome
variables.
Example: Ohm 's Law, which states that the electrical current flowing through a resistive circuit
is linearly proportional to the voltage applied to the circuit. Hence if the input values are known, the
value of the outcome variable is precisely determined.
linear regression models are useful in physical and social science applications where there may
be considerable variation in a particular outcome based on a given set of input values
Applications of linear regression in the real world include :
Medical: A linear regression model can be used to analyze the effect of a proposed radiation
treatment on reducing tumor sizes. Input variables might include duration of a single radiation
treatment, frequency of radiation treatment, and patient attributes such as age or weight.Demand
forecasting: Businesses and governments can use linear regression models to predict demand for
goods and services. Also models can be built to predict retail sales, emergency room visits, and
ambulance dispatches.
Real estate: can be used to model residential home prices as a function of the home's living area.
Such a model helps set or evaluate the list price of a home on the market. The model could be further
improved by including other input variables such as number of bathrooms, number of bed rooms, lot
size, school district ran kings, crime statistics, and property taxes.

Q.5 a. Explain decision tree algorithm in detail.


Answer:
A decision tree (also called prediction tree) uses a tree structure to specify sequences of decisions and
consequences.
The prediction can be achieved by constructing a decision tree with test points and branches. At each
test point, a decision is made to pick a specific branch and traverse down the tree.
Due to its flexibility and easy visualization, decision trees are commonly deployed in data mining
applications for classification purposes.
The input values of a
decision tree can be
categorical or continuous
A decision tree employs a structure of test points (called nodes) and branches, which represent the
decision being made. A node without further branches is called a leaf n ode. The leaf nodes return
class labels and, in some implementations, they return the probability scores.
Decision trees have two varieties: classification trees and regression trees. Classification trees usually
apply to output variables that are categorical.
Regression trees can apply to output variables that are numeric or continuous, such as the predicted
price of a consumer good or the likelihood a subscription will be purchased.
The term branch refers to the outcome of a decision and is visualized as a line connecting two nodes.
If a decision is numerical, the "greater than" branch is usually placed on the right, and the "less than"
branch is placed on the left.
Internal nodes are the decision or test points. Each internal node refers to an input variable or an
attribute. The top internal node is called the root.

Q.5 b. What are the challenges and their possible solutions in Big data visualization?
Answer:
Problems for big data visualization :
•Visual noise: Most of the objects in data-set are too relative to each other. Users cannot divide
them as separate objects on the screen.
•Information loss: Reduction of visible data sets can be used, but leads to information loss.
•Large image perception: Data visualization methods are not only limited by aspect ratio and
resolution of device, but also by physical perception limits.
•High rate of image change: Users observe data and cannot react to the number of data change
or its intensity on display.
•High performance requirements: It can be hardly noticed in static visualization because of
lower visualization speed requirements--high performance requirement.
perceptual and interactive scalability are also challenges of big data visualization. Visualizing
every data point can lead to over-plotting and may overwhelm users’ perceptual and cognitive
capacities; reducing the data through sampling or filtering can elide interesting structures or
outliers. Querying large data stores can result in high latency, disrupting fluent interaction.
Potential solutions to some challenges or problems about visualization and big data were
presented :
1. Meeting the need for speed: One possible solution is hardware. Increased memory and
powerful parallel processing can be used. Another method is putting data in-memory but using a
grid computing approach, where many machines are used.
2. Understanding the data: One solution is to have the proper domain expertise in place.
3. Addressing data quality: It is necessary to ensure the data is clean through the process of data
governance or information management.
4. Displaying meaningful results: One way is to cluster data into a higher-level view where
smaller groups of data are visible and the data can be effectively visualized.
5. Dealing with outliers: Possible solutions are to remove the outliers from the data or create a
separate chart for the outliers.
Q.6 a. Explain tools used in data visualization.
Answer:
common tools used in data visualization are
1. R (Base package, lattice, ggplot2) 2.Tableau 3. DataHero 4. Chart.js 5. Dygraphs
Tableau is a Data Visualisation tool that is widely used for Business Intelligence but is not limited to
it. It helps create interactive graphs and charts in the form of dashboards and worksheets to gain
business insights Visualisation in Tableau is possible through dragging and dropping Measures and
Dimensions onto these different Shelves.
Rows and Columns : Represent the x and y – axis of your graphs / charts.
Filter : Filters help you view a strained version of your data. For example, instead of seeing the
combined 0Sales of all the Categories, you can look at a specific one, such as just Furniture.
Pages :Pages work on the same principle as Filters, with the difference that you can actually see
the changes as you shift between the Paged values.Remember that Rosling chart? You can easily
make one of your own using Pages.
Marks : The Marks property is used to control the mark types of your data. You may choose to
represent your data using different shapes, sizes or text.
R supports four different graphics systems: base graphics, grid graphics, lattice graphics, and ggplot2.
Base graphics is the default graphics system in R, the easiest of the four systems to learn to use, and
provides a wide variety of useful tools, especially for exploratory graphics where we wish to learn
what is in an unfamiliar dataset.

Q.6 b. Explain different components of HADOOP.


Answer:
1. Hadoop Distributed File System: HDFS is the primary storage system of Hadoop. HDFS is a java
based file system that provides scalable, fault tolerance, reliable and cost efficient data storage for Big
data. HDFS is a distributed filesystem that runs on commodity hardware.
There are two major components of Hadoop HDFS- NameNode and DataNode.
NameNode stores Metadata i.e. number of blocks, their location, on which Rack.
DataNode also known as Slave & responsible for storing actual data in HDFS. Datanode performs
read and write operation as per the request of the clients.
2. Hadoop MapReduce is a software framework for easily writing applications that process the vast
amount of structured and unstructured data stored in the Hadoop Distributed File system. MapReduce
programs are parallel in nature, thus are very useful for performing large-scale data analysis using
multiple machines in the cluster.
3. Hadoop YARN (Yet Another Resource Negotiator) provides the resource management. it is
responsible for managing and monitoring workloads. It allows multiple data processing engines such
as real-time streaming and batch processing to handle data stored on a single platform.
4. Hive, is an open source data warehouse system for querying and analyzing large data-sets stored in
Hadoop files. Hive do three main functions: data summarization, query, and analysis. Hive use
language called HiveQL (HQL), which is similar to SQL. HiveQL automatically translates SQL-like
queries into MapReduce jobs which will execute on Hadoop.
5.Pig is a high-level language platform for analyzing and querying huge dataset that are stored in
HDFS. Pig as a component of Hadoop Ecosystem uses PigLatin language.

Q.7 a. Explain Smoothing process.


Ans:
Data smoothing is done by using an algorithm to remove noise from a data set. This allows important
patterns to stand out. Data smoothing can be used to help predict trends, such as those found in
securities prices.
P(ai l ci) is the conditional probabilities of each attribute ai given each class label ci,
P(ci | A): is the Bayes' theorem assigns a classified label to an object with multiple attributes A=
{a1 ,a2 , ••. ,am} such that the label corresponds to the largest value.
If one of the attribute values does not appear with one of the class labels within the training set, the
corresponding P(ai | ci} will equal zero. When this happens, the resulting P(c i | A) from multiplying all
the P(aj | cj)(j ε [1,m]} immediately becomes zero regardless of how large some of the conditional
probabilities are. Therefore overfitting occurs. Smoothing techniques can be employed to adjust the
probabilities of P(aj | cj} and to ensure a nonzero value of P(c;IA). A smoothing technique assigns a
small nonzero probability to rare events not included in the training data-set

Q.7 b. What are the visual data representation techniques?


most fundamental and common data representations techniques
Data for Visualization Type of Chart
Components (parts of whole) Pie chart
Item Bar chart
Time series Line chart
Frequency Line chart or histogram
Correlation Scatter plot, side-by-side bar charts

Pie charts are designed to show the components, or parts relative to a whole set of things. A pie chart
is also the most commonly misused kind of chart. If the situation calls for using a pie chart, employ it
only when showing only 2- 3 items in a chart, and only for sponsor audiences.
Bar charts and line charts are used much more often and are useful for showing comparisons and
trends over time. Even though people use vertical bar charts more often, horizontal bar charts allow an
author more room to fit the text labels. Vertical bar charts tend to work we ll when the labels are
small, such as when showing comparisons over time using years.
For frequency, histograms are useful for demonstrating the distribution of data to an analyst audience
or to data scientists. As shown in the pricing example earlier in this chapter, data distributions are
typically one of the first steps when visualizing data to prepare for model planning. To qualitatively
evaluate correlations, scatter plots ca n be useful to compare relationships among variables.

Q.8 a. Explain Data visualization with Tableau.


Ans:
Tableau is a Data Visualisation tool that is widely used for Business Intelligence but is not limited to
it. It helps create interactive graphs and charts in the form of dashboards and worksheets to gain
business insights. And all of this is made possible with gestures as simple as drag and drop.
Tableau can connect to files, relational and Big Data sources to acquire and process data. The software
allows data blending and real-time collaboration, which makes it very unique. It is used by businesses,
academic researchers, and many government organizations for visual data analysis. It is also
positioned as a leader Business Intelligence and Analytics Platform in Gartner Magic Quadrant.
Tableau Features
e)Speed of Analysis − As it does not require high level of programming expertise, any user
with access to data can start using it to derive value from the data.
f)Self-Reliant − Tableau does not need a complex software setup. The desktop version
which is used by most users is easily installed and contains all the features needed to start
and complete data analysis.
g)Visual Discovery − The user explores and analyzes the data by using visual tools like
colors, trend lines, charts, and graphs. There is very little script to be written as nearly
everything is done by drag and drop.
h)Blend Diverse Data Sets − Tableau allows you to blend different relational,
semistructured and raw data sources in real time, without expensive up-front integration
costs. The users don’t need to know the details of how data is stored.
i)Architecture Agnostic − Tableau works in all kinds of devices where data flows. Hence,
the user need not worry about specific hardware or software requirements to use Tableau.
j)Real-Time Collaboration − Tableau can filter, sort, and discuss data on the fly and embed
a live dashboard in portals like SharePoint site or Salesforce. You can save your view of
data and allow colleagues to subscribe to your interactive dashboards so they see the very
latest data just by refreshing their web browser.
k)Centralized Data − Tableau server provides a centralized location to manage all of the
organization’s published data sources. You can delete, change permissions, add tags, and
manage schedules in one convenient location. It’s easy to schedule extract refreshes and
manage them in the data server. Administrators can centrally define a schedule for
extracts on the server for both incremental and full refreshes.

Q.8 b. Write short note on Hadoop Distributed File System.


Ans:
Hadoop Distributed File System: HDFS is the primary storage system of Hadoop. HDFS is a java
based file system that provides scalable, fault tolerance, reliable and cost efficient data storage for Big
data. HDFS is a distributed filesystem that runs on commodity hardware.
HDFS is not an alternative to common file systems, such as ext3, ext4, and XFS
HDFS attempts to store the blocks for a file on different machines so the map step can operate on each
block of a file in parallel. Also, by default, HDFS creates three copies of each block across the cluster
to provide the necessary redundancy in case of a failure. If a machine fails, HDFS replicates an
accessible copy of the relevant data blocks to another available machine.
it distributes the blocks across several equipment racks to prevent an entire rack failure from causing a
data unavailable event.
To manage the data access, HDFS utilizes three Java daemons (background processes): NameNode,
Data Node, and Secondary NameNode. Running on a single machine, the NameNode daemon
determines and tracks where the various blocks of a data file are stored. The DataNode daemon
manages the data stored on each machine. If a client application wants to access a particular file stored
in HDFS, the application contacts the NameNode, and the NameNode provides the application with
the locations of the various blocks for that file. The application then communicates with the
appropriate Data Nodes to access the file
Each DataNode periodically builds a report about the blocks stored on the DataNode and sends the
report to the NameNode. If one or more blocks are not accessible on a Data Node, the NameNode
ensures that an accessible copy of an inaccessible data block is replicated to another machine.
A third daemon, the Secondary NameNode, provides the capability to perform some of the NameNode
tasks to reduce the load on the NameNode. Such tasks include updating the file system image with the
contents of the file system edit logs. It is important to note that the Secondary NameNode is not a
backup or redundant Name Node. In the event of a NameNode outage, the Name Node must be
restarted and initialized with the last file system image file and the contents of the edits logs.

Q.9 a. Explain HIVE with its architecture.


Ans:
Apache Hive enables users to process data without explicitly writing MapReduce code. One key
difference to Pig is that the Hive language, HiveQL (Hive Query Language), resembles Structured
Query Language (SQL) rather than a scripting language.
Features of Hive
.It stores schema in a database and processed data into HDFS.
.It is designed for OLAP.
.It provides SQL type language for querying called HiveQL or HQL.
.It is familiar, fast, scalable, and extensible.
Hive is not
 A relational database
 A design for OnLine Transaction Processing (OLTP)
 A language for real-time queries and row-level updates

Figure: HIVE architecture


User Interface:Hive is a data warehouse
infrastructure software that can create interaction between user and HDFS. The user interfaces that
Hive supports are Hive Web UI, Hive command line, and Hive HD Insight (In Windows server).
Meta Store: Hive chooses respective database servers to store the schema or Metadata of tables,
databases, columns in a table, their data types, and HDFS mapping.
HiveQL Process Engine:HiveQL is similar to SQL for querying on schema info on the Metastore. It
is one of the replacements of traditional approach for MapReduce program. Instead of writing
MapReduce program in Java, we can write a query for MapReduce job and process it.
Execution Engine: The conjunction part of HiveQL process Engine and MapReduce is Hive
Execution Engine. Execution engine processes the query and generates results as same as MapReduce
results. It uses the flavor of MapReduce.
HDFS or HBASE: Hadoop distributed file system or HBASE are the data storage techniques to store
data into file system.
Q.9 b. Write note on diagnostics of classifiers.
Ans: logistic regression, decision trees, and naive Bayes methods can be used to classify instances
into distinct groups according to the similar characteristics they share. Each of these classifiers faces
the same issue: how to evaluate if they perform well.
A confusion matrix is a specific table layout that allows visualization of the performance of a
classifier.
True positives (TP) are the number of positive instances the classifier correctly identified as positive.
False positives (FP) are the number of instances in which the classifier identified as positive but in
reality are negative.
True negatives (TN) are the number of negative instances the classifier correctly identified as
negative.
False negatives (FN) are the number of instances classified as negative but in reality are positive.
In a two-class classification, a preset threshold may be used to separate positives from negatives. TP
and TN are the correct guesses. A good classifier should have large TP and TN and small (ideally
zero) numbers for FP and FN.
The accuracy {or the overall success rate) is a metric defining the rate at which a model has classified
the records correctly. It is defined as the sum ofTP and TN divided by the total number of instances.
TP+TN
Accuracy= ∗ 100%
TP+TN+FP+FN
True Positive rate (TPR) shows what percent of positive instances the classifier correctly identified
TP
TPR=
TP+FN
The false positive rate {FPR) shows what percent of negatives the classifier marked as positive. The
FP
FPR is also called the false alarm rate or the type I error rateFPR=
FP+TN
The false negative rate (FNR) shows what percent of positives the classifier marked as negatives. It is
FN
also known as the miss rate or type II error rate FNR=
TP+FN
TP
Precision is the percentage of instances marked positive that really are positive, Precision=
TP+FP
A good model should have a high accuracy score, but having a high accuracy score alone does not
guarantee the model is well established. Precision and recall are accuracy metrics used by the
information retrieval community, but they can be used to characterize classifiers in general. A well-
performed model should have a high TPR that is ideally 1 and a low FPR and FNR that are ideally 0.

Q.10 a. Explain types of data visualization.


Ans: Same as Q.7 b.

Q.10 b. What is HBase? Discuss various HBase Data Model and application.
Pig and Hive are intended for batch applications, Apache HBase is capable of providing real-time read
and write access to data sets with billions of rows and millions of columns.
The HBase design is based on Google's 2006 paper on Bigtable. This paper described Bigtable as a
"distributed storage system for managing structured data."
HBase is a data store that is intended to be distributed across a cluster of nodes. Like Hadoop and
many of its related Apache projects, HBase is built upon HDFS and achieves its real-time access
speeds by sharing the workload over a large number of nodes in a distributed cluster. An HBase table
consists of rows and columns. However, an HBase table also has a third dimension, version, to
maintain the different values of a row and column intersection over time.
HBase is built on top of HDFS. HBase uses a key/value structure to store the contents of an HBase
table.
HBase Data Model
HBase Data Model consists of following elements
Set of tables
Each table with column families and rows
Each table must have an element defined as Primary Key.
Row key acts as a Primary key in HBase.
Any access to HBase tables uses this Primary Key,
Each column present in HBase denotes attribute corresponding to object
The applications of HBase are as follows:
Medical: HBase is used in the medical field for storing genome sequences and running
MapReduce on it, storing the disease history of people or an area, and many others.
Sports: HBase is used in the sports field for storing match histories for better analytics and
prediction.
Web: HBase is used to store user history and preferences for better customer targeting.
Oil and petroleum: HBase is used in the oil and petroleum industry to store exploration data for
analysis and predict probable places where oil can be found.
University Question Papers
Subject – 4

Elective I -​ ​Data Mining and Warehousing 410244 (D)


B. E. (Odd Semester), Session 2018-2019
Scheme, Syllabus and Evaluation Guidelines, Of “Data Mining
and Warehousing 410244 (D)”

Course Course Name Lectures Assigned


Code
410244 (D) Data Mining and Theory Practical Tutorial Total
Warehousing
3 - - 3

Course Course Name Examination Evaluation Scheme


Code
Theory Practical
Internal Assessment Ext
P
C C r
l l e
E
a a l
x
s s i
In End t
s s m Inter
Average Sem Sem Total e Total
T T E nal
Exam Exam r
e e x
n
s s a
a
t t m
l
1 2

410244 (D) Data Mining 20 20 70 20 30 70 100 0 0 0


and
Warehousing
Data Mining and Warehousing
Course Contents

Unit -1 : ​Introduction 08 Hours


Data Mining – Data Mining Task Primitives, Data: Data, Information and Knowledge; Attribute
Types: Nominal, Binary, Ordinal and Numeric attributes, Discrete versus Continuous Attributes;
Introduction to Data Preprocessing, Data Cleaning: Missing values, Noisy data; Data integration:
Correlation analysis; transformation: Min-max normalization, z-score normalization and decimal
scaling; data reduction: Data Cube Aggregation, Attribute Subset Selection, sampling; and Data
Discretization: Binning, Histogram Analysis.

Unit – 2: ​Data Warehouse 08 Hours


Data Warehouse, Operational Database Systems and Data Warehouses(OLTP Vs OLAP), A
Multidimensional Data Model: Data Cubes, Stars, Snowflakes, and Fact Constellations Schemas;
OLAP Operations in the Multidimensional Data Model, Concept Hierarchies, Data Warehouse
Architecture, The Process of Data Warehouse Design, A three-tier data warehousing architecture,
Types of OLAP Servers: ROLAP versus MOLAP versus HOLAP.

Unit – 3 : Measuring Data Similarity and Dissimilarity 08 Hours


Measuring Data Similarity and Dissimilarity, Proximity Measures for Nominal Attributes and Binary
Attributes, interval scaled; Dissimilarity of Numeric Data: Minskowski Distance, Euclidean distance
and Manhattan distance; Proximity Measures for Categorical, Ordinal Attributes, Ratio scaled
variables; Dissimilarity for Attributes of Mixed Types, Cosine Similarity.

Unit -4 :​ ​Association Rules Mining 08 Hours


Market basket Analysis, Frequent item set, Closed item set, Association Rules, a-priori Algorithm,
Generating Association Rules from Frequent Item sets, Improving the Efficiency of a-priori, Mining
Frequent Item sets without Candidate Generation: FP Growth Algorithm; Mining Various Kinds of
Association Rules: Mining multilevel association rules, constraint based association rule mining,
Meta rule-Guided Mining of Association Rules.

Unit – 5 :​ ​Classification 08 Hours


Introduction to: Classification and Regression for Predictive Analysis, Decision Tree Induction,
Rule-Based Classification: using IF-THEN Rules for Classification, Rule Induction Using a
Sequential Covering Algorithm. Bayesian Belief Networks, Training Bayesian Belief Networks,
Classification Using Frequent Patterns, Associative Classification, Lazy
Learners-k-Nearest-Neighbor Classifiers, Case-Based Reasoning.

Unit – 6:​ ​Multiclass Classification 08 Hours


5GAA (Autonomous Automation), Millimetre Wave, URLLC, LTEA (Advanced), LTE based
MULTIFIRE, Virtual Reality, Augmented Reality.
Books:

Text:
1. Han, Jiawei Kamber, Micheline Pei and Jian, “Data ​Mining: Concepts and Techniques”,
Elsevier Publishers, ISBN:​9780123814791, 9780123814807.

2. Parag Kulkarni, “Reinforcement and Systemic Machine Learning for Decision Making” by
Wiley-IEEE Press, ISBN: 978-0-470-91999-6

References:
1. Matthew A. Russell, "Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn,
Google+, GitHub, and More" , Shroff Publishers, 2nd Edition, ISBN: 9780596006068.

2. Maksim Tsvetovat, Alexander Kouznetsov, "Social Network Analysis for Startups:Finding


connections on the social web", Shroff Publishers , ISBN: 10: 1449306462
Evaluation Guidelines:

Internal Assessment (IA) : [CT (20Marks)+TA/AT(10 Marks)]

Class Test (CT) [20 marks]:- ​Two class tests, 20 marks each, will be ​conducted in a semester and
out of these two , the average of best two will be selected for calculation of class test marks. Format
of question paper is same as university.

TA [5 marks]: ​Three/four assignments will be conducted in the semester. Teacher ​assessment will be
calculated on the basis of performance in assignments, class test and pre-university test

Attendance (AT) [5 marks]: ​Attendance marks will be given as per university​ ​policy.

Paper pattern and marks distribution for Class tests:

1. Question paper will comprise of 3 Section A, B and C with internal choice of questions.
2. Section A contains 5 short answer type questions of 1 mark each. All questions are
compulsory. (Total 5 Marks)
3. Section B contains 4 medium answer type questions of 2.5 marks each. All questions are
compulsory. ( Total 10 marks)
4. Section C contains 1 long answer type questions of 5 marks. ( Total 5 marks)

In Semester Examination [ 30 Marks]


Paper pattern and marks distribution for PUT: Same as End semester exam

End Semester Examination [ 70 Marks]:


Paper pattern and marks distribution for End Semester Exam: As per university guidelines.
Lecture Plan
Data Mining and Warehousing
1 Data Mining, Data Mining Task Primitives.
2 Data: Data, Information and Knowledge; Attribute Types: Nominal, Binary, Ordinal
and Numeric attributes,
3 Discrete versus Continuous Attributes; Introduction to Data Preprocessing,
4 Data Cleaning: Missing values, Noisy data; Data integration: Correlation analysis;
5 Transformation: Min-max normalization,
6 z-score normalization and decimal scaling;
7 Data reduction: Data Cube Aggregation, Attribute Subset Selection,
8 Sampling; and Data Discretization: Binning, Histogram Analysis.
9 Data Warehouse, Operational Database Systems
10 Data Warehouses(OLTP Vs OLAP),
11 A Multidimensional Data Model: Data Cubes, Stars, Snowflakes, and Fact
Constellations Schemas;
12 OLAP Operations in the Multidimensional Data Model
13 Concept Hierarchies, Data Warehouse Architecture,
14 The Process of Data Warehouse Design
15 A three-tier data warehousing architecture,
16 Types of OLAP Servers: ROLAP versus MOLAP versus HOLAP.
17 Measuring Data Similarity and Dissimilarity
18 Proximity Measures for Nominal Attributes and Binary Attributes, interval scaled;
19 Dissimilarity of Numeric Data: Minskowski Distance,
20 Euclidean distance and Manhattan distance;
21 Proximity Measures for Categorical, Ordinal Attributes,
22 Ratio scaled variables;
23 Dissimilarity for Attributes of Mixed Types
24 Cosine Similarity and Types.
25 Market basket Analysis,
26 Frequent item set, Closed item set, Association Rules,
27 a-priori Algorithm, Generating Association Rules from Frequent Item sets,
28 Improving the Efficiency of a-priori,
29 Mining Frequent Item sets without Candidate Generation: FP Growth Algorithm;
30 Mining Various Kinds of Association Rules: Mining multilevel association rules,
31 constraint based association rule mining,
32 Meta rule-Guided Mining of Association Rules.
33 Introduction to: Classification and Regression for Predictive Analysis
34 Decision Tree Induction,
35 Rule-Based Classification: using IF-THEN Rules for Classification,
36 Rule Induction Using a Sequential Covering Algorithm.
37 Bayesian Belief Networks, Training Bayesian Belief Networks,
38 Classification Using Frequent Patterns,
39 Associative Classification, Lazy Learners-k-Nearest-Neighbor Classifiers,
40 Case-Based Reasoning.
41 Multiclass Classification
42 Semi-Supervised Classification, Reinforcement learning,
43 Systematic Learning, Wholistic learning and multi-perspective learning.
44 Metrics for Evaluating Classifier Performance: Accuracy,
45 Error Rate, precision, Recall, Sensitivity, Specificity;
46 Evaluating the Accuracy of a Classifier
47 Holdout Method,
48 Random Sub sampling and Cross-Validation
C​ourse Delivery, Objectives, Outcomes
Data Mining and Warehousing
Semester – 7

Course Delivery :
The course will be delivered through lectures, assignment/tutorial sessions, classroom interaction,
and presentations.

Course Objectives:

1. To understand the fundamentals of Data Mining


2. To identify the appropriateness and need of mining the data
3. To learn the preprocessing, mining and post processing of the data
4. To understand various methods, techniques and algorithms in data mining

Course Outcomes:
On completion of the course, student will be able to–

1​. ​CO1:​ ​Apply basic, intermediate and advanced techniques to mine the data
​2.​ ​CO2:​ ​Analyze the output generated by the process of data mining
3.​ ​CO3:​ ​Explore the hidden patterns in the data
4​. ​CO4:​ ​Optimize the mining process by choosing best data mining technique

CO-PO Mapping

Course PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
Outcomes
CO1 1
CO2 2
CO3 2
CO4 2
Justification Of CO-PO Mapping

JUSTIFICATION OF CO-PO MATCHING


CO1 WITH PO1 Study the concept of data mining and its applications
involves solving complex engineering problems
CO2 WITH PO2 Principles of mathematics and engineering sciences
are used in understanding various data mining
functionalities
CO3 WITH PO3 Knowledge of various data mining concepts can be
used to design and conduct experiments to provide
valid conclusions
CO4 WITH PO4 Knowledge of various data pre-processing techniques
that improve the efficiency of the mining process can
be used to design and develop solutions for complex
engineering problems
Question Bank
Data Mining and Warehousing 410244(D)

Unit - 1.
Q.1 What are the Steps involved in data pre-processing? Discuss.
Q.2 Explain the concept hierarchy.
Q.3 Describe the functions of various components in a typical Multi-tiered Data Warehouse
architecture with the block diagram.
Q.4 Describes the application and trends in data mining in details.
Q.5 Explain in detail ​z-score normalization and decimal scaling.

Unit – 2:

Q.1 What is Multi-Dimensional Modeling? What is the use of Snow Flake Schema.
Q.2 What is the difference between OLTP an OLAP?
Q.3 Draw and explain the architecture of a typical data mining system.
Q.4 Discuss the various OLAP operations which can be performed on a multidimensional data cube.
Q.5 Explain t​he Process of Data Warehouse Design with suitable diagram.

Unit – 3 :

Q.1 Explain four types of attributes by giving appropriate example?


Q.2 Explain ​Proximity Measures for Nominal Attributes and Binary Attributes.
Q.3 What is the difference between Manhattan and Euclidean distance.
Q.4 Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8): Compute the
Euclidean distance between the two objects using q = 3.
Q.5 What is ​Cosine Similarity? Explain with example.

Unit -4 :

Q.1 Develop the Apriori Algorithm for generating frequent itemset.


Q.2 The following is the list of large two item sets. Show the steps to apply the Apriori property to
generate and prune the candidates for large three itemsets. Describe how the Apriori property is used
is in the steps. Give the final list of candidate large three item sets.​ {10,20} {10,30} {20,30} {20,40}
Q.3 Explain Mining Frequent Patterns using FP-Growth.
Q.4 With an example, explain the frequent item set generation in the Apriori algorithm
Q.5 Explain in detail the candidate generation procedures.
Unit – 5 :
Q.1 What is rule based classifier? Explain how a rule based classifier works.
Q.2 Write the algorithm for k-nearest neighbour classification.
Q.3 Discuss the methods for estimating predictive accuracy of classification method.
Q.4 What is Ruled-Base Classifier? Explain how a is Ruled-Base Classifier works.
Q.5 Explain how the ​Bayesian Belief Networks are trained to perform classification.

Unit – 6:
Q.1 ​Which classification algorithm would you recommend for multiclass classification where the
number of class​es is large? Explain.
Q.2 Write a short note on-
1. ​Accuracy,
2. Error Rate,
3. Precision,
4. Recall
Q.3 How to evaluate the accuracy of classifier using Holdout Method? Explain with example.
Q.4 Difference between ​Wholistic learning and multi-perspective learning.
Q.5 What is the purpose of performing cross-validation? Give one example.
Assignment 1

Question Questions Max. Unit no. CO Bloom’s


No. Marks as per Mapped Taxonomy
syllabus Level
01. What are the Steps involved in data 04 1 CO-1 2
pre-processing? Discuss.
02. Explain the concept hierarchy. 04 1 CO-4 1

03. Describes the application and trends in data 02 1 CO-2 1


mining in details.
04. What is the difference between OLTP and 02 2 CO-3 3
OLAP?

05. What is Multi-Dimensional Modeling? What is 04 2 CO-2 2


the use of Snowflake Schema.

06. Draw and describe the data warehouse 04 2 CO-2 1


architecture

01. What are the Steps involved in data 04 1 CO-1 2


pre-processing? Discuss.

Preprocessing in Data Mining:


Data preprocessing is a data mining technique which is used to transform the raw data in a useful and
efficient format.
Steps Involved in Data Preprocessing:
1. Data Cleaning:
The data can have many irrelevant and missing parts. To handle this part, data cleaning is done. It
involves handling of missing data, noisy data etc.

● (a). Missing Data:


This situation arises when some data is missing in the data. It can be handled in various
ways.
Some of them are:
1. Ignore the tuples:
This approach is suitable only when the dataset we have is quite large and multiple
values are missing within a tuple.
2. Fill the Missing values:
There are various ways to do this task. You can choose to fill the missing values
manually, by attribute mean or the most probable value.
● (b). Noisy Data:
Noisy data is a meaningless data that can’t be interpreted by machines.It can be generated
due to faulty data collection, data entry errors etc. It can be handled in following ways :

1. Binning Method:
This method works on sorted data in order to smooth it. The whole data is divided
into segments of equal size and then various methods are performed to complete the
task. Each segmented is handled separately. One can replace all data in a segment by
its mean or boundary values can be used to complete the task.
2. Regression:
Here data can be made smooth by fitting it to a regression function.The regression
used may be linear (having one independent variable) or multiple (having multiple
independent variables).
3. Clustering:
This approach groups the similar data in a cluster. The outliers may be undetected or
it will fall outside the clusters.

2. Data Transformation:
This step is taken in order to transform the data in appropriate forms suitable for mining process.
This involves following ways:

1. Normalization:
It is done in order to scale the data values in a specified range (-1.0 to 1.0 or 0.0 to 1.0)
2. Attribute Selection:
In this strategy, new attributes are constructed from the given set of attributes to help the
mining process.
3. Discretization:
This is done to replace the raw values of numeric attribute by interval levels or conceptual
levels.
4. Concept Hierarchy Generation:
Here attributes are converted from level to higher level in hierarchy. For Example-The
attribute “city” can be converted to “country”.

3. Data Reduction:
Since data mining is a technique that is used to handle huge amount of data. While working with
huge volume of data, analysis became harder in such cases. In order to get rid of this, we uses data
reduction technique. It aims to increase the storage efficiency and reduce data storage and analysis
costs.
The various steps to data reduction are:

1. Data Cube Aggregation:


Aggregation operation is applied to data for the construction of the data cube.
2. Attribute Subset Selection:
The highly relevant attributes should be used, rest all can be discarded. For performing
attribute selection, one can use level of significance and p- value of the attribute.the attribute
having p-value greater than significance level can be discarded.
3. Numerosity Reduction:
This enable to store the model of data instead of whole data, for example: Regression
Models.
4. Dimensionality Reduction:
This reduce the size of data by encoding mechanisms.It can be lossy or lossless. If after
reconstruction from compressed data, original data can be retrieved, such reduction are called
lossless reduction else it is called lossy reduction. The two effective methods of
dimensionality reduction are:Wavelet transforms and PCA (Principal Component Analysis).

02. Explain the concept hierarchy. 04 1 CO-4 1

Concept Hierarchy reduce the data by collecting and replacing low level concepts (such as numeric
values for the attribute age) by higher level concepts (such as young, middle-aged, or senior).
Concept hierarchy generation for numeric data is as follows:

● Binning (see sections before)


● Histogram analysis (see sections before)
● Clustering analysis (see sections before)
● Entropy-based discretization
● Segmentation by natural partitioning
● Binning
○ In binning, first sort data and partition into (equi-depth) bins then one can smooth by
bin means, smooth by bin median, smooth by bin boundaries, etc.
● Histogram analysis
○ Histogram is a popular data reduction technique
○ Divide data into buckets and store average (sum) for each bucket
○ Can be constructed optimally in one dimension using dynamic programming
○ Related to quantization problems.
● Clustering analysis
○ Partition data set into clusters, and one can store cluster representation only
○ Can be very effective if data is clustered but not if data is “smeared”
○ Can have hierarchical clustering and be stored in multi-dimensional index tree
structures
● Entropy-based discretization
○ Given a set of samples S, if S is partitioned into two intervals S1 and S2 using
boundary T, the entropy after partitioning is

○ S1 & S2 correspond to samples in S satisfying conditions A<v &amp;="" a="">=v


○ The boundary that minimizes the entropy function over all possible boundaries is
selected as a binary discretization.
○ The process is recursively applied to partitions obtained until some stopping criterion
is met, e.g., Ent (S)- E(T,S)>δ
○ Experiments show that it may reduce data size and improve classification accuracy
● Segmentation by natural partitioning
○ 3-4-5 rule can be used to segment numeric data into relatively uniform, “natural”
intervals.
○ If an interval covers 3, 6, 7 or 9 distinct values at the most significant digit, partition
the range into 3 equi-width intervals
○ If it covers 2, 4, or 8 distinct values at the most significant digit, partition the range
into 4 intervals
○ If it covers 1, 5, or 10 distinct values at the most significant digit, partition the range
into 5 intervals

Concept hierarchy generation for categorical data is as follows:

● Specification of a set of attributes, but not of their partial ordering


○ Auto generate the attribute ordering based upon observation that attribute defining a
high level concept has a smaller # of distinct values than an attribute defining a lower
level concept
○ Example : country (15), state_or_province (365), city (3567), street (674,339)
● Specification of only a partial set of attributes
○ Try and parse database schema to determine complete hierarchy

03. What is the difference between OLTP an 02 2 CO-3 3


OLAP?

Basis for OLTP OLAP


Comparison

Basic It is an online transactional system and manages It is an online data retrieving and
database modification. data analysis system.

Focus Insert, Update, Delete information from the Extract data for analyzing that helps
database in decision making

Data OLTP and its transactions are the original source Different OLTPs database becomes
of data. the source of data for OLAP.

Transaction OLTP has short transactions. OLTP has long transactions.

Time The processing time of a transaction is The processing time of a transaction


comparatively less in OLTP. is comparatively more in OLAP.

Queries Simpler queries. Complex queries.

Normalization Tables in OLTP database are normalized (3NF). Tables in OLAP database are not
normalized.

Integrity OLTP database must maintain data integrity OLAP database does not get
constraint. frequently modified. Hence, data
integrity is not affected.

04. Describes the application and trends in data 02 1 CO-2 1


mining in details.

Data mining is widely used in diverse areas. There are a number of commercial data mining system
available today and yet there are many challenges in this field. In this tutorial, we will discuss the
applications and the trend of data mining.

Data Mining Applications

Here is the list of areas where data mining is widely used −

● Financial Data Analysis


● Retail Industry
● Telecommunication Industry
● Biological Data Analysis
● Other Scientific Applications
● Intrusion Detection

Financial Data Analysis

The financial data in banking and financial industry is generally reliable and of high quality which
facilitates systematic data analysis and data mining. Some of the typical cases are as follows −

● Design and construction of data warehouses for multidimensional data analysis and data
mining.
● Loan payment prediction and customer credit policy analysis.
● Classification and clustering of customers for targeted marketing.
● Detection of money laundering and other financial crimes.

Retail Industry

Data Mining has its great application in Retail Industry because it collects large amount of data from
on sales, customer purchasing history, goods transportation, consumption and services. It is natural
that the quantity of data collected will continue to expand rapidly because of the increasing ease,
availability and popularity of the web.
Data mining in retail industry helps in identifying customer buying patterns and trends that lead to
improved quality of customer service and good customer retention and satisfaction. Here is the list of
examples of data mining in the retail industry −

● Design and Construction of data warehouses based on the benefits of data mining.
● Multidimensional analysis of sales, customers, products, time and region.
● Analysis of effectiveness of sales campaigns.
● Customer Retention.
● Product recommendation and cross-referencing of items.

Telecommunication Industry

Today the telecommunication industry is one of the most emerging industries providing various
services such as fax, pager, cellular phone, internet messenger, images, e-mail, web data
transmission, etc. Due to the development of new computer and communication technologies, the
telecommunication industry is rapidly expanding. This is the reason why data mining is become very
important to help and understand the business.
Data mining in telecommunication industry helps in identifying the telecommunication patterns,
catch fraudulent activities, make better use of resource, and improve quality of service. Here is the
list of examples for which data mining improves telecommunication services −

● Multidimensional Analysis of Telecommunication data.


● Fraudulent pattern analysis.
● Identification of unusual patterns.
● Multidimensional association and sequential patterns analysis.
● Mobile Telecommunication services.
● Use of visualization tools in telecommunication data analysis.

Biological Data Analysis

In recent times, we have seen a tremendous growth in the field of biology such as genomics,
proteomics, functional Genomics and biomedical research. Biological data mining is a very
important part of Bioinformatics. Following are the aspects in which data mining contributes for
biological data analysis −
● Semantic integration of heterogeneous, distributed genomic and proteomic databases.
● Alignment, indexing, similarity search and comparative analysis multiple nucleotide
sequences.
● Discovery of structural patterns and analysis of genetic networks and protein pathways.
● Association and path analysis.
● Visualization tools in genetic data analysis.

Other Scientific Applications

The applications discussed above tend to handle relatively small and homogeneous data sets for
which the statistical techniques are appropriate. Huge amount of data have been collected from
scientific domains such as geosciences, astronomy, etc. A large amount of data sets is being
generated because of the fast numerical simulations in various fields such as climate and ecosystem
modeling, chemical engineering, fluid dynamics, etc. Following are the applications of data mining
in the field of Scientific Applications −

● Data Warehouses and data preprocessing.


● Graph-based mining.
● Visualization and domain specific knowledge.

Intrusion Detection

Intrusion refers to any kind of action that threatens integrity, confidentiality, or the availability of
network resources. In this world of connectivity, security has become the major issue. With increased
usage of internet and availability of the tools and tricks for intruding and attacking network prompted
intrusion detection to become a critical component of network administration. Here is the list of areas
in which data mining technology may be applied for intrusion detection −

● Development of data mining algorithm for intrusion detection.


● Association and correlation analysis, aggregation to help select and build discriminating
attributes.
● Analysis of Stream data.
● Distributed data mining.
● Visualization and query tools.

05. What is Multi-Dimensional Modeling? What is 04 2 CO-2 2


the use of Snowflake Schema.

The multidimensional data model is an integral part of On-Line Analytical Processing, or OLAP.
Because OLAP is on-line, it must provide answers quickly; analysts pose iterative queries during
interactive sessions, not in batch jobs that run overnight. And because OLAP is also analytic, the
queries are complex. The multidimensional data model is designed to solve complex queries in real
time. The multidimensional data model is important because it enforces simplicity.

What is snowflaking?
The snowflake design is the result of further expansion and normalized of the dimension table. In
other words, a dimension table is said to be snowflaked if the low-cardinality attribute of the
dimensions have been divided into separate normalized tables. These tables are then joined to the
original dimension table with referential constraints(foreign key constraint).
Generally, snowflaking is not recommended in the dimension table, as it hampers the
understandability and performance of the dimension model as more tables would be required to be
joined to satisfy the queries.
Characteristics of snowflake schema:
The dimension model of snowflake under the following conditions:

● The snowflake schema uses small disk space.


● It is easy to implement dimension is added to schema.
● There are multiple tables, so performance is reduced.
● The dimension table consist of two or more sets of attributes which define information at
different grains.
● The sets of attributes of the same dimension table are being populate by different source
systems.

Advantages:
There are two main advantages of snowflake schema given below:

● It provides structured data which reduces the the problem of data integrity.
● It uses small disk space because data are highly structured.

Disadvantages:
● Snowflaking reduces space consumed by dimension tables, but compared with the entire data
warehouse the saving is usually insignificant.
● Avoid snowflaking or normalization of a dimension table, unless required and appropriate.
● Do not snowflake hierarchies of one dimension table into separate tables. Hierarchies should
belong to the dimension table only and should never be snowfalked.
● Multiple hierarchies can belong to the same dimension has been designed at the lowest
possible detail.

06. Draw and describe the data warehouse 04 2 CO-2 1


architecture

Generally a data warehouses adopts a three-tier architecture. Following are the three tiers of the data
warehouse architecture.

● Bottom Tier ​− The bottom tier of the architecture is the data warehouse database server. It is
the relational database system. We use the back end tools and utilities to feed data into the
bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh
functions.
● Middle Tier ​− In the middle tier, we have the OLAP Server that can be implemented in
either of the following ways.
○ By Relational OLAP (ROLAP), which is an extended relational database management
system. The ROLAP maps the operations on multidimensional data to standard
relational operations.
○ By Multidimensional OLAP (MOLAP) model, which directly implements the
multidimensional data and operations.
● Top-Tier ​− This tier is the front-end client layer. This layer holds the query tools and
reporting tools, analysis tools and data mining tools.

Data Warehouse Models


From the perspective of data warehouse architecture, we have the following data warehouse models

● Virtual Warehouse
● Data mart
● Enterprise Warehouse

Virtual Warehouse

The view over an operational data warehouse is known as a virtual warehouse. It is easy to build a
virtual warehouse. Building a virtual warehouse requires excess capacity on operational database
servers.
Data Mart

Data mart contains a subset of organization-wide data. This subset of data is valuable to specific
groups of an organization.

In other words, we can claim that data marts contain data specific to a particular group. For example,
the marketing data mart may contain data related to items, customers, and sales. Data marts are
confined to subjects.

Points to remember about data marts −

● Window-based or Unix/Linux-based servers are used to implement data marts. They are
implemented on low-cost servers.
● The implementation data mart cycles is measured in short periods of time, i.e., in weeks
rather than months or years.
● The life cycle of a data mart may be complex in long run, if its planning and design are not
organization-wide.
● Data marts are small in size.
● Data marts are customized by department.
● The source of a data mart is departmentally structured data warehouse.
● Data mart are flexible.

Enterprise Warehouse

● An enterprise warehouse collects all the information and the subjects spanning an entire
organization
● It provides us enterprise-wide data integration.
● The data is integrated from operational systems and external information providers.
● This information can vary from a few gigabytes to hundreds of gigabytes, terabytes or
beyond.

Assignment 2

Question Questions Max. Unit no. CO Bloom’s


No. Marks as per Mapped Taxonomy
syllabus Level
01. Explain four types of attributes by giving 04 3 CO-1 2
appropriate example.
02. Define the terms. Nominal, Ordinal, Interval , 04 3 CO-2 1
Ratio with example.
03. What is ​Cosine Similarity? Explain with example. 02 3 CO-2 2

04. Develop the Apriori Algorithm for generating 04 4 CO-2 3


frequent itemset

05 Explain Mining Frequent Patterns using 04 4 CO-2 2


FP-Growth

06 Explain in detail the candidate generation 02 4 CO-3 2


procedures.

01. Explain four types of attributes by giving 04 1 CO-1 2


appropriate example.

Generally attribute constitutes a character and explains the characteristics of an entity. In database
management system (DBMS) it assigns a database component or database field. Attribute stores or
saves only a piece of data. ​For example​, in an invoice the attribute may be the price or date.
Another example​: consider the entity student and it has the attribute like student- Lname, student-
Fname, student-Email, student-phone and many.
Types of Attributes with Examples
The different types of attributes are as follows

● Single valued attributes


● Multi valued attributes
● Compound /Composite attributes
● Simple / Atomic attributes
● Stored attributes
● Derived attributes
● Complex attributes
● Key attributes
● Non key attributes
● Required attributes
● Optional/ null value attributes

The detailed explanation of all the attributes is as follows:


Single Valued Attributes: ​It is an attribute with only one value.

● Example: Any manufactured product can have only one serial no. , but the single valued
attribute cannot be simple valued attribute because it can be subdivided. Likewise in the
above example the serial no. can be subdivided on the basis of region, part no. etc.

Multi Valued Attributes: ​These are the attributes which can have multiple values for a single or
same entity.

● Example: Car’s colors can be divided into many colors like for roof, trim.
● The notation for multi valued attribute is:

Fig3: Multi value attribute notation


Compound / Composite attributes: ​This attribute can be further divided into more attributes.

● The notation for it is:


Fig 4: Compound / Composite attribute notation

● Example: Entity Employee Name can be divided into sub divisions like FName, MName,
LName.

Fig 5: Sample of compound / composite attribute


Simple / Atomic Attributes: ​The attributes which cannot be further divided are called as simple /
atomic attributes.

● Example: The entities like age, marital status cannot be subdivided and are simple attributes.

Stored Attributes: ​Attribute that cannot be derived from other attributes are called as stored
attributes.

● Example: Birth date of an employee is a stored attribute.


Derived Attributes: ​These attributes are derived from other attributes. It can be derived from
multiple attributes and also from a separate table.

● Example: Today’s date and age can be derived. Age can be derived by the difference between
current date and date of birth.
● The notation for the derived attribute is:

Fig 6: Notation of derived attribute


Complex Attributes: ​For an entity, if an attribute is made using the multi valued attributes and
composite attributes then it is known as complex attributes.

● Example: A person can have more than one residence; each residence can have more than one
phone.

Key Attributes: ​This attribute represents the main characteristic of an entity i.e. primary key. Key
attribute has clearly different value for each element in an entity set.

● Example: The entity student ID is a key attribute because no other student will have the same
ID.

Fig 7: Sample of key attribute


Non- Key Attributes: ​Excluding the candidate key attributes in an entity set are the non key
attributes.
● Example: First name of a student or employee is a non-key attribute as it does not represent
main characteristic of an entity.

Fig 8: Sample of non-key attribute


Required Attributes: ​Required attribute must have a data because they describe the vital part of
entity.

● Example: Taking the example of a college, there the student’s name is a vital thing.
Fig 9: sample of required attribute
Optional / Null value Attributes: ​It does not have a value and can be left blank, it’s optional can be
filled or cannot be.

● Example: Considering the entity student there the student’s middle name and the email ID is
optional.

Fig 10: Sample of optional / null attribute


These are different types of attributes and they all play a vital role in the database management
system. Generally an oval is used to represent an attribute as shown below:
Fig 11: Notation of an attribute
Because of their versatility they have wide applications and there are many models in which we can
the attributes as explained above.

02. Define the terms. Nominal, Ordinal, Interval , 04 3 CO-2 1


Ratio with example.

Types of Attributes​ • T​here are different types of attributes–


Nominal•
Nominal data does not have an intrinsic ordering in the categories.
Examples: ID numbers, eye color, zip codes–
Ordinal•
Ordinal data does have an intrinsic ordering in the categories.
Examples: rankings (e.g., taste of potato chips on a scale from 1-10), grades, height in {tall, medium,
short}–
Interval•
Examples: calendar dates, temperatures in Celsius or Fahrenheit.–
Ratio•
Examples: temperature in Kelvin, length, time, counts

03 What is ​Cosine Similarity? Explain with example. 02 3 CO-2 2

Cosine similarity measures the similarity between two vectors of an inner product space. It is
measured by the cosine of the angle between two vectors and determines whether two vectors are
pointing in roughly the same direction. It is often used to measure document similarity in text
analysis.
A document can be represented by thousands of attributes, each recording the frequency of a
particular word (such as a keyword) or phrase in the document. Thus, each document is an object
represented by what is called a ​term-frequency vector.​ For example, in Table 2.5, we see that
Document1 contains five instances of the word ​team,​ while ​hockey occurs three times. The word
coach is absent from the entire document, as indicated by a count value of 0. Such data can be highly
asymmetric.
Term-frequency vectors are typically very long and sparse (i.e., they have many 0 values).
Applications using such structures include information retrieval, text document clustering, biological
taxonomy, and gene feature mapping. The traditional distance measures that we have studied in this
chapter do not work well for such sparse numeric data. For example, two term-frequency vectors
may have many 0 values in common, meaning that the corresponding documents do not share many
words, but this does not make them similar. We need a measure that will focus on the words that the
two documents ​do have in common, and the occurrence frequency of such words. In other words, we
need a measure for numeric data that ignores zero-matches.

04. Develop the Apriori Algorithm for generating 04 4 CO-2 3


frequent itemset

● A frequent itemset is an itemset whose support is greater than some user-specified minimum
support (denoted L​k​, where ​k​ is the size of the itemset)
● A candidate itemset is a potentially frequent itemset (denoted C​k​, where ​k is the size of the
itemset)

Pass 1

1. Generate the candidate itemsets in ​C1​


2. Save the frequent itemsets in ​L​1
Pass ​k

Generate the candidate itemsets in C​k​ from the frequent


itemsets in ​L​k​-1

1. Join ​Lk​ ​-1​ ​p​ with ​Lk​ ​-1​q, as follows:


insert into ​C​k
select ​p.​item​1​, ​p.​item​2​, . . . , ​p.i​ tem​k​-1​, ​q.i​ tem​k​-1
from ​Lk​ ​-1​ ​p,​ ​Lk​ ​-1​q
where ​p.i​ tem​1​ = ​q.​ item​1​, . . . ​p.i​ tem​k​-2​ = ​q.​ item​k​-2​, ​p.i​ tem​k​-1​ < ​q.​ item​k-​ 1
2. Generate all (​k​-1)-subsets from the candidate itemsets in ​C​k
3. Prune all candidate itemsets from ​Ck​ where some (​k​-1)-subset of the candidate itemset
is not in the frequent itemset ​Lk​ ​-1
2. Scan the transaction database to determine the support for each candidate itemset in ​Ck​
3. Save the frequent itemsets in ​L​k

Step 1 : Create a table containing support count of each item present in data set - Called C1
(Candidate set)

Compare candidate set item’s support count with minimum support count. This gives the
itemset L1

Step-2: K=2
Generate candidate set C2 using L1 (this is called join step). Condition of joining Lk-1 and Lk-1 is
that it should have (K-2) elements in common. Check all subsets of an itemset are frequent or not and
if not frequent remove that itemset.(Example subset of{I1, I2} are {I1}, {I2} they are frequent.Check
for each itemset) Now find support count of these itemsets by searching in dataset.

(II) compare candidate (C2) support count with minimum support count(here min_support=2 if

support_count of candidate set item is less than min_support then remove those items) this gives us

itemset L2.

Step-3:
● Generate candidate set C3 using L2 (join step). Condition of joining Lk-1 and Lk-1 is that it should

have (K-2) elements in common. So here, for L2, first element should match. So itemset generated by

joining L2 is {I1, I2, I3}{I1, I2, I5}{I1, I3, i5}{I2, I3, I4}{I2, I4, I5}{I2, I3, I5}
● Check if all subsets of these itemsets are frequent or not and if not, then remove that itemset.(Here

subset of {I1, I2, I3} are {I1, I2},{I2, I3},{I1, I3} which are frequent. For {I2, I3, I4}, subset {I3, I4} is

not frequent so remove it. Similarly check for every itemset) find support count of these remaining

itemset by searching in dataset.


(II) Compare candidate (C3) support count with minimum support count(here min_support=2 if

support_count of candidate set item is less than min_support then remove those items) this gives us

itemset L3.

Step-4:
● Generate candidate set C4 using L3 (join step). Condition of joining Lk-1 and Lk-1 (K=4) is

that, they should have (K-2) elements in common. So here, for L3, first 2 elements (items)

should match.
● Check all subsets of these itemsets are frequent or not (Here itemset formed by joining L3 is

{I1, I2, I3, I5} so its subset contains {I1, I3, I5}, which is not frequent). So no itemset in C4
● We stop here because no frequent itemsets are found further

05 Explain Mining Frequent Patterns using 04 4 CO-2 2


FP-Growth

Fp Growth Algorithm
Fp Growth Algorithm (Frequent pattern growth). FP growth algorithm is an improvement of apriori
algorithm. FP growth algorithm used for finding frequent itemset in a transaction database without
candidate generation.
FP growth represents frequent items in frequent pattern trees or FP-tree.

Fp growth algorithm example


Consider the following database(D)
Let minimum support = 3%

Advantages of FP growth algorithm:-


1. Faster than apriori algorithm
2. No candidate generation
3. Only two passes over dataset

Disadvantages of FP growth algorithm:-


1. FP tree may not fit in memory
2. FP tree is expensive to build
Assignment 3

Question Questions Max. Unit no. CO Bloom’s


No. Marks as per Mapped Taxonomy
syllabus Level
01. Write the algorithm for k-nearest neighbour 04 5 CO-3 2
classification.
02. What is Ruled-Base Classifier? Explain how a 04 5 CO-2 2
is Ruled-Base Classifier works
03. Define Bayesian Belief Networks 02 5 CO-3 1

04. Write a short note on-Accuracy, Error Rate, 02 6 CO-2 2

05. Explain holdout method in data mining 04 6 CO-2 2

06. Explain in detail the candidate generation 04 6 CO-3 2


procedures.

01. Write the algorithm for k-nearest neighbour 04 5 CO-3 2


classification.

K-Nearest Neighbours
K-Nearest Neighbors is one of the most basic yet essential classification algorithms in Machine
Learning. It belongs to the supervised learning domain and finds intense application in pattern
recognition, data mining and intrusion detection.
It is widely disposable in real-life scenarios since it is non-parametric, meaning, it does not make any
underlying assumptions about the distribution of data (as opposed to other algorithms such as ​GMM​,
which assume a Gaussian distribution of the given data).
We are given some prior data (also called training data), which classifies coordinates into groups
identified by an attribute.
As an example, consider the following table of data points containing two features:
Algorithm:
Let m be the number of training data samples. Let p be an unknown point.
1. Store the training samples in an array of data points arr[]. This means each element of this
array represents a tuple (x, y), for i=0 to m:
2. Calculate Euclidean distance d(arr[i], p).
3. Make set S of K smallest distances obtained. Each of these distances correspond to an already
classified data point.
4. Return the majority label among S.

Now, given another set of data points (also called testing data), allocate these points a group by
analyzing the training set. Note that the unclassified points are marked as ‘White’.
Intuition
If we plot these points on a graph, we may be able to locate some clusters, or groups. Now, given an
unclassified point, we can assign it to a group by observing what group its nearest neighbours belong
to. This means, a point close to a cluster of points classified as ‘Red’ has a higher probability of
getting classified as ‘Red’.
Intuitively, we can see that the first point (2.5, 7) should be classified as ‘Green’ and the second point
(5.5, 4.5) should be classified as ‘Red’.

02. What is Ruled-Base Classifier? Explain how a 04 5 CO-2 2


is Ruled-Base Classifier works

IF-THEN Rules

Rule-based classifier makes use of a set of IF-THEN rules for classification. We can express a rule in
the following from −
IF condition THEN conclusion
Let us consider a rule R1,
R1: IF age = youth AND student = yes
THEN buy_computer = yes
Points to remember −

● The IF part of the rule is called rule antecedent or precondition.


● The THEN part of the rule is called rule consequent.
● The antecedent part the condition consist of one or more attribute tests and these tests are
logically ANDed.
● The consequent part consists of class prediction.

Note − We can also write rule R1 as follows −


R1: (age = youth) ^ (student = yes))(buys computer = yes)
If the condition holds true for a given tuple, then the antecedent is satisfied.
Rule Extraction
Here we will learn how to build a rule-based classifier by extracting IF-THEN rules from a decision
tree.
Points to remember −
To extract a rule from a decision tree −
● One rule is created for each path from the root to the leaf node.
● To form a rule antecedent, each splitting criterion is logically ANDed.
● The leaf node holds the class prediction, forming the rule consequent.

03. Define Bayesian Belief Networks 02 5 CO-3 1

A ​Bayesian Belief Network (​BBN)​ , or simply ​Bayesian Network​, is a ​statistical model used to
describe the ​conditional dependencies​ between different random variables.
BBNs are chiefly used in areas like computational biology and medicine for ​risk analysis and
decision support (basically, to understand what caused a certain problem, or the probabilities of
different effects given an action).

Structure of a Bayesian Network

A typical BBN looks something like this:

The shown example, ‘​Burglary-Alarm​‘ is one of the most quoted ones in texts on Bayesian theory.
04. Write a short note on-Accuracy, Error Rate 02 6 CO-2 2

Accuracy :
Accuracy is an indicator of a measurement that is true. Simply put, we are looking at how close is the
average of all measurements to the real value of what is measured. So, in average, we are really
measuring what we say we are measuring. If we were to use shooting as an example, high accuracy
would mean that the average of all shots taken is right at the target, or very close to it. In the case of
web guiding, accurate sensing and guiding of material means the average sensing and placement of
the material is very close to the true and desired position. However, the spread of individual
occasions of the positioning of the material might be so wide that it makes the accuracy useless. In
the shooting example, we would have a wide pattern with the average on the target.

Error Rate:
The degree of errors encountered during data transmission over a communications or network
connection. The higher the ​error rate​, the less reliable the connection or data transfer will be.
The term error rate can refer to anything where errors can occur. For example, when taking a typing
test that measures errors an error rate is used to calculate your final score or​ ​net WPM​.

05. Explain holdout method in data mining 04 6 CO-2 2

If we had access to an unlimited number of examples these questions have a simple answer. Choose
the model that provides the lowest error rate on the entire population and, of course, that error rate is
the true error rate
In real applications we only have access to a finite set of examples, usually smaller than we wanted.
One approach is to use the entire training data to select our classifier and estimate the error rate
This naïve approach has two fundamental problems
The final model will normally overfit the training data
The error rate estimate will be overly optimistic (lower than the true error rate) , Actually, it is not
uncommon to have 100% correct classification on training data
A much better approach is to split the training data into disjoint subsets: the holdout method
The holdout method
Split dataset into two groups
Training set: used to train the classifier
Test set (or ‘hold out’ set) : used to estimate the error rate of the trained classifier
The holdout method has two basic drawbacks
In problems where we have a small dataset we may not be able to afford the “luxury” of setting aside
a portion of the dataset for testing
Since it is a single train-and-test experiment, the holdout estimate of performance (for example error
rate) will be misleading if we happen to get an “unfortunate” split between train and test
The limitations of the holdout can be overcome with a family of resampling methods at the expense
of more computations
Cross Validation
Random Subsampling
K-Fold Cross-Validation
Leave-one-out Cross-Validation
Bootstrap
Class test Question Papers

CLASS TEST- I
(AY 2018-19)
Branch: B.E. Computer Engineering Date: 27/07/2018

Semester: I Duration: 1 hour

Subject: Data Mining and Warehousing (410244(D)) Max. Marks: 20M


Note: 1. All Questions are compulsory
2. Bloom’s Taxonomy level: ​Bloom Levels (BL) : 1. Remember 2. Understand 3. Apply 4. Create
3. All questions are as per course outcomes
4. Assume suitable data wherever is required.
Question Questions Max. Unit no. CO Bloom’s
No. Marks as per Mapped Taxonomy
syllabus Level
01. What are the Steps involved in data 05 1 CO-1 2
pre-processing? Discuss.
02. Explain the concept hierarchy. 05 1 CO-4 1

03. What is the difference between OLTP an 05 2 CO-3 3


OLAP?

04. What is Multi-Dimensional Modeling? What is 05 2 CO-2 2


the use of Snow Flake Schema.

01. What are the Steps involved in data 04 1 CO-1 2


pre-processing? Discuss.

Preprocessing in Data Mining:


Data preprocessing is a data mining technique which is used to transform the raw data in a useful and
efficient format.
Steps Involved in Data Preprocessing:
1. Data Cleaning:
The data can have many irrelevant and missing parts. To handle this part, data cleaning is done. It
involves handling of missing data, noisy data etc.

● (a). Missing Data:


This situation arises when some data is missing in the data. It can be handled in various
ways.
Some of them are:
1. Ignore the tuples:
This approach is suitable only when the dataset we have is quite large and multiple
values are missing within a tuple.
2. Fill the Missing values:
There are various ways to do this task. You can choose to fill the missing values
manually, by attribute mean or the most probable value.
● (b). Noisy Data:
Noisy data is a meaningless data that can’t be interpreted by machines.It can be generated
due to faulty data collection, data entry errors etc. It can be handled in following ways :

1. Binning Method:
This method works on sorted data in order to smooth it. The whole data is divided
into segments of equal size and then various methods are performed to complete the
task. Each segmented is handled separately. One can replace all data in a segment by
its mean or boundary values can be used to complete the task.
2. Regression:
Here data can be made smooth by fitting it to a regression function.The regression
used may be linear (having one independent variable) or multiple (having multiple
independent variables).
3. Clustering:
This approach groups the similar data in a cluster. The outliers may be undetected or
it will fall outside the clusters.

2. Data Transformation:
This step is taken in order to transform the data in appropriate forms suitable for mining process.
This involves following ways:

1. Normalization:
It is done in order to scale the data values in a specified range (-1.0 to 1.0 or 0.0 to
1.0)

2. Attribute Selection:
In this strategy, new attributes are constructed from the given set of attributes to help the
mining process.

3. Discretization:
This is done to replace the raw values of numeric attribute by interval levels or conceptual
levels.

4. Concept Hierarchy Generation:


Here attributes are converted from level to higher level in hierarchy. For Example-The
attribute “city” can be converted to “country”.

3. Data Reduction:
Since data mining is a technique that is used to handle huge amount of data. While working with
huge volume of data, analysis became harder in such cases. In order to get rid of this, we uses data
reduction technique. It aims to increase the storage efficiency and reduce data storage and analysis
costs.
The various steps to data reduction are:

1. Data Cube Aggregation:


Aggregation operation is applied to data for the construction of the data cube.
2. Attribute Subset Selection:
The highly relevant attributes should be used, rest all can be discarded. For
performing attribute selection, one can use level of significance and p- value of the
attribute.the attribute having p-value greater than significance level can be discarded.
3. Numerosity Reduction:
This enable to store the model of data instead of whole data, for example: Regression
Models.
4. Dimensionality Reduction:
This reduce the size of data by encoding mechanisms.It can be lossy or lossless. If
after reconstruction from compressed data, original data can be retrieved, such
reduction are called lossless reduction else it is called lossy reduction. The two
effective methods of dimensionality reduction are:Wavelet transforms and PCA
(Principal Component Analysis).

02. Explain the concept hierarchy. 04 1 CO-4 1

Concept Hierarchy reduce the data by collecting and replacing low level concepts (such as numeric
values for the attribute age) by higher level concepts (such as young, middle-aged, or senior).
Concept hierarchy generation for numeric data is as follows:

● Binning (see sections before)


● Histogram analysis (see sections before)
● Clustering analysis (see sections before)
● Entropy-based discretization
● Segmentation by natural partitioning
● Binning
○ In binning, first sort data and partition into (equi-depth) bins then one can smooth by
bin means, smooth by bin median, smooth by bin boundaries, etc.
● Histogram analysis
○ Histogram is a popular data reduction technique
○ Divide data into buckets and store average (sum) for each bucket
○ Can be constructed optimally in one dimension using dynamic programming
○ Related to quantization problems.
● Clustering analysis
○ Partition data set into clusters, and one can store cluster representation only
○ Can be very effective if data is clustered but not if data is “smeared”
○ Can have hierarchical clustering and be stored in multi-dimensional index tree
structures
● Entropy-based discretization
○ Given a set of samples S, if S is partitioned into two intervals S1 and S2 using
boundary T, the entropy after partitioning is

○ – S1 & S2 correspond to samples in S satisfying conditions A<v &amp;="" a="">=v


○ The boundary that minimizes the entropy function over all possible boundaries is
selected as a binary discretization.
○ The process is recursively applied to partitions obtained until some stopping criterion
is met, e.g., Ent (S)- E(T,S)>δ
○ Experiments show that it may reduce data size and improve classification accuracy
● Segmentation by natural partitioning
○ 3-4-5 rule can be used to segment numeric data into relatively uniform, “natural”
intervals.
○ If an interval covers 3, 6, 7 or 9 distinct values at the most significant digit, partition
the range into 3 equi-width intervals
○ If it covers 2, 4, or 8 distinct values at the most significant digit, partition the range
into 4 intervals
○ If it covers 1, 5, or 10 distinct values at the most significant digit, partition the range
into 5 intervals

Concept hierarchy generation for categorical data is as follows:

● Specification of a set of attributes, but not of their partial ordering


○ Auto generate the attribute ordering based upon observation that attribute defining a
high level concept has a smaller # of distinct values than an attribute defining a lower
level concept
○ Example : country (15), state_or_province (365), city (3567), street (674,339)
● Specification of only a partial set of attributes
○ Try and parse database schema to determine complete hierarchy

03. What is the difference between OLTP an 04 2 CO-3 3


OLAP?

Basis for OLTP OLAP


Comparison
Basic It is an online transactional system and It is an online data retrieving and data
manages database modification. analysis system.

Focus Insert, Update, Delete information from Extract data for analyzing that helps in
the database decision making

Data OLTP and its transactions are the original Different OLTPs database becomes the
source of data. source of data for OLAP.

Transaction OLTP has short transactions. OLTP has long transactions.

Time The processing time of a transaction is The processing time of a transaction is


comparatively less in OLTP. comparatively more in OLAP.

Queries Simpler queries. Complex queries.

Normalization Tables in OLTP database are normalized Tables in OLAP database are not
(3NF). normalized.

Integrity OLTP database must maintain data OLAP database does not get frequently
integrity constraint. modified. Hence, data integrity is not
affected.

04. What is Multi-Dimensional Modeling? What is 04 2 CO-2 2


the use of Snowflake Schema.

The multidimensional data model is an integral part of On-Line Analytical Processing, or OLAP.
Because OLAP is on-line, it must provide answers quickly; analysts pose iterative queries during
interactive sessions, not in batch jobs that run overnight. And because OLAP is also analytic, the
queries are complex. The multidimensional data model is designed to solve complex queries in real
time. The multidimensional data model is important because it enforces simplicity.

What is snowflaking?
The snowflake design is the result of further expansion and normalized of the dimension table. In
other words, a dimension table is said to be snowflaked if the low-cardinality attribute of the
dimensions have been divided into separate normalized tables. These tables are then joined to the
original dimension table with referential constraints(foreign key constraint).
Generally, snowflaking is not recommended in the dimension table, as it hampers the
understandability and performance of the dimension model as more tables would be required to be
joined to satisfy the queries.
Characteristics of snowflake schema:
The dimension model of snowflake under the following conditions:

● The snowflake schema uses small disk space.


● It is easy to implement dimension is added to schema.
● There are multiple tables, so performance is reduced.
● The dimension table consist of two or more sets of attributes which define information at
different grains.
● The sets of attributes of the same dimension table are being populate by different source
systems.

Advantages:
There are two main advantages of snowflake schema given below:

● It provides structured data which reduces the the problem of data integrity.
● It uses small disk space because data are highly structured.

Disadvantages:

● Snowflaking reduces space consumed by dimension tables, but compared with the entire data
warehouse the saving is usually insignificant.
● Avoid snowflaking or normalization of a dimension table, unless required and appropriate.
● Do not snowflake hierarchies of one dimension table into separate tables. Hierarchies should
belong to the dimension table only and should never be snowfalked.
● Multiple hierarchies can belong to the same dimension has been designed at the lowest
possible detail.
CLASS TEST- I
(AY 2018-19)
Branch: Computer Engineering Date: 27/07/2018

Semester: I Duration: 1 hour

Subject: Data Mining and Warehousing (410244(D)) Max. Marks: 20M

Note: 1. All Questions are compulsory


2. Bloom’s Taxonomy level: ​Bloom Levels (BL) : 1. Remember 2. Understand 3. Apply 4. Create
3. All questions are as per course outcomes
4. Assume suitable data wherever is required.

Questio Questions Max. Unit no. CO Bloom’s


n No. Marks as per Mapped Taxonomy
syllabus Level
01. Describe the functions of various components in a 05 1 CO-1 2
typical Multi-tiered Data Warehouse architecture
with the block diagram.

02. Describes the application and trends in data 05 1 CO-4 2


mining in details.
03. Discuss the various OLAP operations which can 05 2 CO-3 3
be performed on a multidimensional data cube.
04. Draw and explain the architecture of a typical 05 2 CO-1 1
data mining system.

01. Describe the functions of various components in a 05 1 CO-1 2


typical Multi-tiered Data Warehouse architecture
with the block diagram.

To design an effective and efficient data warehouse, we need to understand and analyze the business
needs and construct a business analysis framework. Each person has different views regarding the
design of a data warehouse. These views are as follows −

● The top-down view − This view allows the selection of relevant information needed for a
data warehouse.
● The data source view ​− This view presents the information being captured, stored, and
managed by the operational system.
● The data warehouse view − This view includes the fact tables and dimension tables. It
represents the information stored inside the data warehouse.
● The business query view ​− It is the view of the data from the viewpoint of the end-user.

Three-Tier Data Warehouse Architecture


Generally a data warehouses adopts a three-tier architecture. Following are the three tiers of the data
warehouse architecture.

● Bottom Tier ​− The bottom tier of the architecture is the data warehouse database server. It is
the relational database system. We use the back end tools and utilities to feed data into the
bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh
functions.
● Middle Tier − In the middle tier, we have the OLAP Server that can be implemented in
either of the following ways.
○ By Relational OLAP (ROLAP), which is an extended relational database management
system. The ROLAP maps the operations on multidimensional data to standard
relational operations.
○ By Multidimensional OLAP (MOLAP) model, which directly implements the
multidimensional data and operations.
● Top-Tier − ​This tier is the front-end client layer. This layer holds the query tools and
reporting tools, analysis tools and data mining tools.

The following diagram depicts the three-tier architecture of data warehouse −


02. Describes the application and trends in data 05 1 CO-4 2
mining in details.

Data mining is widely used in diverse areas. There are a number of commercial data mining system
available today and yet there are many challenges in this field.

Data Mining Applications

Here is the list of areas where data mining is widely used −

● Financial Data Analysis


● Retail Industry
● Telecommunication Industry
● Biological Data Analysis
● Other Scientific Applications
● Intrusion Detection

Financial Data Analysis

The financial data in banking and financial industry is generally reliable and of high quality which
facilitates systematic data analysis and data mining. Some of the typical cases are as follows −

● Design and construction of data warehouses for multidimensional data analysis and data
mining.
● Loan payment prediction and customer credit policy analysis.
● Classification and clustering of customers for targeted marketing.
● Detection of money laundering and other financial crimes.

Retail Industry

Data Mining has its great application in Retail Industry because it collects large amount of data from
on sales, customer purchasing history, goods transportation, consumption and services. It is natural
that the quantity of data collected will continue to expand rapidly because of the increasing ease,
availability and popularity of the web.
Data mining in retail industry helps in identifying customer buying patterns and trends that lead to
improved quality of customer service and good customer retention and satisfaction. Here is the list of
examples of data mining in the retail industry −

● Design and Construction of data warehouses based on the benefits of data mining.
● Multidimensional analysis of sales, customers, products, time and region.
● Analysis of effectiveness of sales campaigns.
● Customer Retention.
● Product recommendation and cross-referencing of items.

Telecommunication Industry

Today the telecommunication industry is one of the most emerging industries providing various
services such as fax, pager, cellular phone, internet messenger, images, e-mail, web data
transmission, etc. Due to the development of new computer and communication technologies, the
telecommunication industry is rapidly expanding. This is the reason why data mining is become very
important to help and understand the business.
Data mining in telecommunication industry helps in identifying the telecommunication patterns,
catch fraudulent activities, make better use of resource, and improve quality of service. Here is the
list of examples for which data mining improves telecommunication services −

● Multidimensional Analysis of Telecommunication data.


● Fraudulent pattern analysis.
● Identification of unusual patterns.
● Multidimensional association and sequential patterns analysis.
● Mobile Telecommunication services.
● Use of visualization tools in telecommunication data analysis.

Biological Data Analysis

In recent times, we have seen a tremendous growth in the field of biology such as genomics,
proteomics, functional Genomics and biomedical research. Biological data mining is a very
important part of Bioinformatics. Following are the aspects in which data mining contributes for
biological data analysis −

● Semantic integration of heterogeneous, distributed genomic and proteomic databases.


● Alignment, indexing, similarity search and comparative analysis multiple nucleotide
sequences.
● Discovery of structural patterns and analysis of genetic networks and protein pathways.
● Association and path analysis.

03. Discuss the various OLAP operations which can 05 2 CO-3 3


be performed on a multidimensional data cube.

OLAP is a category of software that allows users to analyze information from multiple database
systems at the same time. It is a technology that enables analysts to extract and view business data
from different points of view. OLAP stands for Online Analytical Processing.
Analysts frequently need to group, aggregate and join data. These operations in relational databases
are resource intensive. With OLAP data can be pre-calculated and pre-aggregated, making analysis
faster.
OLAP databases are divided into one or more cubes. The cubes are designed in such a way that
creating and viewing reports become easy.

Basic analytical operations of OLAP


Four types of analytical operations in OLAP are:

1. Roll-up
2. Drill-down
3. Slice and dice
4. Pivot (rotate)

1) Roll-up:
Roll-up is also known as "consolidation" or "aggregation." The Roll-up operation can be performed
in 2 ways

1. Reducing dimensions
2. Climbing up concept hierarchy. Concept hierarchy is a system of grouping things based on
their order or level.

Consider the following diagram

2) Drill-down
In drill-down data is fragmented into smaller parts. It is the opposite of the rollup process. It can be
done via

● Moving down the concept hierarchy


● Increasing a dimension

Consider the diagram above

● Quater Q1 is drilled down to months January, February, and March. Corresponding sales are
also registers.
● In this example, dimension months are added.
3) Slice:
Here, one dimension is selected, and a new sub-cube is created.
Following diagram explain how slice operation performed:

● Dimension Time is Sliced with Q1 as the filter.


● A new cube is created altogether.

3)Dice:
This operation is similar to a slice. The difference in dice is you select 2 or more dimensions that
result in the creation of a sub-cube.
4) Pivot
In Pivot, you rotate the data axes to provide a substitute presentation of data.
In the following example, the pivot is based on item types.
04. Draw and explain the architecture of a typical 05 2 CO-1 1
data mining system.

The architecture of a typical data mining system may have the following major components
Database, data warehouse, World Wide Web, or other information repository:
This is one or a set of databases, data warehouses, spreadsheets, or other kinds of information
repositories. Data cleaning and data integration techniques may be performed on the data.
Database or data warehouse server:
The database or data warehouse server is responsible for fetching the relevant data, based on the
user’s data mining request.
Knowledge base:
This is the domain knowledge that is used to guide the search or evaluate the interestingness of
resulting patterns. Such knowledge can include concept hierarchies, used to organize attributes or
attribute values into different levels of abstraction. Knowledge such as user beliefs, which can be
used to assess a pattern’s interestingness based on its unexpectedness, may also be included.
Data mining engine:
This is essential to the data mining system and ideally consists of a set of functional modules for
tasks such as characterization, association and correlation analysis, classification, prediction, cluster
analysis, outlier analysis, and evolution analysis.
Pattern evaluation module:
This component typically employs interestingness measures and interacts with the data mining
modules so as to focus the search toward interesting patterns.
It may use interestingness thresholds to filter out discovered patterns.
Alternatively, the pattern evaluation module may be integrated with the mining module, depending
on the implementation of the data mining method used.
For efficient data mining, it is highly recommended to push the evaluation of pattern interestingness
as deep as possible into the mining process so as to confine the search to only the interesting patterns.
User interface:
This module communicates between users and the data mining system, allowing the user to interact
with the system by specifying a data mining query or task, providing information to help focus the
search, and performing exploratory data mining based on the intermediate data mining results.
In addition, this component allows the user to browse database and data warehouse schemas or data
structures, evaluate mined patterns, and visualize the patterns in different forms.
DIAGRAM:
Q.1. Unlike traditional production rules, association rules
allow the same variable to be an input attribute in one rule and an output attribute in another rule.

Q.2. The apriori algorithm is used for the following data mining task

​Association

Q.3. Which of the following is not a data mining functionality?


Selection and interpretation

Q.4. ……………………….. is a summarization of the general characteristics or features of a


target class of data.
Data Characterization

Q.5. A snowflake schema is which of the following types of tables?

All of the above


01. Explain application of Market Basket Analysis 05 4 CO-1 2

Market Basket Analysis


Market basket analysis is a method or technique of data analysis for retail and marketing purpose.
The idea behind market basket analysis has emerged from customers who are buying and adding
different products to their shopping cart or in a market basket. Market basket analysis is done to
understand the purchasing behavior of customers. MBA (Market Business Analysis) is used to
uncover what items are frequently brought together by the customer. Market basket analysis leads to
effective sales and marketing.
Market basket analysis measures the co-occurrence of products and services. Market basket analysis
is only considered when there is a transaction between two or more items. It does not entertain a
single product. There should be a relation between two products in a market basket. If a customer is
buying a particular product he is likely to buy some related goods to compliment the first one. For
instance, if a customer is buying bread then he is likely to buy butter, jam or milk to compliment
bread. Market basket analysis is used by retailers so that they can make a purchase suggestion to
their customers. It is also used to predict future purchase decision of a customer.

Application of Market Basket Analysis

Market basket analysis is applied to various fields of the retail sector in order to boost sales and
generate revenue by identifying the needs of the customers and make purchase suggestions to them.
1. Cross Selling: Cross-selling is basically a sales technique in which seller suggests some
related product to a customer after he buys a product. A seller influences the customer to
spend more by purchasing more products related to the product that has already been
purchased by him. For instance, if someone buys milk from a store, the seller asks or suggests
him to buy coffee or tea as well. So basically the seller suggests the complementary product
to the customer with the product that he has already purchased. Market basket analysis helps
the retailer to know the consumer behavior and then go for cross-selling.
2. Product Placement: It refers to placing the complimentary (pen and paper)and substitute
goods (tea and coffee) together so that the customer addresses the goods and will buy both
the goods together. If a seller places these kinds of goods together there is a probability that a
customer will purchase them together. Market basket analysis helps the retailer to identify the
goods that a customer can purchase together.
3. Affinity Promotion: Affinity promotion is a method of promotion that design promotional
events based on associated products. Market basket analysis affinity promotion is a useful
way to prepare and analyze questionnaire data.
4. Fraud Detection: Market basket analysis is also applied to fraud detection. It may be
possible to identify purchase behavior that can associate with fraud on the basis of market
basket analysis data that contain credit card usage. Hence market basket analysis is also
useful in fraud detection.
5. Customer Behavior: Market basket analysis helps to understand customer behavior. It
understands the customer behavior under different conditions. It provides an insight into
customer behavior. It allows the retailer to identify the relationship between two products that
people tend to buy and hence helps to understand the customer behavior towards a product or
service.
Hence, market basket analysis helps the retailer to get an insight into customer behavior and to
understand the relationship between two or more goods so that they can offer or do purchase
suggestions to their customers so that they will buy more from their stores and they can earn great
revenue.

02. Write Apriori Algorithm and explain it with 05 4 CO-4 21


suitable example.
Three significant components comprise the apriori algorithm. They are as follows.

● Support
● Confidence
● Lift

This example will make things easy to understand.


As mentioned earlier, you need a big database. Let us suppose you have 2000 customer transactions
in a supermarket. You have to find the Support, Confidence, and Lift for two items, say bread and
jam. It is because people frequently bundle these two items together.
Out of the 2000 transactions, 200 contain jam whereas 300 contain bread. These 300 transactions
include a 100 that includes bread as well as jam. Using this data, we shall find out the support,
confidence, and lift.
Support

Support is the default popularity of any item. You calculate the Support as a quotient of the division
of the number of transactions containing that item by the total number of transactions. Hence, in our
example,
Support (Jam) = (Transactions involving jam) / (Total Transactions)
= 200/2000 = 10%

Confidence

In our example, Confidence is the likelihood that customer bought both bread and jam. Dividing the
number of transactions that include both bread and jam by the total number of transactions will give
the Confidence figure.
Confidence = (Transactions involving both bread and jam) / (Total Transactions involving jam)
= 100/200 = 50%
It implies that 50% of customers who bought jam bought bread as well.

Lift

According to our example, Lift is the increase in the ratio of the sale of bread when you sell jam. The
mathematical formula of Lift is as follows.
Lift = (Confidence (Jam͢͢ – Bread)) / (Support (Jam))
= 50 / 10 = 5
It says that the likelihood of a customer buying both jam and bread together is 5 times more than the
chance of purchasing jam alone. If the Lift value is less than 1, it entails that the customers are
unlikely to buy both the items together. Greater the value, the better is the combination.
Consider a supermarket scenario where the itemset is I = {Onion, Burger, Potato, Milk, Beer}. The
database consists of six transactions where 1 represents the presence of the item and 0 the absence.

The Apriori Algorithm makes the following assumptions.

● All subsets of a frequent itemset should be frequent.


● In the same way, the subsets of an infrequent itemset should be infrequent.
● Set a threshold support level. In our case, we shall fix it at 50%

Step 1
Create a frequency table of all the items that occur in all the transactions. Now, prune the frequency
table to include only those items having a threshold support level over 50%. We arrive at this
frequency table.

This table signifies the items frequently bought by the customers.

Step 2
Make pairs of items such as OP, OB, OM, PB, PM, BM. This frequency table is what you arrive at.
Step 3
Apply the same threshold support of 50% and consider the items that exceed 50% (in this case 3 and
above).
Thus, you are left with OP, OB, PB, and PM

Step 4
Look for a set of three items that the customers buy together. Thus we get this combination.

● OP and OB gives OPB


● PB and PM gives PBM

Step 5
Determine the frequency of these two itemsets. You get this frequency table.
If you apply the threshold assumption, you can deduce that the set of three items frequently
purchased by the customers is OPB
03. What are the issues regarding classification and 05 5 CO-3 3
prediction?

Issues Regarding Classification and Prediction


Preparing the Data for Classification and Prediction
The following preprocessing steps may be applied to the data in order to help improve the accuracy,
efficiency, and scalability of the classification or prediction process.
Data Cleaning: ​This refers to the preprocessing of data in order to remove or reduce noise (by
applying smoothing techniques) and the treatment of missing values (e.g., by replacing a missing
value with the most commonly occurring value for that attribute, or with the most probable value
based on statistics.) Although most classification algorithms have some mechanisms for handling
noisy or missing data, this step can help reduce confusion during learning.
Relevance Analysis: ​Many of the attributes in the data may be irrelevant to the classification or
prediction task. For example, data recording the day of the week on which a bank loan application
was filed is unlikely to be relevant to the success of the application. Furthermore, other attributes
may be redundant. Hence, relevance analysis may be performed on the data with the aim of
removing any irrelevant or redundant attributes from the learning process. In machine learning, this
step is known as feature selection. Including such attributes may otherwise slow down, and possibly
mislead, the learning step.
Ideally, the time spent on relevance analysis, when added to the time spent on learning from the
resulting “reduced” feature subset should be less than the time that would have been spent on
learning from the original set of features. Hence, such analysis can help improve classification
efficiency and scalability.
Data Transformation: ​The data can be generalized to higher – level concepts. Concept hierarchies
may be used for this purpose. This is particularly useful for continuous – valued attributes. For
example, numeric values for the attribute income may be generalized to discrete ranges such as low,
medium, and high. Similarly, nominal – valued attributes like street, can be generalized to higher –
level concepts, like city. Since generalization compresses the original training data, fewer input /
output operations may be involved during learning.
The data may also be normalized, particularly when neural networks or methods involving distance
measurements are used in the learning step. Normalization involves scaling all values for a given
attribute so that they fall within a small specified range, such as – 1.0 to 1.0, or 0.0 to 1.0. In methods
that use distance measurements, for example, this would prevent attributes with initially large ranges
(like, say, income) from outweighing attributes with initially smaller ranges (such as binary
attributes).
Comparing Classification Methods
Classification and prediction methods can be compared and evaluated according to the
following criteria:
Predictive Accuracy: ​This refers to the ability of the model to correctly predict the class label of
new or previously unseen data.
Speed: ​This refers to the computation costs involved in generating and using the model.
Robustness: ​This is the ability of the model to make correct predictions given noisy data or data
with missing values.
Scalability:​ This refers to the ability to construct the model efficiently given large amount of data.
Interpretability:​ This refers to the level of understanding and insight that is provided by the model.

Classification by Decision Tree Induction


“What is a decision tree”? ​A decision tree is a flow – chart – like tree structure, where each internal
node denotes a test on an attribute, each branch represents an outcome of the test, and leaf nodes
represent classes or class distributions. The top – most node in a tree is the root node.
A typical decision tree is shown in Figure 8.2. It represents the concept buys_ computer, that is, it
predicts whether or not a customer at All Electronics is likely to purchase a computer. Internal nodes
are denoted by rectangles, and leaf nodes are denoted by ovals.
age? < = 30
31 … 40 > 40
In order to classify an unknown sample, the attribute values of the sample are tested against the
decision tree. A path is traced from the root to a leaf node that holds the class prediction for that
sample. Decision trees can easily be converted to classification rules.

04. Explain with suitable example K-nearest 05 5 CO-2 2


Neighbor Classifier

The KNN Algorithm

1. Load the data


2. Initialize K to your chosen number of neighbors
3. For each example in the data
3.1 Calculate the distance between the query example and the current example from the data.
3.2 Add the distance and the index of the example to an ordered collection
4. Sort the ordered collection of distances and indices from smallest to largest (in ascending
order) by the distances
5. Pick the first K entries from the sorted collection
6. Get the labels of the selected K entries
7. If regression, return the mean of the K labels
8. If classification, return the mode of the K labels

Choosing the right value for K

To select the K that’s right for your data, we run the KNN algorithm several times with
different values of K and choose the K that reduces the number of errors we encounter while
maintaining the algorithm’s ability to accurately make predictions when it’s given data it
hasn’t seen before.

1. As we decrease the value of K to 1, our predictions become less stable. Just think for a
minute, image K=1 and we have a query point surrounded by several reds and one green (I’m
thinking about the top left corner of the colored plot above), but the green is the single nearest
neighbor. Reasonably, we would think the query point is most likely red, but because K=1,
KNN incorrectly predicts that the query point is green.
2. Inversely, as we increase the value of K, our predictions become more stable due to majority
voting / averaging, and thus, more likely to make more accurate predictions (up to a certain
point). Eventually, we begin to witness an increasing number of errors. It is at this point we
know we have pushed the value of K too far.
3. In cases where we are taking a majority vote (e.g. picking the mode in a classification
problem) among labels, we usually make K an odd number to have a tiebreaker.

Advantages

1. The algorithm is simple and easy to implement.


2. There’s no need to build a model, tune several parameters, or make additional assumptions.
3. The algorithm is versatile. It can be used for classification, regression, and search

Disadvantages

1. The algorithm gets significantly slower as the number of examples and/or


predictors/independent variables increase.
Prelim Exam (AY 2018-19)
Branch:BE Date: 12/10/2018
Semester: I Duration: 2:30 hour
Subject:Data Mining & Warehousing (2015 Pattern) Max. Marks: 70
Note: (1) Answer Q. 1 or Q. 2, Q. 3 or Q. 4, Q. 5 or Q. 6, Q. 7 or Q. 8, Q. 9 or Q. 10.
(2) Figures to the right indicate full marks. (3) Neat diagrams must be drawn wherever necessary.
(4) Assume suitable data, if necessary
Max. CO Bloom's
Questions
Marks mapped Taxonomy
Level
Q.1 a) What are the Steps involved in data pre-processing? Discuss. 10 CO2 4
CO3 2
b) Describe the functions of various components in a typical Multi-tiered Data Warehouse
architecture with the block diagram
OR CO2 4
10
Q.2 a) What is Multi-Dimensional Modeling? What is the use of Snow Flake Schema.
CO 4
b) What is the difference between OLTP an OLAP?

Q.3 a) Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8): Compute the 10 CO3 4
Euclidean distance between the two objects using q = 3..
b) Explain Proximity Measures for Nominal Attributes and Binary Attributes. CO4 2
OR
Q.4 a) Explain the Process of Data Warehouse Design with suitable diagram . CO3 1
10
b) Explain four types of attributes by giving appropriate example? CO2 4

Q.5 a.) The following is the list of large two item sets. Show the steps to apply the Apriori property to 16 CO1 2&3
generate and prune the candidates for large three itemsets. Describe how the Apriori property is used is
in the steps. Give the final list of candidate large three item sets {10,20} {10,30} {20,30} {20,40}
CO2 4
b.) Explain Mining Frequent Patterns using FP-Growth

OR
CO3 4
16
Q.6 a). What is rule based classifier? Explain how a rule based classifier works.
CO2 3
b). Write the algorithm for k-nearest neighbour classification

Q.7 a.) Explain in detail the candidate generation procedures . 16 CO3 3

b.) Discuss the methods for estimating predictive accuracy of classification method. CO2 4

OR
Q.8 a.) Develop the Apriori Algorithm for generating frequent itemset. CO1 2
16
b.) What is the purpose of performing cross-validation? Give one example. CO3
1
Q.9 a.) Explain how the Bayesian Belief Networks are trained to perform classification. CO2 3
2
b.) What is Ruled-Base Classifier? Explain how a is Ruled-Base Classifier works.
CO1
OR
CO3
Q.10 a). Difference between Wholistic learning and multi-perspective learning. 2&3
b) Write a short note on-
1. Accuracy, 2. Error Rate, CO1 4
3. Precision, 4. Recall
Q.1.)

a)

x′:=(x−xmin)/(xmax−xmin)

Let income range $12,000 to $98,000 normalized to [0.0, 1.0].

Then $73,000 is mapped to

(1.0-0)+ 0 = 0.716

b)

1. Parsing
2. Correcting
3. Standardizing
4. Matching
5. Conolidating
6. Data Cleaning
7. Data staging

c)

Correlation is often used as a preliminary technique to discover relationships between variables.


More precisely, the correlation is a measure of the linear relationship between two variables.
Pearson’s correlation coefficient is defined as:

As written above, the main drawback of correlation is the linear relationship restriction. If the
correlation is null between two variables, they may be non-linearly related.

Q.2

a)
In context of data reduction in data mining there are a few basic methods of attribute subset
selection
1) Stepwise forward selection: This procedure begins with an empty set of attributes as the
reduced set (temporarily).Next the best among the original attributes is determined and added to
the reduced set. With each iteration the best among the remaining original attributes is added to
the reduced set.

2) Stepwise backward elimination: This procedure begins with full set of attributes .Each step
sees the worst attribute getting removed.

3) Combination of forward selection and backward elimination: Here the first two methods are
combined and the procedure at every step selects the best attribute and removes the worst.

4) Decision tree induction (ID3,C4.5 etc) are employed to construct a flowchart like structure
and each non leaf node is a test on attribute whereas the external node is a prediction. Algorithm
selects best attribute.

A tree is constructed and attributes that are not selected are assumed to be irrelevant.

b)

c)

Simple Random Sampling (SRS)


Stratified Sampling
Cluster Sampling
Systematic Sampling
Multistage Sampling (in which some of the methods above are combined in stages)

Q.3)

a) Data Warehouse Models


From the perspective of data warehouse architecture, we have the following data warehouse
models −

 Virtual Warehouse

 Data mart
 Enterprise Warehouse
Virtual Warehouse
The view over an operational data warehouse is known as a virtual warehouse. It is easy to build
a virtual warehouse. Building a virtual warehouse requires excess capacity on operational
database servers.

Data Mart
Data mart contains a subset of organization-wide data. This subset of data is valuable to specific
groups of an organization.

In other words, we can claim that data marts contain data specific to a particular group. For
example, the marketing data mart may contain data related to items, customers, and sales. Data
marts are confined to subjects.

Points to remember about data marts −

 Window-based or Unix/Linux-based servers are used to implement data marts. They are
implemented on low-cost servers.

 The implementation data mart cycles is measured in short periods of time, i.e., in weeks
rather than months or years.

 The life cycle of a data mart may be complex in long run, if its planning and design are
not organization-wide.

 Data marts are small in size.

 Data marts are customized by department.

 The source of a data mart is departmentally structured data warehouse.

 Data mart are flexible.

Enterprise Warehouse
 An enterprise warehouse collects all the information and the subjects spanning an entire
organization

 It provides us enterprise-wide data integration.

 The data is integrated from operational systems and external information providers.
 This information can vary from a few gigabytes to hundreds of gigabytes, terabytes or
beyond.

b)
c)

Concept Hierarchy reduce the data by collecting and replacing low level concepts (such as
numeric values for the attribute age) by higher level concepts (such as young, middle-aged, or
senior).
Concept hierarchy generation for numeric data is as follows:

 Binning
 Histogram analysis
 Clustering analysis
 Entropy-based discretization
 Segmentation by natural partitioning
 Binning
o In binning, first sort data and partition into (equi-depth) bins then one can smooth
by bin means, smooth by bin median, smooth by bin boundaries, etc.
 Histogram analysis
o Histogram is a popular data reduction technique
o Divide data into buckets and store average (sum) for each bucket
o Can be constructed optimally in one dimension using dynamic programming
o Related to quantization problems.
 Clustering analysis
o Partition data set into clusters, and one can store cluster representation only
o Can be very effective if data is clustered but not if data is “smeared”
o Can have hierarchical clustering and be stored in multi-dimensional index tree
structures

Q.4)

a) Data Warehouse Architecture


Generally a data warehouses adopts a three-tier architecture. Following are the three tiers of the
data warehouse architecture.

 Bottom Tier − The bottom tier of the architecture is the data warehouse database server.
It is the relational database system. We use the back end tools and utilities to feed data
into the bottom tier. These back end tools and utilities perform the Extract, Clean, Load,
and refresh functions.

 Middle Tier − In the middle tier, we have the OLAP Server that can be implemented in
either of the following ways.
o By Relational OLAP (ROLAP), which is an extended relational database

management system. The ROLAP maps the operations on multidimensional data


to standard relational operations.

o By Multidimensional OLAP (MOLAP) model, which directly implements the


multidimensional data and operations.

 Top-Tier − This tier is the front-end client layer. This layer holds the query tools and
reporting tools, analysis tools and data mining tools.
The following diagram depicts the three-tier architecture of data warehouse –
b)

OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will
discuss OLAP operations in multidimensional data.

Here is the list of OLAP operations −

 Roll-up

 Drill-down

 Slice and dice

 Pivot (rotate)
Roll-up
Roll-up performs aggregation on a data cube in any of the following ways −

 By climbing up a concept hierarchy for a dimension

 By dimension reduction

The following diagram illustrates how roll-up works.

 Roll-up is performed by climbing up a concept hierarchy for the


dimension location.
 Initially the concept hierarchy was "street < city < province < country".

 On rolling up, the data is aggregated by ascending the location hierarchy


from the level of city to the level of country.

 The data is grouped into cities rather than countries.

 When roll-up is performed, one or more dimensions from the data cube
are removed.

Slice
The slice operation selects one particular dimension from a given cube and
provides a new sub-cube. Consider the following diagram that shows how
slice works.

 Here Slice is performed for the dimension "time" using the criterion time
= "Q1".

 It will form a new sub-cube by selecting one or more dimensions.

Dice
Dice selects two or more dimensions from a given cube and provides a new
sub-cube. Consider the following diagram that shows the dice operation.
The dice operation on the cube based on the following selection criteria
involves three dimensions.

 (location = "Toronto" or "Vancouver")

 (time = "Q1" or "Q2")

 (item =" Mobile" or "Modem")

c)

Fact Table- Fact table contains the measurement along the attributes of a dimension table.

Dimension Table- Dimension table contains the attributes along which fact table calculates the metric.

Q5)

a)
b)

Method 1:

Simple matching – The dissimilarity between two objects i and j can be computed based on the ratio of
mismatches:

– m is the number of matches (i.e., the number of variables for which i and j are in the same state)

– p is the total number of variables.

Weights can be assigned to increase the effect of m or to assign greater weight to the matches in
variables having a larger number of states.

Example: Dissimilarity between categorical variable s

Suppose that we have the sample data

– where test-1 is categorical.

Let’s compute the dissimilarity the matrix

Since here we have one categorical variable, test-1, we set p = 1 in


So that d(i, j) evaluates to 0 if objects i and j match, and 1 if the objects differ. Thus,

Method 2: use a large number of binary variables

– creating a new asymmetric binary variable for each of the nominal states

– For an object with a given state value, the binary variable representing that state is set to 1, while the
remaining binary variables are set to 0.

– For example, to encode the categorical variable map _color, a binary variable can be created for each
of the five colors listed above.

– For an object having the color yellow, the yellow variable is set to 1, while the remaining four
variables are set to 0.

c)

Cosine similarity is a measure of similarity between two non-zero vectors of an inner


product space that measures the cosine of the angle between them. The cosine of 0° is 1,
and it is less than 1 for any other angle in the interval [0,0.5π). It is thus a judgment of
orientation and not magnitude: two vectors with the same orientation have a cosine
similarity of 1, two vectors oriented at 90° relative to each other have a similarity of 0, and
two vectors diametrically opposed have a similarity of -1, independent of their magnitude.
The cosine similarity is particularly used in positive space, where the outcome is neatly
bounded in The name derives from the term "direction cosine": in this case, unit vectors are
maximally "similar" if they're parallel and maximally "dissimilar" if
they're orthogonal (perpendicular). This is analogous to the cosine, which is unity
(maximum value) when the segments subtend a zero angle and zero (uncorrelated) when
the segments are perpendicular.
Q.6)

a)

d1: The sun in the sky is bright.

d2: We can see the shining sun, the bright sun.

The resulting vector shows that we have, in order, 0 occurrences of the


term “blue”, 1 occurrence of the term “sun”, and so on. In the , we have 0
occurences of the term “blue”, 2 occurrences of the term “sun”, etc.

But wait, since we have a collection of documents, now represented by


vectors, we can represent them as a matrix with shape, where is the
cardinality of the document space, or how many documents we have and
the is the number of features, in our case represented by the vocabulary
size. An example of the matrix representation of the vectors described above
is:

As you may have noted, these matrices representing the term frequencies
tend to be very sparse (with majority of terms zeroed), and that’s why you’ll
see a common representation of these matrix as sparse matrices.

b)

Dissimilarity between ordinal variables

Step 1: if we replace each value for test-2 by its rank, the four objects are assigned the ranks 3, 1, 2, and
3, respectively.

Step 2: normalizes the ranking by mapping rank 1 to 0.0, rank 2 to 0.5, and rank 3 to 1.0.

Step 3: we can use, say, the Euclidean distance, which results in the following dissimilarity matrix:
c)

 Data matrix
o n data points with p dimensions
o Two modes
 Dissimilarity matrix
o n data points, but registers only the distance
o A triangular matrix
o Single mode
Total No. of Questions : 8] SEAT No. :

8
23
P3337 [5461]-597
[Total No. of Pages : 3

ic-
B.E. (Computer Engineering)

tat
8s
DATA MINING AND WAREHOUSING

8:5
(2015 Course) (Semester - I) (End Sem.) (410244D)

01 91
3:0
Time : 2½ Hours] [Max. Marks : 70

0
81
8/1 13
Instructions to the candidates:
1) Answer Q1 or Q2, Q3 or Q4, Q5 or Q6, Q7 or Q8.
0
2) Assume suitable data if necessary.
2/2
.23 GP

3) Neat diagrams must be drawn wherever necessary.


4) Figures to the right indicate full marks.
E
80

8
C

23
ic-
Q1) a) For the given attribute AGE values : 16, 16, 180, 4, 12, 24, 26, 28, apply
16

tat
following Binning technique for smoothing the noise. [6]
8.2

8s
i) Bin Medians
.24

8:5
91
ii) Bin Boundaries
49

3:0
iii) Bin Means
30
81

b) Differentiate between Star schema and Snowflake schema. [6]


01
01
2/2

c) Calculate the Jaccard coefficient between Ram and Hari assuming that
GP

all binary attributes are a symmetric and for each pair values for an
8/1

attribute, first one is more frequent than the second. [8]


CE
80

8
23
Object Gender Food Caste Education Hobby Job
.23

Hari M(1) V(1) M(0) L(1) C(0) N ic-


16

tat
(0)
8.2

8s

Ram M(1) N(0) M(0) I(0) T(1) N


.24

8:5
91

(0)
49

3:0
30

Tomi F(0) N(0) H(1) L(1) C(0) Y


81

(1)
01
01

OR
2/2
GP

Q2) a) Explain following attribute types with example. [6]


8/1
CE

i) Ordinal
80
.23

ii) Binary
16

iii) Nominal
8.2

b) Differentiate between OLTP and OLAP with example. [6]


.24

P.T.O.
49
c) Calculate the Euclidean distance matrix for given Data points. [8]

8
point x y

23
ic-
p1 0 2

tat
p2 2 0

8s
p3 3 1

8:5
p4 5 1

01 91
3:0
0
Q3) a) A database has 6 transactions. Let minimum support = 60% and Minimum

81
8/1 13
confidence = 70% [8]
0
Transaction ID Items Bought
2/2
.23 GP

T1 {A, B, C, E}
T2 {A, C, D, E}
E
80

8
T3 {B, C, E}
C

23
T4 {A, C, D, E}

ic-
16

tat
T5 {C, D, E}
8.2

8s
T6 {A, D, E}
.24

8:5
i) Find Closed frequent Itemsets
91
49

ii) Find Maximal frequent itemsets


3:0
30

iii) Design FP Tree using FP growth algorithm


81

b) Explain with example Multi level and Constraint based association Rule
01
01

mining. [5]
2/2
GP

c) How can we improve the efficiency of a-priori algorithm. [4]


8/1

OR
CE
80

8
Q4) a) Consider the Market basket transactions shown below. Assuming the

23
.23

minimum support = 50% and Minimum confidence = 80% [8]


ic-
16

i) Find all frequent item sets using Apriori algorithm


tat
8.2

ii) Find all association rules using Apriori algorithm


8s
.24

Transaction ID Items Bought


8:5
91

T1 {Mango, Apple, Banana, Dates}


49

3:0

T2 {Apple, Dates, Coconut, Banana, Fig}


30
81

T3 {Apple, Coconut, Banana, Fig}


01
01

T4 {Apple, Banana, Dates}


2/2
GP

b) Explain FP growth algorithm with example. [5]


8/1

c) Explain following measures used in association Rule mining [4]


CE
80

i) Minimum Support
.23

ii) Minimum Confidence


16

iii) Support
8.2

iv) Confidence
.24

[5461]-597 2
49
Q5) a) Explain the training and testing phase using Decision Tree in detail.

8
Support your answer with relevant example. [8]

23
ic-
b) Apply KNN algorithm to find class of new tissue paper (X1 = 3,

tat
X2 = 7). Assume K = 3 [5 ]

8s
X1 = Acid Durability (secs) X2 = Strength(kg/sq.meter) Y = Classification

8:5
01 91
7 7 Bad

3:0
0
7 4 Bad

81
3
8/1 13 4 Good
0
2/2
1 4 Good
.23 GP

c) Explain the use of regression model in prediction of real estate prices.[4]


E
80

8
OR
C

23
Q6) a) What is Bayesian Belief Network. Elaborate the training process of a

ic-
16

Bayesian Belief Network with suitable example. [8]

tat
8.2

8s
b) Explain K-nearest neighbor classifier algorithm with suitable application.
.24

8:5
[5]
91
49

3:0
c) Elaborate on Associative Classification with appropriate applications.[4]
30
81
01
01

Q7) a) Discuss the Sequential Covering algorithm in detail. [8]


2/2
GP

b) Explain following measures for evaluating classifier accuracy [4]


8/1

i) Specificity
CE
80

8
23
ii) Sensitivity
.23

c) ic-
Differentiate between Wholistic learning and Multi perspective learning.[4]
16

tat
8.2

8s

OR
.24

8:5

Q8) a) How is the performance of Classifiers algorithms evaluated. Discuss in


91
49

3:0

detail. [8]
30
81

b) Discuss Reinforcement learning relevance and its applications in real


01

time environment. [4]


01
2/2

c) Explain following measures for evaluating classifier accuracy [4]


GP
8/1

i) Recall
CE
80

ii) Precision
.23
16


8.2
.24

[5461]-597 3
49
Subject – 5

Elective II- Mobile Communication (410245(D))


B. E. (Odd Semester), Session 2019-2020
Scheme, Syllabus and Evaluation Guidelines, Of “Mobile
Communication 410245(D)”

Course Course Name Lectures Assigned


Code
410245(D) Mobile Communication Theory Practical Tutorial Total
3 - - 3
Mobile Communication
Course Contents

Unit I: Introduction to Cellular Networks 08 Hours


Cell phone generation-1G to 5G, Personal Communication System (PCS), PCS Architecture,
Mobile Station,, SIM, Base Station, Base Station Controller, Mobile Switching Center, MSC
Gateways, HLR and VLR, AuC/EIR/OSS, Radio Spectrum, Free Space Path Loss, S/N Ratio, Line
of sight transmission, Length of Antenna, Fading in Mobile Environment.

Unit II: Cellular Network Design 08 Hours


Performance Criterion, Handoff/Hanover, Frequency Reuse, Co-channel Interference and System
Capacity, Channel Planning, Cell Splitting, Mobility Management in GSM and CDMA.

Unit III: Medium Access Control 08 Hours


Specialized MAC, SDMA, FDMA, TDMA, CDMA, Frequency Hopping Spread Spectrum (FHSS),
Direct Sequence Spread Spectrum (DSSS), GMSK Modulation, 8PSK, 64 QAM, 128 QAM and
OFDM

Unit IV: GSM 08 Hours


GSM – Architecture, GSM Identifiers, Spectrum allocation, Physical and Logical Traffic and
Control channels, GSM Bursts, GSM Frame, GSM Speech Encoding and decoding, Location
Update, Incoming and Outgoing Call setup, GPRS.

Unit V: Current 3G and 4G Technologies for GSM and CDMA 08 Hours


EDGE, W-CDMA: Wideband CDMA, CDMA2000, UMTS, HSPA (High Speed Packet Access),
HSDPA, HSUPA, HSPA+, LTE (E-UTRA) 3GPP2 family CDMA2000 1x, 1xRTT, EV-DO
(Evolution-Data Optimized), Long Term Evolution (LTE) in 4G.

Unit VI: Advances in Mobile Technologies 08 Hours


5GAA (Autonomous Automation), Millimetre Wave, URLLC, LTEA (Advanced), LTE based
MULTIFIRE, Virtual Reality, Augmented Reality.

Books:

Text:
 Jochen Schiller, “Mobile Communications”, Pearson Education, Second Edition, 2004,
ISBN: 13: 978-8131724262.
 Jason Yi-Bing Lin, Yi-Bing Lin, Imrich Chlamtac, “Wireless and Mobile network
Architecture”, 2005, Wiley Publication, ISBN: 978812651560.
 Martin Sauter, “3G, 4G and Beyond: Bringing Networks, Devices and the Web Together”,
2012, ISBN-13: 978-1118341483

References:

 Theodore S Rappaport, “Wireless Communications – Principles and Practice” , Pearson


Education India, Second Edition, 2010, ISBN: 978-81-317-3186-4.
 Lee and Kappal, “Mobile Communication Engineering”, Mc Graw Hill.
 William Stallings, “Wireless Communication and Networks”, Prentice Hall, Second Edition,
2014, ISBN: 978-0131918351
Evaluation Guidelines:

Internal Assessment (IA): [CT (20 Marks) + TA/AT(10 Marks)]

Class Test (CT) [20 marks]:- Three class tests, 20 marks each, will be conducted in a semester and
out of these three, the average of best two will be selected for calculation of class test marks.
Format of question paper is same as university.

TA [5 marks]: Three/four assignments will be conducted in the semester. Teacher assessment will
be calculated on the basis of performance in assignments, class test and pre-university test.

Attendance (AT) [5 marks]: Attendance marks will be given as per university policy.

Paper pattern and marks distribution for Class tests:

 Question paper will comprise of 2 Section A and B.


 Section A contains 5 MCQ’s of 1 marks each. All questions are compulsory. (Total 5
Marks)
 Section B contains 4 questions of 5 marks each. Attempt any 3 questions. (Total 15 Marks)

In Semester Examination [30 Marks]

Paper pattern and marks distribution for PUT: Same as End semester exam.

Prelim Examination [50 Marks]:

Paper pattern and marks distribution for Prelim Exam: Same as End Semester Exam.

End Semester Examination [70 Marks]:Paper pattern and marks distribution for End Semester
Exam: As per university guidelines.
Lecture Plan
Mobile Communication

1 Unit I: Introduction to Cellular Networks


Cell phone generation-1G to 5G, Personal Communication System (PCS)
2 PCS Architecture, Mobile Station, SIM, Base Station
3 Base Station Controller, Mobile Switching Center
4 MSC Gateways, HLR and VLR
5 AuC/EIR/OSS, Radio Spectrum
6 Free Space Path Loss, S/N Ratio
7 Line of sight transmission, Length of Antenna
8 Fading in Mobile Environment
9 Unit II: Cellular Network Design
Performance Criterion
10 Handoff/Hanover
11 Frequency Reuse
12 Co-channel Interference and System Capacity
13 Channel Planning
14 Cell Splitting
15 Mobility Management in GSM
16 Mobility Management in CDMA
17 Unit III: Medium Access Control
Specialized MAC, SDMA, FDMA
18 TDMA, CDMA
19 Frequency Hopping Spread Spectrum (FHSS)
20 Direct Sequence Spread Spectrum (DSSS)
21 GMSK Modulation
22 8PSK
23 64 QAM
24 128 QAM and OFDM
25 Unit IV: GSM
GSM – Architecture, GSM Identifiers
26 Spectrum allocation
27 Physical and Logical Traffic and Control channels
28 GSM Bursts, GSM Frame
29 GSM Speech Encoding and decoding
30 Location Update
31 Incoming and Outgoing Call setup
32 GPRS
33 Unit V: Current 3G and 4G Technologies for GSM and CDMA
EDGE, W-CDMA
34 Wideband CDMA, CDMA2000
35 UMTS
36 HSPA (High Speed Packet Access)
37 HSDPA, HSUPA, HSPA+
38 LTE (E-UTRA) 3GPP2 family CDMA2000 1x
39 1xRTT, EV-DO (Evolution-Data Optimized)
40 Long Term Evolution (LTE) in 4G
41 Unit VI: Advances in Mobile Technologies
42 5GAA (Autonomous Automation)
43 Millimetre Wave
44 URLLC
45 LTEA (Advanced)
46 LTE based MULTIFIRE
47 Virtual Reality
48 Augmented Reality
Course Delivery, Objectives, Outcomes
Mobile Communication
Semester- 6

Course Delivery:
The course will be delivered through lectures, assignment/tutorial sessions, classroom
interaction, and presentations.

Course Objectives:
 To understand the Personal Communication Services.
 To learn the design parameters for setting up mobile network.
 To know GSM architecture and support services.
 To learn current technologies being used on field.

Course Outcomes:
On completion of the course, student will be able to–
CO1: Justify the Mobile Network performance parameters and design decisions.
CO2: Choose the modulation technique for setting up mobile network.
CO3: Formulate GSM/CDMA mobile network layout considering futuristic requirements which
conforms to the technology.
CO4: Use the 3G/4G technology based network with bandwidth capacity planning.
CO5: Percept to the requirements of next generation mobile network and mobile applications.

CO-PO Mapping:
Course PO PO PO PO PO PO PO PO PO PO1 PO1 PO12
1 2 3 4 5 6 7 8 9 0 1

Outcome
s
CO1 1 3 1 1 1 - 1 - - - 1 1
CO2 1 2 - 3 3 - - - - - 2 2
CO3 1 2 2 3 2 - - - - - 1 2
CO4 1 1 - 1 3 - 1 1
CO5 1 3 - 2 2 2 2 3
Justification of CO - PO Mapping

CO1 WITH PO1 According to CO1 students get basic knowledge of


Mobile Network performance parameters and design
decisions. So it is substantially correlated to PO2.
Also it is little correlated to
PO1,PO3,PO4,PO5,PO7,PO11 & PO12.
CO2 WITH PO2 According to CO2 students get basic knowledge of
the modulation technique for setting up mobile net-
work. It is substantially correlated to PO4, PO5 .
Also it is moderately correlated to PO2, PO11, PO12
and a little to PO1.
CO3 WITH PO3 According to CO3 students get knowledge of GSM/
CDMA mobile network layout considering futuristic
requirements which conforms to the technology. It
is substantially correlated to PO4. Also it is moder-
ately correlated to PO2, PO3, PO5, PO12 and a little
to PO1 & PO11.
CO4 WITH PO4 According to CO4 students get knowledge of the 3G/
4G technology. It is substantially correlated to PO5
and a little to PO1, PO2, PO4, PO11 & PO12.
CO5 WITH PO5 According to CO4 students get knowledge of the
requirements of next generation mobile network and
mobile applications. It is substantially correlated to
PO2 & PO12. Also it is moderately correlated to
PO4, PO5, PO6, PO11 and little to PO1.
Question Bank
Mobile Communication 410245(D)

Unit-I: Introduction to Cellular Networks


Q.1 Explain Cell phone generation-1G to 5G?
Q.2 Explain Personal Communication System (PCS)?
Q.3 Draw & Explain PCS Architecture?
Q.4 Short Note on MS, BS, BST, MSC?
Q.5 Explain HLR and VLR?

Unit-II: Cellular Network Design


Q.1 Explain Performance Criterion of Cellular Network Design?
Q.2 Explain Handoff/Hanover?
Q.3 Draw & Explain Mobility Management in GSM?
Q.4 Short Note on Channel Planning, & Cell Splitting?
Q.5 Explain Frequency Reuse in Cellular Network?

Unit-III: Medium Access Control


Q.1 Explain Frequency Hopping Spread Spectrum?
Q.2 Explain Direct Sequence Spread Spectrum?
Q.3 Draw & Explain GMSK Modulation?
Q.4 Short Note on SDMA, & FDMA?
Q.5 Short Note on TDMA, & CDMA?

Unit-IV: GSM
Q.1 Explain Incoming and Outgoing Call setup?
Q.2 Draw & Explain GPRS Architecture?
Q.3 Draw & Explain GSM Architecture?
Q.4 Short Note on GSM Bursts & GSM Frame?
Q.5 Explain Physical and Logical Traffic?
Unit-V: Current 3G and 4G Technologies for GSM and CDMA
Q.1 Explain 1xRTT, EV-DO?
Q.2 Explain High Speed Packet Access?
Q.3 Draw & Explain W-CDMA Architecture?
Q.4 Short Note on
1. HSDPA
2. HSUPA
3. HSPA+
Q.5 Explain Long Term Evolution (LTE) in 4G?

Unit–VI: Advances in Mobile Technologies


Q.1 Explain 5GAA (Autonomous Automation)?
Q.2 Explain Virtual Reality in Mobile Technologies?
Q.3 Explain Augmented Reality in Mobile Technologies ?
Q.4 Short Note on URLLC & LTEA ?
Q.5 Explain LTE based MULTIFIRE ?
Assignments
Mobile Communication 410245(D)

Assignment No.-01 (AY 2018-19 SEM-II)


Unit-I: Introduction to Cellular Networks & Unit-II: Cellular Network Design

Branch: Computer Engineering


Class: BE Computer
Subject: ELE-II Mobile Communication 410245(D) Max. Marks: 20

Q. Questions Max. Unit. CO Bloom's


No. Marks No. as mapped Taxonomy
per Level
syllabus
Section A
Q.1 Explain PCS Architecture? 04 01 01 2
Q.2 Explain HLR & VLR? 04 01 01 2
Q.3 Short note on MS? 02 01 01 1
Q.4 Explain Handoff/Hanover? 04 02 01 1
Q.5 Explain Frequency Reuse in Cellular Network? 04 02 01 2
Q.6 Short Note on Cell Splitting? 02 02 01 1

Solution

Q.1 Explain PCS Architecture? 04 01 01 2


Ans:

Fig. PCS Architecture

A personal communications service (PCS) is a type of wireless mobile service with advanced
coverage and that delivers services at a more personal level. It generally refers to the modern
mobile communication that boosts the capabilities of conventional cellular networks and fixed-line
telephony networks as well.
PCS is also known as digital cellular.
A PCS works similarly to a cellular network in basic operations, but requires more service provider
infrastructure to cover a wider geographical area. A PCS generally includes the following:

 Wireless communication (data, voice and video)


 Mobile PBX
 Paging and texting
 Wireless radio
 Personal communication networks
 Satellite communication systems, etc.

PCS has three broad categories: narrowband, broadband and unlicensed. TDMA, CDMA and GSM,
and 2G, 3G and 4G are some of the common technologies that are used to deliver a PCS.

Q.2 Explain HLR & VLR? 04 01 01 2


Ans:
a) HLR (Home location register):- The HLR is a data base that permanently stores data related to
a given set of subscribers. The HLR is the reference database for subscriber parameters. Various
identification numbers and addresses as well as authentication parameters, services subscribed, and
special routing information are stored. Current subscriber status, including a subscriber’s temporary
roaming number and associated VLR if the mobile is roaming, is maintained. The HLR provides
data needed to route calls to all MS-SIMs home based in its MSC area, even when they are roaming
out of area or in other GSM networks. The HLR provides the current location data needed to
support searching for and paging the MS-SIM for incoming calls, wherever the MS-SIM may be.
The HLR is responsible for storage and provision of SIM authentication and encryption parameters
needed by the MSC where the MS-SIM is operating. It obtains these parameters from the AUC. The
HLR maintains records of which supplementary services each user has subscribed to and provides
permission control in granting access to these services.
The HLR stores the identifications of SMS gateways that have messages for the subscriber
under the SMS until they can be transmitted to the subscriber and receipt is acknowledged. The
HLR provides receipt and forwarding to the billing center of charging information for its home
subscribers, even when that information comes from other PLMNs while the home subscribers are
roaming. Based on the above functions, different types of data are stored in HLR.
Some data are permanent; that is, they are modified only for administrative reasons, while
others are temporary and modified automatically by other network entities depending on the
movements and actions performed by the subscriber. Some data are mandatory, other data are
optional. Both the HLR and the VLR can be implemented in the same equipment in an MSC
(collocated). A PLMN may contain one or several HLRs. The permanent data stored in an HLR
includes the following.
 IMSI: It identifies unambiguously the MS in the whole GSM system;
 International MS ISDN number: It is the directory number of the mobile station;
 MS category specifies whether a MS is a pay phone or not;
 Roaming restriction (allowed or not);
 Closed user group (CUG) membership data;
 Supplementary services related parameters: Forwarded-to number, registration status, no
reply condition timer, call barring password, activation status, supplementary services check
flag;
 Authentication key, which is used in the security procedure and especially to authenticate
the declared identity of a MS.

The temporary data consists of the following.


 LMSI (Local MS identity);
 RANDISRES and Kc; data related to authentication and ciphering;
 MSRN;
 VLR address, which identifies the VLR currently handling the MS;
 MSC address, which identifies the MSC area where the MS is registered;
 Roaming restriction;
 Messages waiting data (used for SMS);
 RAND/SRES and ciphering key, that is, data related to authentication and ciphering;

The permanent data associated with the mobile are those that do not change as it moves
from one area to another. On the other hand, temporary data changes from call to call. The HLR
interacts with MSCs mainly for the procedures of interrogation for routing calls to a MS and to
transfer charging information after call termination. Location registration is performed by HLR.
When the subscriber changes the VLR area, the HLR is informed about the address of the actual
VLR. The HLR updates the new VLR with all relevant subscriber data. Similarly, location
canceling is done by HLR. After the subscriber roams to a different VLR area, the HLR updates the
new VLR with all the relevant subscriber data. Supplementary services are add-ons to the basic
service. These parameters need not all be stored in the HLR. However, it is safer to store all
subscription parameters in the HLR even when some are stored in a subscriber card. The data stored
in the HLR is changed only by MMI action when new subscribers are added, old subscribers are
deleted, or the specific services to which they subscribe are changed and not dynamically updated
by the system.

b) VLR (Visitor location register):- A MS roaming in an MSC area is controlled by the VLR
responsible for that area. When a MS appears in a LA, it starts a registration procedure. The MSC
for that area notices this registration and transfers to the VLR the identity of the LA where the MS
is situated. A VLR may be in charge of one or several MSC LAs. The VLR constitutes the database
that supports the MSC in the storage and retrieval of the data of subscribers present in its area.
When an MS enters the MSC area borders, it signals its arrival to the MSC that stores its identity in
the VLR. The information necessary to manage the MS is contained in the HLR and is transferred
to the VLR so that they can be easily retrieved if so required.
The location registration procedure allows the subscriber data to follow the movements of the MS.
For such reasons the data contained in the VLR and in the HLR are more or less the same.
Nevertheless, the data are present in the VLR only as long as the MS is registered in the area related
to that VLR. The terms permanent and temporary, in this case, are meaningful only during that time
interval when the mobile is in the area of local MSCNLR combination. The data contained in the
VLR can be compared with the subscriber-related data contained in a normal fixed exchange; the
location information can be compared with the line equipment reference attached to each fixed
subscriber connected to that exchange. The VLR is responsible for assigning a new TMSI number
to the subscriber. It also relays the ciphering key from HLR to BSS.
Cells in the PLMN are grouped into geographic areas, and each is assigned a LAI, as shown in
Figure 2.2(c). Each VLR controls a certain set of LAs. When a mobile subscriber roams from one
LA to another, their current location is automatically updated in their VLR. If the old and new LAs
are under the control of two different VLRs, the entry on the old VLR is deleted and an entry is
created in the new VLR by copying the basic data from the HLR. The subscriber's current VLR
address, stored at the HLR, is also updated. This provides the information necessary to complete
calls to roaming mobiles. The VLR supports a mobile paging and tracking subsystem in the local
area where the mobile is presently roaming. The detailed functions of VLR are as follows.
 Works with the HLR and AUC on authentication;
 Relays cipher key from HLR to BSS for encryption decryption;
 Controls allocation of new TMSI numbers; a subscriber's TMSI number can be periodically
changed to secure a subscriber's identity;
 Supports paging;
 Tracks state of all MSs in its area.

Data Stored in VLR


The VLR constitutes the database that supports the MSC in the storage and retrieval of the data of
subscribers present in its area. When an MS enters the MSC area borders, it signals its arrival to the
MSC that stores its identity in the VLR. The information necessary to manage the MS in whichever
type of call it may attempt is contained in the HLR and is transferred to the VLR so that they can be
easily retrieved if so required (location registration). This procedure allows the subscriber data to
follow the movements of the MS [2,3,7,8]. For such reasons the data contained in the VLR and in
the HLR are more or less the same. Nevertheless the data are present in the VLR only as long as the
MS is registered in the area related to that VLR. Data associated with the movement of mobile are
IMSI, MSISDN, MSRN, and TMSI. The terms permanent and temporary, in this case, are
meaningful only during that time interval. Some data are mandatory, others are optional. Data
stored
in VLR are as follows.
 The IMSI;
 The MSISDN;
 The MSRN, which is allocated to the MS either when the station is registered in an MSC
area or on a per-call basis and is used to route the incoming calls to that station;
 The TMSI;
 The LA where the MS has been registered, which will be used to call the station;
 Supplementary service parameters;
 MS category;
 Authentication key, query and response obtained from AUC;
 ID of the current MSC.

Q.3 Short note on MS? 02 01 01 1


Ans:
Mobile Station: The MS includes radio equipment and the man machine interface (MMI) that a
subscriber needs in order to access the services provided by the GSM PLMN. MSs can be installed
in vehicles or can be portable or handheld stations. The MS may include provisions for data
communication as well as voice. A mobile transmits and receives messages to and from the GSM
system over the air interface to establish and continue connections through the system.
Functions of MS
The primary functions of MS are to transmit and receive voice and data over the air interface of the
GSM system. MS performs the signal processing functions of digitizing, encoding, error protecting,
encrypting, and modulating the transmitted signals. It also performs the inverse functions on the
received signals from the BS.
functions includes the following.
 Voice and data transmission;
 Frequency and time synchronization;
 Monitoring of power and signal quality of the surrounding cells for optimum handover;
 Provision of location updates;
 Equalization of multipath distortions;

Q.4 Explain Handoff/Hanover? 04 02 01 1


Ans:
Although the concept of cellular handover or cellular handoff is relatively straightforward, it is not
an easy process to implement in reality. The cellular network needs to decide when handover or
handoff is necessary, and to which cell. Also when the handover occurs it is necessary to re-route
the call to the relevant base station along with changing the communication between the mobile and
the base station to a new channel. All of this needs to be undertaken without any noticeable
interruption to the call. The process is quite complicated, and in early systems calls were often lost
if the process did not work correctly.

Different cellular standards handle hand over / handoff in slightly different ways. Therefore for the
sake of an explanation the example of the way that GSM handles handover is given.
There are a number of parameters that need to be known to determine whether a handover is
required. The signal strength of the base station with which communication is being made, along
with the signal strengths of the surrounding stations. Additionally the availability of channels also
needs to be known. The mobile is obviously best suited to monitor the strength of the base stations,
but only the cellular network knows the status of channel availability and the network makes the
decision about when the handover is to take place and to which channel of which cell.
Types of handover / handoff
With the advent of CDMA systems where the same channels can be used by several mobiles, and
where it is possible to adjacent cells or cell sectors to use the same frequency channel there are a
number of different types of handover that can be performed:
 Hard handover (hard handoff)
 Soft handover (soft handoff)

Fig:-Types of Handover
Hard handover

The definition of a hard handover or handoff is one where an existing connection must be broken
before the new one is established. One example of hard handover is when frequencies are changed.
As the mobile will normally only be able to transmit on one frequency at a time, the connection
must be broken before it can move to the new channel where the connection is re-established. This
is often termed and inter-frequency hard handover. While this is the most common form of hard
handoff, it is not the only one. It is also possible to have intra-frequency hard handovers where the
frequency channel remains the same.

Although there is generally a short break in transmission, this is normally short enough not to be
noticed by the user.

Soft handover

The new 3G technologies use CDMA where it is possible to have neighboring cells on the same fre-
quency and this opens the possibility of having a form of handover or handoff where it is not neces-
sary to break the connection. This is called soft handover or soft handoff, and it is defined as a han-
dover where a new connection is established before the old one is released. In UMTS most of the
handovers that are performed are intra-frequency soft handovers.

Q.5 Explain Frequency Reuse in Cellular Network? 04 02 01 2


Ans:
The cellular concept, frequencies allocated to the service are re-used in a regular pattern of areas,
called 'cells', each covered by one base station. In mobile-telephone nets these cells are usually
hexagonal. In radio broadcasting, a similar concept has been developed based on rhombic cells. To
ensure that the mutual interference between users remains below a harmful level, adjacent cells use
different frequencies. In fact, a set of C different frequencies {f1, ..., fC} are used for each cluster of
Cadjacent cells. Cluster patterns and the corresponding frequencies are re-used in a regular pattern
over the entire service area.
Cellular radio systems rely on an intelligent allocation and reuse of channels throughout a coverage
region. Each cellular base station is allocated a group of radio channels to be used within a small
geographic area called a cell. Base stations in adjacent cells are assigned channel groups which
contain completely different channels than neighboring cells. The base station antennas are
designed to achieve the desired coverage within the particular cell. By limiting the coverage area to
within the boundaries of a cell, the same group of channels may be used to cover different cells that
are separated from one another by distances large enough to keep interference levels within
tolerable limits. The design process of selecting and allocating channel groups for all of the cellular
base stations within a system is called frequency reuse or frequency planning.

Equation 1

S= kN

The N cells which collectively use the complete set of available frequencies is called a cluster. If a
cluster is replicated M times within the system, the total number of duplex channels, C, can be used
as a measure of capacity and is given by

Equation 2
C = MkN = MS
As seen from equation 2, the capacity of a cellular system is directly proportional to the number of
times a cluster is replicated in a fixed service area.
The factor N is called the cluster size and is typically equal to 4, 7, or 12. Also it describe the
number of cells in the cluster.
So there are two ways to increase the system capacity.
1. Increase cluster size N.
2. Increase the number of allocated channels to each cell.
A large cluster size indicates that the ratio between the cell radius and the distance between co-
channel cells is small.
And a small cluster size indicates that co-channel cells are located much closer together.
The value for N is a function of how much interference a mobile or base station can tolerate while
maintaining a sufficient quality of communications. From a design point of view, the smallest
possible value of N is desirable in order to maximize capacity over a given coverage area (i.e., to
maximize C in Equation 2)).
The frequency reuse factor of a cellular system is given by 1/N, since each cell within a cluster is
only assigned 1/N of the total available channels in the system.
Due to the fact that the hexagonal geometry exactly six equidistant neighbors and that the lines
joining the centers of any cell and each of its neighbors are separated by multiples of 60 degrees,
there are only certain cluster sizes and cell layouts which are possible. In order to connect without
gaps between adjacent cells—the geometry of hexagons is such that the number of cells per cluster,
N, can only have values which satisfy Equation.

Q.6 Short Note on Cell Splitting? 02 02 01 1


Ans:
Cell splitting is the process of subdividing a congested cell into smaller cells such that each
smaller cell has its own base station with Reduced antenna height and Reduced transmitter power. It
increases the capacity of a cellular system since number of times channels are reused increases.
Cell Sectorization. One way to increase to subscriber capacity of a cellular network is re-
place the omni-directional antenna at each base station by three (or six) sector antennas of 120 (or
60) degrees opening.
Both mitosis and meiosis, as already mentioned (two different types) could be used. More
correctly, I'd say Cytokinesis, the process after mitosis, where the cell is actually split.
The concept of Cell Splitting is quite self explanatory by its name itself. Cell splitting means to split
up cells into smaller cells. The process of cell splitting is used to expand the capacity (number of
channels) of a mobile communication system. As a network grows, a quite large number of mobile
users in an area come into picture.Consider the following scenario.
There are 100 people in a specific area. All of them owns a mobile phone (MS) and are quite
comfortable to communicate with each other. So, a provision for all of them to mutually
communicate must be made. As there are only 100 users, a single base station (BS) is built in the
middle of the area and all these users’ MS are connected to it. All these 100 users now come under
the coverage area of a single base station. This coverage area is called a cell. This is shown in Fig .
Fig 1. A single BS for 100 MS users

But now, as time passed by, the number of mobile users in the same area increased from 100 to 700.
Now if the same BS has to connect to these 700 users’ MS, obviously the BS will be overloaded. A
single BS, which served for 100 users is forced to serve for 700 users, which is impractical. To
reduce the load of this BS, we can use cell splitting. That is, we will divide the above single cell
into 7 separate adjacent cells, each having its own BS. This is shown in Fig .

Fig 2. Single cell split up into 7 cells

Now, let us look into the big picture. Until now, we have discussed about cell splitting in a
small area. Now, we use this same concept to deal with large networks. In a large network, it is not
necessary to split up all the cells in all the clusters. Certain BSes can handle the traffic well if their
cells (coverage areas) are split up. Only those cells must be ideal for cell splitting. Fig 2-3 shows
network architecture with a few number of cells split up into smaller cells, without affecting the
other cells in the network.

Fig 3. Cell Splitting.


Assignment No.-02 (AY 2018-19 SEM-II)
Unit-III: Medium Access Control & Unit-IV: GSM

Branch: Computer Engineering


Class: BE Computer
Subject: ELE-II Mobile Communication 410245(D) Max. Marks: 20

Q. Questions Max. Unit No. CO Bloom's


No. Marks as per mapped Taxonomy
Syllabus Level
Section A
Q.1 Explain SDMA, FDMA, TDMA & CDMA? 04 03 03 2
Q.2 Explain GSM Architecture with Digram? 04 03 03 2
Q.3 Define is Near and far terminals? 02 03 01 1
Q.4 Explain GSM Architecture? 04 04 03 2
Q.5 Explain different types of Frequency Channels in 04 04 03 2
GSM?
Q.6 Define GSM Burst? 02 04 03 1

Solution

Q.1 Explain SDMA, FDMA, TDMA & CDMA? 04 03 03 2


Ans:
SDMA:
Space Division Multiple Access (SDMA) is used for allocating a separated space to users in
wireless networks. A typical application involves assigning an optimal base station to a mobile
phone user. The mobile phone may receive several base stations with different quality. A MAC
algorithm could now decide which base station is best, taking into account which frequencies
(FDM), time slots (TDM) or code (CDM) are still available. The basis for the SDMA algorithm is
formed by cells and sectorized antennas which constitute the infrastructure implementing space
division multiplexing (SDM). SDM has the unique advantage of not requiring any multiplexing
equipment. It is usually combined with other multiplexing techniques to better utilize the individual
physical channels.

FDMA:
Frequency division multiplexing (FDM) describes schemes to subdivide the frequency dimension
into several non-overlapping frequency bands.
Frequency Division Multiple Access is a method employed to permit several users to transmit
simultaneously on one satellite transponder by assigning a specific frequency within the channel to
each user. Each conversation gets its own, unique, radio channel. The channels are relatively
narrow, usually 30 KHz or less and are defined as either transmit or receive channels. A full duplex
conversation requires a transmit & receive channel pair. FDM is often used for simultaneous access
to the medium by base station and mobile station in cellular networks establishing a duplex channel.
A scheme called frequency division duplexing (FDD) in which the two directions, mobile station to
base station and vice versa are now separated using different frequencies.

FDM for multiple access and duplex


The two frequencies are also known as uplink, i.e., from mobile station to base station or from
ground control to satellite, and as downlink, i.e., from base station to mobile station or from satellite
to ground control. The basic frequency allocation scheme for GSM is fixed and regulated by
national authorities. All uplinks use the band between 890.2 and 915 MHz, all downlinks use 935.2
to 960 MHz. According to FDMA, the base station, shown on the right side, allocates a certain
frequency for up- and downlink to establish a duplex channel with a mobile phone. Up- and
downlink have a fixed relation. If the uplink frequency is fu = 890 MHz + n·0.2 MHz, the downlink
frequency is fd = fu + 45 MHz, i.e., fd = 935 MHz + n·0.2 MHz for a certain channel n. The base
station selects the channel. Each channel (uplink and downlink) has a bandwidth of 200 kHz.
This scheme also has disadvantages. WhileA more flexible multiplexing scheme for typical mobile
communications is time division multiplexing (TDM). Compared to FDMA, time division multiple
access (TDMA) offers a much more flexible scheme, which comprises all technologies that allocate
certain time slots for communication. Now synchronization between sender and receiver has to be
achieved in the time domain. Again this can be done by using a fixed pattern similar to FDMA
techniques, i.e., allocating a certain time slot for a channel, or by using a dynamic allocation
scheme.

Listening to different frequencies at the same time is quite difficult, but listening to many channels
separated in time at the same frequency is simple. Fixed schemes do not need identification, but are
not as flexible considering varying bandwidth requirements.
his general scheme still wastes a lot of bandwidth. It is too static, too inflexible for data
communication. In this case, connectionless, demand-oriented TDMA schemes can be used

TDMA:
A more flexible multiplexing scheme for typical mobile communications is time division
multiplexing (TDM). Compared to FDMA, time division multiple access (TDMA) offers a much
more flexible scheme, which comprises all technologies that allocate certain time slots for
communication. Now synchronization between sender and receiver has to be achieved in the time
domain. Again this can be done by using a fixed pattern similar to FDMA techniques, i.e.,
allocating a certain time slot for a channel, or by using a dynamic allocation scheme.
Listening to different frequencies at the same time is quite difficult, but listening to many channels
separated in time at the same frequency is simple. Fixed schemes do not need identification, but are
not as flexible considering varying bandwidth requirements.

Fixed TDM :-
The simplest algorithm for using TDM is allocating time slots for channels in a fixed pattern. This
results in a fixed bandwidth and is the typical solution for wireless phone systems. MAC is quite
simple, as the only crucial factor is accessing the reserved time slot at the right moment. If this
synchronization is assured, each mobile station knows its turn and no interference will happen. The
fixed pattern can be assigned by the base station, where competition between different mobile
stations that want to access the medium is solved.

The above figure shows how these fixed TDM patterns are used to implement multiple access and a
duplex channel between a base station and mobile station. Assigning different slots for uplink and
downlink using the same frequency is called time division duplex (TDD). As shown in the figure,
the base station uses one out of 12 slots for the downlink, whereas the mobile station uses one out of
12 different slots for the uplink. Uplink and downlink are separated in time. Up to 12 different
mobile stations can use the same frequency without interference using this scheme. Each connection
is allotted its own up- and downlink pair. This general scheme still wastes a lot of bandwidth. It is
too static, too inflexible for data communication. In this case, connectionless, demand-oriented
TDMA schemes can be used
Classical Aloha :-
In this scheme, TDM is applied without controlling medium access. Here each station can access
the medium at any time as shown below:

This is a random access scheme, without a central arbiter controlling access and without
coordination among the stations. If two or more stations access the medium at the same time, a
collision occurs and the transmitted data is destroyed. Resolving this problem is left to higher layers
(e.g., retransmission of data). The simple Aloha works fine for a light load and does not require any
complicated access mechanisms.

Slotted Aloha:-
The first refinement of the classical Aloha scheme is provided by the introduction of time slots
(slotted Aloha). In this case, all senders have to be synchronized, transmission can only start at the
beginning of a time slot as shown below.

The introduction of slots raises the throughput from 18 per cent to 36 per cent, i.e., slotting doubles
the throughput. Both basic Aloha principles occur in many systems that implement distributed
access to a medium. Aloha systems work perfectly well under a light load, but they cannot give any
hard transmission guarantees, such as maximum delay before accessing the medium or minimum
throughput.

CDMA:
Code division multiple access systems apply codes with certain characteristics to the transmission
to separate different users in code space and to enable access to a shared medium without
interference.

All terminals send on the same frequency probably at the same time and can use the whole
bandwidth of the transmission channel. Each sender has a unique random number, the sender XORs
the signal with this random number. The receiver can “tune” into this signal if it knows the pseudo
random number, tuning is done via a correlation function
Disadvantages:
 higher complexity of a receiver (receiver cannot just listen into the medium and start
receiving if there is a signal)
 all signals should have the same strength at a receiver
Advantages:
 all terminals can use the same frequency, no planning needed
 huge code space (e.g. 232) compared to frequency space
 interferences (e.g. white noise) is not coded
 forward error correction and encryption can be easily integrated

Q.2 Explain GSM Architecture with Digram? 04 03 03 2


Ans:
GSM Architecture
MS-Mobile station BSS- Base station Subsystem BTS-Base Transceiver Station
BSC- Base Station Controller NSS- Network subsystem OSS-operation support system
HLR -Home Location Register VLR-Visitor Location Register AUC-Authentication center
MSC- Mobile switching center EIR- Equipment Identity Register

Mobile station- These are the users .Number of users are controlled by one BTS
1. The mobile stations (MS) communicate with the base station subsystem over the radio the radio interface.
2. The BSS called as radio the subsystem, provides and manages the radio transmission path between the
mobile stations and the Mobile Switching Centre(MSC).It also manages radio interface between the mobile
stations and other subsystems of GSM.
3. Each BSS comprises many Base Station Controllers(BSC) that connect the mobile station to the network
and switching subsystem (NSS) through the mobile switching center
4. The NSS controls the switching functions of the GSM system. It allows the mobile switching center to
communicate with networks like PSTN, ISDN, CSPDN, PSPDN and other data networks.
5. The operation support system (OSS) allows the operation and maintenance of the GSM system. It allows
the system engineers to diagnose, troubleshoot and observe the parameters of the GSM systems. The OSS
subsystem interacts with the other subsystems and is provided for the GSM operating company staff that
provides service facilities for the network.
Base station(BSS)-- The following stations subsystem comprises of two parts:
1. Base Transceiver Station (BTS).
2. Base Station Controller(BSC).
The BSS consists many BSC that connect to a single MSC. Each BSC controls up to several
hundred BTS.
Base Transceiver Station(BTS)-BTS
It has radio transreciever that define a cell and are capable of handling radio link protocols with
MS.
Functions of BTS are
1. Handling radio link protocols
2. Providing FD communication to MS.
3. Interleaving and de- interleaving.
Base station controller(BSC) IT manages radio resources for one or more BTS.It controls several
hundred BTS al are connected to single MSC.
Functions of BTS are
• To control BTS.
• Radio resource management
• Handoff management and control
• Radio channel setup and frequency hoping
Network subsystem( NSS)
1.It handles the switching of GSM calls between external networks and indoor BSC
2.It includes three different data bases for mobility management as
A .HLR (Home Location Register)
B .VLR (Visitor Location Register)
C. AUC (Authentication center)
Mobile switching center (MSC)--
It connects fix networks like ISDN ,PSTN etc.
Following are the functions of MSC
1. Call setup, supervision and relies
2. Collection Of Billing Information
3. Call handling / routing
4. Management of signaling protocol
5. Record of VLR and HLR
HLR (Home Location Register) - Call ramming and call routing capabilities of GSM are
handeled.It stores all the administrative information of sub scriber registered in the networks. IT
maintains unique international mobile subscriber identity.(IMSI).
VLR (Visitor Location Register) - It is a temporary data base. It stores the IMSC number and
customer information for each roaming customer visiting specific MSC.
Authentication center - It is protected database .It maintains authentication keys and algorithms. It
contain s a register called as Equipment Identity Register.
Operation subsystem(OSS) - IT manages all mobile equipment in the system
1)management for charging and billing procedure
2)To maintain all hardware and network operations
AuC:
The AuC database holds different algorithms that are used for authentication and encryptions of the
mobile subscribers that verify the mobile user’s identity and ensure the confidentiality of each call.
The AuC holds the authentication and encryption keys for all the subscribers in both the home and
visitor location register.
EIR:
The EIR is another database that keeps the information about the identity of mobile equipment such
the International mobile Equipment Identity (IMEI) that reveals the details about the manufacturer,
country of production, and device type. This information is used to prevent calls from being
misused, to prevent unauthorized or defective MSs, to report stolen mobile phones or check if the
mobile phone is operating according to the specification of its type.

Q.3 Define is Near and far terminals? 02 03 01 1


Ans:
Consider the situation shown below. A and B are both sending with the same transmission power.
 Signal strength decreases proportional to the square of the distance
 So, B’s signal drowns out A’s signal making C unable to receive A’s transmission
 If C is an arbiter for sending rights, B drown out A’s signal on the physical layer making C
unable to hear out A.
The near/far effect is a severe problem of wireless networks using CDM. All signals should arrive
at the receiver with more or less the same strength for which Precise power control is to be
implemented.

Q.4 Explain GSM Architecture? 04 04 03 2


Ans:
GSM Architecture

MS-Mobile station BSS- Base station Subsystem BTS-Base Transceiver Station


BSC- Base Station Controller NSS- Network subsystem OSS-operation support system
HLR -Home Location Register VLR-Visitor Location Register AUC-Authentication center
MSC- Mobile switching center EIR- Equipment Identity Register

Mobile station- These are the users .Number of users are controlled by one BTS
1. The mobile stations (MS) communicate with the base station subsystem over the radio the radio interface.
2. The BSS called as radio the subsystem, provides and manages the radio transmission path between the
mobile stations and the Mobile Switching Centre(MSC).It also manages radio interface between the mobile
stations and other subsystems of GSM.
3. Each BSS comprises many Base Station Controllers(BSC) that connect the mobile station to the network
and switching subsystem (NSS) through the mobile switching center
4. The NSS controls the switching functions of the GSM system. It allows the mobile switching center to
communicate with networks like PSTN, ISDN, CSPDN, PSPDN and other data networks.
5. The operation support system (OSS) allows the operation and maintenance of the GSM system. It allows
the system engineers to diagnose, troubleshoot and observe the parameters of the GSM systems. The OSS
subsystem interacts with the other subsystems and is provided for the GSM operating company staff that
provides service facilities for the network.
Base station(BSS)-- The following stations subsystem comprises of two parts:
1. Base Transceiver Station (BTS).
2. Base Station Controller(BSC).
The BSS consists many BSC that connect to a single MSC. Each BSC controls up to several
hundred BTS.
Base Transceiver Station(BTS)-BTS
It has radio transreciever that define a cell and are capable of handling radio link protocols with
MS.
Functions of BTS are
1. Handling radio link protocols
2. Providing FD communication to MS.
3. Interleaving and de- interleaving.
Base station controller(BSC) IT manages radio resources for one or more BTS.It controls several
hundred BTS al are connected to single MSC.
Functions of BTS are
• To control BTS.
• Radio resource management
• Handoff management and control
• Radio channel setup and frequency hoping
Network subsystem( NSS)
1.It handles the switching of GSM calls between external networks and indoor BSC
2.It includes three different data bases for mobility management as
A .HLR (Home Location Register)
B .VLR (Visitor Location Register)
C. AUC (Authentication center)
Mobile switching center (MSC)--
It connects fix networks like ISDN ,PSTN etc.
Following are the functions of MSC
1. Call setup, supervision and relies
2. Collection Of Billing Information
3. Call handling / routing
4. Management of signaling protocol
5. Record of VLR and HLR
HLR (Home Location Register) - Call ramming and call routing capabilities of GSM are
handeled.It stores all the administrative information of sub scriber registered in the networks. IT
maintains unique international mobile subscriber identity.(IMSI).
VLR (Visitor Location Register) - It is a temporary data base. It stores the IMSC number and
customer information for each roaming customer visiting specific MSC.
Authentication center - It is protected database .It maintains authentication keys and algorithms. It
contain s a register called as Equipment Identity Register.
Operation subsystem(OSS) - IT manages all mobile equipment in the system
1)management for charging and billing procedure
2)To maintain all hardware and network operations
AuC:
The AuC database holds different algorithms that are used for authentication and encryptions of the
mobile subscribers that verify the mobile user’s identity and ensure the confidentiality of each call.
The AuC holds the authentication and encryption keys for all the subscribers in both the home and
visitor location register.
EIR:
The EIR is another database that keeps the information about the identity of mobile equipment such
the International mobile Equipment Identity (IMEI) that reveals the details about the manufacturer,
country of production, and device type. This information is used to prevent calls from being
misused, to prevent unauthorized or defective MSs, to report stolen mobile phones or check if the
mobile phone is operating according to the specification of its type

White list:
This list contains the IMEI of the phones who are allowed to enter in the network.

Black list:
This list on the contrary contains the IMEI of the phones who are not allowed
to enter in the network, for example because they are stolen.

Grey list:
This list contains the IMEI of the phones momentarily not allowed to enter in the
network, for example because the software version is too old or because they are in
repair.

Q.5 Explain different types of Frequency Channels in 04 04 03 2


GSM?
Ans:
There are two main types of GSM channels viz. physical channel and logical channel. Physical
channel is specified by specific time slot/carrier frequency. Logical channel run over physical
channel i.e. logical channels are time multiplexed on physical channels; each physical channel(time
slot at one particular ARFCN) will have either 26 Frame MF(Multi-frame) or 51 Frame MF
structure describe here. logical channels are classified into traffic channel and control channel.
Traffic channel carry user data. Control channels are interspersed with traffic channels in well
specified ways.

Logical vs physical gsm channels


For example, every 26 TDMA frames a logical channel gets bandwidth in a physical channel.
Traffic channel are mainly of two types half rate and full rate traffic channels. There are various
control channels such as BCCH (Broadcast control channel), SCH (synchronous channel), FCCH
( Frequency control channel), DCCH(Dedicated control channel).
All these gsm channels help maintain GSM network and also helps GSM mobile phone connect to
GSM network and maintain the connection and help tear down the connection. Figure below
mention all the channels used in GSM.
Q.6 Define GSM Burst? 02 04 03 1

Ans:
The information contained in one time The information contained in one time slot on the
TDMA frame is call a slot on the TDMA frame is call a burst.
Five types of burst
1) Normal Burst (NB)
2) Frequency Correction Burst (FB)
3) Synchronization Burst (SB)
4) Access Burst (AB)
5) Dummy Burst
Assignment No.-03 (AY 2018-19 SEM-II)
Unit-V: Current 3G and 4G Technologies for GSM and CDMA &
Unit–VI: Advances in Mobile Technologies

Branch: Computer Engineering


Class: BE Computer
Subject: ELE-II Mobile Communication 410245(D) Max. Marks: 20

Q. Questions Max. Unit. CO Bloom's


No. Marks No. as mapped Taxonomy
per Level
syllabus
Section A
Q.1 Explain HSPA? 04 05 03 2
Q.2 Draw & Explain LTE in 4G Architecture? 04 05 04 2
Q.3 Short note on Edge Technology? 02 05 04 1
Q.4 Explain 5GAA? 04 06 04 2
Q.5 Explain LTE based MULTIFIRE. 04 06 05 2
Q.6 Compare Virtual Reality & Augmented Reality? 02 06 05 1

Solution

Q.1 Explain HSPA? 04 05 03 2


Ans:
HSPA - High Speed Packet Access

UMTS HSPA, High Speed Packet Access, combines HSDPA and HSUPA for uplink and
downlink to provide high speed data access.
3G HSPA, High Speed packet Access is the combination of two technologies, one of the
downlink and the other for the uplink that can be built onto the existing 3G UMTS or W-
CDMA technology to provide increased data transfer speeds.
The original 3G UMTS / W-CDMA standard provided a maximum download speed of 384
kbps.
With many users requiring much high data transfer speeds to compete with fixed line
broadband services and also to support services that require higher data rates, the need for an
increase in the speeds obtainable became necessary.
This resulted in the development of the technologies for 3G HSPA.

3G HSPA benefits
The UMTS cellular system as defined under the 3GPP Release 99 standard was orientated
more towards switched circuit operation and was not well suited to packet operation.
Additionally greater speeds were required by users than could be provided with the original
UMTS networks. Accordingly the changes required for HSPA were incorporated into many
UMTS networks to enable them to operate more in the manner required for current
applications.
HSPA provides a number of significant benefits that enable the new service to provide a far
better performance for the user. While 3G UMTS HSPA offers higher data transfer rates, this
is not the only benefit, as the system offers many other improvements as well:

1. Use of higher order modulation: 16QAM is used in the downlink instead of QPSK to
enable data to be transmitted at a higher rate. This provides for maximum data rates of 14
Mbps in the downlink. QPSK is still used in the uplink where data rates of up to 5.8 Mbps are
achieved. The data rates quoted are for raw data rates and do not include reductions in actual
payload data resulting from the protocol overheads.
2. Shorter Transmission Time Interval (TTI): The use of a shorter TTI reduces the round trip time
and enables improvements in adapting to fast channel variations and provides for reductions in latency.
3. Use of shared channel transmission: Sharing the resources enables greater levels of efficiency to be
achieved and integrates with IP and packet data concepts.
4. Use of link adaptation: By adapting the link it is possible to maximize the channel usage.
5. Fast Node B scheduling: The use of fast scheduling with adaptive coding and modulation (only
downlink) enables the system to respond to the varying radio channel and interference conditions and
to accommodate data traffic which tends to be "bursty" in nature.
6. Node B based Hybrid ARQ: This enables 3G HSPA to provide reduced retransmission
round trip times and it adds robustness to the system by allowing soft combining of
retransmissions.

For the network operator, the introduction of 3G HSPA technology brings a cost reduction
per bit carried as well as an increase in system capacity. With the increase in data traffic, and
operators looking to bring in increased revenue from data transmission, this is a particularly
attractive proposition. A further advantage of the introduction of 3G HSPA is that it can
often be rolled out by incorporating a software update into the system. This means its use
brings significant benefits to user and operator alike.

3G UMTS HSPA constituents


There are two main components to 3G UMTS HSPA, each addressing one of the links
between the base station and the user equipment, i.e. one for the uplink, and one for the
downlink.

Uplink and downlink transmission directions

The two technologies were released at different times through 3GPP. They also have
different properties resulting from the different modes of operation that are required. In view
of these facts they were often treated as almost separate entities. Now they are generally
rolled out together. The two technologies are summarised below:

 HSDPA - High Speed Downlink Packet Access: HSDPA provides packet data
support, reduced delays, and a peak raw data rate (i.e. over the air) of 14 Mbps. It also
provides around three times the capacity of the 3G UMTS technology defined in
Release 99 of the 3GPP UMTS standard. Read more about High speed downlink
packet access, HSDPA
 HSUPA - High Speed Uplink Packet Access: HSUPA provides improved uplink
packet support, reduced delays and a peak raw data rate of 5.74 Mbps. This results
in a capacity increase of around twice that provided by the Release 99 services.
Read more about High speed uplink packet access, HSUPA

Q.2 Draw & Explain LTE in 4G Architecture? 04 05 04 2


Ans:
LTE/4G

The high-level network architecture of LTE is comprised of following three main components:
1. The User Equipment (UE).
2. The Evolved UMTS Terrestrial Radio Access Network (E-UTRAN).
3. The Evolved Packet Core (EPC).
The evolved packet core communicates with packet data networks in the outside world such
as the internet, private corporate networks or the IP multimedia subsystem. The interfaces
between the different parts of the system are denoted Uu, S1 and SGi as shown below:

The User Equipment (UE)


The internal architecture of the user equipment for LTE is identical to the one used by
UMTS and GSM which is actually a Mobile Equipment (ME). The mobile equipment
comprised of the following important modules:
1. Mobile Termination (MT) : This handles all the communication functions.
2. Terminal Equipment (TE) : This terminates the data streams.
3. Universal Integrated Circuit Card (UICC) : This is also known as the SIM card for LTE
equipments. It runs an application known as the Universal Subscriber Identity Module (USIM).
A USIM stores user-specific data very similar to 3G SIM card. This keeps information
about the user's phone number, home network identity and security keys etc.

The E-UTRAN (The access network)


The architecture of evolved UMTS Terrestrial Radio Access Network (E- UTRAN) has
been illustrated below.
The E-UTRAN handles the radio communications between the mobile and the evolved packet core

and just has one component, the evolved base stations, called eNodeB or eNB. Each eNB is a
base station that controls the mobiles in one or more cells. The base station that is communicating
with a mobile is known as its serving eNB.
LTE Mobile communicates with just one base station and one cell at a time and there are
following two main functions supported by eNB:
 The eBN sends and receives radio transmissions to all the mobiles using the analogue and
digital signal processing functions of the LTE air interface.
 The eNB controls the low-level operation of all its mobiles, by sending them signalling
messages such as handover commands.
Each eBN connects with the EPC by means of the S1 interface and it can also be connected to nearby
base stations by the X2 interface, which is mainly used for signaling and packet forwarding during
handover.
A home eNB (HeNB) is a base station that has been purchased by a user to provide femtocell
coverage within the home. A home eNB belongs to a closed
subscriber group (CSG) and can only be accessed by mobiles with a USIM that also
belongs to the closed subscriber group.
The Evolved Packet Core (EPC) (The core network)
The architecture of Evolved Packet Core (EPC) has been illustrated below. There are few more
components which have not been shown in the diagram to keep it simple. These components are like
the Earthquake and Tsunami Warning System (ETWS), the Equipment Identity Register (EIR) and
Policy Control and Charging Rules Function (PCRF).
Below is a brief description of each of the components shown in the above architecture:
 The Home Subscriber Server (HSS) component has been carried forward from UMTS and

GSM and is a central database that contains information about all the network operator's
subscribers.
 The Packet Data Network (PDN) Gateway (P-GW) communicates with the outside world ie.
packet data networks PDN, using SGi interface. Each packet data network is identified by an
access point name (APN). The PDN gateway has the same role as the GPRS support node
(GGSN) and the serving GPRS support node (SGSN) with UMTS and GSM.
 The serving gateway (S-GW) acts as a router, and forwards data between the base station
and the PDN gateway.
 The mobility management entity (MME) controls the high-level operation of the mobile by
means of signalling messages and Home Subscriber Server (HSS).
 The Policy Control and Charging Rules Function (PCRF) is a component which is not
shown in the above diagram but it is responsible for policy
control decision-making, as well as for controlling the flow-based charging functionalities in
the Policy Control Enforcement Function (PCEF), which resides in the P-GW.
The interface between the serving and PDN gateways is known as S5/S8. This has two slightly
different implementations, namely S5 if the two devices are in the same network, and S8 if they are
in different networks.

Functional split between the E-UTRAN and the EPC


Following diagram shows the functional split between the E-UTRAN and the EPC for an
LTE network:

Q.3 Short note on Edge Technology? 02 05 04 1


Ans:
EDGE technology is an extended version of GSM. It allows the clear and fast transmission of data
and information. EDGE is also termed as IMT-SC or single carrier. EDGE technology was invented
and introduced by Cingular, which is now known as AT& T. EDGE is radio technology and is a
part of third generation technologies. EDGE technology is preferred over GSM due to its flexibility
to carry packet switch data and circuit switch data.
EDGE technology
EDGE is termed as backward compatible technology; backward compatible technology is that
technology which allows the input generation of older devices. EDGE technology is supported by
third generation partnership projects; this association helps and supports the up gradation of GSM,
EDGE technology and other related technologies. The frequency, capability and performance of
EDGE technology is more than the 2G GSM Technology. EDGE technology hold more
sophisticated coding and transmission of data. EDGE technology can help you connect to the
internet. This technology supports the packet switching system. EDGE develops a broadband
internet connection for its users. EDGE technology helps its users to exploit the multimedia services
.EDGE technology do not involve the expense of additional hardware and software technologies. It
only requires the base station to install EDGE technology transceiver. EDGE technology is an
improved technology which almost supports all the network vendors. All they have to do is to
upgrade their stations. EDGE technology has its edge because it can make use of both switch circuit
technology and packet circuit technology. EDGE technology is also believed to support EGPRS or
in other words enhanced general packet radio service. It is important to have GPRS network if one
wants to use EDGE technology because EDGE can not work without GSM Technology. Therefore
it is an extended version of GSM Technology.
Differences between GSM Technology and EDGE Technology
As an extended version to GSM Technology, EDGE is alike GSM. EDGE technology and GSM
technology have identical symbol rate, because EDGE does not require any supplementary hard-
ware and software .there is a minor difference in both technologies when it comes to modulation.
EDGE technology is three times faster than ordinary GPRS.EDGE provides three times faster ac-
cess to data as compared to GSM Technology alone.
Advantages of EDGE Technology
For the upper five modulation layers EDGE Technology uses 8 PSK. There is a three bit phase
carrier. This triples the gross data rate for GSM Technology. The travels in a rapid speed over the
network. This allows for the convenient and proficient flow of data. The access to the internet and
wireless access points is easy. The multimedia messages and short message services are carried
much faster as compared to the GSM. The data to the sender and receiver is transferred quickly.
EDGE technology has provided the viable growth to the internet and mobile phone operators and
subscribers. EDGE technology is the only technology that has provided the Mold Cell user to make
use of most advance multimedia services. This technology offers services as video downloads
netsurfing and browsing. The technology is much easier for the people to use, who used GPRS and
GSM at large. The latest mobile phone sets turn to the enhanced data rates for GSM Technology
easily.
The use of EDGE technology has augmented the use of black berry, N97 and N95 mobile phones.
EDGE transfers data in fewer seconds if we compare it with GPRS Technology. For example a
typical text file of 40KB is transferred in only 2 seconds as compared to the transfer from GPRS
technology, which is 6 seconds. The biggest advantage of using EDGE technology is one does not
need to install any additional hardware and software in order to make use of EDGE Technology.
There are no additional charges for exploiting this technology. If a person is an ex GPRS
Technology user he can utilize this technology without paying any additional charges.
Q.4 Explain 5GAA? 04 06 04 2
Ans:
The 5G Automotive Association (5GAA) is an international, global, cross-industry organization of
companies from the automotive, technology, and telecommunications industries. Its goal is, to de-
velop end-to-end solutions for future mobility and transportation services, so avoiding incompatibil-
ity problems from the beginning.
The 5G Automotive Association was created in September 2016, by Audi AG, BMW Group, Daim-
ler AG from the side of car makers, Ericsson, Huawei, Intel, Nokia, as producers of telecommunica-
tions equipment and Qualcomm as a firmware manufacturer [1] .[2] In 2017 the 5G Automotive As-
sociation signed a letter of intent with the European Automotive Telecom Alliance (EATA) for co-
operation.
The 5G Automotive Association works for the standardization needed for the implementation of
driverless, autonomous driving in cooperation with standards organizations such as ETSI, 3GPP and
SAE.
A second task is the information of the public on the emerging technology by demonstrating
feasibility and holding conferences on the topic.

Q.5 Explain LTE based MULTIFIRE. 04 06 05 2


Ans:
MulteFire is an LTE-based technology that operates standalone in unlicensed and shared spectrum,
including the global 5 GHz band. Based on 3GPP Release 13 and 14, MulteFire technology sup-
ports Listen-Before-Talk for fair co-existence with Wi-Fi and other technologies operating in the
same spectrum. It supports private LTE and neutral host deployment models. Target vertical mar-
kets include industrial IoT, enterprise, cable, and various other vertical markets.
The MulteFire Release 1.0 specification was developed by the MulteFire Alliance, an independent,
diverse and international member-driven consortium. Release 1.0 was published to MulteFire
Alliance members in January 2017 and was made publicly available in April 2017. The MulteFire
Alliance is currently working on Release 1.1 which will add further optimizations for IoT and new
spectrum bands.
According to Harbor Research in its published white paper, the market opportunity for private LTE
networks for industrial and commercial IoT will reach $118.5 billion in 2023. It also reported that
the total addressable revenue for Enterprise markets deploying private and neutral host LTE with
MulteFire will reach $5.7 billion by 2025.
The MulteFire Alliance has grown to more than 40 members. Its board members include Boingo
Wireless, CableLabs, Ericsson, Huawei, Intel, Nokia, Qualcomm and SoftBank. The organization is
open to any company with an interest in advancing LTE and cellular technology in unlicensed and
shared spectrum.
Q.6 Compare Virtual Reality & Augmented Reality? 02 06 05 1
Ans:
Augmented reality (AR) adds digital elements to a live view often by using the camera on a smart-
phone. Examples of augmented reality experiences include Snapchat lenses and the game Pokemon
Go.
Virtual reality (VR) implies a complete immersion experience that shuts out the physical world.
Using VR devices such as HTC Vive, Oculus Rift or Google Cardboard, users can be transported
into a number of real-world and imagined environments such as the middle of a squawking penguin
colony or even the back of a dragon.
Class Test Question Papers
Class Test 1
Class Test 1 Solution
Q.1 Draw & Explain PCS Architecture?
Ans:

Fig:- PCS Architecture

A personal communications service (PCS) is a type of wireless mobile service with advanced
coverage and that delivers services at a more personal level. It generally refers to the modern
mobile communication that boosts the capabilities of conventional cellular networks and fixed-line
telephony networks as well.
PCS is also known as digital cellular.
A PCS works similarly to a cellular network in basic operations, but requires more service provider
infrastructure to cover a wider geographical area. A PCS generally includes the following:

 Wireless communication (data, voice and video)


 Mobile PBX
 Paging and texting
 Wireless radio
 Personal communication networks
 Satellite communication systems, etc.

PCS has three broad categories: narrowband, broadband and unlicensed. TDMA, CDMA and GSM,
and 2G, 3G and 4G are some of the common technologies that are used to deliver a PCS.
Q.2 Short note on Cell phone generation-(1G to 5G)?
Ans:

1G - First Generation

This was the first generation of cell phone technology . The very first generation of commercial
cellular network was introduced in the late 70's with fully implemented standards being established
throughout the 80's. It was introduced in 1987 by Telecom (known today as Telstra), Australia
received its first cellular mobile phone network utilising a 1G analog system. 1G is an analog
technology and the phones generally had poor battery life and voice quality was large without much
security, and would sometimes experience dropped calls . These are the analog
telecommunications standards that were introduced in the 1980s and continued until being replaced
by 2G digital telecommunications. The maximum speed of 1G is 2.4 Kbps .

2G - Second Generation
Cell phones received their first major upgrade when they went from 1G to 2G. The main difference
between the two mobile telephone systems (1G and 2G), is that the radio signals used by 1G
network are analog, while 2G networks are digital . Main motive of this generation was to provide
secure and reliable communication channel. It implemented the concept of CDMA and GSM .
Provided small data service like sms and mms. Second generation 2G cellular telecom networks
were commercially launched on the GSM standard in Finland by Radiolinja (now part of Elisa Oyj)
in 1991. 2G capabilities are achieved by allowing multiple users on a single channel via
multiplexing. During 2G Cellular phones are used for data also along with voice. The advance in
technology from 1G to 2G introduced many of the fundamental services that we still use today,
such as SMS, internal roaming , conference calls, call hold and billing based on services e.g.
charges based on long distance calls and real time billing. The max speed of 2G with General
Packet Radio Service ( GPRS ) is 50 Kbps or 1 Mbps with Enhanced Data Rates for GSM
Evolution ( EDGE ). Before making the major leap from 2G to 3G wireless networks, the lesser-
known 2.5G and 2.75G was an interim standard that bridged the gap.

3G - Third Generation
This generation set the standards for most of the wireless technology we have come to know and
love. Web browsing, email, video downloading, picture sharing and other Smartphone technology
were introduced in the third generation. Introduced commercially in 2001, the goals set out for third
generation mobile communication were to facilitate greater voice and data capacity, support a wider
range of applications, and increase data transmission at a lower cost .
The 3G standard utilises a new technology called UMTS as its core network architecture -
Universal Mobile Telecommunications System. This network combines aspects of the 2G network
with some new technology and protocols to deliver a significantly faster data rate. Based on a set of
standards used for mobile devices and mobile telecommunications use services and networks that
comply with the International Mobile Telecommunications-2000 ( IMT-2000 ) specifications by
the International Telecommunication Union. One of requirements set by IMT-2000 was that speed
should be at least 200Kbps to call it as 3G service.
3G has Multimedia services support along with streaming are more popular. In 3G, Universal
access and portability across different device types are made possible (Telephones, PDA's, etc.). 3G
increased the efficiency of frequency spectrum by improving how audio is compressed during a
call, so more simultaneous calls can happen in the same frequency range. The UN's International
Telecommunications Union IMT-2000 standard requires stationary speeds of 2Mbps and mobile
speeds of 384kbps for a "true" 3G. The theoretical max speed for HSPA+ is 21.6 Mbps.
Like 2G, 3G evolved into 3.5G and 3.75G as more features were introduced in order to bring about
4G. A 3G phone cannot communicate through a 4G network , but newer generations of phones are
practically always designed to be backward compatible, so a 4G phone can communicate through a
3G or even 2G network .

4G - Fourth Generation
4G is a very different technology as compared to 3G and was made possible practically only
because of the advancements in the technology in the last 10 years. Its purpose is to provide high
speed , high quality and high capacity to users while improving security and lower the cost of voice
and data services, multimedia and internet over IP. Potential and current applications include
amended mobile web access, IP telephony , gaming services, high-definition mobile TV, video
conferencing, 3D television, and cloud computing.
The key technologies that have made this possible are MIMO (Multiple Input Multiple Output) and
OFDM (Orthogonal Frequency Division Multiplexing). The two important 4G standards are
WiMAX (has now fizzled out) and LTE (has seen widespread deployment). LTE (Long Term
Evolution) is a series of upgrades to existing UMTS technology and will be rolled out on Telstra's
existing 1800MHz frequency band. The max speed of a 4G network when the device is moving is
100 Mbps or 1 Gbps for low mobility communication like when stationary or walking, latency
reduced from around 300ms to less than 100ms, and significantly lower congestion. When 4G first
became available, it was simply a little faster than 3G. 4G is not the same as 4G LTE which is very
close to meeting the criteria of the standards. To download a new game or stream a TV show in HD,
you can do it without buffering .
Newer generations of phones are usually designed to be backward-compatible , so a 4G phone can
communicate through a 3G or even 2G network. All carriers seem to agree that OFDM is one of the
chief indicators that a service can be legitimately marketed as being 4G. OFDM is a type of digital
modulation in which a signal is split into several narrowband channels at different frequencies.
There are a significant amount of infrastructure changes needed to be implemented by service
providers in order to supply because voice calls in GSM , UMTS and CDMA2000 are circuit
switched, so with the adoption of LTE, carriers will have to re-engineer their voice call network.
And again, we have the fractional parts: 4.5G and 4.9G marking the transition of LTE (in the stage
called LTE-Advanced Pro) getting us more MIMO, more D2D on the way to IMT-2020 and the
requirements of 5G .
5G - Fifth Generation
5G is a generation currently under development , that's intended to improve on 4G. 5G promises
significantly faster data rates, higher connection density, much lower latency, among other
improvements. Some of the plans for 5G include device-to-device communication, better battery
consumption, and improved overall wireless coverage. The max speed of 5G is aimed at being as
fast as 35.46 Gbps , which is over 35 times faster than 4G.
Key technologies to look out for: Massive MIMO , Millimeter Wave Mobile Communications etc.
Massive MIMO, milimetre wave, small cells, Li-Fi all the new technologies from the previous
decade could be used to give 10Gb/s to a user, with an unseen low latency, and allow connections
for at least 100 billion devices . Different estimations have been made for the date of commercial
introduction of 5G networks. Next Generation Mobile Networks Alliance feel that 5G should be
rolled out by 2020 to meet business and consumer demands.

Q.3 Define and Explain handoff or handover?


Ans:
Although the concept of cellular handover or cellular handoff is relatively straightforward, it is not
an easy process to implement in reality. The cellular network needs to decide when handover or
handoff is necessary, and to which cell. Also when the handover occurs it is necessary to re-route
the call to the relevant base station along with changing the communication between the mobile and
the base station to a new channel. All of this needs to be undertaken without any noticeable
interruption to the call. The process is quite complicated, and in early systems calls were often lost
if the process did not work correctly.
Different cellular standards handle hand over / handoff in slightly different ways. Therefore for the
sake of an explanation the example of the way that GSM handles handover is given.
There are a number of parameters that need to be known to determine whether a handover is
required. The signal strength of the base station with which communication is being made, along
with the signal strengths of the surrounding stations. Additionally the availability of channels also
needs to be known. The mobile is obviously best suited to monitor the strength of the base stations,
but only the cellular network knows the status of channel availability and the network makes the
decision about when the handover is to take place and to which channel of which cell.
Types of handover / handoff
With the advent of CDMA systems where the same channels can be used by several mobiles, and
where it is possible to adjacent cells or cell sectors to use the same frequency channel there are a
number of different types of handover that can be performed:

 Hard handover (hard handoff)


 Soft handover (soft handoff)

Fig:-Types of Handover

Hard handover

The definition of a hard handover or handoff is one where an existing connection must be broken
before the new one is established. One example of hard handover is when frequencies are changed.
As the mobile will normally only be able to transmit on one frequency at a time, the connection
must be broken before it can move to the new channel where the connection is re-established. This
is often termed and inter-frequency hard handover. While this is the most common form of hard
handoff, it is not the only one. It is also possible to have intra-frequency hard handovers where the
frequency channel remains the same.

Although there is generally a short break in transmission, this is normally short enough not to be
noticed by the user.

Soft handover

The new 3G technologies use CDMA where it is possible to have neighbouring cells on the same
frequency and this opens the possibility of having a form of handover or handoff where it is not
necessary to break the connection. This is called soft handover or soft handoff, and it is defined as a
handover where a new connection is established before the old one is released. In UMTS most of
the handovers that are performed are intra-frequency soft handovers.

Q.4 Short note on


1. Mobile Station
2. SIM
3. Base Station
Ans:
1. Mobile Station

Fig, GSM Architecture


The MS consists of the physical equipment, such as the radio transceiver, display and digital signal
processors, and the SIM card. It provides the air interface to the user in GSM networks. As such,
other services are also provided, which include:

 Voice teleservices
 Data bearer services
 The features' supplementary services

The MS Functions
The MS also provides the receptor for SMS messages, enabling the user to toggle between the voice
and data use. Moreover, the mobile facilitates access to voice messaging systems. The MS also
provides access to the various data services available in a GSM network. These data services
include:
 X.25 packet switching through a synchronous or asynchronous dial-up connection to the
PAD at speeds typically at 9.6 Kbps.
 General Packet Radio Services (GPRSs) using either an X.25 or IP based data transfer
method at speeds up to 115 Kbps.
 High speed, circuit switched data at speeds up to 64 Kbps.

Working of Mobile Station


The complete operation of a Mobile Station can be divided into four different phases of
operation:
1. Initialization Phase
2. Cell Selection Phase
3. Idle Phase
4. Dedicated Mode
Initialization Phase: Initialization of Mobile Station is the first phase that prepares the mobile for
further phases of operation. Initialization phase begins as soon as the mobile is switched on.
Initialization can be accomplished with or without SIM (Subscriber Identity Module). However,
initialization without SIM does not allow mobile to initiate or receive any calls. Some mobile
operators allow emergency calls even without SIM. In the case of initialization with SIM, the
mobile station uses this identity of the SIM to establish a connection with the home network. SIM
can be set to ask for a PIN number every time it is inserted into a new mobile station.
Cell Selection Phase: A mobile station can receive signals from more than one transmitting stations
but it connects to a single network that has the strongest signal. Each cell has its unique transmitting
station and thus by selecting the strongest network, a mobile station selects the nearest cell.
Idle Phase: Once a mobile station is connected to the strongest network, it goes into an ideal phase
where it waits for the network to initiate any incoming calls.
Dedicated Mode: When a mobile station receives an incoming call and accepts it, a communication
channel is reserved between the mobile station and the network as long as the call continues. Thus,
the communication channel works as the dedicated channel and this mode is referred as the
dedicated mode.

2. SIM
A subscriber identity module or subscriber identification module (SIM), widely known as a SIM
card, is an integrated circuit that is intended to securely store the international mobile subscriber
identity (IMSI) number and its related key, which are used to identify and authenticate subscribers
on mobile telephony devices (such as mobile phones and computers). It is also possible to store
contact information on many SIM cards. SIM cards are always used on GSM phones; for CDMA
phones, they are only needed for newer LTE-capable handsets. SIM cards can also be used in
satellite phones, smart watches, computers, or cameras.
The SIM circuit is part of the function of a universal integrated circuit card (UICC) physical smart
card, which is usually made of PVC with embedded contacts and semiconductors. SIM cards are
transferable between different mobile devices. The first UICC smart cards were the size of credit
and bank cards; sizes were reduced several times over the years, usually keeping electrical contacts
the same, so that a larger card could be cut down to a smaller size.
A SIM card contains its unique serial number (ICCID), international mobile subscriber identity
(IMSI) number, security authentication and ciphering information, temporary information related to
the local network, a list of the services the user has access to, and two passwords: a personal
identification number (PIN) for ordinary use, and a personal unblocking code (PUC) for PIN
unlocking.
The SIM provides personal mobility so that the user can have access to all subscribed services
irrespective of both the location of the terminal and the use of a specific terminal. You need to
insert the SIM card into another GSM cellular phone to receive calls at that phone, make calls from
that phone, or receive other subscribed services.

3. Base Station
A base station is a fixed point of communication for customer cellular phones on a carrier network.
The base station is connected to an antenna (or multiple antennae) that receives and transmits
the signals in the cellular network to customer phones and cellular devices. That equipment is con-
nected to a mobile switching station that connects cellular calls to the public switched telephone
network (PSTN).
Class Test 2

CLASS TEST- II
(AY 2018-19)
Branch: B.E. Computer Engineering Date: 29 /09/ 2018
Semester: I Duration: 1 hour
Subject: EL-II: Mobile Communication ( 410245) Max. Marks: 20M
Note: 1. Attempt All Questions in Section A. 2. Attempt any 3 Questions in Section B.
3. All questions are as per course outcomes. 4. Assume suitable data wherever is required.

Bloom’s Taxonomy level: Bloom Levels (BL) : 1. Remember 2. Understand 3. Apply 4. Create

Unit no. Bloom’s


Question Max. CO
Questions as per Taxonomy
No. Marks Mapped
syllabus Level
Section A
A _________ Stored user details permanently.
01 01 4 CO-3 1
a) HLR b) MSC c) VLR d) AuC
A _________ Stored user details Temporary.
02 01 4 CO-5 1
a) AuC b) MSC c) VLR d) HLR
Which Generation support packet switching for calling and Data?
03 01 5 CO-1 2
a) 3G b) 2.5G c) 4G d) GPRS
Which of the following leads to the 3G evolution of GSM, IS-136 and PDC
04 systems? 01 5 CO-4 2
a) W-CDMA b) GPRS c) EDGE d) HSCSD
What does SGSN stands for?
05 a) Serial Gateway Supporting Node b) Supporting GGSN Support Node 01 4 CO-3 1
c) Serving GPRS Support Node d) Supporting Gateway Support Node
Section B
01. Explain GSM Architecture with Diagram? 05 4 CO-3 2
02. Draw & Explain General Packet Radio Service Architecture? 05 4 CO-5 4
03. Explain UMTS Architecture with diagram? 05 5 CO-1 2
04. Explain 4G Architecture with diagram? 05 5 CO-4 2
Class Test 2 Solution
A _________ Stored user details permanently.
01 01 4 CO-3 1
a) HLR b) MSC c) VLR d) AuC
Ans:
a) HLR

A _________ Stored user details Temporary.


02 01 4 CO-5 1
a) AuC b) MSC c) VLR d) HLR
Ans:
c) VLR

Which Generation support packet switching for


03 calling and Data? 01 5 CO-1 2
a) 3G b) 2.5G c) 4G d) GPRS
Ans:
c) 4G

Which of the following leads to the 3G evolution of


04 GSM, IS-136 and PDC systems? 01 5 CO-4 2
a) W-CDMA b) GPRS c) EDGE d) HSCSD
Ans:
a) W-CDMA

What does SGSN stands for?


a) Serial Gateway Supporting Node
05 b) Supporting GGSN Support Node 01 4 CO-3 1
c) Serving GPRS Support Node
d) Supporting Gateway Support Node
Ans:
c) Serving GPRS Support Node

01. Explain GSM Architecture with Diagram? 05 4 CO-3 2


Ans:
GSM Architecture
MS-Mobile station BSS- Base station Subsystem BTS-Base Transceiver Station
BSC- Base Station Controller NSS- Network subsystem OSS-operation support system
HLR -Home Location Register VLR-Visitor Location Register AUC-Authentication center
MSC- Mobile switching center EIR- Equipment Identity Register

Mobile station- These are the users .Number of users are controlled by one BTS
1. The mobile stations (MS) communicate with the base station subsystem over the radio the radio
interface.
2. The BSS called as radio the subsystem, provides and manages the radio transmission path
between the mobile stations and the Mobile Switching Centre(MSC).It also manages radio interface
between the mobile stations and other subsystems of GSM.
3. Each BSS comprises many Base Station Controllers(BSC) that connect the mobile station to the
network and switching subsystem (NSS) through the mobile switching center
4. The NSS controls the switching functions of the GSM system. It allows the mobile switching
center to communicate with networks like PSTN, ISDN, CSPDN, PSPDN and other data networks.
5. The operation support system (OSS) allows the operation and maintenance of the GSM system. It
allows the system engineers to diagnose, troubleshoot and observe the parameters of the GSM
systems. The OSS subsystem interacts with the other subsystems and is provided for the GSM
operating company staff that provides service facilities for the network.
Base station(BSS)-- The following stations subsystem comprises of two parts:
1. Base Transceiver Station (BTS).
2. Base Station Controller(BSC).
The BSS consists many BSC that connect to a single MSC. Each BSC controls up to several
hundred BTS.
Base Transceiver Station(BTS)-BTS
It has radio transreciever that define a cell and are capable of handling radio link protocols with
MS.
Functions of BTS are
1. Handling radio link protocols
2. Providing FD communication to MS.
3. Interleaving and de- interleaving.
Base station controller(BSC) IT manages radio resources for one or more BTS.It controls several
hundred BTS al are connected to single MSC.
Functions of BTS are
• To control BTS.
• Radio resource management
• Handoff management and control
• Radio channel setup and frequency hoping
Network subsystem( NSS)
1.It handles the switching of GSM calls between external networks and indoor BSC
2.It includes three different data bases for mobility management as
A .HLR (Home Location Register)
B .VLR (Visitor Location Register)
C. AUC (Authentication center)
Mobile switching center (MSC)--
It connects fix networks like ISDN ,PSTN etc.
Following are the functions of MSC
1. Call setup, supervision and relies
2. Collection Of Billing Information
3. Call handling / routing
4. Management of signaling protocol
5. Record of VLR and HLR
HLR (Home Location Register) - Call ramming and call routing capabilities of GSM are
handeled.It stores all the administrative information of sub scriber registered in the networks. IT
maintains unique international mobile subscriber identity.(IMSI).
VLR (Visitor Location Register) - It is a temporary data base. It stores the IMSC number and
customer information for each roaming customer visiting specific MSC.
Authentication center - It is protected database .It maintains authentication keys and algorithms. It
contain s a register called as Equipment Identity Register.
Operation subsystem(OSS) - IT manages all mobile equipment in the system
1)management for charging and billing procedure
2)To maintain all hardware and network operations
AuC:
The AuC database holds different algorithms that are used for authentication and encryptions of the
mobile subscribers that verify the mobile user’s identity and ensure the confidentiality of each call.
The AuC holds the authentication and encryption keys for all the subscribers in both the home and
visitor location register.
EIR:
The EIR is another database that keeps the information about the identity of mobile equipment such
the International mobile Equipment Identity (IMEI) that reveals the details about the manufacturer,
country of production, and device type. This information is used to prevent calls from being
misused, to prevent unauthorized or defective MSs, to report stolen mobile phones or check if the
mobile phone is operating according to the specification of its type

Draw & Explain General Packet Radio Service


02. 05 4 CO-5 4
Architecture?

Ans:
General Packet Radio Service (GPRS)
MS-Mobile station , BSS- Base station Subsystem, HLR -Home Location Register , VLR-
Visitor Location Register , MSC- Mobile switching center EIR- Equipment Identity Register, PDN –
Packet Data Network SGSN-Serving GPRS Support Node, GGSN-Gateway GPRS Support Node
03. Explain UMTS Architecture with diagram? 05 5 CO-1 2

UMTS/3G

the Universal Mobile Telecommunications System is the third generation (3G) successor to the
second generation GSM based cellular technologies which also include GPRS, and EDGE.
Although UMTS uses a totally different air interface, the core network elements have been
migrating towards the UMTS requirements with the introduction of GPRS and EDGE. In this way
the transition from GSM to the 3G UMTS architecture did not require such a large instantaneous
investment.

UMTS uses Wideband CDMA (WCDMA / W-CDMA) to carry the radio transmissions, and often
the system is referred to by the name WCDMA. It is also gaining a third name.

UMTS Specifications and Management

In order to create and manage a system as complicated as UMTS or WCDMA it is necessary to


develop and maintain a large number of documents and specifications. For UMTS or WCDMA,
these are now managed by a group known as 3GPP - the Third Generation Partnership Programme.
This is a global co-operation between six organizational partners - ARIB, CCSA, ETSI, ATIS, TTA
and TTC.

The scope of 3GPP was to produce globally applicable Technical Specifications and Technical
Reports for a 3rd Generation Mobile Telecommunications System. This would be based upon the
GSM core networks and the radio access technologies that they support (i.e., Universal Terrestrial
Radio Access (UTRA) both Frequency Division Duplex (FDD) and Time Division Duplex (TDD)
modes).

Since it was originally formed, 3GPP has also taken over responsibility for the GSM standards as
well as looking at future developments including LTE (Long Term Evolution) and the 4G
technology known as LTE Advanced.

3G UMTS / WCDMA technologies

There are several key areas of 3G UMTS / WCDMA. Within these there are several key
technologies that have been employed to enable UMTS / WCDMA to provide a leap in
performance over its 2G predecessors.

Some of these key areas include:

 Radio interface: The UMTS radio interface provides the basic definition of the radio
signal. W-CDMA occupies 5 MHz channels and has defined formats for elements such as
synchronization, power control and the like Read more about the UMTS / W-CDMA radio
interface.
 CDMA technology: 3G UMTS relies on a scheme known as CDMA or code division
multiple access to enable multiple handsets or user equipments to have access to the base
station. Using a scheme known as direct sequence spread spectrum, different UEs have
different codes and can all talk to the base station even though they are all on the same
frequency Read more about the code division multiple access.
 UMTS network architecture: The architecture for a UMTS network was designed to enable
packet data to be carried over the network, whilst still enabling it to support circuit
switched voice. All the usual functions enabling access to the network, roaming and the
like are also supported.
 UMTS modulation schemes: Within the CDMA signal format, a variety of forms of
modulation are used. These are typically forms of phase shift keying.
 UMTS channels: As with any cellular system, different data channels are required for
passing payload data as well as control information and for enabling the required resources
to be allocated. A variety of different data channels are used to enable these facilities to be
accomplished
 UMTS TDD: There are two methods of providing duplex for 3G UMTS. One is what is
termed frequency division duplex, FDD. This uses two channels spaced sufficiently apart
so that the receiver can receive whilst the transmitter is also operating. Another method is
to use time vision duplex, TDD where short time blocks are allocated to transmissions in
both directions. Using this method, only a single channel is required.
 Handover: One key area of any cellular telecommunications system is the handover
(handoff) from one cell to the next. Using CDMA there are several forms of handover that
are implemented within the system.

UMTS network constituents


The UMTS network architecture can be divided into three main elements:
 User Equipment (UE): The User Equipment or UE is the name given to what was previous
termed the mobile, or cell phone. The new name was chosen because the considerably
greater functionality that the UE could have. It could also be anything between a mobile
phone used for talking to a data terminal attached to a computer with no voice capability.
 Radio Network Subsystem (RNS): The RNS also known as the UMTS Radio Access
Network, UTRAN, is the equivalent of the previous Base Station Subsystem or BSS in
GSM. It provides and manages the air interface fort he overall network.
 Core Network: The core network provides all the central processing and management for the
system. It is the equivalent of the GSM Network Switching Subsystem or NSS.
The core network is then the overall entity that interfaces to external networks including the public
phone network and other cellular telecommunications networks.

UMTS Network Architecture Overview


User Equipment, UE
The USER Equipment or UE is a major element of the overall 3G UMTS network architecture. It
forms the final interface with the user. In view of the far greater number of applications and
facilities that it can perform, the decision was made to call it a user equipment rather than a mobile.
However it is essentially the handset (in the broadest terminology), although having access to much
higher speed data communications, it can be much more versatile, containing many more
applications. It consists of a variety of different elements including RF circuitry, processing,
antenna, battery, etc.
3G UMTS Radio Network Subsystem
This is the section of the 3G UMTS / WCDMA network that interfaces to both the UE and the core
network. The overall radio access network, i.e. collectively all the Radio Network Subsystem is
known as the UTRAN UMTS Radio Access Network.
The radio network subsystem is also known as the UMTS Radio Access Network or UTRAN.
3G UMTS Core Network
The 3G UMTS core network architecture is a migration of that used for GSM with further elements
overlaid to enable the additional functionality demanded by UMTS.
In view of the different ways in which data may be carried, the UMTS core network may be split
into two different areas:
Circuit switched elements: These elements are primarily based on the GSM network entities and
carry data in a circuit switched manner, i.e. a permanent channel for the duration of the call.
Packet switched elements: These network entities are designed to carry packet data. This enables
much higher network usage as the capacity can be shared and data is carried as packets which
are routed according to their destination.
Some network elements, particularly those that are associated with registration are shared by both
domains and operate in the same way that they did with GSM.

UMTS Core Network


Circuit switched elements
The circuit switched elements of the UMTS core network architecture include the following
network entities:
 Mobile switching centre (MSC): This is essentially the same as that within GSM, and it
manages the circuit switched calls under way.
 Gateway MSC (GMSC): This is effectively the interface to the external networks.
Packet switched elements
The packet switched elements of the 3G UMTS core network architecture include the following
network entities:
 Gateway GPRS Support Node (GGSN): Like the SGSN, this entity was also first
introduced into the GPRS network. The Gateway GPRS Support Node (GGSN) is the
central element within the UMTS packet switched network. It handles inter-working
between the UMTS packet switched network and external packet switched networks, and
can be considered as a very sophisticated router. In operation, when the GGSN receives data
addressed to a specific user, it checks if the user is active and then forwards the data to the
SGSN serving the particular UE.
Shared elements
The shared elements of the 3G UMTS core network architecture include the following network entities:
 Home location register (HLR): This database contains all the administrative information
about each subscriber along with their last known location. In this way, the UMTS network
is able to route calls to the relevant RNC / Node B. When a user switches on their UE, it
registers with the network and from this it is possible to determine which Node B it
communicates with so that incoming calls can be routed appropriately. Even when the UE is
not active (but switched on) it re-registers periodically to ensure that the network (HLR) is
aware of its latest position with their current or last known location on the network.
 Equipment identity register (EIR): The EIR is the entity that decides whether a given UE
equipment may be allowed onto the network. Each UE equipment has a number known as
the International Mobile Equipment Identity. This number, as mentioned above, is installed
in the equipment and is checked by the network during registration.
 Authentication centre (AuC) : The AuC is a protected database that contains the secret key
also contained in the user's USIM card.
UMTS radio access network, UTRAN
The UMTS Radio Access Network, UTRAN, or Radio Network Subsystem, RNS comprises two
main components:
 Radio Network Controller, RNC: This element of the UTRAN / radio network subsystem
controls the Node Bs that are connected to it, i.e. the radio resources in its domain.. The
RNC undertakes the radio resource management and some of the mobility management
functions, although not all. It is also the point at which the data encryption / decryption is
performed to protect the user data from eavesdropping.
 Node B: Node B is the term used within UMTS to denote the base station transceiver. This
part of the UTRAN contains the transmitter and receiver to communicate with the UEs
within the cell. It participates with the RNC in the resource management. NodeB is the
3GPP term for base station, and often the terms are used interchangeably.

3G UMTS UTRAN Architecture

UTRAN interfaces
 Serving GPRS Support Node (SGSN): As the name implies, this entity was first developed
when GPRS was introduced, and its use has been carried over into the UMTS network
architecture. The SGSN provides a number of functions within the UMTS network
architecture.
o Mobility management When a UE attaches to the Packet Switched domain of the
UMTS Core Network, the SGSN generates MM information based on the mobile's
current location.
o Session management: The SGSN manages the data sessions providing the required
quality of service and also managing what are termed the PDP (Packet data Protocol)
contexts, i.e. the pipes over which the data is sent.
o Interaction with other areas of the network: The SGSN is able to manage its
elements within the network only by communicating with other areas of the network,
e.g. MSC and other circuit switched areas.
o Billing: The SGSN is also responsible billing. It achieves this by monitoring the flow
of user data across the GPRS network. CDRs (Call Detail Records) are generated by
the SGSN before being transferred to the charging entities (Charging Gateway
Function, CGF).
The UMTS standards are structured in a way that the internal functionality of the different network
elements is not defined. Instead, the interfaces between the network elements is defined and in this
way, so too is the element functionality.
There are several interfaces that are defined for the UTRAN elements:
 Iub : The Iub connects the NodeB and the RNC within the UTRAN. Although when it was
launched, a standardization of the interface between the controller and base station in the
UTRAN was revolutionary, the aim was to stimulate competition between suppliers,
allowing opportunities like some manufacturers who might concentrate just on base stations
rather than the controller and other network entities.
 Iur : The Iur interface allows communication between different RNCs within the UTRAN.
The open Iur interface enables capabilities like soft handover to occur as well as helping to
stimulate competition between equipment manufacturers.
 Iu : The Iu interface connects the UTRAN to the core network.
Having standardised interfaces within various areas of the network including the UTRAN allows
network operators to select different network entities from different suppliers.

04. Explain 4G Architecture with diagram? 05 5 CO-4 2

Ans:

LTE/4G

The high-level network architecture of LTE is comprised of following three main components:
 The User Equipment (UE).
 The Evolved UMTS Terrestrial Radio Access Network (E-UTRAN).
 The Evolved Packet Core (EPC).
The evolved packet core communicates with packet data networks in the outside world such as the
internet, private corporate networks or the IP multimedia subsystem. The interfaces between the
different parts of the system are denoted Uu, S1 and SGi as shown below:

The User Equipment (UE)


The internal architecture of the user equipment for LTE is identical to the one used by UMTS and
GSM which is actually a Mobile Equipment (ME). The mobile equipment comprised of the
following important modules:
 Mobile Termination (MT) : This handles all the communication functions.
 Terminal Equipment (TE) : This terminates the data streams.
 Universal Integrated Circuit Card (UICC) : This is also known as the SIM card for LTE
equipments. It runs an application known as the Universal Subscriber Identity Module
(USIM).
A USIM stores user-specific data very similar to 3G SIM card. This keeps information about the
user's phone number, home network identity and security keys etc.
The E-UTRAN (The access network)
The architecture of evolved UMTS Terrestrial Radio Access Network (E-UTRAN) has been
illustrated below.

The E-UTRAN handles the radio communications between the mobile and the evolved packet core
and just has one component, the evolved base stations, called eNodeB or eNB. Each eNB is a base
station that controls the mobiles in one or more cells. The base station that is communicating with
a mobile is known as its serving eNB.
LTE Mobile communicates with just one base station and one cell at a time and there are following
two main functions supported by eNB:
 The eBN sends and receives radio transmissions to all the mobiles using the analogue and
digital signal processing functions of the LTE air interface.
 The eNB controls the low-level operation of all its mobiles, by sending them signalling
messages such as handover commands.
Each eBN connects with the EPC by means of the S1 interface and it can also be connected to
nearby base stations by the X2 interface, which is mainly used for signaling and packet forwarding
during handover.
A home eNB (HeNB) is a base station that has been purchased by a user to provide femtocell
coverage within the home. A home eNB belongs to a closed subscriber group (CSG) and can only
be accessed by mobiles with a USIM that also belongs to the closed subscriber group.
The Evolved Packet Core (EPC) (The core network)
The architecture of Evolved Packet Core (EPC) has been illustrated below. There are few more
components which have not been shown in the diagram to keep it simple. These components are
like the Earthquake and Tsunami Warning System (ETWS), the Equipment Identity Register (EIR)
and Policy Control and Charging Rules Function (PCRF).
Below is a brief description of each of the components shown in the above architecture:
 The Home Subscriber Server (HSS) component has been carried forward from UMTS and
GSM and is a central database that contains information about all the network operator's
subscribers.
 The Packet Data Network (PDN) Gateway (P-GW) communicates with the outside world
ie. packet data networks PDN, using SGi interface. Each packet data network is identified
by an access point name (APN). The PDN gateway has the same role as the GPRS support
node (GGSN) and the serving GPRS support node (SGSN) with UMTS and GSM.
 The serving gateway (S-GW) acts as a router, and forwards data between the base station
and the PDN gateway.
 The mobility management entity (MME) controls the high-level operation of the mobile by
means of signalling messages and Home Subscriber Server (HSS).
 The Policy Control and Charging Rules Function (PCRF) is a component which is not
shown in the above diagram but it is responsible for policy control decision-making, as
well as for controlling the flow-based charging functionalities in the Policy Control
Enforcement Function (PCEF), which resides in the P-GW.
The interface between the serving and PDN gateways is known as S5/S8. This has two slightly dif-
ferent implementations, namely S5 if the two devices are in the same network, and S8 if they are in
different networks.
Prelim Exam (AY 2018-19)
Branch: B.E.Computer Date:13/10/2018
Semester: VII Duration: 2:30 hour
Subject: EL-II: Mobile Communication ( 410245) (2015 Pattern) Max. Marks: 70
Note: (1) Answer Q. 1 or Q. 2, Q. 3 or Q. 4, Q. 5 or Q. 6, Q. 7 or Q. 8, Q. 9 or Q. 10.
(2) Figures to the right indicate full marks.
(3) Neat diagrams must be drawn wherever necessary.
(4) Assume suitable data, if necessary
Max. CO Bloom's
Questions Marks mapped Taxonomy
Level

Q.1 a. Draw and explain Frequency Reuse and Co-channel Interference 5 CO-1 2
b. Short Note on
5 CO-2 1
a) GMSK Modulation b) 8PSK
OR
5 CO-2 2
Q.2 a. Define and Explain handoff /handover?
b. Difference Between SDMA, FDMA, TDMA, CDMA 5 CO-1 2
5 CO-3 2
Q.3 a. Difference Between FHSS and DSSS
b. Explain PCS Architecture? 5 CO-1 2
OR

Q.4 a. Short Note on 5 CO-1 1


1) AuC 2) EIR 3) PSTN
5 CO-1 2
4) Home Location Register 5)Visitor Location Register
b. Explain Mobility Management in CDMA ?
8 CO-3 2
Q.5 a. Explain GSM Architecture with Diagram?
b. Explain GSM Bursts and GSM Frame 8 CO-3 2
OR
8 CO-4 2
Q.6 a. Explain High Speed Packet Access(HSPA)
b. Explain different types of GSM Channels 8 CO-3 2
8 CO-5 2
Q.7 a. Explain 5GAA (Autonomous Automation)
b. Difference Between W-CDMA and CDMA2000 8 CO-3 2
OR
8 CO-3 2
Q.8 a. Draw & Explain GPRS Architecture?
b. Explain Incoming and Outgoing Call setup 8 CO-3 2

Q.9 a. Explain 3G Architecture with diagram? 9 CO-4 2


b. Short Note on
9 CO-5 1
1) Virtual Reality 2) Augmented Reality
OR

Q.10 a. Explain 4G Architecture with diagram? 9 CO-4 2


b. Short Note on
9 CO-4 1
1) HSPA+ 2)HSUPA 3) HSDPA
Prelim Exam Solution
Q.1 a. Draw and explain Frequency Reuse and Co-channel Interference.
Ans:
Frequency reuse
1. Cellular phone networks use cellular frequency reuse. In the cellular reuse concept,
frequencies allocated to the service are reused in a regular pattern of areas, called "cells",
each covered by one base station.
2. In mobile-telephone nets these cells are usually hexagonal. To ensure that the mutual
interference between users remains below a harmful level, adjacent cells use different
frequencies. However in cells that are separated further away, frequencies can be reused.
3. Particularly in the United States, the term "cell phone" is often used by the public when a
wireless phone is meant. The cellular approach was proposed and developed predominantly
by the Bell System, in the U.S. in the early 70's, after the regulatory agency FCC has asked
for an upgrade of the existing radio telephone service.

Typical frequency reuse plan for 3 different radio frequencies, based on hexagonal cells. Radio
channels are indicated by color. In fact some problems in cellular frequency assignment are solved
using map coloring theory.
The FCC had the foresight to require:
1. a large subscriber capacity
2. efficient use of spectrum resources
3. nationwide coverage
4. adaptability to traffic density
5. telephone service to both vehicle and portable user terminals
6. telephony but also other services including closed user groups with voice dispatch
operations
7. toll quality
Co-channel cells: Frequency reuse implies that in a given coverage area, there are several cells that
use the same set of frequencies. These cells are called co-channel cells.
Causes:
1. Reduction of D/R ratio, which reduce distance between two co-channels.
2. Use of omnidirectional antennas at the base station.
3. Increasing the antenna height at the base station.
Effects of co-channel interference on system capacity:
The parameter Q, called the co-channel reuse ratio, is related to cluster size N,
Q=D/R=√3N
A small value of Q provides larger capacity since the cluster size N is small, whereas a large value
of Q implies smaller level of co-channel interference. Thus with reduction in cp-channel
interference there will reduction in system capacity.

b. Short Note on
a) GMSK Modulation b) 8PSK
Ans:
a) GMSK Modulation
The Gaussian Minimum Shift Keying (GMSK) modulation is a modified version of the Minimum
Shift Keying (MSK) modulation where the phase is further filtered through a Gaussian filter to
smooth the transitions from one point to the next in the constellation. Next figure presents the
GMSK generation scheme:

Fig. GMSK generation scheme

where the Gaussian filter adopts the following form in the time domain:

Where λ is a normalization constant to maintain the power and the product BTc is the -3 dB band-
width-symbol time product. The higher this value, the cleaner will be the eye diagram of the signal
but more power will be transmitted on the side lobes of the spectrum. A typical value in communi-
cation applications is BTc=0.3 which is a good compromise between spectral efficiency and Inter-
Symbol interference

Q.2 a. Define and Explain handoff /handover?


Ans:
A handoff refers to the process of transferring an active call or data session from one cell in a
cellular network to another or from one channel in a cell to another
A handover is a process in telecommunications and mobile communications in which a connected
cellular call or a data session is transferred from one cell site (base station) to another without
disconnecting the session.

b. Difference Between SDMA, FDMA, TDMA, CDMA


Ans:
Q.3 a. Difference Between FHSS and DSSS
Ans:
DSSS-Direct Sequence Spread Spectrum

In DSSS, which stands for Direct Sequence Spread Spectrum, information bits are modulated by PN
codes(chips). PN codes are Pseudonoise code symbols. This PN codes have short duration compare
to information bits. Here transmitted information over the air occupies more bandwidth compare to
user informaion bits. DSSS is the modulation technique adopted in IEEE 802.11 based WLAN
compliant products. In DSSS systems entire system bandwidth is available for each user all the
time.
Figure-1 depicts DSSS Transmitter and DSSS receiver Block Diagram. PRS stands for Pseudo-
Random Sequence. Refer CCK vs DSSS vs OFDM>> which explains DSSS transmitter and
receiver with signal waveforms.
FHSS-Frequency Hopping Spread Spectrum

In FHSS, which stands for Frequency Hopping Spread Spectrum, RF carrier frequency is changed
according to the Pseudo-random sequence(PRS or PN sequence). This PN sequence is known to
both transmitter and Receiver and hence help demodulate/decode the information. Within one chip
duration, RF frequency does not vary. Based on this fact there are two types of FHSS, fast hopped
FHSS and slow hopped FHSS.
In Fast hopped FHSS, hopping is done at the rate faster than message(information) bit rate. In slow
hopped FHSS, hopping is done at the rate slower than information bit rate.
b. Explain PCS Architecture?
Ans:
A personal communications service (PCS) is a type of wireless mobile service with advanced cover-
age and that delivers services at a more personal level. It generally refers to the modern mobile
communication that boosts the capabilities of conventional cellular networks and fixed-line tele-
phony networks as well.
PCS is also known as digital cellular.
A PCS works similarly to a cellular network in basic operations, but requires more service provider
infrastructure to cover a wider geographical area. A PCS generally includes the following:
 Wireless communication (data, voice and video)
 Mobile PBX
 Paging and texting
 Wireless radio
 Personal communication networks
 Satellite communication systems, etc.
PCS has three broad categories: narrowband, broadband and unlicensed. TDMA, CDMA and GSM,
and 2G, 3G and 4G are some of the common technologies that are used to deliver a PCS.
Q.4 a. Short Note on
1) AuC 2) EIR 3) PSTN
4) Home Location Register 5)Visitor Location Register
Ans:

Fig. Simplified GSM Network

1) AuC
Authentication Centre (AuC): The AuC is a protected database that contains the secret key also con-
tained in the user's SIM card. It is used for authentication and for ciphering on the radio channel.

2) EIR
Equipment Identity Register (EIR): The EIR is the entity that decides whether a given mobile
equipment may be allowed onto the network. Each mobile equipment has a number known as the
International Mobile Equipment Identity. This number, as mentioned above, is installed in the
equipment and is checked by the network during registration. Dependent upon the information held
in the EIR, the mobile may be allocated one of three states - allowed onto the network, barred
access, or monitored in case its problems.

3) PSTN
PSTN stands for Public Switched Telephone Network, or the traditional circuit-switched telephone
network. This is the system that has been in general use since the late 1800s.
Using underground copper wires, this legacy platform has provided businesses and households alike
with a reliable means to communicate with anyone around the world for generations.
The phones themselves are known by several names, such as PSTN, landlines, Plain Old Telephone
Service (POTS), or fixed-line telephones.
PSTN phones are widely used and generally still accepted as a standard form of communication.
However, they have seen a steady decline over the last decade.
4) Home Location Register
Home Location Register (HLR): This database contains all the administrative information about
each subscriber along with their last known location. In this way, the GSM network is able to route
calls to the relevant base station for the MS. When a user switches on their phone, the phone regis-
ters with the network and from this it is possible to determine which BTS it communicates with so
that incoming calls can be routed appropriately. Even when the phone is not active (but switched
on) it re-registers periodically to ensure that the network (HLR) is aware of its latest position. There
is one HLR per network, although it may be distributed across various sub-centres to for operational
reasons.

5)Visitor Location Register


Visitor Location Register (VLR): This contains selected information from the HLR that enables the
selected services for the individual subscriber to be provided. The VLR can be implemented as a
separate entity, but it is commonly realised as an integral part of the MSC, rather than a separate en-
tity. In this way access is made faster and more convenient.

b. Explain Mobility Management in CDMA ?


Ans:
 Of all the 2G cellular systems,IS-95 is the most complex because of the use of spread spec-
trum,that brings advantage not available with TDMA based systems.
 The CDMA which is helping in performing soft hand-offs which improved the voice quality
during hand-off.
 Mobility outside the soft hand-offs is based on general mobility management procedures.
 There are three types of hand-offs defined as IS-95.

 The system level objectives of maximum utilization of available radio spectrum in cellular
communication system often translate into maximum number of simultaneous mobile users
served bye the system with acceptable signal quality,which is directly related to minimizing
the transmitted power of each mobile users at all time of its operation.
 In IS-95, a slow mobile assisted power control is employed on forward channel therefore
Non coherent detection is employed on reverse channel and hence power control implemen-
tation is a must on the reverse channel.
 There are mainly two types of power control mechanism an open loop and close loop.
 Because all the voice channel occupy the same frequency and time slot, the received signal
from multiple mobile user located anywhere within the periphery of the serving cell must all
have the same receive signal strength at the base station for detection.
 The advantage of implementing strict power control is that the mobile user can operate at
minimum required EbNoEbNofor adequate performance.This increase battery life and re-
duces the size of weight of mobile users phone equipment.

Q.5 a. Explain GSM Architecture with Diagram?


Ans:
A GSM network comprises of many functional units. These functions and interfaces are explained
in this chapter. The GSM network can be broadly divided into:

 The Mobile Station (MS)

 The Base Station Subsystem (BSS)

 The Network Switching Subsystem (NSS)

 The Operation Support Subsystem (OSS)

Given below is a simple pictorial view of the GSM architecture.

The additional components of the GSM architecture comprise of databases and messaging systems
functions:
 Home Location Register (HLR)
 Visitor Location Register (VLR)
 Equipment Identity Register (EIR)
 Authentication Center (AuC)
 SMS Serving Center (SMS SC)
 Gateway MSC (GMSC)
 Chargeback Center (CBC)
 Transcoder and Adaptation Unit (TRAU)

The following diagram shows the GSM network along with the added elements:

The MS and the BSS communicate across the Um interface. It is also known as the air interface or
the radio link. The BSS communicates with the Network Service Switching (NSS) center across
the A interface.

GSM network areas


In a GSM network, the following areas are defined:

 Cell : Cell is the basic service area; one BTS covers one cell. Each cell is given a Cell
Global Identity (CGI), a number that uniquely identifies the cell.

 Location Area : A group of cells form a Location Area (LA). This is the area that is paged
when a subscriber gets an incoming call. Each LA is assigned a Location Area Identity
(LAI). Each LA is served by one or more BSCs.

 MSC/VLR Service Area : The area covered by one MSC is called the MSC/VLR service
area.

 PLMN : The area covered by one network operator is called the Public Land Mobile Net-
work (PLMN). A PLMN can contain one or more MSCs.
b. Explain GSM Bursts and GSM Frame.
Ans:
GSM Bursts:
The GSM burst, or transmission can fulfil a variety of functions. Some GSM bursts are used for car-
rying data while others are used for control information. As a result of this a number of different
types of GSM burst are defined.

 Normal burst uplink and downlink


 Synchronisation burst downlink
 Frequency correction burst downlink
 Random Access (Shortened Burst) uplink.
The Below diagram illustrates a GSM burst. It consists of several different elements.

Fig. GSM Burst


These elements are as below:
Info
This is the area in which the speech, data or control information is held.
Guard Period
The BTS and MS can only receive the burst and decode it, if it is received within the timeslot desig-
nated for it. The timing, therefore, must be extremely accurate, but the structure does allow for a
small margin of error by incorporating a ‘guard period’ as shown in the diagram. To be precise, the
timeslot is 0.577 ms long, whereas the burst is only 0.546 ms long, therefore there is a time differ-
ence of 0.031 ms to enable the burst to hit the timeslot.
Stealing Flags
These two bits are set when a traffic channel burst has been ‘‘stolen” by a FACCH (the Fast Associ-
ated Control Channel). One bit set indicates that half of the block has been stolen.
Training Sequence
This is used by the receiver’s equalizer as it estimates the transfer characteristic of the physical path
between the BTS and the MS. The training sequence is 26 bits long.
Tail Bits
These are used to indicate the beginning and end of the burst.
Burst Types
The diagram below shows the five types of burst employed in the GSM air interface. All bursts, of
whatever type, have to be timed so that they are received within the appropriate timeslot of the
TDMA frame.

The burst is the sequence of bits transmitted by the BTS or MS, the timeslot is the discrete period of
real time within which it must arrive in order to be correctly decoded by the receiver:
Normal Burst
The normal burst carries traffic channels and all types of control channels.
Frequency Correction Burst
This burst carries FCCH downlink to correct the frequency of the MS’s local oscillator, effectively
locking it to that of the BTS.
Synchronization Burst
So called because its function is to carry SCH downlink, synchronizing the timing of the MS to that
of the BTS.
Dummy Burst
Used when there is no information to be carried on the unused timeslots of the BCCH Carrier
(downlink only).
Access Burst
This burst is of much shorter duration than the other types. The increased guard period is necessary
because the timing of its transmission is unknown. When this burst is transmitted, the BTS does not
know the location of the MS and therefore the timing of the message from the MS can not be
accurately accounted for. (The Access Burst is uplink only.)

GSM Frame:
GSM data structure is split into slots, frames, multiframes, superframes and hyperframes to give the
required structure and timing to the transmitted data.
The data frames and slots within 2G GSM are organised in a logical manner so that the system
understands when particular types of data are to be transmitted.
Having the GSM frame structure enables the data to be organised in a logical fashion so that the
system is able to handle the data correctly. This includes not only the voice data, but also the
important signalling information as well.
The GSM frame structure provides the basis for the various physical channels used within GSM,
and accordingly it is at the heart of the overall system.
GSM frame structure - the basics
The basic element in the GSM frame structure is the frame itself. This comprises the eight slots,
each used for different users within the TDMA system. As mentioned in another page of the
tutorial, the slots for transmission and reception for a given mobile are offset in time so that the
mobile does not transmit and receive at the same time.

The basic GSM frame defines the structure upon which all the timing and structure of the GSM
messaging and signalling is based. The fundamental unit of time is called a burst period and it lasts
for approximately 0.577 ms (15/26 ms). Eight of these burst periods are grouped into what is known
as a TDMA frame. This lasts for approximately 4.615 ms (i.e.120/26 ms) and it forms the basic unit
for the definition of logical channels. One physical channel is one burst period allocated in each
TDMA frame.

In simplified terms the base station transmits two types of channel, namely traffic and control.
Accordingly the channel structure is organised into two different types of frame, one for the traffic
on the main traffic carrier frequency, and the other for the control on the beacon frequency.

GSM multiframe

The GSM frames are grouped together to form multiframes and in this way it is possible to establish
a time schedule for their operation and the network can be synchronised.

There are several GSM multiframe structures:

 Traffic multiframe: The Traffic Channel frames are organised into multiframes consisting
of 26 bursts and taking 120 ms. In a traffic multiframe, 24 bursts are used for traffic. These
are numbered 0 to 11 and 13 to 24. One of the remaining bursts is then used to accommo-
date the SACCH, the remaining frame remaining free. The actual position used alternates
between position 12 and 25.
 Control multiframe: The Control Channel multiframe that comprises 51 bursts and occu-
pies 235.4 ms. This always occurs on the beacon frequency in time slot zero and it may also
occur within slots 2, 4 and 6 of the beacon frequency as well. This multiframe is subdivided
into logical channels which are time-scheduled. These logical channels and functions in-
clude the following:
 Frequency correction burst
 Synchronisation burst
 Broadcast channel (BCH)
 Paging and Access Grant Channel (PACCH)
 Stand Alone Dedicated Control Channel (SDCCH)
GSM Superframe

Multiframes are then constructed into superframes taking 6.12 seconds. These consist of 51 traffic
multiframes or 26 control multiframes. As the traffic multiframes are 26 bursts long and the control
multiframes are 51 bursts long, the different number of traffic and control multiframes within the
superframe, brings them back into line again taking exactly the same interval.

GSM Hyperframe

Above this 2048 superframes (i.e. 2 to the power 11) are grouped to form one hyperframe which
repeats every 3 hours 28 minutes 53.76 seconds. It is the largest time interval within the GSM frame
structure.

Within the GSM hyperframe there is a counter and every time slot has a unique sequential number
comprising the frame number and time slot number. This is used to maintain synchronisation of the
different scheduled operations with the GSM frame structure. These include functions such as:

Frequency hopping: Frequency hopping is a feature that is optional within the GSM system. It can
help reduce interference and fading issues, but for it to work, the transmitter and receiver must be
synchronised so they hop to the same frequencies at the same time.

Encryption: The encryption process is synchronised over the GSM hyperframe period where a
counter is used and the encryption process will repeat with each hyperframe. However, it is unlikely
that the cellphone conversation will be over 3 hours and accordingly it is unlikely that security will
be compromised as a result.
The slots and frames are handled in a very logical manner to enable the system to expect and accept
the data that needs to be sent. Organising it in this logical fashion enables it to be handled in the
most efficient manner.

Q.6 a. Explain High Speed Packet Access(HSPA)


Ans:
HSPA (high speed packet access) is a third-generation (3G) mobile broadband communications
technology.

The term HSPA actually refers to two specific protocols used in tandem, high speed downlink
packet access (HSDPA) and high speed uplink packet access (HSUPA). HSPA networks offer a
maximum of 14.4 megabytes per second (MBps) of throughput per cell.

An improved version of high speed packet access technology, known as Evolved HSPA, offers 42
Mbps of throughput per cell. By using dual cell deployment and multiple input, multiple outputar-
chitecture, HSPA+ networks can achieve maximum throughput of 168 Mbps overall.

The International Telecommunication Unionrecognized HSPA+ as a fourth-generation (4G) tech-


nology in December 2010. HSPA+, however, offers significantly slower speeds than the predomi-
nant 4G standard, long term evolution (LTE).

True mobility is almost here, free from the constraints of meager 3G networks! HSPA+ will be de-
buting here in Winnipeg on March 31st. Rogers and MTS Mobility are going to share towers, so all
customers benefit from the joint effort put forth. The new network will have speeds up to 21Mbps
down.

HSPA increases peak data rate and capacities in many ways.


 Shared-channel transmission, which results in efficient use of available code and power re-
sources in WCDMA
 A shorter Transmission Time Interval (TTI), which reduces round-trip time and improves
the tracking of fast channel variations
 Link adaptation, which maximizes channel usage and enables the base station to operate at
close to maximum cell power
 Fast scheduling, which prioritizes users with the most favorable channel conditions
 Fast retransmission and soft-combining, which further increase capacity
 16QAM and 64QAM (Quadrature Amplitude Modulation), which yields higher bit-rates
 MIMO, which exploits antenna diversity to provide further capacity benefits.

b. Explain different types of GSM Channels.


Ans:
There are two main types of GSM channels viz. physical channel and logical channel. Physical
channel is specified by specific time slot/carrier frequency. Logical channel run over physical chan-
nel i.e. logical channels are time multiplexed on physical channels; each physical channel(time slot
at one particular ARFCN) will have either 26 Frame MF(Multi-frame) or 51 Frame MF structure
describe here. logical channels are classified into traffic channel and control channel. Traffic chan-
nel carry user data. Control channels are interspersed with traffic channels in well specified ways.
Logical vs physical gsm channels

This tutorial on GSM covers following:

For example, every 26 TDMA frames a logical channel gets bandwidth in a physical channel.
Traffic channel are mainly of two types half rate and full rate traffic channels. There are various
control channels such as BCCH (Broadcast control channel), SCH (synchronous channel), FCCH
( Frequency control channel), DCCH(Dedicated control channel).

All these gsm channels help maintain GSM network and also helps GSM mobile phone connect to
GSM network and maintain the connection and help tear down the connection. Figure below
mention all the channels used in GSM.
Fig. GSM Channels

Q.7 a. Explain 5GAA (Autonomous Automation).


Ans:
The 5G Automotive Association (5GAA) encourages the automotive, technology, and telecommu-
nications industries and the European Commission to be ambitious when evaluating technologies
for connected, autonomous vehicles. 5GAA is confident that cellular communication technology
(C-V2X) has the most benefits when applied to connected, self-driving cars. It has provided its
views to the Commission in a workshop and a letter with recommendations. Since its launch one
year ago, over 50 industry leaders from the automotive, technology and telecommunications indus-
tries have teamed up in 5GAA to accelerate C-V2X technology development and its evolution to
5G-V2X for enhanced safety, automated driving and connected mobility.
The 5G Automotive Association (5GAA) is a global cross-industry organisation of companies from
the automotive, technology and telecommunications industries (ICT), working together to develop
end-to-end solutions for future mobility and transportation services. Created in 2016, the Associa-
tion is comprised of over 50 members which mission
is to develop, test and promote communications solutions, initiate their standardization and acceler-
ate their commercial availability and global market penetration to address societal need.

b. Difference Between W-CDMA and CDMA2000


Ans:
Q.8 a. Draw & Explain GPRS Architecture?
Ans:
GPRS architecture works on the same procedure like GSM network, but, has additional entities
that allow packet data transmission. This data network overlaps a second-generation GSM network
providing packet data transport at the rates from 9.6 to 171 kbps. Along with the packet data trans-
port the GSM network accommodates multiple users to share the same air interface resources con-
currently.

Following is the GPRS Architecture diagram:

GPRS attempts to reuse the existing GSM network elements as much as possible, but to effectively
build a packet-based mobile cellular network, some new network elements, interfaces, and proto-
cols for handling packet traffic are required.

Therefore, GPRS requires modifications to numerous GSM network elements as summarized be-
low:
GPRS Mobile Stations
New Mobile Stations (MS) are required to use GPRS services because existing GSM phones do
not handle the enhanced air interface or packet data. A variety of MS can exist, including a high-
speed version of current phones to support high-speed data access, a new PDA device with an
embedded GSM phone, and PC cards for laptop computers. These mobile stations are backward
compatible for making voice calls using GSM.

GPRS Base Station Subsystem


Each BSC requires the installation of one or more Packet Control Units (PCUs) and a software
upgrade. The PCU provides a physical and logical data interface to the Base Station Subsystem
(BSS) for packet data traffic. The BTS can also require a software upgrade but typically does not
require hardware enhancements.

When either voice or data traffic is originated at the subscriber mobile, it is transported over the air
interface to the BTS, and from the BTS to the BSC in the same way as a standard GSM call.
However, at the output of the BSC, the traffic is separated; voice is sent to the Mobile Switching
Center (MSC) per standard GSM, and data is sent to a new device called the SGSN via the PCU
over a Frame Relay interface.
GPRS Support Nodes
Following two new components, called Gateway GPRS Support Nodes (GSNs) and, Serving
GPRS Support Node (SGSN) are added:

Gateway GPRS Support Node (GGSN)


The Gateway GPRS Support Node acts as an interface and a router to external networks. It
contains routing information for GPRS mobiles, which is used to tunnel packets through the IP
based internal backbone to the correct Serving GPRS Support Node. The GGSN also collects
charging information connected to the use of the external data networks and can act as a packet
filter for incoming traffic.

Serving GPRS Support Node (SGSN)


The Serving GPRS Support Node is responsible for authentication of GPRS mobiles, registration
of mobiles in the network, mobility management, and collecting information on charging for the
use of the air interface.

Internal Backbone
The internal backbone is an IP based network used to carry packets between different GSNs.
Tunnelling is used between SGSNs and GGSNs, so the internal backbone does not need any
information about domains outside the GPRS network. Signalling from a GSN to a MSC, HLR or
EIR is done using SS7.

Routing Area
GPRS introduces the concept of a Routing Area. This concept is similar to Location Area in GSM,
except that it generally contains fewer cells. Because routing areas are smaller than location areas,
less radio resources are used While broadcasting a page message.

b. Explain Incoming and Outgoing Call setup.


Ans:
Q.9 a. Explain 3G Architecture with diagram?
Ans:
The infrastructure of 2G and 3G cellular networks are similar. They comprise an air interface
between the user's mobile device and the base station and two core networks; one for circuit-
switched voice and another for packet-switched data. In the subsequent 4G/LTE architecture, voice
and data are both based on IP packets.

The Air Interface


The GERAN (GSM EDGE Radio Access Network) is the 2G air interface, and the UTRAN
(Universal Terrestrial Radio Access Network) is the 3G interface. The air interface comprises the
base stations (cell towers) and controlling equipment. The 2G base station is a Basic Transceiver
Station (BTS) controlled by the Base Station Controller (BSC). The 3G base station is a Node B
controlled by the Radio Network Controller (RNC).

Circuit Side
For the circuit side, the BSC or RNC connects to the Mobile Switching Center (MSC), which sets
up and tears down the calls, handles text messages (SMS) and tracks users as they move from cell
to cell. When a user arrives within an MSC's jurisdiction, subscriber information is sent from the
Home Location Register (HLR) database to the Visitor Location Register (VLR) within the MSC.
The Gateway MSC (GMSC) connects the MSC to the external circuit-switched networks.

Packet Side
The counterpart to the MSC on the packet side is the Serving GPRS Support Node (SGSN), which
manages the packet connection for the user. The Gateway GPRS Support Node (GGSN) provides
the connection to the external packet networks. The GGSN also receives subscriber information
from the HLR. Contrast with LTE architecture. See UMTS and cellular generations.
2G and 3G Equipment
Voice is handled by the circuit-switched network, and data are handled by the packet-switched side.

b. Short Note on
1) Virtual Reality 2) Augmented Reality
Ans:
1) Virtual Reality
Virtual Reality (VR) is an immersive computer system that mimics the world we see around us.
It can also be used to create imaginary worlds, or in other words it can be used to create immer-
sive games. VR isn’t a new idea, in fact it was first described in the 1930s, and the first VR sys -
tem was built in the late 1960s. Its boom time came in the 1990s with companies like Sega
and Nintendo started developing consumer level VR gaming products. However after a boom,
there is often a bust. And that is what happened to VR. Sega’s product was never release, and
Nintendo’s Virtual Boy was a commercial failure.

Since then very little has happened at a consumer level. The reasons for VRs failures in the
1990s were not only to do with computing power. Think back to the size and design of laptops
and mobile phones in that era. To make VR headsets truly useful the technology in terms of
miniaturization, displays, materials and computing power needed to improve.
After almost 20 years VR is now making a come back. In 2012 Palmer Luckey launched a Kick-
starter campaign for an immersive virtual reality headset for video games. The Oculus
Rift project aimed to raise $250,000, but actually raise $2.4 million.

In late 2013 John Carmack, famous for his 3D game series like Doom and Quake, joined Ocu-
lus. The Oculus Rift is designed to be connected and used with a PC, however Carmack helped
Oculus develop a mobile version in collaboration with Samsung.

The Samsung Gear VR uses a smartphone which is clipped into a headset to create a VR plat-
form. It is an untethered solution which means there are no wires connecting it to a PC or other
computing device. The smartphone’s GPU is used to render the virtual world and the phone’s
display is split in half for the images needed by the left and right eyes. The headset includes the
head-tracking module from the Oculus Rift.

Android
As we can see with the difference between the Oculus Rift and the Gear VR, today’s Virtual Reality
market is split into two segments: tethered and untethered. The advantage of the tethered
approach is that the processing power and the electrical power comes from a PC or console. These
machines have high performance CPUs and GPUs, and don’t need to worry about battery life.
However the disadvantage is that they are generally fixed to one room in your house. The advantage
of untethered VR is that it is truly portable. Wherever you go, your VR headset can go with you. It
also means it has a greater social impact. Although using a VR headset could be considered as anti-
social if used in public, there is the aspect of sharing the VR experience within a group of friends.
For example, the “WOW” factor when the headset is passed from one person to the other.

2) Augmented Reality

Augmented reality is the technology that expands our physical world, adding layers of digital infor-
mation onto it. Unlike Virtual Reality (VR), AR does not create the whole artificial environments to
replace real with a virtual one. AR appears in direct view of an existing environment and
adds sounds, videos, graphics to it.

A view of the physical real-world environment with superimposed computer-generated images, thus
changing the perception of reality, is the AR.

The term itself was coined back in 1990, and one of the first commercial uses were in television and
military. With the rise of the Internet and smartphones, AR rolled out its second wave and nowa-
days is mostly related to the interactive concept. 3D models are directly projected onto physical
things or fused together in real-time, various augmented reality apps impact our habits, social life,
and the entertainment industry.

AR apps typically connect digital animation to a special ‘marker’, or with the help of GPS in
phones pinpoint the location. Augmentation is happening in real time and within the context of the
environment, for example, overlaying scores to a live feed sport events.

There are 4 types of augmented reality today:

 markerless AR

 marker-based AR

 projection-based AR

 superimposition-based AR

How does Augmented Reality work

What is Augmented Reality for many of us implies a technical side, i.e. how does AR work? For
AR a certain range of data (images, animations, videos, 3D models) may be used and people will
see the result in both natural and synthetic light. Also, users are aware of being in the real world
which is advanced by computer vision, unlike in VR.

AR can be displayed on various devices: screens, glasses, handheld devices, mobile phones, head-
mounted displays. It involves technologies like S.L.A.M. (simultaneous localization and
mapping), depth tracking (briefly, a sensor data calculating the distance to the objects), and the
following components:

 Cameras and sensors. Collecting data about user’s interactions and sending it for processing.
Cameras on devices are scanning the surroundings and with this info, a device locates physical
objects and generates 3D models. It may be special duty cameras, like in Microsoft Hololens, or
common smartphone cameras to take pictures/videos.

 Processing. AR devices eventually should act like little computers, something modern smart-
phones already do. In the same manner, they require a CPU, a GPU, flash memory, RAM, Blue-
tooth/WiFi, a GPS, etc. to be able to measure speed, angle, direction, orientation in space, and
so on.

 Projection. This refers to a miniature projector on AR headsets, which takes data from sensors
and projects digital content (result of processing) onto a surface to view. In fact, the use of pro-
jections in AR has not been fully invented yet to use it in commercial products or services.

 Reflection. Some AR devices have mirrors to assist human eyes to view virtual images. Some
have an “array of small curved mirrors” and some have a double-sided mirror to reflect light to
a camera and to a user’s eye. The goal of such reflection paths is to perform a proper image
alignment.

Types of Augmented Reality


1. Marker-based AR. Some also call it to image recognition, as it requires a special visual ob-
ject and a camera to scan it. It may be anything, from a printed QR code to special signs.
The AR device also calculates the position and orientation of a marker to position the con-
tent, in some cases. Thus, a marker initiates digital animations for users to view, and so im-
ages in a magazine may turn into 3D models.

Projection-based AR. Projecting synthetic light to physical surfaces, and in some cases allows to
interact with it. These are the holograms we have all seen in sci-fi movies like Star Wars. It detects
user interaction with a projection by its alterations.

Superimposition-based AR. Replaces the original view with an augmented, fully or partially. Ob-
ject recognition plays a key role, without it the whole concept is simply impossible. We’ve all seen
the example of superimposed augmented reality in IKEA Catalog app, that allows users to place vir-
tual items of their furniture catalog in their rooms.
Q.10 a. Explain 4G Architecture with diagram?
Ans:
4G Architecture
1. 4G stands for fourth generation cellular system.
2. 4G is evaluation of 3G to meet the forecasted rising demand.
3. It is an integration of various technologies including GSM,CDMA,GPRS,IMT-2000 ,Wire-
less LAN.
4. Data rate in 4G system will range from 20 to 100 Mbps.

Features:
1. Fully IP based Mobile System.
2. It supports interactive multimedia, voice, streaming video, internet and other broadband ser-
vice.
3. It has better spectrum efficiency.
4. It supports Ad-hoc and multi hop network.

4 G Architecture
1. Figure shows Generic Mobile Communication architecture.
2. 4 G network is an integration of all heterogeneous wireless access networks such as Ad-hoc,
cellular, hotspot and satellite radio component.
3. Technologies used in 4 G are smart antennas for multiple input and multiple output
(MIMO), IPv6, VoIP, OFDM and Software defined radio (SDR) System.

Smart Antennas:
1. Smart Antennas are Transmitting and receiving antennas.
2. It does not require increase power and additional frequency.

IPV6 Technology:
1. 4G uses IPV6 Technology in order to support a large number of wireless enable devices.
2. It enables a number of application with better multicast, security and route optimization ca-
pabilities.

VoIP:
1. It stands for Voice over IP.
2. It allows only packet to be transferred eliminating complexity of 2 protocols over the same
circuit.

OFDM:
1. OFDM stands for Orthogonal Frequency Division Multiplexing.
2. It is currently used as WiMax and WiFi.

SDR:
1. SDR stands for Software Defined Radio.
2. It is the form of open wireless architecture.

Advantages:
1. It provides better spectral efficiency.
2. It has high speed, high capacity and low cost per bit.

Disadvantage:
1. Battery usage is more.
2. Hard to implement.

MME- Mobility Management Entity


It is used for Paging ,Authentication, Handover and Selection of Serving Gateway
SGW- Serving gateway
It is used to Routing and Forwarding user data packet.
PDN-GW Packet Data Network Gateway
It is used for user equipment (UE) IP allocation
HSS -Home Subscriber Server
It is a user Database used for service subscriber, user identification and addressing
PCRF -Policy and Charging Rule Function
It provide quality of service and charging
eNode B-evolved Node B
It is used as radio resources management and radio bearer control

b. Short Note on
1) HSPA+ 2)HSUPA 3) HSDPA
Ans:
1) HSPA+
HSPA stands for High speed packet access, the standard mainly designed to support high speed
data rate in the uplink and downlink. HSPA falls under categories viz. HSDPA, HSUPA and
HSPA+. All the HSPA standards follow different UMTS releases. HSDPA was UMTS R5 release
and supports about 14Mbps peak data rates. HSUPA was UMTS R6 release and supports about 5.76
Mbps uplink data rates. HSPA+ follows R7-R9 UMTS releases. HSPA supports spectral efficiency
of about 2.9bits/sec/Hz.
HSPA+ uses the same 5MHz band of WCDMA spectrum and hence great ease for operators. In the
same 5MHz HSPA+ tries to increase data rate by using MIMO/higher order modulation techniques.
Hence it roughly achieves 42.2 Mbps data rate and spectral efficiency of about 8.4 bits/sec/Hz.

The biggest achievement here with HSPA+ is that latency is reduced using a concept called CPC
(Continuous Packet Connectivity). HSPA+ supports DC-HSDPA which supports dual cell or dual
carrier concept where carrier aggregation of two nearby adjacent bands of 5MHz each are used for
the same area of the cell to increase the performance.

2)HSUPA
High-Speed Uplink Packet Access (HSUPA) is a 3G mobile telephony protocol in the HSPA fam-
ily. This technology was the second major step in the UMTS evolution process. It was specified and
standardized in 3GPP Release 6 to improve the uplink data rate to 5.76 Mbit/s,[6]extending the ca-
pacity, and reducing latency. Together with additional improvements, this creates opportunities for
a number of new applications including VoIP, uploading pictures, and sending large e-mail mes-
sages.
HSUPA has been superseded by newer technologies further advancing transfer rates. LTE provides
up to 300 Mbit/s for downlink and 75 Mbit/s for uplink. Its evolution LTE Advanced supports max-
imum downlink rates of over 1 Gbit/s.
Technology
Enhanced Uplink adds a new transport channel to WCDMA, called the Enhanced Dedicated
Channel (E-DCH). It also features several improvements similar to those of HSDPA, including
multi-code transmission, shorter transmission time interval enabling faster link adaptation, fast
scheduling, and fast Hybrid Automatic Repeat Request (HARQ) with incremental redundancy
making retransmissions more effective. Similarly to HSDPA, HSUPA uses a "packet scheduler",
but it operates on a "request-grant" principle where the user equipment (UE) requests permission to
send data and the scheduler decides when and how many UEs will be allowed to do so. A request
for transmission contains data about the state of the transmission buffer and the queue at the UE and
its available power margin. However, unlike HSDPA, uplink transmissions are not orthogonal to
each other.
In addition to this "scheduled" mode of transmission, the standards allows a self-initiated
transmission mode from the UEs, denoted "non-scheduled". The non-scheduled mode can, for
example, be used for VoIP services for which even the reduced TTI and the Node B based
scheduler will be unable to provide the very short delay time and constant bandwidth required.
Each MAC-d flow (i.e., QoS flow) is configured to use either scheduled or non-scheduled modes.
The UE adjusts the data rate for scheduled and non-scheduled flows independently. The maximum
data rate of each non-scheduled flow is configured at call setup, and typically not changed
frequently. The power used by the scheduled flows is controlled dynamically by the Node B
through absolute grant (consisting of an actual value) and relative grant (consisting of a single
up/down bit) messages.
At the physical layer, HSUPA introduces new channels E-AGCH (Absolute Grant Channel), E-
RGCH (Relative Grant Channel), F-DPCH (Fractional-DPCH), E-HICH (E-DCH Hybrid ARQ
Indicator Channel), E-DPCCH (E-DCH Dedicated Physical Control Channel), and E-DPDCH (E-
DCH Dedicated Physical Data Channel).
E-DPDCH is used to carry the E-DCH Transport Channel; and E-DPCCH is used to carry the
control information associated with the E-DCH.

3) HSDPA

As mentioned HSDPA is mainly designed for high speed data rates in the downlink mainly for
internet based applications and hence the name High Speed Downlink Packet Access. As mentioned
in UMTS tutorial, UMTS architecture composed of three main parts UE (User Equipment), RAN
(Radio Access Network) and Core Network. In HSDPA changes are incorporated on air interface
side and hence UE and RAN have been modified to take care of higher data rate requirements
compare to its predecessor i.e. UMTS R99. No changes have been done on core network side.

Silent features of HSDPA

-It supports asymmetric data transfer mode


-Bandwidth is about 5MHz same as WCDMA
-Supports both voice and data applications
-Turbo coding is used as FEC technique
-Adaptive coding and modulations based on channel conditions hence achieve better data rate under
good channel conditions due to higher modulation-code rate assignment.
-utilizing maximum power for transmission
-minimizing use of redundant packets (under better channel conditions) in data transfer
-HARQ support for new channels introduced at PHY layer
-12 categories are available for UE; Category 10 is most commonly which supports 14 Mbps.
HSDPA Channels

In HSDPA HS-DSCH (High-Speed Downlink Shared Channel) is introduced as transport channel.


This is supported by three physical layer channels viz. HS-PDSCH (High Speed-Physical Downlink
Shared Channel), HS-SCCH (High Speed-Shared Control Channel), HS-DPCCH (High Speed-
Dedicated Physical Control Channel). PDSCH carries user information/data. SCCH informs UE
that data will be carried on DSCH. DPCCH carries ACK/NACK and CQI.
University Insem Question Paper
Oct-2018 InSem Question Paper Solution
Q 1) a) What is Fading in the Mobile Environment? Explain 2 Major Types of Fading.
Ans:

Perhaps the most challenging technical problem facing communications systems engineers is fading
in a mobile environment. The term fading refers to the time variation of received signal power
caused by changes in the transmission medium or path(s). In a fixed environment, fading is affected
by changes in atmospheric conditions, such as rainfall. But in a mobile environment, where one of
the two antennas is moving relative to the other, the relative location of various obstacles changes
over time, creating complex transmission effects.

For example, suppose a ground-reflected wave near the mobile unit is received. Because the
ground-reflected wave has a 180° phase shift after reflection, the ground wave and the line-of-sight
(LOS) wave may tend to cancel, resulting in high signal loss.2 Further, because the mobile antenna
is lower than most human-made structures in the area, multipath interference occurs. These
reflected waves may interfere constructively or destructively at the receiver.

Diffraction occurs at the edge of an impenetrable body that is large compared to the wavelength of
the radio wave. When a radio wave encounters such an edge, waves propagate in different
directions with the edge as the source. Thus, signals can be received even when there is no
unobstructed LOS from the transmitter. If the size of an obstacle is on the order of the wavelength
of the signal or less, scattering occurs. An incoming signal is scattered into several weaker outgoing
signals. At typical cellular microwave frequencies, there are numerous objects, such as lamp posts
and traffic signs, that can cause scattering. Thus, scattering effects are difficult to predict. These
three propagation effects influence system performance in various ways depending on local
conditions and as the mobile unit moves within a cell. If a mobile unit has a clear LOS to the
transmitter, then diffraction and scattering are generally minor effects, although reflection may have
a significant impact. If there is no clear LOS, such as in an urban area at street level, then diffraction
and scattering are the primary means of signal reception.

The Effects of Multipath Propagation

As just noted, one unwanted effect of multipath propagation is that multiple copies of a signal may
arrive at different phases. If these phases add destructively, the signal level relative to noise
declines, making signal detection at the receiver more difficult. A second phenomenon, of particular
importance for digital transmission, is intersymbol interference (ISI). Consider that we are sending
a narrow pulse at a given frequency across a link between a fixed antenna and a mobile unit. Figure
14.8 shows what the channel may deliver to the receiver if the impulse is sent at two different times.
The upper line shows two pulses at the time of transmission. The lower line shows the resulting
pulses at the receiver. In each case the first received pulse is the desired LOS signal. The magnitude
of that pulse may change because of changes in atmospheric attenuation. Further, as the mobile unit
moves farther away from the fixed antenna, the amount of LOS attenuation increases. But in
addition to this primary pulse, there may be multiple secondary pulses due to reflection, diffraction,
and scattering. Now suppose that this pulse encodes one or more bits of data. In that case, one or
more delayed copies of a pulse may arrive at the same time as the primary pulse for a subsequent
bit. These delayed pulses act as a form of noise to the subsequent primary pulse, making recovery
of the bit information more difficult. As the mobile antenna moves, the location of various obstacles
changes; hence the number, magnitude, and timing of the secondary pulses change. This makes it
difficult to design signal processing techniques that will filter out multipath effects so that the
intended signal is recovered with fidelity.

Types of Fading

Fading effects in a mobile environment can be classified as either fast or slow. At a frequency of
900 MHz, which is typical for mobile cellular applications, a wavelength is 0.33 m. Changes of
amplitude can be as much as 20 or 30 dB over a short distance. This type of rapidly changing fading
phenomenon, known as fast fading, affects not only mobile phones in automobiles, but even a
mobile phone user walking down an urban street. As the mobile user covers distances well in excess
of a wavelength, the urban environment changes, as the user passes buildings of different heights,
vacant lots, intersections, and so forth. Over these longer distances, there is a change in the average
received power level about which the rapid fluctuations occur. This is referred to as slow fading.
Fading effects can also be classified as flat or selective. Flat fading, or nonselective fading, is that
type of fading in which all frequency components of the received signal fluctuate in the same
proportions simultaneously. Selective fading affects unequally the different spectral components of
a radio signal. The term selective fading is usually significant only relative to the bandwidth of the
overall communications channel. If attenuation occurs over a portion of the bandwidth of the signal,
the fading is considered to be selective; nonselective fading implies that the signal bandwidth of
interest is narrower than, and completely covered by, the spectrum affected by the fading.

Error Compensation Mechanisms

The efforts to compensate for the errors and distortions introduced by multipath fading fall into
three general categories: forward error correction, adaptive equalization, and diversity techniques.
In the typical mobile wireless environment, techniques from all three categories are combined to
combat the error rates encountered. Forward error correction is applicable in digital transmission
applications: those in which the transmitted signal carries digital data or digitized voice or video
data. Typically in mobile wireless applications, the ratio of total bits sent to data bits sent is between
2 and 3. This may seem an extravagant amount of overhead, in that the capacity of the system is cut
to one-half or one-third of its potential, but the mobile wireless environment is so difficult that such
levels of redundancy are necessary. Chapter 6 discusses forward error correction. Adaptive
equalization can be applied to transmissions that carry analog information (e.g., analog voice or
video) or digital information (e.g., digital data, digitized voice or video) and is used to combat
intersymbol interference. The process of equalization involves some method of gathering the
dispersed symbol energy back together into its original time interval. Equalization is a broad topic;
techniques include the use of so-called lumped analog circuits as well as sophisticated digital signal
processing algorithms. Diversity is based on the fact that individual channels experience
independent fading events. We can therefore compensate for error effects by providing multiple
logical channels in some sense between transmitter and receiver and sending part of the signal over
each channel. This technique does not eliminate errors but it does reduce the error rate, since we
have spread the transmission out to avoid being subjected to the highest error rate that might occur.
The other techniques (equalization, forward error correction) can then cope with the reduced error
rate. Some diversity techniques involve the physical transmission path and are referred to as space
diversity. For example, multiple nearby antennas may be used to receive the message, with the
signals combined in some fashion to reconstruct the most likely transmitted signal. Another
example is the use of collocated multiple directional antennas, each oriented to a different reception
angle with the incoming signals again combined to reconstitute the transmitted signal. More
commonly, the term diversity refers to frequency diversity or time diversity techniques. With
frequency diversity, the signal is spread out over a larger frequency bandwidth or carried on
multiple frequency carriers.
b) Compare Cell Phone Generation- 1G To 5G
Ans:
1G: Voice Only
Cell phones began with 1G in the 1980s. 1G is an analog technology and the phones generally had
poor battery life and voice quality was large without much security, and would sometimes
experience dropped calls. The max speed of 1G is 2.4 Kbps

2G: SMS & MMS


2G signifies second-generation wireless digital technology. Fully digital 2G networks replaced
analog 1G technology, which originated in the 1980s. 2G networks saw their first commercial light
of day on the GSM standard. GSM, which made international roaming possible, is an acronym for
global system for mobile communications.
2G technology on the GSM standard was first used in commercial practice in 1991 in Finland.
Second generation cell phone technology is either time division multiple access (TDMA) or code
division multiple access (CDMA).
Download and upload speed in 2G technology was 236 Kbps. 2G preceded 2.5G, which bridged 2G
technology to 3G.
Benefits of 2G Technology
When 2G was introduced to cell phones, it was praised for several reasons. Its digital signal used
less power than analog signals so mobile batteries lasted longer. Environmentally friendly 2G
technology made possible the introduction of SMS the short and incredibly popular text message
along with multimedia messages (MMS) and picture messages. 2G's digital encryption added
privacy to data and voice calls.
2G Disadvantage
2G cell phones required powerful digital signals to work, so they were unlikely to work in rural or
less populated areas.

3G: More Data! Video Calling & Mobile Internet

3G service, also known as third-generation service, is high-speed access to data and voice services,
made possible by the use of a 3G network. A 3G network is a high-speed mobile broadband
network, offering data speeds of at least 144 kilobits per second (Kbps).

For comparison, a dial-up Internet connection on a computer typically offers speeds of about 56
Kbps. If you've ever sat and waited for a Web page to download over a dial-up connection, you
know how slow that is.

3G networks can offer speeds of 3.1 megabits per second (Mbps) or more; that's on par with speeds
offered by cable modems. In day-to-day use, the actual speed of the 3G network will vary. Factors
such as signal strength, your location, and network traffic all come into play.

4G The Current Standard

4G wireless is the term used to describe the fourth-generation of wireless cellular service. 4G is a
big step up from 3G and is up to 10 times faster than 3G service. Sprint was the first carrier to offer
4G speeds in the U.S. beginning in 2009. Now all the carriers offer 4G service in most areas of the
country, although some rural areas still have only the slower 3G coverage.
Why 4G Speed Matters

As smart phones and tablets developed the capability to stream video and music, the need for speed
became critically important. Historically, cellular speeds were much slower than those offered by
high-speed broadband connections to computers. 4G speed compares favorably with some
broadband options and is particularly useful in areas without broadband connections.

4G Technology

While all 4G service is called 4G or 4G LTE, the underlying technology is not the same with every
carrier. Some use WiMax technology for their 4G network, while Verizon Wireless uses a
technology called Long Term Evolution, or LTE.

Sprint says its 4G WiMax network offers download speeds that are ten times faster than a 3G
connection, with speeds that top out at 10 megabits per second.

5G: Coming Soon


5G is a not-yet-implemented wireless technology that's intended to improve on 4G.
5G promises significantly faster data rates, higher connection density, much lower latency, among
other improvements.

Q2) a) Explain Frequency-hopping spread spectrum(FHSS) with digram.

Ans:

It is a method of transmitting radio signals by rapidly switching a carrier among many frequency
channels, using a pseudorandom sequence known to both transmitter and receiver. It is used as
a multiple access method in the code division multiple access (CDMA) scheme frequency-hopping
code division multiple access (FH-CDMA).

Each available frequency band is divided into sub-frequencies. Signals rapidly change ("hop")
among these in a predetermined order. Interference at a specific frequency will only affect the
signal during that short interval. FHSS can, however, because interference with adjacent direct-
sequence spread spectrum (DSSS) systems.
Fig. FHSS
In FHSS, the transmitter hops between available narrowband frequencies within a specified
broad channel in a pseudo-random sequence known to both sender and receiver. A short burst of
data is transmitted on the current narrowband channel, then transmitter and receiver tune to the
next frequency in the sequence for the next burst of data. In most systems, the transmitter will hop
to a new frequency more than twice per second. Because no channel is used for long, and the odds
of any other transmitter being on the same channel at the same time are low, FHSS is often used as
a method to allow multiple transmitter and receiver pairs to operate in the same space on the same
broad channel at the same time.

b) Explain Modulation And Demodulation


Ans:

Modulation

The frequency of a Radio frequency channel can be explained best as the frequency of a carrier
wave. A carrier wave is purely made up of constant frequency, bit similar to sine wave. It doesn’t
carry much information that we can relate to data or speech. The concepts of Amplitude
Modulation.
To involve data information or speech information, another wave has to be imposed known as input
signal above the carrier wave. This process of imposing an input signal on a carrier wave is known
as modulation. Put differently; modulation modifies the shape of a carrier wave to encode the data
information that we intended in carrying. Modulation is similar to hiding a code in the carrier wave.

Types of Modulation

There are three types of modulation namely:

 Frequency modulation
 Amplitude modulation
 Phase modulation.

Amplitude modulation

A kind of modulation where the amplitude of the carrier signal is changed in proportion to the
message signal while the phase and frequency are kept constant.

Phase modulation

A kind of modulation where the phase of the carrier signal is altered according to the low frequency
of the message signal is called as phase modulation.
Frequency modulation

A kind of modulation where the frequency of the carrier signal is altered in proportion to the
message signal while the phase and amplitude are kept constant.

Modulation mechanisms can also be digital or analog. An analog modulation scheme has an input
wave that changes like a sine wave continuously, but it’s a bit more complicated when it comes to
digital. The voice sample is considered at some rate and then compressed into a bit (stream of zeros
and ones). This, in turn, is made into a specific type of wave that is superimposed on the carrier.

Demodulation

Demodulation is defined as extracting the original information carrying signal from a modulated
carrier wave. A demodulator is an electronic circuit that is mainly used to recover the information
content from the modulated carrier wave. There are different types of modulation and so are
demodulators. The output signal via demodulator may describe the sound, images or binary data.

Difference between Modulation and Demodulation


 Modulation is the process of influencing data information on the carrier, while demodulation
is the recovery of original information at the distant end from the carrier.
 Modem is the equipment that performs both modulation and demodulation.
 Both processes aim to achieve transfer information with the minimum distortion, minimum
loss and efficient utilization of spectrum.

Even though there are different methods for modulation and demodulation processes, each has its
own advantages and disadvantages. For example, AM is used in shortwave and radio wave
broadcasting; FM is mostly used in high-frequency radio broadcasting, and pulse modulation is
known for digital signal modulation

Q3) a) Explain Personal communication system (PCS)? Explain PCS Architecture.


Ans:
A personal communications service (PCS) is a type of wireless mobile service with advanced
coverage and that delivers services at a more personal level. It generally refers to the modern
mobile communication that boosts the capabilities of conventional cellular networks and fixed-line
telephony networks as well.
PCS is also known as digital cellular.
Fig. PCS Architecture

A PCS works similarly to a cellular network in basic operations, but requires more service provider
infrastructure to cover a wider geographical area. A PCS generally includes the following:
 Wireless communication (data, voice and video)
 Mobile PBX
 Paging and texting
 Wireless radio
 Personal communication networks
 Satellite communication systems, etc.

PCS has three broad categories: narrowband, broadband and unlicensed. TDMA, CDMA and GSM,
and 2G, 3G and 4G are some of the common technologies that are used to deliver a PCS.

b) What is cell splitting? Explain with Scenario.


Ans:

Cell splitting is the process of subdividing a congested cell into smaller cells such that each
smaller cell has its own base station with Reduced antenna height and Reduced transmitter power. It
increases the capacity of a cellular system since number of times channels are reused increases.

Cell Sectorization. One way to increase to subscriber capacity of a cellular network is replace the
omni-directional antenna at each base station by three (or six) sector antennas of 120 (or 60)
degrees opening.

Both mitosis and meiosis, as already mentioned (two different types) could be used. More correctly,
I'd say Cytokines is, the process after mitosis, where the cell is actually split.

The concept of Cell Splitting is quite self explanatory by its name itself. Cell splitting means to split
up cells into smaller cells. The process of cell splitting is used to expand the capacity (number of
channels) of a mobile communication system. As a network grows, a quite large number of mobile
users in an area come into picture. Consider the following scenario.

There are 100 people in a specific area. All of them owns a mobile phone (MS) and are quite
comfortable to communicate with each other. So, a provision for all of them to mutually
communicate must be made. As there are only 100 users, a single base station (BS) is built in the
middle of the area and all these users’ MS are connected to it. All these 100 users now come under
the coverage area of a single base station. This coverage area is called a cell. This is shown in Fig .

Fig 1. A single BS for 100 MS users

But now, as time passed by, the number of mobile users in the same area increased from 100 to 700.
Now if the same BS has to connect to these 700 users’ MS, obviously the BS will be overloaded. A
single BS, which served for 100 users is forced to serve for 700 users, which is impractical. To
reduce the load of this BS, we can use cell splitting. That is, we will divide the above single cell
into 7 separate adjacent cells, each having its own BS. This is shown in Fig .

Fig 2. Single cell split up into 7 cells

Now, let us look into the big picture. Until now, we have discussed about cell splitting in a
small area. Now, we use this same concept to deal with large networks. In a large network, it is not
necessary to split up all the cells in all the clusters. Certain BSes can handle the traffic well if their
cells (coverage areas) are split up. Only those cells must be ideal for cell splitting. Fig 2-3 shows
network architecture with a few number of cells split up into smaller cells, without affecting the
other cells in the network.

Fig 3.Cell Splitting.


Q 4) a) Explain Direct sequence spread spectrum with Diagram?
Ans:
In telecommunications, direct-sequence spread spectrum (DSSS) is a spread
spectrum modulation technique used to reduce overall signal interference. The spreading of this
signal makes the resulting wideband channel more noisy, allowing for greater resistance to
unintentional and intentional interference.
A method of achieving the spreading of a given signal is provided by the modulation scheme. With
DSSS, the message signal is used to modulate a bit sequence known as a Pseudo Noise (PN) code;
this PN code consists of a radio pulse that is much shorter in duration (larger bandwidth) than the
original message signal. This modulation of the message signal scrambles and spreads the pieces of
data, and thereby resulting in a bandwidth size nearly identical to that of the PN sequence. [1] In this
context, the duration of the radio pulse for the PN code is referred to as the chip duration. The
smaller this duration, the larger the bandwidth of the resulting DSSS signal; more bandwidth
multiplexed to the message signal results in better resistance against interference.

(DSSS) is a related technique. It spreads a signal across a wide channel, but it does so all at once
instead of in discrete bursts separated by hops. It can achieve higher throughput, but DSSS is more
susceptible to interference and less effective as a spectrum-sharing method.

b) Explain Co-channel interference with diagram?

Ans:
Co-channel interference or CCI is crosstalk from two different radio transmitters using the
same channel. Co-channel interference can be caused by many factors from weather conditions to
administrative and design issues. Co-channel interference may be controlled by various radio
resource management schemes.
How could we reduce co-channel interference:-

Q5 ) a) What is handover/handoff ? Explain with scenario.


Ans:
Although the concept of cellular handover or cellular handoff is relatively straightforward, it is not
an easy process to implement in reality. The cellular network needs to decide when handover or
handoff is necessary, and to which cell. Also when the handover occurs it is necessary to re-route
the call to the relevant base station along with changing the communication between the mobile and
the base station to a new channel. All of this needs to be undertaken without any noticeable
interruption to the call. The process is quite complicated, and in early systems calls were often lost
if the process did not work correctly.

Different cellular standards handle hand over / handoff in slightly different ways. Therefore for the
sake of an explanation the example of the way that GSM handles handover is given.
There are a number of parameters that need to be known to determine whether a handover is
required. The signal strength of the base station with which communication is being made, along
with the signal strengths of the surrounding stations. Additionally the availability of channels also
needs to be known. The mobile is obviously best suited to monitor the strength of the base stations,
but only the cellular network knows the status of channel availability and the network makes the
decision about when the handover is to take place and to which channel of which cell.
Types of handover / handoff
With the advent of CDMA systems where the same channels can be used by several mobiles, and
where it is possible to adjacent cells or cell sectors to use the same frequency channel there are a
number of different types of handover that can be performed:

 Hard handover (hard handoff)


 Soft handover (soft handoff)
Fig:-Types of Handover

Hard handover

The definition of a hard handover or handoff is one where an existing connection must be broken
before the new one is established. One example of hard handover is when frequencies are changed.
As the mobile will normally only be able to transmit on one frequency at a time, the connection
must be broken before it can move to the new channel where the connection is re-established. This
is often termed and inter-frequency hard handover. While this is the most common form of hard
handoff, it is not the only one. It is also possible to have intra-frequency hard handovers where the
frequency channel remains the same.

Although there is generally a short break in transmission, this is normally short enough not to be
noticed by the user.

Soft handover

The new 3G technologies use CDMA where it is possible to have neighboring cells on the same
frequency and this opens the possibility of having a form of handover or handoff where it is not
necessary to break the connection. This is called soft handover or soft handoff, and it is defined as a
handover where a new connection is established before the old one is released. In UMTS most of
the handovers that are performed are intra-frequency soft handovers.

b) What is frequency reuse? Explain with example .


Ans:
the cellular concept, frequencies allocated to the service are re-used in a regular pattern of areas,
called 'cells', each covered by one base station. In mobile-telephone nets these cells are usually
hexagonal. In radio broadcasting, a similar concept has been developed based on rhombic cells. To
ensure that the mutual interference between users remains below a harmful level, adjacent cells use
different frequencies. In fact, a set of C different frequencies {f1, ...,fC} are used for each cluster
of Adjacent cells. Cluster patterns and the corresponding frequencies are re-used in a regular pattern
over the entire service area.
Cellular radio systems rely on an intelligent allocation and reuse of channels throughout a coverage
region. Each cellular base station is allocated a group of radio channels to be used within a small
geographic area called a cell. Base stations in adjacent cells are assigned channel groups which
contain completely different channels than neighboring cells. The base station antennas are
designed to achieve the desired coverage within the particular cell. By limiting the coverage area to
within the boundaries of a cell, the same group of channels may be used to cover different cells that
are separated from one another by distances large enough to keep interference levels within
tolerable limits. The design process of selecting and allocating channel groups for all of the cellular
base stations within a system is called frequency reuse or frequency planning.
Equation 1
S= kN
The N cells which collectively use the complete set of available frequencies is called a cluster. If a
cluster is replicated M times within the system, the total number of duplex channels, C, can be used
as a measure of capacity and is given by
Equation 2
C = MkN = MS
As seen from equation 2, the capacity of a cellular system is directly proportional to the number of
times a cluster is replicated in a fixed service area.
The factor N is called the cluster size and is typically equal to 4, 7, or 12. Also it describe the
number of cells in the cluster.
So there are two ways to increase the system capacity.
1. Increase cluster size N.
2. Increase the number of allocated channels to each cell.
A large cluster size indicates that the ratio between the cell radius and the distance between co-
channel cells is small.
And a small cluster size indicates that co-channel cells are located much closer together.
The value for N is a function of how much interference a mobile or base station can tolerate while
maintaining a sufficient quality of communications. From a design point of view, the smallest
possible value of N is desirable in order to maximize capacity over a given coverage area (i.e., to
maximize C in Equation 2)).
The frequency reuse factor of a cellular system is given by 1/N, since each cell within a cluster is
only assigned 1/N of the total available channels in the system.
Due to the fact that the hexagonal geometry exactly six equidistant neighbors and that the lines
joining the centers of any cell and each of its neighbors are separated by multiples of 60 degrees,
there are only certain cluster sizes and cell layouts which are possible. In order to connect without
gaps between adjacent cells—the geometry of hexagons is such that the number of cells per
cluster, N, can only have values which satisfy Equation.
Q 6) a) What are the different types of Multiplexing? Explain any two.
Ans:
Multiplexing is a technique in which several message signals are combined into a composite signal
for transmission over a common channel.
These signals to be transmitted over the common channel must be kept apart so that they do not
interfere with each other, and hence they can be separated easily at the receiver end.

Basically, multiplexing is of two types such as :


 Frequency division multiplexing (FDM)
 Time division multiplexing (TDM)
Now we will discuss each type of multiplexing in detail.

 Frequency division multiplexing (FDM)


Frequency division multiplexing (FDM) describes schemes to subdivide the frequency dimension
into several non-overlapping frequency bands.

Frequency Division Multiple Access is a method employed to permit several users to transmit
simultaneously on one satellite transponder by assigning a specific frequency within the channel to
each user. Each conversation gets its own, unique, radio channel. The channels are relatively
narrow, usually 30 KHz or less and are defined as either transmit or receive channels. A full duplex
conversation requires a transmit& receive channel pair. FDM is often used for simultaneous access
to the medium by base station and mobile station in cellular networks establishing a duplex channel.
A scheme called frequency division duplexing (FDD) in which the two directions, mobile station to
base station and vice versa are now separated using different frequencies.
 Time division multiplexing (TDM)
A more flexible multiplexing scheme for typical mobile communications is time division
multiplexing (TDM). Compared to FDMA, time division multiple access (TDMA) offers a
much more flexible scheme, which comprises all technologies that allocate certain time slots
for communication. Now synchronization between sender and receiver has to be achieved in
the time domain. Again this can be done by using a fixed pattern similar to FDMA
techniques, i.e., allocating a certain time slot for a channel, or by using a dynamic allocation
scheme.

Listening to different frequencies at the same time is quite difficult, but listening to many
channels separated in time at the same frequency is simple. Fixed schemes do not need
identification, but are not as flexible considering varying bandwidth requirements.

b) What Is Relation Between MS,BS,MSC,HLR,VLR?


Ans:
a) HLR (Home location register):- A home location register (HLR) is a database containing
pertinent data regarding subscribers authorized to use a global system for mobile communications
(GSM) network. Some of the information stored in an HLR includes the international mobile
subscriber identity (IMSI) and the mobile station international subscriber directory number
(MSISDN) of each subscription.
Other information stored in the HLR includes services requested by or rendered to the
corresponding subscriber; the general packet radio service settings of the subscriber, the current
location of the subscriber and call divert settings.
b) VLR (Visitor location register):- A visitor location register (VLR) is a database that contains
information about the subscribers roaming within a mobile switching center’s (MSC) location area.
The primary role of the VLR is to minimize the number of queries that MSCs have to make to the
home location register (HLR), which holds permanent data regarding the cellular network’s
subscribers.
A visitor location register may also perform the following functions:
 Monitor the subscriber’s location within the VLR’s jurisdiction
 Determine whether a subscriber may access a particular service
 Allocate roaming numbers during incoming calls
 Delete the records of inactive subscribers
 Accept information passed to it by the HLR.

c) Mobile Switching Center: A mobile switching center (MSC) is the centerpiece of a network
switching subsystem (NSS). The MSC is mostly associated with communications switching
functions, such as call set-up, release, and routing. However, it also performs a host of other duties,
including routing SMS messages, conference calls, fax, and service billing as well as interfacing
with other networks, such as the public switched telephone network (PSTN).

d) Base station: -A base station is a fixed point of communication for customer cellular phones on
a network. The base station is connected to an antenna (or multiple antennae) that receives and
transmits the signals in the cellular network to customer phones and cellular devices. That
equipment is connected to a mobile switching station that connects cellular calls to the public
switched telephone network (PSTN).

e) Mobile Station: The MS includes radio equipment and the man machine interface (MMI) that a
subscriber needs in order to access the services provided by the GSM PLMN. MSs can be installed
in vehicles or can be portable or handheld stations. The MS may include provisions for data
communication as well as voice. A mobile transmits and receives messages to and from the GSM
system over the air interface to establish and continue connections through the system.
Functions of MS
The primary functions of MS are to transmit and receive voice and data over the air interface of the
GSM system. MS performs the signal processing functions of digitizing, encoding, error protecting,
encrypting, and modulating the transmitted signals. It also performs the inverse functions on the
received signals from the BS.
functions includes the following.
 Voice and data transmission;
 Frequency and time synchronization;
 Monitoring of power and signal quality of the surrounding cells for optimum handover;
 Provision of location updates;
 Equalization of multipath distortions.
University Endsem Question Paper
MOC MODEL ANSWER DECEMBER– 2018
Q.1 a) What frequencies reuse and give its frequency reuse factors?
Ans:
Frequency Reuse
Frequency reuse is the process of using the same radio frequencies on radio transmitter sites within
a geographic area that are separated by sufficient distance to cause minimal interference with each
other. Frequency reuse allows for a dramatic increase in the number of customers that can be served
(capacity) within a geographic area on a limited amount of radio spectrum (limited number of radio
channels). The ability to reuse frequencies depends on various factors that include the ability of
channels to operate in with interference signal energy attenuation between the transmitters.
The IS-95 CDMA and CDMA radio channels use coded channels that are uniquely assigned to each
user. This allows many users to operate on the same frequency. This also allows frequencies to be
reused in every cell site and sectors within a cell site. However, the use of the same frequency in the
same cell site and sector increases the interference levels and decreases the capacity of the radio
channels.

b) Explain cell Splitting give its features?


Ans:
Cell splitting is the process of dividing the radio coverage of a cell site in a wireless telephone
system into two or more new cell sites. Cell splitting may be performed to provide additional
capacity within the region of the original cell site.
Cell Splitting Operation

This diagram shows the process of cell splitting that is used to expand the capacity (number of
channels) of a mobile communication system. In this example, the radio coverage area of large cells
sites are split by adjusting the power level and/or using reduced antenna height to cover a reduced
area. Reducing the radio coverage area of a cell site by changing the RF boundaries of a cell site has
the same effect as placing cells farther apart, and allows new cell sites to be added.
Q.2 a) Explain PCS Architecture in details.
Ans:

Fig. PCS Architecture

A personal communications service (PCS) is a type of wireless mobile service with advanced
coverage and that delivers services at a more personal level. It generally refers to the modern
mobile communication that boosts the capabilities of conventional cellular networks and fixed-line
telephony networks as well.
PCS is also known as digital cellular.
A PCS works similarly to a cellular network in basic operations, but requires more service provider
infrastructure to cover a wider geographical area. A PCS generally includes the following:

 Wireless communication (data, voice and video)


 Mobile PBX
 Paging and texting
 Wireless radio
 Personal communication networks
 Satellite communication systems, etc.

PCS has three broad categories: narrowband, broadband and unlicensed. TDMA, CDMA and GSM,
and 2G, 3G and 4G are some of the common technologies that are used to deliver a PCS.

b) Give details Fading in mobile environment.


Ans:
Perhaps the most challenging technical problem facing communications systems engineers is fading
in a mobile environment. The term fading refers to the time variation of received signal power
caused by changes in the transmission medium or path(s). In a fixed environment, fading is affected
by changes in atmospheric conditions, such as rainfall. But in a mobile environment, where one of
the two antennas is moving relative to the other, the relative location of various obstacles changes
over time, creating complex transmission effects.

For example, suppose a ground-reflected wave near the mobile unit is received. Because the
ground-reflected wave has a 180° phase shift after reflection, the ground wave and the line-of-sight
(LOS) wave may tend to cancel, resulting in high signal loss.2 Further, because the mobile antenna
is lower than most human-made structures in the area, multipath interference occurs. These
reflected waves may interfere constructively or destructively at the receiver.

Diffraction occurs at the edge of an impenetrable body that is large compared to the wavelength of
the radio wave. When a radio wave encounters such an edge, waves propagate in different
directions with the edge as the source. Thus, signals can be received even when there is no
unobstructed LOS from the transmitter. If the size of an obstacle is on the order of the wavelength
of the signal or less, scattering occurs. An incoming signal is scattered into several weaker outgoing
signals. At typical cellular microwave frequencies, there are numerous objects, such as lamp posts
and traffic signs, that can cause scattering. Thus, scattering effects are difficult to predict. These
three propagation effects influence system performance in various ways depending on local
conditions and as the mobile unit moves within a cell. If a mobile unit has a clear LOS to the
transmitter, then diffraction and scattering are generally minor effects, although reflection may have
a significant impact. If there is no clear LOS, such as in an urban area at street level, then diffraction
and scattering are the primary means of signal reception.

The Effects of Multipath Propagation

As just noted, one unwanted effect of multipath propagation is that multiple copies of a signal may
arrive at different phases. If these phases add destructively, the signal level relative to noise
declines, making signal detection at the receiver more difficult. A second phenomenon, of particular
importance for digital transmission, is intersymbol interference (ISI). Consider that we are sending
a narrow pulse at a given frequency across a link between a fixed antenna and a mobile unit. Figure
14.8 shows what the channel may deliver to the receiver if the impulse is sent at two different times.
The upper line shows two pulses at the time of transmission. The lower line shows the resulting
pulses at the receiver. In each case the first received pulse is the desired LOS signal. The magnitude
of that pulse may change because of changes in atmospheric attenuation. Further, as the mobile unit
moves farther away from the fixed antenna, the amount of LOS attenuation increases. But in
addition to this primary pulse, there may be multiple secondary pulses due to reflection, diffraction,
and scattering. Now suppose that this pulse encodes one or more bits of data. In that case, one or
more delayed copies of a pulse may arrive at the same time as the primary pulse for a subsequent
bit. These delayed pulses act as a form of noise to the subsequent primary pulse, making recovery
of the bit information more difficult. As the mobile antenna moves, the location of various obstacles
changes; hence the number, magnitude, and timing of the secondary pulses change. This makes it
difficult to design signal processing techniques that will filter out multipath effects so that the
intended signal is recovered with fidelity.

Q.3 a) Write short note on specialized MAC.


Ans:
Medium Access Control (MAC) address is a hardware address use to uniquely identify each node of
a network. It provides addressing and channel access control mechanisms to enable the several
terminals or network nodes to communicate in a specified network. Medium Access Control of data
communication protocol is also named as Media Access Control. In IEEE 802 OSI Reference
model of computer networking, the Data Link Control (DLC) layer is subdivided into two sub-
layers:
 The Logical Link Control (LLC) layer and
 The Medium Access Control (MAC) layer
The MAC sublayer acts as a direct interface between the logical link control (LLC) Ethernet
sublayer and the physical layer of reference model. Consequently, each different type of network
medium requires a different MAC layer. On networks that don’t conform they are part of IEEE 802
standards but they do conform that they participate OSI Reference Model then the node address is
named the Data Link Control (DLC) address. The MAC sublayer emulates a full-duplex logical
communication channel in a multipoint network system. These communication channels may
provide unicast, multicast and/or broadcast communication services.

MAC address is suitable when multiple devices are connected with same physical link then to
prevent from collisions system uniquely identify the devices one another at the data link layer, by
using the MAC addresses that are assigned to all ports on a switch. The MAC sublayer uses MAC
protocols to prevent collisions and MAC protocols uses MAC algorithm that accepts as input a
secret key and an arbitrary-length message to be authenticated, and outputs a MAC address.

Functions performed in the MAC sublayer:


The primary functions performed by the MAC layer as per the IEEE Std 802-2001 section 6.2.3 are
as follows:
1. Frame delimiting and recognition: This function is responsible to creates and recognizes
frame boundaries.
2. Addressing: MAC sublayer performs the addressing of destination stations (both as
individual stations and as groups of stations) and conveyance of source-station addressing
information as well.
3. Transparent data transfer: It performs the data transparency over data transfer of LLC,
PDUs, or of equivalent information in the Ethernet sublayer.
4. Protection: MAC sublayer function is to protect the data against errors, generally by means
of generating and checking frame check sequences.
5. Access control: Control of access to the physical transmission medium form unauthorized
medium access.
One of the most commonly used of MAC sublayer for wired networks i.e. Carrier Sense Multiple
Access with Collision Detection (CSMA/CD). Through MAC schema, a sender senses the medium
(a wire or coaxial cable) before transmission of data to check whether the medium is free or not. If
MAC senses that the medium is busy, the sender waits until it is free. When medium becomes free,
the sender starts transmitting of data and continues to listen into the medium. If any kind of
collision detected by sender while sending data, it stops at once and sends a jamming signal. But
this scheme doest work well with wireless networks. Some of the problems that occur when it uses
to transfer data through wireless networks are as follow;
 Signal strength decreases proportional to the square of the distance
 The sender would apply Carrier Sense (CS) and Collision Detection (CD), but the collisions
happen at the receiver
 It might be a case that a sender cannot “hear” the collision, i.e., CD does not work
 Furthermore, CS might not work, if for e.g., the terminals are“hidden”.
Medium access control comprises all mechanisms that regulate user access to a medium using
SDM, TDM, FDM, or CDM.
• MAC is thus similar to traffic regulations in the highway/multiplexing example.
• MAC belongs to layer 2, the data link control layer (DLC).
• Layer 2 is subdivided into the logical link control (LLC), layer 2b, and the MAC, layer 2a.
• The task of DLC is to establish a reliable point to point or point to multi-point connection between
different devices over a wired or wireless medium.

Motivation for a specialized MAC


• Example CSMA/CD
– Carrier Sense Multiple Access with Collision Detection.
– send as soon as the medium is free, listen into the medium if a collision occurs (original method in
IEEE 802.3).
• Problems in wireless networks
– signal strength decreases proportional to the square of the distance.
– the sender would apply CS and CD, but the collisions happen at the receiver.
– it might be the case that a sender cannot “hear” the collision, i.e., CD does not work.
– furthermore, CS might not work if, e.g., a terminal is “hidden”.

Motivation for a specialized MAC


Near and far terminals
b) Explain GMSK modulations & give its type.
Ans:
The Gaussian Minimum Shift Keying (GMSK) modulation is a modified version of the Minimum
Shift Keying (MSK) modulation where the phase is further filtered through a Gaussian filter to
smooth the transitions from one point to the next in the constellation. Next figure presents the
GMSK generation scheme:

Fig. GMSK generation scheme

where the Gaussian filter adopts the following form in the time domain:
Where λ is a normalization constant to maintain the power and the product BTc is the -3 dB band-
width-symbol time product. The higher this value, the cleaner will be the eye diagram of the signal
but more power will be transmitted on the side lobes of the spectrum. A typical value in communi-
cation applications is BTc=0.3 which is a good compromise between spectral efficiency and Inter-
Symbol interference.

Q.4 a) Explain Mobile station and SIM.


Ans:
Mobile Station: The MS includes radio equipment and the man machine interface (MMI) that a
subscriber needs in order to access the services provided by the GSM PLMN. MSs can be installed
in vehicles or can be portable or handheld stations. The MS may include provisions for data
communication as well as voice. A mobile transmits and receives messages to and from the GSM
system over the air interface to establish and continue connections through the system.
Functions of MS
The primary functions of MS are to transmit and receive voice and data over the air interface of the
GSM system. MS performs the signal processing functions of digitizing, encoding, error protecting,
encrypting, and modulating the transmitted signals. It also performs the inverse functions on the
received signals from the BS.
functions includes the following.
 Voice and data transmission;
 Frequency and time synchronization;
 Monitoring of power and signal quality of the surrounding cells for optimum handover;
 Provision of location updates;
 Equalization of multipath distortions;

SIM:
A subscriber identity module or subscriber identification module (SIM), widely known as a SIM
card, is an integrated circuit that is intended to securely store the international mobile subscriber
identity (IMSI) number and its related key, which are used to identify and authenticate subscribers
on mobile telephony devices (such as mobile phones and computers). It is also possible to store
contact information on many SIM cards. SIM cards are always used on GSM phones; for CDMA
phones, they are only needed for newer LTE-capable handsets. SIM cards can also be used in
satellite phones, smart watches, computers, or cameras.
The SIM circuit is part of the function of a universal integrated circuit card (UICC) physical smart
card, which is usually made of PVC with embedded contacts and semiconductors. SIM cards are
transferable between different mobile devices. The first UICC smart cards were the size of credit
and bank cards; sizes were reduced several times over the years, usually keeping electrical contacts
the same, so that a larger card could be cut down to a smaller size.
A SIM card contains its unique serial number (ICCID), international mobile subscriber identity
(IMSI) number, security authentication and ciphering information, temporary information related to
the local network, a list of the services the user has access to, and two passwords: a personal
identification number (PIN) for ordinary use, and a personal unblocking code (PUC) for PIN
unlocking.
The SIM provides personal mobility so that the user can have access to all subscribed services irre-
spective of both the location of the terminal and the use of a specific terminal. You need to insert
the SIM card into another GSM cellular phone to receive calls at that phone, make calls from that
phone, or receive other subscribed services.

b) Short note on HLR & VLR.


Ans:
a) HLR (Home location register):- The HLR is a data base that permanently stores data related to
a given set of subscribers. The HLR is the reference database for subscriber parameters. Various
identification numbers and addresses as well as authentication parameters, services subscribed, and
special routing information are stored. Current subscriber status, including a subscriber’s temporary
roaming number and associated VLR if the mobile is roaming, is maintained. The HLR provides
data needed to route calls to all MS-SIMs home based in its MSC area, even when they are roaming
out of area or in other GSM networks. The HLR provides the current location data needed to
support searching for and paging the MS-SIM for incoming calls, wherever the MS-SIM may be.
The HLR is responsible for storage and provision of SIM authentication and encryption parameters
needed by the MSC where the MS-SIM is operating. It obtains these parameters from the AUC. The
HLR maintains records of which supplementary services each user has subscribed to and provides
permission control in granting access to these services.
The HLR stores the identifications of SMS gateways that have messages for the subscriber
under the SMS until they can be transmitted to the subscriber and receipt is acknowledged. The
HLR provides receipt and forwarding to the billing center of charging information for its home
subscribers, even when that information comes from other PLMNs while the home subscribers are
roaming. Based on the above functions, different types of data are stored in HLR.
Some data are permanent; that is, they are modified only for administrative reasons, while
others are temporary and modified automatically by other network entities depending on the
movements and actions performed by the subscriber. Some data are mandatory, other data are
optional. Both the HLR and the VLR can be implemented in the same equipment in an MSC
(collocated). A PLMN may contain one or several HLRs. The permanent data stored in an HLR
includes the following.
 IMSI: It identifies unambiguously the MS in the whole GSM system;
 International MS ISDN number: It is the directory number of the mobile station;
 MS category specifies whether a MS is a pay phone or not;
 Roaming restriction (allowed or not);
 Closed user group (CUG) membership data;
 Supplementary services related parameters: Forwarded-to number, registration status, no
reply condition timer, call barring password, activation status, supplementary services check
flag;
 Authentication key, which is used in the security procedure and especially to authenticate
the declared identity of a MS.

The temporary data consists of the following.


 LMSI (Local MS identity);
 RANDISRES and Kc; data related to authentication and ciphering;
 MSRN;
 VLR address, which identifies the VLR currently handling the MS;
 MSC address, which identifies the MSC area where the MS is registered;
 Roaming restriction;
 Messages waiting data (used for SMS);
 RAND/SRES and ciphering key, that is, data related to authentication and ciphering;

The permanent data associated with the mobile are those that do not change as it moves
from one area to another. On the other hand, temporary data changes from call to call. The HLR
interacts with MSCs mainly for the procedures of interrogation for routing calls to a MS and to
transfer charging information after call termination. Location registration is performed by HLR.
When the subscriber changes the VLR area, the HLR is informed about the address of the actual
VLR. The HLR updates the new VLR with all relevant subscriber data. Similarly, location
canceling is done by HLR. After the subscriber roams to a different VLR area, the HLR updates the
new VLR with all the relevant subscriber data. Supplementary services are add-ons to the basic
service. These parameters need not all be stored in the HLR. However, it is safer to store all
subscription parameters in the HLR even when some are stored in a subscriber card. The data stored
in the HLR is changed only by MMI action when new subscribers are added, old subscribers are
deleted, or the specific services to which they subscribe are changed and not dynamically updated
by the system.

b) VLR (Visitor location register):- A MS roaming in an MSC area is controlled by the VLR
responsible for that area. When a MS appears in a LA, it starts a registration procedure. The MSC
for that area notices this registration and transfers to the VLR the identity of the LA where the MS
is situated. A VLR may be in charge of one or several MSC LAs. The VLR constitutes the database
that supports the MSC in the storage and retrieval of the data of subscribers present in its area.
When an MS enters the MSC area borders, it signals its arrival to the MSC that stores its identity in
the VLR. The information necessary to manage the MS is contained in the HLR and is transferred
to the VLR so that they can be easily retrieved if so required.
The location registration procedure allows the subscriber data to follow the movements of the MS.
For such reasons the data contained in the VLR and in the HLR are more or less the same.
Nevertheless, the data are present in the VLR only as long as the MS is registered in the area related
to that VLR. The terms permanent and temporary, in this case, are meaningful only during that time
interval when the mobile is in the area of local MSCNLR combination. The data contained in the
VLR can be compared with the subscriber-related data contained in a normal fixed exchange; the
location information can be compared with the line equipment reference attached to each fixed
subscriber connected to that exchange. The VLR is responsible for assigning a new TMSI number
to the subscriber. It also relays the ciphering key from HLR to BSS.
Cells in the PLMN are grouped into geographic areas, and each is assigned a LAI, as shown in
Figure 2.2(c). Each VLR controls a certain set of LAs. When a mobile subscriber roams from one
LA to another, their current location is automatically updated in their VLR. If the old and new LAs
are under the control of two different VLRs, the entry on the old VLR is deleted and an entry is
created in the new VLR by copying the basic data from the HLR. The subscriber's current VLR
address, stored at the HLR, is also updated. This provides the information necessary to complete
calls to roaming mobiles. The VLR supports a mobile paging and tracking subsystem in the local
area where the mobile is presently roaming. The detailed functions of VLR are as follows.
 Works with the HLR and AUC on authentication;
 Relays cipher key from HLR to BSS for encryption decryption;
 Controls allocation of new TMSI numbers; a subscriber's TMSI number can be periodically
changed to secure a subscriber's identity;
 Supports paging;
 Tracks state of all MSs in its area.

Data Stored in VLR


The VLR constitutes the database that supports the MSC in the storage and retrieval of the data of
subscribers present in its area. When an MS enters the MSC area borders, it signals its arrival to the
MSC that stores its identity in the VLR. The information necessary to manage the MS in whichever
type of call it may attempt is contained in the HLR and is transferred to the VLR so that they can be
easily retrieved if so required (location registration). This procedure allows the subscriber data to
follow the movements of the MS [2,3,7,8]. For such reasons the data contained in the VLR and in
the HLR are more or less the same. Nevertheless the data are present in the VLR only as long as the
MS is registered in the area related to that VLR. Data associated with the movement of mobile are
IMSI, MSISDN, MSRN, and TMSI. The terms permanent and temporary, in this case, are
meaningful only during that time interval. Some data are mandatory, others are optional. Data
stored
in VLR are as follows.
 The IMSI;
 The MSISDN;
 The MSRN, which is allocated to the MS either when the station is registered in an MSC
area or on a per-call basis and is used to route the incoming calls to that station;
 The TMSI;
 The LA where the MS has been registered, which will be used to call the station;
 Supplementary service parameters;
 MS category;
 Authentication key, query and response obtained from AUC;
 ID of the current MSC.

Q.5 a) GSM Architecture in detail.


Ans:
GSM Architecture

MS-Mobile station BSS- Base station Subsystem BTS-Base Transceiver Station


BSC- Base Station Controller NSS- Network subsystem OSS-operation support system
HLR -Home Location Register VLR-Visitor Location Register AUC-Authentication center
MSC- Mobile switching center EIR- Equipment Identity Register

Mobile station- These are the users .Number of users are controlled by one BTS
1. The mobile stations (MS) communicate with the base station subsystem over the radio the radio interface.
2. The BSS called as radio the subsystem, provides and manages the radio transmission path between the
mobile stations and the Mobile Switching Centre(MSC).It also manages radio interface between the mobile
stations and other subsystems of GSM.
3. Each BSS comprises many Base Station Controllers(BSC) that connect the mobile station to the network
and switching subsystem (NSS) through the mobile switching center
4. The NSS controls the switching functions of the GSM system. It allows the mobile switching center to
communicate with networks like PSTN, ISDN, CSPDN, PSPDN and other data networks.
5. The operation support system (OSS) allows the operation and maintenance of the GSM system. It allows
the system engineers to diagnose, troubleshoot and observe the parameters of the GSM systems. The OSS
subsystem interacts with the other subsystems and is provided for the GSM operating company staff that
provides service facilities for the network.
Base station(BSS)-- The following stations subsystem comprises of two parts:
1. Base Transceiver Station (BTS).
2. Base Station Controller(BSC).
The BSS consists many BSC that connect to a single MSC. Each BSC controls up to several
hundred BTS.
Base Transceiver Station(BTS)-BTS
It has radio transreciever that define a cell and are capable of handling radio link protocols with
MS.
Functions of BTS are
1. Handling radio link protocols
2. Providing FD communication to MS.
3. Interleaving and de- interleaving.
Base station controller(BSC) IT manages radio resources for one or more BTS.It controls several
hundred BTS al are connected to single MSC.
Functions of BTS are
• To control BTS.
• Radio resource management
• Handoff management and control
• Radio channel setup and frequency hoping
Network subsystem( NSS)
1.It handles the switching of GSM calls between external networks and indoor BSC
2.It includes three different data bases for mobility management as
A .HLR (Home Location Register)
B .VLR (Visitor Location Register)
C. AUC (Authentication center)
Mobile switching center (MSC)--
It connects fix networks like ISDN ,PSTN etc.
Following are the functions of MSC
1. Call setup, supervision and relies
2. Collection Of Billing Information
3. Call handling / routing
4. Management of signaling protocol
5. Record of VLR and HLR
HLR (Home Location Register) - Call ramming and call routing capabilities of GSM are
handeled.It stores all the administrative information of sub scriber registered in the networks. IT
maintains unique international mobile subscriber identity.(IMSI).
VLR (Visitor Location Register) - It is a temporary data base. It stores the IMSC number and
customer information for each roaming customer visiting specific MSC.
Authentication center - It is protected database .It maintains authentication keys and algorithms. It
contain s a register called as Equipment Identity Register.
Operation subsystem(OSS) - IT manages all mobile equipment in the system
1)management for charging and billing procedure
2)To maintain all hardware and network operations
AuC:
The AuC database holds different algorithms that are used for authentication and encryptions of the
mobile subscribers that verify the mobile user’s identity and ensure the confidentiality of each call.
The AuC holds the authentication and encryption keys for all the subscribers in both the home and
visitor location register.
EIR:
The EIR is another database that keeps the information about the identity of mobile equipment such
the International mobile Equipment Identity (IMEI) that reveals the details about the manufacturer,
country of production, and device type. This information is used to prevent calls from being
misused, to prevent unauthorized or defective MSs, to report stolen mobile phones or check if the
mobile phone is operating according to the specification of its type.

b) Explain GSM identifiers.


Ans:

GSM treats the users and the equipment in different ways. Phone numbers, subscribers, and
equipment identifiers are some of the known ones. There are many other identifiers that have been
well-defined, which are required for the subscriber’s mobility management and for addressing the
remaining network elements. Vital addresses and identifiers that are used in GSM are addressed
below.
International Mobile Station Equipment Identity (IMEI)
The International Mobile Station Equipment Identity (IMEI) looks more like a serial number which
distinctively identifies a mobile station internationally. This is allocated by the equipment
manufacturer and registered by the network operator, who stores it in the Equipment Identity
Register (EIR). By means of IMEI, one recognizes obsolete, stolen, or non-functional equipment.
Following are the parts of IMEI:
 Type Approval Code (TAC) : 6 decimal places, centrally assigned.
 Final Assembly Code (FAC) : 6 decimal places, assigned by the manufacturer.
 Serial Number (SNR) : 6 decimal places, assigned by the manufacturer.
 Spare (SP) : 1 decimal place.
Thus, IMEI = TAC + FAC + SNR + SP. It uniquely characterizes a mobile station and gives clues
about the manufacturer and the date of manufacturing.

International Mobile Subscriber Identity (IMSI)


Every registered user has an original International Mobile Subscriber Identity (IMSI) with a valid
IMEI stored in their Subscriber Identity Module (SIM).
IMSI comprises of the following parts:
 Mobile Country Code (MCC) : 3 decimal places, internationally standardized.
 Mobile Network Code (MNC) : 2 decimal places, for unique identification of mobile
network within the country.
 Mobile Subscriber Identification Number (MSIN) : Maximum 10 decimal places,
identification number of the subscriber in the home mobile network.

Mobile Subscriber ISDN Number (MSISDN)


The authentic telephone number of a mobile station is the Mobile Subscriber ISDN Number
(MSISDN). Based on the SIM, a mobile station can have many MSISDNs, as each subscriber is
assigned with a separate MSISDN to their SIM respectively.
Listed below is the structure followed by MSISDN categories, as they are defined based on
international ISDN number plan:
 Country Code (CC) : Up to 3 decimal places.
 National Destination Code (NDC) : Typically 2-3 decimal places.
 Subscriber Number (SN) : Maximum 10 decimal places.

Mobile Station Roaming Number (MSRN)


Mobile Station Roaming Number (MSRN) is an interim location dependent ISDN number, assigned
to a mobile station by a regionally responsible Visitor Location Register (VLA). Using MSRN, the
incoming calls are channelled to the MS.
The MSRN has the same structure as the MSISDN.
 Country Code (CC) : of the visited network.
 National Destination Code (NDC) : of the visited network.
 Subscriber Number (SN) : in the current mobile network.

Location Area Identity (LAI)


Within a PLMN, a Location Area identifies its own authentic Location Area Identity (LAI). The
LAI hierarchy is based on international standard and structured in a unique format as mentioned
below:
 Country Code (CC) : 3 decimal places.
 Mobile Network Code (MNC) : 2 decimal places.
 Location Area Code (LAC) : maximum 5 decimal places or maximum twice 8 bits coded
in hexadecimal (LAC < FFFF).

Temporary Mobile Subscriber Identity (TMSI)


Temporary Mobile Subscriber Identity (TMSI) can be assigned by the VLR, which is responsible
for the current location of a subscriber. The TMSI needs to have only local significance in the area
handled by the VLR. This is stored on the network side only in the VLR and is not passed to the
Home Location Register (HLR).
Together with the current location area, the TMSI identifies a subscriber uniquely. It can contain up
to 4 × 8 bits.

Local Mobile Subscriber Identity (LMSI)


Each mobile station can be assigned with a Local Mobile Subscriber Identity (LMSI), which is an
original key, by the VLR. This key can be used as the auxiliary searching key for each mobile
station within its region. It can also help accelerate the database access. An LMSI is assigned if the
mobile station is registered with the VLR and sent to the HLR. LMSI comprises of four octets (4x8
bits).

Cell Identifier (CI)


Using a Cell Identifier (CI) (maximum 2 × 8) bits, the individual cells that are within an LA can be
recognized. When the Global Cell Identity (LAI + CI) calls are combined, then it is uniquely
defined.

c) Short note on GSM burst.


Ans:
The information contained in one time The information contained in one time slot on the
TDMA frame is call a slot on the TDMA frame is call a burst.
Five types of burst
1) Normal Burst (NB)
2) Frequency Correction Burst (FB)
3) Synchronization Burst (SB)
4) Access Burst (AB)
5) Dummy Burst

Q.6 a) Explain the block diagram of GPRS.


Ans:
GPRS architecture works on the same procedure like GSM network, but, has additional entities
that allow packet data transmission. This data network overlaps a second-generation GSM network
providing packet data transport at the rates from 9.6 to 171 kbps. Along with the packet data trans-
port the GSM network accommodates multiple users to share the same air interface resources con-
currently.

Following is the GPRS Architecture diagram:

GPRS attempts to reuse the existing GSM network elements as much as possible, but to effectively
build a packet-based mobile cellular network, some new network elements, interfaces, and proto-
cols for handling packet traffic are required.

Therefore, GPRS requires modifications to numerous GSM network elements as summarized be-
low:
GPRS Mobile Stations
New Mobile Stations (MS) are required to use GPRS services because existing GSM phones do
not handle the enhanced air interface or packet data. A variety of MS can exist, including a high-
speed version of current phones to support high-speed data access, a new PDA device with an
embedded GSM phone, and PC cards for laptop computers. These mobile stations are backward
compatible for making voice calls using GSM.

GPRS Base Station Subsystem


Each BSC requires the installation of one or more Packet Control Units (PCUs) and a software
upgrade. The PCU provides a physical and logical data interface to the Base Station Subsystem
(BSS) for packet data traffic. The BTS can also require a software upgrade but typically does not
require hardware enhancements.

When either voice or data traffic is originated at the subscriber mobile, it is transported over the air
interface to the BTS, and from the BTS to the BSC in the same way as a standard GSM call.
However, at the output of the BSC, the traffic is separated; voice is sent to the Mobile Switching
Center (MSC) per standard GSM, and data is sent to a new device called the SGSN via the PCU
over a Frame Relay interface.
GPRS Support Nodes
Following two new components, called Gateway GPRS Support Nodes (GSNs) and, Serving
GPRS Support Node (SGSN) are added:

Gateway GPRS Support Node (GGSN)


The Gateway GPRS Support Node acts as an interface and a router to external networks. It
contains routing information for GPRS mobiles, which is used to tunnel packets through the IP
based internal backbone to the correct Serving GPRS Support Node. The GGSN also collects
charging information connected to the use of the external data networks and can act as a packet
filter for incoming traffic.

Serving GPRS Support Node (SGSN)


The Serving GPRS Support Node is responsible for authentication of GPRS mobiles, registration
of mobiles in the network, mobility management, and collecting information on charging for the
use of the air interface.

Internal Backbone
The internal backbone is an IP based network used to carry packets between different GSNs.
Tunnelling is used between SGSNs and GGSNs, so the internal backbone does not need any
information about domains outside the GPRS network. Signalling from a GSN to a MSC, HLR or
EIR is done using SS7.

Routing Area
GPRS introduces the concept of a Routing Area. This concept is similar to Location Area in GSM,
except that it generally contains fewer cells. Because routing areas are smaller than location areas,
less radio resources are used While broadcasting a page message.

b) Describe in detail logical channel for GSM.


Ans:
Traffic Channels (TCH)
A traffic channel (TCH) is used to carry speech and data traffic. Traffic channels are defined using
a 26-frame multi frame, or group of 26 TDMA frames. The length of a 26-frame multi frame is
120ms. Out of the 26 frames, 24 are used for traffic, 1 is used for the slow associated control
channel (SACCH) and 1 is currently unused.

Full Rate & Half Rate TCH


They can be defined as full-rate TCHs (TCH/F, 22.8 kbps) and half-rate TCHs (TCH/H, 11.4
kbps). Half-rate TCHs double the capacity of a system effectively by making it possible to
transmit two calls in a single channel. If a TCH/F is used for data communications, the usable
data rate drops to 9.6 kbps (in TCH/H: max. 4.8 kbps) due to the enhanced security algorithms.
Eighth-rate TCHs are also specified, and are used for signaling. In the GSM Recommendations,
they are called stand-alone dedicated control channels (SDCCH)

Signaling channels
The signaling channels on the air interface are used for call establishment, paging, call maintenance,
synchronization, etc.
There are three type of signaling channels
1. Broadcast Channels
2. Common Control Channels
3. Dedicated Control Channel
Broadcast Channels (BCH)
Carry only downlink information and are responsible mainly for synchronization and frequency
correction. This is the only channel type enabling point-to-multipoint communications in which
short messages are simultaneously transmitted to several mobiles
BCH Characteristics
• Each cell has a designated BCH carrier
• All BCH timeslots transmit continuously on full power
• TS 0 contains logical control channels
• TS1-7 optionally carries traffic
• BCCH block occur once each 51-frame multiframe
• Each block comprises 4 frames carrying 1 message
The BCHs include the following channels;
1. Broadcast Control Channel (BCCH): General information, cell specific (local area code
(LAC), network operator, access parameters, list of neighboring cells, etc). The MS receives
signals via the BCCH from many BTSs within the same network and/or different networks.
2. Frequency Correction Channel (FCCH): Downlink only; correction of MS frequencies;
transmission of frequency standard to MS; it is also used for synchronization of an
acquisition by providing the boundaries between timeslots and the position of the first
timeslot of a TDMA frame.
3. Synchronization Channel (SCH): Downlink only; frame synchronization (TDMA frame
number) and identification of base station. The valid reception of one SCH burst will
provide the MS with all the information needed to synchronize with a BTS
Common Control Channels (CCCH)
A group of uplink and downlink channels between the MS and the BTS. These channels are used to
convey information from the network to MSs and provide access to the network. The CCCHs
include the following channels;
1. Paging Channel (PCH): Downlink only; the MS is informed by the BTS for incoming calls via
the PCH
2. Access Grant Channel (AGCH): Downlink only, BTS allocates a TCH or SDCCH to the MS,
thus allowing the MS access to the network.
3. Random Access Channel (RACH): Uplink only, allows the MS to request an SDCCH in
response to a page or due to a call; the MS chooses a random time to send on this channel. This
creates a possibility of collisions with transmissions from other MSs
Dedicated Control Channels (DCCH)
Responsible for roaming, handovers, encryption, etc. The DCCHs include the following channels;
1. Stand-alone Dedicated Control Channel (SDCCH); Communications channel between
MS and the BTS; signaling during call setup before a traffic channel (TCH) is allocated
2. Slow Associated Control Channel (SACCH); Transmits continuous measurement reports
in parallel to operation of a TCH or SDCCH
3. Fast Associated Control Channel (FACCH); Similar to the SDCCH, but used in parallel
to operation of the TCH; if the data rate of the SACCH is insufficient, “borrowing mode” is
used: Additional bandwidth is borrowed from the TCH; this happens for messages
associated with call establishment authentication of the subscriber, handover decisions, etc.

c) Write a short note on GSM frame.


Ans:
GSM data structure is split into slots, frames, multiframes, superframes and hyperframes to give the
required structure and timing to the transmitted data.
The data frames and slots within 2G GSM are organised in a logical manner so that the system
understands when particular types of data are to be transmitted.
Having the GSM frame structure enables the data to be organised in a logical fashion so that the
system is able to handle the data correctly. This includes not only the voice data, but also the
important signalling information as well.
The GSM frame structure provides the basis for the various physical channels used within GSM,
and accordingly it is at the heart of the overall system.

GSM frame structure - the basics


The basic element in the GSM frame structure is the frame itself. This comprises the eight slots,
each used for different users within the TDMA system. As mentioned in another page of the
tutorial, the slots for transmission and reception for a given mobile are offset in time so that the
mobile does not transmit and receive at the same time.

The basic GSM frame


defines the structure upon which all the timing and structure of the GSM messaging and signalling
is based. The fundamental unit of time is called a burst period and it lasts for approximately 0.577
ms (15/26 ms). Eight of these burst periods are grouped into what is known as a TDMA frame. This
lasts for approximately 4.615 ms (i.e.120/26 ms) and it forms the basic unit for the definition of
logical channels. One physical channel is one burst period allocated in each TDMA frame.
In simplified terms the base station transmits two types of channel, namely traffic and control.
Accordingly the channel structure is organised into two different types of frame, one for the traffic
on the main traffic carrier frequency, and the other for the control on the beacon frequency.

GSM multiframe
The GSM frames are grouped together to form multiframes and in this way it is possible to establish
a time schedule for their operation and the network can be synchronised.

There are several GSM multiframe structures:


 Traffic multiframe: The Traffic Channel frames are organised into multiframes consisting
of 26 bursts and taking 120 ms. In a traffic multiframe, 24 bursts are used for traffic. These
are numbered 0 to 11 and 13 to 24. One of the remaining bursts is then used to accommodate
the SACCH, the remaining frame remaining free. The actual position used alternates
between position 12 and 25.
 Control multiframe: the Control Channel multiframe that comprises 51 bursts and occupies
235.4 ms. This always occurs on the beacon frequency in time slot zero and it may also
occur within slots 2, 4 and 6 of the beacon frequency as well. This multiframe is subdivided
into logical channels which are time-scheduled. These logical channels and functions
include the following:
 Frequency correction burst
 Synchronisation burst
 Broadcast channel (BCH)
 Paging and Access Grant Channel (PACCH)
 Stand Alone Dedicated Control Channel (SDCCH)
GSM Superframe
Multiframes are then constructed into superframes taking 6.12 seconds. These consist of 51 traffic
multiframes or 26 control multiframes. As the traffic multiframes are 26 bursts long and the control
multiframes are 51 bursts long, the different number of traffic and control multiframes within the
superframe, brings them back into line again taking exactly the same interval.

GSM Hyperframe
Above this 2048 superframes (i.e. 2 to the power 11) are grouped to form one hyperframe which
repeats every 3 hours 28 minutes 53.76 seconds. It is the largest time interval within the GSM frame
structure.
Within the GSM hyperframe there is a counter and every time slot has a unique sequential number
comprising the frame number and time slot number. This is used to maintain synchronisation of the
different scheduled operations with the GSM frame structure. These include functions such as:
 Frequency hopping: Frequency hopping is a feature that is optional within the GSM
system. It can help reduce interference and fading issues, but for it to work, the transmitter
and receiver must be synchronised so they hop to the same frequencies at the same time.
 Encryption: The encryption process is synchronised over the GSM hyperframe period
where a counter is used and the encryption process will repeat with each hyperframe.
However, it is unlikely that the cellphone conversation will be over 3 hours and accordingly
it is unlikely that security will be compromised as a result.
The slots and frames are handled in a very logical manner to enable the system to expect and accept
the data that needs to be sent. Organising it in this logical fashion enables it to be handled in the
most efficient manner

Q.7 a) Explain block diagram of UMTS in detail.


Ans:

UMTS/3G

the Universal Mobile Telecommunications System is the third generation (3G) successor to the
second generation GSM based cellular technologies which also include GPRS, and EDGE.
Although UMTS uses a totally different air interface, the core network elements have been
migrating towards the UMTS requirements with the introduction of GPRS and EDGE. In this way
the transition from GSM to the 3G UMTS architecture did not require such a large instantaneous
investment.

UMTS uses Wideband CDMA (WCDMA / W-CDMA) to carry the radio transmissions, and often
the system is referred to by the name WCDMA. It is also gaining a third name.

UMTS Specifications and Management

In order to create and manage a system as complicated as UMTS or WCDMA it is necessary to


develop and maintain a large number of documents and specifications. For UMTS or WCDMA,
these are now managed by a group known as 3GPP - the Third Generation Partnership Programme.
This is a global co-operation between six organizational partners - ARIB, CCSA, ETSI, ATIS, TTA
and TTC.
The scope of 3GPP was to produce globally applicable Technical Specifications and Technical
Reports for a 3rd Generation Mobile Telecommunications System. This would be based upon the
GSM core networks and the radio access technologies that they support (i.e., Universal Terrestrial
Radio Access (UTRA) both Frequency Division Duplex (FDD) and Time Division Duplex (TDD)
modes).

Since it was originally formed, 3GPP has also taken over responsibility for the GSM standards as
well as looking at future developments including LTE (Long Term Evolution) and the 4G
technology known as LTE Advanced.

3G UMTS / WCDMA technologies

There are several key areas of 3G UMTS / WCDMA. Within these there are several key
technologies that have been employed to enable UMTS / WCDMA to provide a leap in
performance over its 2G predecessors.

Some of these key areas include:

 Radio interface: The UMTS radio interface provides the basic definition of the radio
signal. W-CDMA occupies 5 MHz channels and has defined formats for elements such as
synchronization, power control and the like Read more about the UMTS / W-CDMA radio
interface.
 CDMA technology: 3G UMTS relies on a scheme known as CDMA or code division
multiple access to enable multiple handsets or user equipments to have access to the base
station. Using a scheme known as direct sequence spread spectrum, different UEs have
different codes and can all talk to the base station even though they are all on the same
frequency.
 UMTS network architecture: The architecture for a UMTS network was designed to enable
packet data to be carried over the network, whilst still enabling it to support circuit
switched voice. All the usual functions enabling access to the network, roaming and the
like are also supported.
 UMTS modulation schemes: Within the CDMA signal format, a variety of forms of
modulation are used. These are typically forms of phase shift keying.
 UMTS channels: As with any cellular system, different data channels are required for
passing payload data as well as control information and for enabling the required resources
to be allocated. A variety of different data channels are used to enable these facilities to be
accomplished
 UMTS TDD: There are two methods of providing duplex for 3G UMTS. One is what is
termed frequency division duplex, FDD. This uses two channels spaced sufficiently apart
so that the receiver can receive whilst the transmitter is also operating. Another method is
to use time vision duplex, TDD where short time blocks are allocated to transmissions in
both directions. Using this method, only a single channel is required.
 Handover: One key area of any cellular telecommunications system is the handover
(handoff) from one cell to the next. Using CDMA there are several forms of handover that
are implemented within the system.

UMTS network constituents


The UMTS network architecture can be divided into three main elements:
 User Equipment (UE): The User Equipment or UE is the name given to what was previous
termed the mobile, or cell phone. The new name was chosen because the considerably
greater functionality that the UE could have. It could also be anything between a mobile
phone used for talking to a data terminal attached to a computer with no voice capability.
 Radio Network Subsystem (RNS): The RNS also known as the UMTS Radio Access
Network, UTRAN, is the equivalent of the previous Base Station Subsystem or BSS in
GSM. It provides and manages the air interface fort he overall network.
 Core Network: The core network provides all the central processing and management for the
system. It is the equivalent of the GSM Network Switching Subsystem or NSS.
The core network is then the overall entity that interfaces to external networks including the public
phone network and other cellular telecommunications networks.

UMTS Network Architecture Overview


User Equipment, UE
The USER Equipment or UE is a major element of the overall 3G UMTS network architecture. It
forms the final interface with the user. In view of the far greater number of applications and
facilities that it can perform, the decision was made to call it a user equipment rather than a mobile.
However it is essentially the handset (in the broadest terminology), although having access to much
higher speed data communications, it can be much more versatile, containing many more
applications. It consists of a variety of different elements including RF circuitry, processing,
antenna, battery, etc.
3G UMTS Radio Network Subsystem
This is the section of the 3G UMTS / WCDMA network that interfaces to both the UE and the core
network. The overall radio access network, i.e. collectively all the Radio Network Subsystem is
known as the UTRAN UMTS Radio Access Network.
The radio network subsystem is also known as the UMTS Radio Access Network or UTRAN.
3G UMTS Core Network
The 3G UMTS core network architecture is a migration of that used for GSM with further elements
overlaid to enable the additional functionality demanded by UMTS.
In view of the different ways in which data may be carried, the UMTS core network may be split
into two different areas:
Circuit switched elements: These elements are primarily based on the GSM network entities and
carry data in a circuit switched manner, i.e. a permanent channel for the duration of the call.
Packet switched elements: These network entities are designed to carry packet data. This enables
much higher network usage as the capacity can be shared and data is carried as packets which
are routed according to their destination.
Some network elements, particularly those that are associated with registration are shared by both
domains and operate in the same way that they did with GSM.

UMTS Core Network


Circuit switched elements
The circuit switched elements of the UMTS core network architecture include the following
network entities:
 Mobile switching centre (MSC): This is essentially the same as that within GSM, and it
manages the circuit switched calls under way.
 Gateway MSC (GMSC): This is effectively the interface to the external networks.
Packet switched elements
The packet switched elements of the 3G UMTS core network architecture include the following
network entities:
 Gateway GPRS Support Node (GGSN): Like the SGSN, this entity was also first
introduced into the GPRS network. The Gateway GPRS Support Node (GGSN) is the
central element within the UMTS packet switched network. It handles inter-working
between the UMTS packet switched network and external packet switched networks, and
can be considered as a very sophisticated router. In operation, when the GGSN receives data
addressed to a specific user, it checks if the user is active and then forwards the data to the
SGSN serving the particular UE.
Shared elements
The shared elements of the 3G UMTS core network architecture include the following network entities:
 Home location register (HLR): This database contains all the administrative information
about each subscriber along with their last known location. In this way, the UMTS network
is able to route calls to the relevant RNC / Node B. When a user switches on their UE, it
registers with the network and from this it is possible to determine which Node B it
communicates with so that incoming calls can be routed appropriately. Even when the UE is
not active (but switched on) it re-registers periodically to ensure that the network (HLR) is
aware of its latest position with their current or last known location on the network.
 Equipment identity register (EIR): The EIR is the entity that decides whether a given UE
equipment may be allowed onto the network. Each UE equipment has a number known as
the International Mobile Equipment Identity. This number, as mentioned above, is installed
in the equipment and is checked by the network during registration.
 Authentication centre (AuC) : The AuC is a protected database that contains the secret key
also contained in the user's USIM card.
UMTS radio access network, UTRAN
The UMTS Radio Access Network, UTRAN, or Radio Network Subsystem, RNS comprises two
main components:
 Radio Network Controller, RNC: This element of the UTRAN / radio network subsystem
controls the Node Bs that are connected to it, i.e. the radio resources in its domain.. The
RNC undertakes the radio resource management and some of the mobility management
functions, although not all. It is also the point at which the data encryption / decryption is
performed to protect the user data from eavesdropping.
 Node B: Node B is the term used within UMTS to denote the base station transceiver. This
part of the UTRAN contains the transmitter and receiver to communicate with the UEs
within the cell. It participates with the RNC in the resource management. NodeB is the
3GPP term for base station, and often the terms are used interchangeably.

3G UMTS UTRAN Architecture

UTRAN interfaces
 Serving GPRS Support Node (SGSN): As the name implies, this entity was first developed
when GPRS was introduced, and its use has been carried over into the UMTS network
architecture. The SGSN provides a number of functions within the UMTS network
architecture.
o Mobility management When a UE attaches to the Packet Switched domain of the
UMTS Core Network, the SGSN generates MM information based on the mobile's
current location.
o Session management: The SGSN manages the data sessions providing the required
quality of service and also managing what are termed the PDP (Packet data Protocol)
contexts, i.e. the pipes over which the data is sent.
o Interaction with other areas of the network: The SGSN is able to manage its
elements within the network only by communicating with other areas of the network,
e.g. MSC and other circuit switched areas.
o Billing: The SGSN is also responsible billing. It achieves this by monitoring the flow
of user data across the GPRS network. CDRs (Call Detail Records) are generated by
the SGSN before being transferred to the charging entities (Charging Gateway
Function, CGF).
The UMTS standards are structured in a way that the internal functionality of the different network
elements is not defined. Instead, the interfaces between the network elements is defined and in this
way, so too is the element functionality.
There are several interfaces that are defined for the UTRAN elements:
 Iub : The Iub connects the NodeB and the RNC within the UTRAN. Although when it was
launched, a standardization of the interface between the controller and base station in the
UTRAN was revolutionary, the aim was to stimulate competition between suppliers,
allowing opportunities like some manufacturers who might concentrate just on base stations
rather than the controller and other network entities.
 Iur : The Iur interface allows communication between different RNCs within the UTRAN.
The open Iur interface enables capabilities like soft handover to occur as well as helping to
stimulate competition between equipment manufacturers.
 Iu : The Iu interface connects the UTRAN to the core network.
Having standardised interfaces within various areas of the network including the UTRAN allows
network operators to select different network entities from different suppliers.

b) Write short note on 3 GPP2 family CDMA 2000.


Ans:
The 3rd Generation Partnership Project 2 (3GPP2) is a collaboration between
telecommunications associations to make a globally applicable third generation (3G) mobile phone
system specification within the scope of the ITU's IMT-2000 project. In practice, 3GPP2 is the
standardization group for CDMA2000, the set of 3G standards based on the earlier cdmaOne 2G
CDMA technology.
3GPP2 should not be confused with 3GPP; 3GPP is the standard body behind the Universal Mobile
Telecommunications System (UMTS) that is the 3G upgrade to GSM networks, while 3GPP2 is the
standard body behind the competing 3G standard CDMA2000 that is the 3G upgrade to cdmaOne
networks used mostly in the United States (and to some extent also in Japan, China, Canada, South
Korea and India).
GSM/GPRS/EDGE/W-CDMA is the most widespread wireless standard in the world. A few
countries (such as China, the United States, Canada, Ukraine, Trinidad and Tobago, India, South
Korea and Japan) use both sets of standards, but most countries use only the GSM family.
Code-division multiple access
CDMA2000 (also known as C2K or IMT Multi-Carrier (IMT-MC)) is a family of 3G mobile
technology standards for sending voice, data, and signaling data between mobile phones and cell
sites. It is developed by 3GPP2 as a backwards-compatible successor to second-generation
cdmaOne (IS-95) set of standards and used especially in North America and South Korea.
CDMA2000 compares to UMTS, a competing set of 3G standards, which is developed by 3GPP
and used in Europe, Japan, and China.
The name CDMA2000 denotes a family of standards that represent the successive, evolutionary
stages of the underlying technology. These are:
 Voice: CDMA2000 1xRTT, 1X Advanced
 Data: CDMA2000 1xEV-DO (Evolution-Data Optimized): Release 0, Revision A, Revision
B, Ultra Mobile Broadband (UMB)
All are approved radio interfaces for the ITU's IMT-2000. In the United States, CDMA2000 is a
registered trademark of the Telecommunications Industry Association (TIA-USA).

c) Give detail HSPA.


Ans:
HSPA - High Speed Packet Access

UMTS HSPA, High Speed Packet Access, combines HSDPA and HSUPA for uplink and
downlink to provide high speed data access.
3G HSPA, High Speed packet Access is the combination of two technologies, one of the
downlink and the other for the uplink that can be built onto the existing 3G UMTS or W-
CDMA technology to provide increased data transfer speeds.
The original 3G UMTS / W-CDMA standard provided a maximum download speed of 384
kbps.
With many users requiring much high data transfer speeds to compete with fixed line
broadband services and also to support services that require higher data rates, the need for an
increase in the speeds obtainable became necessary.
This resulted in the development of the technologies for 3G HSPA.

3G HSPA benefits
The UMTS cellular system as defined under the 3GPP Release 99 standard was orientated
more towards switched circuit operation and was not well suited to packet operation.
Additionally greater speeds were required by users than could be provided with the original
UMTS networks. Accordingly the changes required for HSPA were incorporated into many
UMTS networks to enable them to operate more in the manner required for current
applications.
HSPA provides a number of significant benefits that enable the new service to provide a far
better performance for the user. While 3G UMTS HSPA offers higher data transfer rates, this
is not the only benefit, as the system offers many other improvements as well:

1. Use of higher order modulation: 16QAM is used in the downlink instead of QPSK to
enable data to be transmitted at a higher rate. This provides for maximum data rates of 14
Mbps in the downlink. QPSK is still used in the uplink where data rates of up to 5.8 Mbps are
achieved. The data rates quoted are for raw data rates and do not include reductions in actual
payload data resulting from the protocol overheads.
2. Shorter Transmission Time Interval (TTI): The use of a shorter TTI reduces the round trip time
and enables improvements in adapting to fast channel variations and provides for reductions in latency.
3. Use of shared channel transmission: Sharing the resources enables greater levels of efficiency to be
achieved and integrates with IP and packet data concepts.
4. Use of link adaptation: By adapting the link it is possible to maximize the channel usage.
5. Fast Node B scheduling: The use of fast scheduling with adaptive coding and modulation (only
downlink) enables the system to respond to the varying radio channel and interference conditions and
to accommodate data traffic which tends to be "bursty" in nature.
6. Node B based Hybrid ARQ: This enables 3G HSPA to provide reduced retransmission
round trip times and it adds robustness to the system by allowing soft combining of
retransmissions.

For the network operator, the introduction of 3G HSPA technology brings a cost reduction
per bit carried as well as an increase in system capacity. With the increase in data traffic, and
operators looking to bring in increased revenue from data transmission, this is a particularly
attractive proposition. A further advantage of the introduction of 3G HSPA is that it can
often be rolled out by incorporating a software update into the system. This means its use
brings significant benefits to user and operator alike.

3G UMTS HSPA constituents


There are two main components to 3G UMTS HSPA, each addressing one of the links
between the base station and the user equipment, i.e. one for the uplink, and one for the
downlink.

Uplink and downlink transmission directions

The two technologies were released at different times through 3GPP. They also have
different properties resulting from the different modes of operation that are required. In view
of these facts they were often treated as almost separate entities. Now they are generally
rolled out together. The two technologies are summarised below:

 HSDPA - High Speed Downlink Packet Access: HSDPA provides packet data
support, reduced delays, and a peak raw data rate (i.e. over the air) of 14 Mbps. It also
provides around three times the capacity of the 3G UMTS technology defined in
Release 99 of the 3GPP UMTS standard. Read more about High speed downlink
packet access, HSDPA
 HSUPA - High Speed Uplink Packet Access: HSUPA provides improved uplink
packet support, reduced delays and a peak raw data rate of 5.74 Mbps. This results
in a capacity increase of around twice that provided by the Release 99 services.
Read more about High speed uplink packet access, HSUPA

Q.8 a) What are the three main CDMA 2000 std and explain all three.
Ans:
1. 1X:
CDMA2000 1X (IS-2000), also known as 1x and 1xRTT, is the core CDMA2000 wireless air
interface standard. The designation "1x", meaning 1 times radio transmission technology, indicates
the same radio frequency (RF) bandwidth as IS-95: a duplex pair of 1.25 MHz radio channels.
1xRTT almost doubles the capacity of IS-95 by adding 64 more traffic channels to the forward link,
orthogonal to (in quadrature with) the original set of 64. The 1X standard supports packet data
speeds of up to 153 kbit/s with real world data transmission averaging 80–100 kbit/s in most
commercial applications.[3] IMT-2000 also made changes to the data link layer for greater use of
data services, including medium and link access control protocols and QoS. The IS-95 data link
layer only provided "best efforts delivery" for data and circuit switched channel for voice (i.e., a
voice frame once every 20 ms).
2. 1xEV-DO
CDMA2000 1xEV-DO (Evolution-Data Optimized), often abbreviated as EV-DO or EV, is a
telecommunications standard for the wireless transmission of data through radio signals, typically
for broadband Internet access. It uses multiplexing techniques including code division multiple
access (CDMA) as well as time-division access to maximize both individual user's throughput and
the overall system throughput. It is standardized (IS-856) by 3rd Generation Partnership Project 2
(3GPP2) as part of the CDMA2000 family of standards and has been adopted by many mobile
phone service providers around the world – particularly those previously employing CDMA
networks.

3. 1X Advanced
1X Advanced(Rev.E)[4][5] is the evolution of CDMA2000 1X. It provides up to four times the
capacity and 70% more coverage compared to 1X.

CDMA2000-3x
CDMA2000-3x (or CDMA 3G-3xRTT) uses 5 MHz of bandwidth, and it is therefore classified
together with UMTS in the Wideband CDMA (W-CDMA) family of radio transmission
technologies. It delivers peak bit rates of up to 144 Kbps for mobile applications and as much as 2
Mbps for stationary applications. CDMA2000-3x will also introduce higher bit rates for data
transmission, more sophisticated QoS and policy mechanisms, and advanced multimedia
capabilities. It will rely on the ATM-based data link layer between the base stations and MSCs to
accommodate the higher speeds and advanced call model. Table 3.2 shows a comparison between
the CDMA technologies, including the UMTS W-CDMA technology, which is described in the
following section.

b) Explain LTE in 4G.


Ans:
LTE/4G
The high-level network architecture of LTE is comprised of following three main components:
1. The User Equipment (UE).
2. The Evolved UMTS Terrestrial Radio Access Network (E-UTRAN).
3. The Evolved Packet Core (EPC).
The evolved packet core communicates with packet data networks in the outside world such
as the internet, private corporate networks or the IP multimedia subsystem. The interfaces
between the different parts of the system are denoted Uu, S1 and SGi as shown below:

The User Equipment (UE)


The internal architecture of the user equipment for LTE is identical to the one used by
UMTS and GSM which is actually a Mobile Equipment (ME). The mobile equipment
comprised of the following important modules:
1. Mobile Termination (MT) : This handles all the communication functions.
2. Terminal Equipment (TE) : This terminates the data streams.
3. Universal Integrated Circuit Card (UICC) : This is also known as the SIM card for LTE
equipments. It runs an application known as the Universal Subscriber Identity Module (USIM).
A USIM stores user-specific data very similar to 3G SIM card. This keeps information
about the user's phone number, home network identity and security keys etc.
The E-UTRAN (The access network)
The architecture of evolved UMTS Terrestrial Radio Access Network (E- UTRAN) has
been illustrated below.
The E-UTRAN handles the radio communications between the mobile and the evolved packet core

and just has one component, the evolved base stations, called eNodeB or eNB. Each eNB is a
base station that controls the mobiles in one or more cells. The base station that is communicating
with a mobile is known as its serving eNB.
LTE Mobile communicates with just one base station and one cell at a time and there are
following two main functions supported by eNB:
 The eBN sends and receives radio transmissions to all the mobiles using the analogue and
digital signal processing functions of the LTE air interface.
 The eNB controls the low-level operation of all its mobiles, by sending them signalling
messages such as handover commands.
Each eBN connects with the EPC by means of the S1 interface and it can also be connected to nearby
base stations by the X2 interface, which is mainly used for signaling and packet forwarding during
handover.
A home eNB (HeNB) is a base station that has been purchased by a user to provide femtocell
coverage within the home. A home eNB belongs to a closed subscriber group (CSG) and can only be
accessed by mobiles with a USIM that also belongs to the closed subscriber group.
The Evolved Packet Core (EPC) (The core network)
The architecture of Evolved Packet Core (EPC) has been illustrated below. There are few more
components which have not been shown in the diagram to keep it simple. These components are like
the Earthquake and Tsunami Warning System (ETWS), the Equipment Identity Register (EIR) and
Policy Control and Charging Rules Function (PCRF).
Below is a brief description of each of the components shown in the above architecture:
 The Home Subscriber Server (HSS) component has been carried forward from UMTS and

GSM and is a central database that contains information about all the network operator's
subscribers.
 The Packet Data Network (PDN) Gateway (P-GW) communicates with the outside world ie.
packet data networks PDN, using SGi interface. Each packet data network is identified by an
access point name (APN). The PDN gateway has the same role as the GPRS support node
(GGSN) and the serving GPRS support node (SGSN) with UMTS and GSM.
 The serving gateway (S-GW) acts as a router, and forwards data between the base station
and the PDN gateway.
 The mobility management entity (MME) controls the high-level operation of the mobile by
means of signalling messages and Home Subscriber Server (HSS).
 The Policy Control and Charging Rules Function (PCRF) is a component which is not
shown in the above diagram but it is responsible for policy
control decision-making, as well as for controlling the flow-based charging functionalities in
the Policy Control Enforcement Function (PCEF), which resides in the P-GW.
The interface between the serving and PDN gateways is known as S5/S8. This has two slightly
different implementations, namely S5 if the two devices are in the same network, and S8 if they are
in different networks.

Functional split between the E-UTRAN and the EPC


Following diagram shows the functional split between the E-UTRAN and the EPC for an
LTE network:

c) Give detail HSUPA.


Ans:
High-Speed Uplink Packet Access (HSUPA) is a 3G mobile telephony protocol in the HSPA fam-
ily. This technology was the second major step in the UMTS evolution process. It was specified and
standardized in 3GPP Release 6 to improve the uplink data rate to 5.76 Mbit/s,[6]extending the ca-
pacity, and reducing latency. Together with additional improvements, this creates opportunities for
a number of new applications including VoIP, uploading pictures, and sending large e-mail mes-
sages.
HSUPA has been superseded by newer technologies further advancing transfer rates. LTE provides
up to 300 Mbit/s for downlink and 75 Mbit/s for uplink. Its evolution LTE Advanced supports max-
imum downlink rates of over 1 Gbit/s.
Technology
Enhanced Uplink adds a new transport channel to WCDMA, called the Enhanced Dedicated
Channel (E-DCH). It also features several improvements similar to those of HSDPA, including
multi-code transmission, shorter transmission time interval enabling faster link adaptation, fast
scheduling, and fast Hybrid Automatic Repeat Request (HARQ) with incremental redundancy
making retransmissions more effective. Similarly to HSDPA, HSUPA uses a "packet scheduler",
but it operates on a "request-grant" principle where the user equipment (UE) requests permission to
send data and the scheduler decides when and how many UEs will be allowed to do so. A request
for transmission contains data about the state of the transmission buffer and the queue at the UE and
its available power margin. However, unlike HSDPA, uplink transmissions are not orthogonal to
each other.
In addition to this "scheduled" mode of transmission, the standards allows a self-initiated
transmission mode from the UEs, denoted "non-scheduled". The non-scheduled mode can, for
example, be used for VoIP services for which even the reduced TTI and the Node B based
scheduler will be unable to provide the very short delay time and constant bandwidth required.
Each MAC-d flow (i.e., QoS flow) is configured to use either scheduled or non-scheduled modes.
The UE adjusts the data rate for scheduled and non-scheduled flows independently. The maximum
data rate of each non-scheduled flow is configured at call setup, and typically not changed
frequently. The power used by the scheduled flows is controlled dynamically by the Node B
through absolute grant (consisting of an actual value) and relative grant (consisting of a single
up/down bit) messages.
At the physical layer, HSUPA introduces new channels E-AGCH (Absolute Grant Channel), E-
RGCH (Relative Grant Channel), F-DPCH (Fractional-DPCH), E-HICH (E-DCH Hybrid ARQ
Indicator Channel), E-DPCCH (E-DCH Dedicated Physical Control Channel), and E-DPDCH (E-
DCH Dedicated Physical Data Channel).
E-DPDCH is used to carry the E-DCH Transport Channel; and E-DPCCH is used to carry the
control information associated with the E-DCH.

Q.9 a) Draw a diagram for millimeter wave & explain it.


Ans:
Mm-Wave is a promising technology for future cellular systems. Since limited spectrum is available
for commercial cellular systems, most research has focused on increasing spectral efficiency by
using OFDM, MIMO, efficient channel coding • Network densification has also been studied to
increase area spectral efficiency, including the use of heterogeneous infrastructure (macro-, Pico-,
femto cells, relays, distributed antennas) but increased spectral efficiency is not enough to guarantee
high user data rates. The alternative is more spectrum. • Millimeter wave (mm-Wave) cellular
systems, operating in the 30- 300GHz band, above which electromagnetic radiation is considered to
be low (or far) infrared light, also referred to as terahertz radiation.
1. Mm-wave spectrum would allow service providers to significantly expand the channel band-
widths far beyond the present 20 MHz channels used by 4G customers. By increasing the
RF channel bandwidth for mobile radio channels, the data capacity is greatly increased,
while the latency for digital traffic is greatly decreased, thus supporting much better internet
based access and applications that require minimal latency. Mm-wave frequencies, due to
the much smaller wavelength, may exploit polarization and new spatial processing tech-
niques, such as massive MIMO and adaptive beam forming. • the mm-wave spectrum will
have spectral allocations that are relatively much closer together, making the propagation
characteristics of different mm-wave bands much more comparable and ``homogenous''.
2. A common myth in the wireless engineering community is that rain and atmosphere make
mm-wave spectrum useless for mobile communications. However, when one considers the
fact that today's cell sizes in urban environments are on the order of 200 m, it becomes clear
that mm-wave cellular can overcome these issues. • Figure shows the rain attenuation and
atmospheric absorption characteristics of mm-wave propagation. • . It can be seen that for
cell sizes on the order of 200 m, atmospheric absorption does not create significant addi-
tional path loss for mm- waves, particularly at 28 GHz and 38 GHz. Only 7 dB/km of atten-
uation is expected due to heavy rainfall rates of 1 inch/hr for cellular propagation at 28 GHz,
which translates to only 1.4 dB of attenuation over 200 m distance.
3. Parameter Affected By mm-wave • BANDWIDTH:-The main benefit that millimeter Wave
technology has over RF frequencies is the spectral bandwidth of 5GHz being available in
these ranges, resulting in current speeds of 1.25Gbps Full Duplex with potential throughput
speeds of up to 10Gbps Full Duplex being made possible. • SECURITY:-Since millimeter
waves have a narrow beam width and are blocked by many solid structures they also create
an inherent level of security. In order to sniff millimeter wave radiation a receiver would
have to be setup very near, or in the path of, the radio connection. The loss of data integrity
caused by a sniffing antenna provides a detection mechanism for networks under attack. Ad-
ditional measures, such as cryptographic algorithms can be used that allow a network to be
fully protected against attack.
4. BEAM WIDTH INTERFERENCE RESISTANCE:-Millimeter wave signals transmit in
very narrow focused beams which allows for multiple employments in close range using the
same frequency ranges. This allows Millimeter wave ideal for Point-to-Point Mesh, Ring
and dense Hub & Spoke network topologies where lower frequency signals would not be
able to cope before cross signal interference would become a significant limiting factor.
5. Advantages And Limitation Of mm-wave ADVANTAGES:- •Millimeter wave’s larger
bandwidth is able to provide higher transmission rate, capability of spread spectrum and is
more immune to interference. •Extremely high frequencies allow multiple short-distance
(I.e. multiple TX can be placed in nearby location to each other) usages at the same fre-
quency without interfering each other but It requires the narrow beam width. For the same
size of antenna, when the frequency is increased, the beam width is decreased. •It reduces
hardware size, i.e. higher the frequency is, the smaller the antenna size can be used.
6. LIMITATIONS • Higher costs in manufacturing of greater precision hardware due to com-
ponents with smaller size. • At extremely high frequencies, there is significant attenuation.
Hence millimeter waves can hardly be used for long distance applications. • The penetration
power of mm-wave through objects such concrete walls is known less. • There are interfer-
ences with oxygen & rain at higher frequencies therefore further research is going on to re-
duce this.
7. Applications of Mm wave communication I. Small Cell Access : Small cells deployed
underplaying the macro cells and provide solution for the capacity enhancement in the 5G
networks. With huge bandwidth, mm Wave small cells are able to provide the gigabit rates.
Small cells encrypt all voice and data sent and received.
8. II. Wireless Backhaul With small cells densely deployed in the next generation of cellular
systems (5G), it is costly to connect base stations (BSs) to the other BSs and to the network
by fiber based backhaul .In contrast, high speed wireless backhaul with low cost, flexible,
and easier to deploy. With huge bandwidth available, wireless backhaul in mm Wave bands,
such as the 60 GHz band and E-band (71–76 GHz and 81– 86 GHz), provides several-Gbps
data rates and can be a promising backhaul solution for small cells. The Eband backhaul
provides the high speed transmission between the small cell base stations (BSs) or between
BSs and the gateway.
9. III. millimeter wave propagation The propagation characteristics of millimeter wave
bands are very different to those below 4GHz. Typically distances that can be achieved are
very much less and the signals do not pass through walls and other objects in buildings.
Typically millimeter wave communication is likely to be used for outdoor coverage ranges
between 200 - 300 meters. Often these millimeter wave small cells may use beamforming
techniques to target the required user equipment and also reduce the possibility of reflec-
tions.

b) Describe in detail Virtual Reality and Augmented Reality.


Ans:
What is Virtual Reality?
Virtual reality (VR) is an artificial, computer-generated simulation or recreation of a real life
environment or situation. It immerses the user by making them feel like they are experiencing the
simulated reality firsthand, primarily by stimulating their vision and hearing.
VR is typically achieved by wearing a headset like Facebook’s Oculus equipped with the
technology, and is used prominently in two different ways:
 To create and enhance an imaginary reality for gaming, entertainment, and play (Such as
video and computer games, or 3D movies, head mounted display).
 To enhance training for real life environments by creating a simulation of reality where
people can practice beforehand (Such as flight simulators for pilots).
Virtual reality is possible through a coding language known as VRML (Virtual Reality Modeling
Language) which can be used to create a series of images, and specify what types of interactions are
possible for them.

What is Augmented Reality?


Augmented reality (AR) is a technology that layers computer-generated enhancements atop an
existing reality in order to make it more meaningful through the ability to interact with it. AR is
developed into apps and used on mobile devices to blend digital components into the real world in
such a way that they enhance one another, but can also be told apart easily.
AR technology is quickly coming into the mainstream. It is used to display score overlays on
telecasted sports games and pop out 3D emails, photos or text messages on mobile devices. Leaders
of the tech industry are also using AR to do amazing and revolutionary things with holograms and
motion activated commands.

Augmented Reality vs. Virtual Reality


Augmented reality and virtual reality are inverse reflections of one in another with what each
technology seeks to accomplish and deliver for the user. Virtual reality offers a digital recreation of
a real life setting, while augmented reality delivers virtual elements as an overlay to the real world.

How are Virtual Reality and Augmented Reality Similar?


Technology
Augmented and virtual realities both leverage some of the same types of technology, and they each
exist to serve the user with an enhanced or enriched experience.
Entertainment
Both technologies enable experiences that are becoming more commonly expected and sought after
for entertainment purposes. While in the past they seemed merely a figment of a science fiction
imagination, new artificial worlds come to life under the user’s control, and deeper layers of
interaction with the real world are also achievable. Leading tech moguls are investing and
developing new adaptations, improvements, and releasing more and more products and apps that
support these technologies for the increasingly savvy users.
Science and Medicine
Additionally, both virtual and augmented realities have great potential in changing the landscape of
the medical field by making things such as remote surgeries a real possibility. These technologies
been already been used to treat and heal psychological conditions such as Post Traumatic Stress
Disorder (PTSD).

How do Augmented and Virtual Realities Differ?


Purpose
Augmented reality enhances experiences by adding virtual components such as digital images,
graphics, or sensations as a new layer of interaction with the real world. Contrastingly, virtual
reality creates its own reality that is completely computer generated and driven.
Delivery Method
Virtual Reality is usually delivered to the user through a head-mounted, or hand-held controller.
This equipment connects people to the virtual reality, and allows them to control and navigate their
actions in an environment meant to simulate the real world.
Augmented reality is being used more and more in mobile devices such as laptops, smart phones,
and tablets to change how the real world and digital images, graphics intersect and interact.

How do they work together?


It is not always virtual reality vs. augmented reality– they do not always operate independently of
one another, and in fact are often blended together to generate an even more immersing experience.
For example, haptic feedback-which is the vibration and sensation added to interaction with
graphics-is considered an augmentation. However, it is commonly used within a virtual reality
setting in order to make the experience more lifelike though touch.
Virtual reality and augmented reality are great examples of experiences and interactions fueled by
the desire to become immersed in a simulated land for entertainment and play, or to add a new
dimension of interaction between digital devices and the real world. Alone or blended together, they
are undoubtedly opening up worlds-both real and virtual alike.

Q.10 a) Explain LTE based MULTIFIRE.


Ans:
MulteFire is an LTE-based technology that operates standalone in unlicensed and shared spectrum,
including the global 5 GHz band. Based on 3GPP Release 13 and 14, MulteFire technology sup-
ports Listen-Before-Talk for fair co-existence with Wi-Fi and other technologies operating in the
same spectrum. It supports private LTE and neutral host deployment models. Target vertical mar-
kets include industrial IoT, enterprise, cable, and various other vertical markets.
The MulteFire Release 1.0 specification was developed by the MulteFire Alliance, an independent,
diverse and international member-driven consortium. Release 1.0 was published to MulteFire
Alliance members in January 2017 and was made publicly available in April 2017. The MulteFire
Alliance is currently working on Release 1.1 which will add further optimizations for IoT and new
spectrum bands.
According to Harbor Research in its published white paper, the market opportunity for private LTE
networks for industrial and commercial IoT will reach $118.5 billion in 2023. It also reported that
the total addressable revenue for Enterprise markets deploying private and neutral host LTE with
MulteFire will reach $5.7 billion by 2025.
The MulteFire Alliance has grown to more than 40 members. Its board members include Boingo
Wireless, CableLabs, Ericsson, Huawei, Intel, Nokia, Qualcomm and SoftBank. The organization is
open to any company with an interest in advancing LTE and cellular technology in unlicensed and
shared spectrum.
b) Explain in detail URLLC.
Ans:
Ultra-reliable low-latency communication, or URLLC, is one of several different types of use cases
supported by the 5G New Radio (NR) standard, as stipulated by 3GPP (3rd Generation Partnership
Project) Release 15. URLLC will cater to multiple advanced services for latency0-sensitive
connected devices, such as factory automation, autonomous driving, the industrial internet and
smart grid or robotic surgeries.
But, in order to understand URLLC, you must understand 5G NR. This is the global standard for a
much stronger and more capable cellular network. With it we will deliver faster, more reliable
mobile services, and a much smoother user experience from everyday cellphone users to the
internet of things (IoT) to smart technologies on a massive scale.
Other services that 5G will support include eMBB (Enhanced Mobile Broadband) that will supply
high bandwidth internet access for wireless connectivity, large-scale video streaming, and virtual
reality. And mMTC (Massive Machine Type Communication) which supports internet access for
sensing, metering, and monitoring devices.
But we’ll focus on URLLC for now. One of the key features of URLLC is the LL, or low latency.
Low latency is important for gadgets that, say, drive themselves, or perform prostate surgeries. Low
latency allows a network to be optimized for processing incredibly large amounts of data with
minimal delay (or, latency). The networks need to adapt to a broad amount of changing data in real
time. 5G will enable this service to function. URLLC is, arguably, the most promising addition to
upcoming 5G capabilities, but it will also be the hardest to secure; URLLC requires a quality of
service (QoS) totally different from mobile broadband services. It will provide networks with
instantaneous and intelligent systems, though it will require transitioning out of the core network.
This new URLLC wireless connectivity will guarantee latency to be 1ms or less. In order for this
interface to achieve low latency, all the devices have to synchronize to the same time-base. Time-
sensitive networking is another component of the 5G URLLC capabilities. This will allow the
shapers used for managing traffic to be time aware.
The design of a low-latency and high-reliability service involves several components: Integrated
frame structure, incredibly fast turnaround, efficient control and data resource sharing, grant-free
based uplink transmission, and advanced channel coding schemes. Uplink grant-free structures
guarantee a reduction in user equipment (UE) latency transmission through avoiding the middle-
man process of acquiring a dedicated scheduling grant.
Ultra-reliable low-latency communication represents a complete game-changer for communications
technology in the modern age. With it, we can conduct remote surgeries, have our cars drive for us,
and increase machine productivity by large-scale factors. But URLLC simply isn’t possible without
the development and implementation of 5G NR; the non-standalone version of 5G NR is slated to
be released later this year. The standalone version should be released sometime in 2020.

Вам также может понравиться