Академический Документы
Профессиональный Документы
Культура Документы
Class: BE
Sem: I
ACADEMIC BOOK
Contents:
Vision
“To provide quality technical education in rural area to
create competent human resources.”
Mission
“Committed to produce competent engineers to cater
the needs of society by imparting skill based education
through effective teaching learning process.”
Vision
“Develop Department of Computer Engineering into
centre of excellence through imparting technical
education of international standards and research in the
field of Computer Engineering.”
Mission
“To provide quality engineering education to the
students through state of art education in Computer
Engineering.”
Pravara Rural Education Society’s
Pravara Technical Education Campus
Sir Visvesvaraya Institute of Technology, Nashik
Academic Calendar SE to BE - 2019-20 (Sem-I)
Week Days No. of
Week
Month Working Events
No. Mon Tue Wed Thu Fri Sat
Days
1 1 -- 05- Ramjan Id - Holiday
2 3 4 5 6 7 8 -- 6-8- Administrative Audit (IQAC)
3 10 11 12 13 14 15 -- 10- Principal, HOD & Deans Meeting
4 17 18 19 20 21 22 6 11-12- Orientation program by the faculty department
June- level
19 14- Orientation program for faculty Institute level
17- Commencement of Teaching SE to BE
5 24 25 26 27 28 29 6 17- Project/ Mini Project/ Internship presentation by the
students of all Department
21-International Yoga Day
5- Earn and Learn students selection
7 1 2 3 4 5 6 5 12- Ashadi Ekadashi – Holiday
13-Seminar on Rules and Regulations for woman at
8 8 9 10 11 12 13 5 work place
15-19 -1st Industrial Visit week
9 15 16 17 18 19 20 5 20- 1st Display & submission of Academic &
Attendance Defaulter list to dean office.
22-24- Academic Audit
10 July- 22 23 24 25 26 27 6 22-27 1st Assignment week
19 22- Collection of application for Student council 2019-
20
25- Students Feedback- I
27- HR Meet (Training & Placement)
11 29 30 31 3 27- Principal, HOD, Deans Meeting
29- First year Induction Program
29-02-Class Test –I
31-Student Council 2019-20 selection
31- Mentoring report by the department
1-2- 1st Project Evaluation
12 1 2 3 2 05-10 2ndIndustrial Visit week
8- Display of Class Test-I Marks
10- Alumni Meet
13 5 6 7 8 9 10 6 10- Pleasure Trip for Pravara Technical Campus staff
12- Bakari- Id- Holiday
14-Late Padmashri Dr.VitthalraoVikhePatil Jayanti.
14 12 13 14 15 16 17 3 15- Independence day celebration
15- Student Council Meeting
15- Principal, HOD & Deans Meeting
15 Aug.- 19 20 21 22 23 24 6 17- Parshi (New Year)- Holiday
19
20- 2nd Display & submission of Academic &
Attendance Defaulter list to dean office.
19-24-University Insem Exam tentative (SE, TE & BE)
22-Bakri- Id- Holiday
26-31 2nd Assignment week
16 26 27 28 29 30 31 6 26-31- Foot Prints- Sports Event
26-31- 1st makeup classes
29-31- Parents Teacher Interaction Meet
30 – 2nd Project Evaluation
31- Mentoring report by the department
2- 12 Ganesh Utsav (Pravarecha Raja)
17 2 3 4 5 6 7 3 3- Student Council Meeting
5- Teacher's Day celebration & Accolade 2K19
10-Moharum- Holiday
18 9 10 11 12 13 14 5 14- Engineers day celebration
16-20- Class Test -II
20 -3rd Display & submission of Academic &
19 Sep.- 16 17 18 19 20 21 5 Attendance Defaulter list to dean office.
19 23-28 3rd Industrial Visit week
24- Student Council meeting
20 23 24 25 26 27 28 6 26- Display of Class Test-II Marks
27- Students Feedback- II
27- Student Council Meeting
21 30 1 28- Principal, HOD & Deans Meeting
29- 3rd Project Evaluation
30- Mentoring report by the department
22 1 2 3 4 5 4 1- 5 3rdAssignment week
2- Mahatma Gandhi Jayanti- Holiday
23 7 8 9 10 11 12 5 07-12 Preliminary Exam (SE to BE)
8- Dasara – Holiday
24 14 15 16 17 18 19 3 12- SE final Submission
14-TE final Submission
25 Oct.- 21 22 23 24 25 26 -- 15- BE final Submission
19 16- Display of Preliminary Exam Marks
16- Display & submission of Academic &
Attendance Defaulter list to dean office.
26 28 29 30 31 -- 16- Last day of term/ conclusion of teaching
18-05- University Oral/Practical Exam (SE to BE)
28- Diwali (Bali Pratipada)- Holiday
29- Bhaubij- Holiday
27 1 2 -- Cont. University Oral/Practical Exam (SE to BE)
28 4 5 6 7 8 9 -- 11- Datta Jayanti- Holiday
Nov.- 12- Gurunanak Jayanti- Holiday
29 11 12 13 14 15 16 --
19 14-7th Dec. University End Sem Examination (SE to
30 18 19 20 21 22 23 --
BE)
31 25 26 27 28 29 30 --
Total Working Days 91
Conducting the Aptitude and doing its analysis. FE to
BE
Mentor Meetings FE to
BE
Technical Expert Lectures and Soft Skills Training FE to Continuing Process
BE
Conducting the Technical Interviews, Personal
BE
Interviews.
Colour Index
Working days with activity Working teaching days University Exam Days Holidays
Industrial Visits, Expert lectures & Other activities will be conduct in each month of July to Sept. 2019
Faculty of Engineering Savitribai Phule Pune University
Teacher assessment
Insem Exam
Class Test 1
Class Test 2
Attendance
End Sem.Exam
External
Internal
Prelim
Total
Total
410241 High 20 20 70 20 5 5 30 70 100 - 50 150
Performance
Computing
High Performance Computing
Course Contents
Unit -1 : Introduction 09 Hours
Motivating Parallelism, Scope of Parallel Computing, Parallel Programming Platforms: Implicit
Parallelism, Trends in Microprocessor and Architectures, Limitations of Memory, System
Performance, Dichotomy of Parallel Computing Platforms, Physical Organization of Parallel
Platforms, Communication Costs in Parallel Machines, Scalable design principles, Architectures:
N-wide superscalar architectures, Multi-core architecture.
Books:
Text:
1. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar, "Introduction to Parallel
Computing", 2nd edition, Addison-Wesley, 2003, ISBN: 0-201-64865-2 2.
2. Jason sanders, Edward Kandrot, “CUDA by Example”, Addison-Wesley, ISBN-13: 978-0- 13-
138768-3
References:
1. Kai Hwang, ”Scalable Parallel Computing”, McGraw Hill 1998, ISBN:0070317984
2. Shane Cook, “CUDA Programming: A Developer's Guide to Parallel Computing with
GPUs”, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA 2013 ISBN:
9780124159884
3. David Culler Jaswinder Pal Singh, ”Parallel Computer Architecture: A Hardware/Software
Approach”, Morgan Kaufmann,1999, ISBN 978-1-55860-343-1
4. Rod Stephens, “ Essential Algorithms”, Wiley, ISBN: ISBN: 978-1-118-61210-1
Evaluation Guidelines:
Class Test (CT) [20 marks]:- Three class tests, 20 marks each, will be conducted in a semester
and out of these three, the average of best two will be selected for calculation of class test marks.
Format of question paper is same as university.
TA [5 marks]: Three/four assignments will be conducted in the semester. Teacher assessment will be
calculated on the basis of performance in assignments, class test and pre-university test
Attendance (AT) [5 marks]: Attendance marks will be given as per university policy.
1. Question Paper will have 5 questions. Question 1 is objective question contain 5 sub
questions each carry 1 marks.
2. Attempt any 3 questions from remaining 4 question each carry 5 marks.
In semester Exam :
30 Marks in semester exam : As per university guidelines.
Course Objectives:
To study parallel computing hardware and programming models
To be conversant with performance analysis and modeling of parallel programs
To understand the options available to parallelize the programs
To know the operating system requirements to qualify in handling the parallelization
Course Outcomes:
On completion of the course, student will be able to–
Describe different parallel architectures, inter-connect networks, programming models
Develop an efficient parallel algorithm to solve given problem
Analyze and measure performance of modern parallel computing systems
Build the logic to parallelize the programming task
CO-PO Mapping
Course PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
Outcomes
CO1 2 1
CO2 2 1 1 1
CO3 1 1
CO4 1 2 1
CO1 with PO1 According to CO1 students learn to describe different parallel
architectures, inter-connect networks, programming models. So it is
moderately correlated to PO1.
CO1 with PO2 According to CO1 students learn to describe different parallel
architectures, inter-connect networks, programming models. So it is
slightly correlated to PO4.
CO2 with PO2 According to CO2 students learn to develop an efficient parallel algorithm
to solve given problem . So it is moderately correlated to PO2.
CO2 with PO3 According to CO2 students learn to develop an efficient parallel algorithm
to solve given problem. So it is slightly correlated to PO3.
CO2 with PO4 According to CO2 students learn to develop an efficient parallel
algorithm to solve given problem. So it is slightly correlated to PO4.
CO2 with PO12 According to CO2 students learn to develop an efficient parallel
algorithm to solve given problem. So it is slightly correlated to PO12.
CO3 with PO6 According to CO3 students get the knowledge to analyze and measure
performance of modern parallel computing systems . So it is slightly
related to PO6.
CO3 with PO7 According to CO3 students get the knowledge to analyze and measure
performance of modern parallel computing systems . So it is slightly
related to PO7.
CO4 with PO1 According to CO4 Students are able to build the logic to parallelize the
programming task. So it is slightly correlated with PO1.
CO4 with PO4 According to CO4 Students are able to build the logic to parallelize the
programming task. So it is moderately correlated with PO4.
CO4 with PO12 According to CO4 Students are able to build the logic to parallelize the
programming task. So it is slightly correlated with PO12.
Assignments
Solution(Assignment 1 )
Answer:
SISD (Single Instruction, Single Data stream)
Single Instruction, Single Data (SISD) refers to an Instruction Set Architecture in which a single
processor (one CPU) executes exactly one instruction stream at a time and also fetches or stores one
item of data at a time to operate on data stored in a single memory unit. Most of the CPU design,
based on the von Neumann architecture, from the beginning till recent times are based on the SISD.
The SISD model is a typical non-pipelined architecture with the general-purpose registers, as well
as dedicated special registers such as the Program Counter (PC), the Instruction Register (IR),
Memory Address Registers (MAR) and Memory Data Registers (MDR).
Very long instruction word (VLIW) describes a computer processing architecture in which a
language compiler or pre-processor breaks program instruction down into basic operations that
can be performed by the processor in parallel (that is, at the same time). These operations are put
into a very long instruction word which the processor can then take apart without further analysis,
handing each operation to an appropriate functional unit.
VLIW is sometimes viewed as the next step beyond the reduced instruction set computing ( RIS
C ) architecture, which also works with a limited set of relatively basic instructions and can usually
execute more than one instruction at a time (a characteristic referred to as superscalar ). The main
advantage of VLIW processors is that complexity is moved from the hardware to the software,
which means that the hardware can be smaller, cheaper, and require less power to operate. The
challenge is to design a compiler or pre-processor that is intelligent enough to decide how to build
the very long instruction words. If dynamic pre-processing is done as the program is run,
performance may be a concern.
The Crusoe family of processors from Transmeta uses very long instruction words that are
assembled by a pre-processor that is located in a flash memory chip. Because the processor does
not need to have the ability to discover and schedule parallel operations, the processor contains only
about a fourth of the transistor s of a regular processor. The lower power requirement enables
computers based on Crusoe technology to be operated by battery almost all day without a recharge.
The Crusoe processors emulate Intel's x86 processor instruction set. Theoretically, pre-processors
could be designed to emulate other processor architectures.
UMA (Uniform Memory Access) system is a shared memory architecture for the multiprocessors.
In this model, a single memory is used and accessed by all the processors present the multiprocessor
system with the help of the interconnection network. Each processor has equal memory accessing
time (latency) and access speed. It can employ either of the single bus, multiple bus or crossbar
switch. As it provides balanced shared memory access, it is also known as SMP (Symmetric
multiprocessor) systems.
Note that all tasks in figure are independent and can be performed all together or in any sequence.
However, in general, some tasks may use data produced by other tasks and thus may need to wait
for these tasks to finish execution. An abstraction used to express such dependencies among tasks
and their relative order of execution is known as a task-dependency graph. A task-dependency
graph is a directed acyclic graph in which the nodes represent tasks and the directed edges indicate
the dependencies amongst them. The task corresponding to a node can be executed when all tasks
connected to this node by incoming edges have completed. Note that task-dependency graphs can
be disconnected and the edge-set of a task-dependency graph can be empty. This is the case for
matrix-vector multiplication, where each task computes a subset of the entries of the product vector.
To see a more interesting task-dependency graph, consider the following database query processing
example.
Figure : Decomposition of dense matrix-vector multiplication into four tasks. The portions of the
matrix and the input and output vectors accessed by Task 1 are highlighted.
A concept related to granularity is that of degree of concurrency. The maximum number of tasks
that can be executed simultaneously in a parallel program at any given time is known as its
maximum degree of concurrency. In most cases, the maximum degree of concurrency is less than
the total number of tasks due to dependencies among the tasks. For example, the maximum degree
of concurrency in the task-graphs of Figures and is four. In these task-graphs, maximum
concurrency is available right at the beginning when tables for Model, Year, Color Green, and
Color White can be computed simultaneously. In general, for task-dependency graphs that are trees,
the maximum degree of concurrency is always equal to the number of leaves in the tree.
A more useful indicator of a parallel program's performance is the average degree of concurrency,
which is the average number of tasks that can run concurrently over the entire duration of execution
of the program.
Both the maximum and the average degrees of concurrency usually increase as the granularity of
tasks becomes smaller (finer). For example, the decomposition of matrix-vector multiplication
shown in fig has a fairly small granularity and a large degree of concurrency. The decomposition
for the same problem shown in fig has a larger granularity and a smaller degree of concurrency.
The degree of concurrency also depends on the shape of the task-dependency graph and the same
granularity, in general, does not guarantee the same degree of concurrency. For example, consider
the two task graphs in fig, which are abstractions of the task graphs of Figures and , respectively
(Problem 3.1). The number inside each node represents the amount of work required to complete
the task corresponding to that node. The average degree of concurrency of the task graph in fig is
2.33 and that of the task graph in fig is 1.88 (Problem 3.1), although both task-dependency graphs
are based on the same decomposition.
Solution(Assignment 2 )
Answer:
Parallel algorithms often require a single process to send identical data to all other processes or to a
subset of them. This operation is known as one-to-all broadcast. Initially, only the source process
has the data of size m that needs to be broadcast. At the termination of the procedure, there are p
copies of the initial data - one belonging to each process. The dual of one-to-all broadcast is all-to-
one reduction. In an all-to-one reduction operation, each of the p participating processes starts
with a buffer Mcontaining m words. The data from all processes are combined through an
associative operator and accumulated at a single destination process into one buffer of size m.
Reduction can be used to find the sum, product, maximum, or minimum of sets of numbers - the i
th word of the accumulated M is the sum, product, maximum, or minimum of the i th words of
each of the original buffers. figure shows one-to-all broadcast and all-to-one reduction among p
processes.
One-to-all broadcast and all-to-one reduction are used in several important parallel algorithms
including matrix-vector multiplication, Gaussian elimination, shortest paths, and vector inner
product. In the following subsections, we consider the implementation of one-to-all broadcast in
detail on a variety of interconnection topologies.
A naive way to perform one-to-all broadcast is to sequentially send p - 1 messages from the source
to the other p - 1 processes. However, this is inefficient because the source process becomes a
bottleneck. Moreover, the communication network is underutilized because only the connection
between a single pair of nodes is used at a time. A better broadcast algorithm can be devised using a
technique commonly known as recursive doubling. The source process first sends the message to
another process. Now both these processes can simultaneously send the message to two other
processes that are still waiting for the message. By continuing this procedure until all the processes
have received the data, the message can be broadcast in log p steps.The steps in a one-to-all
broadcast on an eight-node linear array or ring are shown in fig. The nodes are labeled from 0 to 7.
Each message transmission step is shown by a numbered, dotted arrow from the source of the
message to its destination. Arrows indicating messages sent during the same time step have the
same number.
Figure: One-to-all broadcast on an eight-node ring. Node 0 is the source of the broadcast. Each message transfer step
is shown by a numbered, dotted arrow from the source of the message to its destination. The number on an arrow
indicates the time step during which the message is transferred.
Note that on a linear array, the destination node to which the message is sent in each step must be
carefully chosen. In fig, the message is first sent to the farthest node (4) from the source (0). In the
second step, the distance between the sending and receiving nodes is halved, and so on. The
message recipients are selected in this manner at each step to avoid congestion on the network. For
example, if node 0 sent the message to node 1 in the first step and then nodes 0 and 1 attempted to
send messages to nodes 2 and 3, respectively, in the second step, the link between nodes 1 and 2
would be congested as it would be a part of the shortest route for both the messages in the second
step. Reduction on a linear array can be performed by simply reversing the direction and the
sequence of communication, as shown in fig. In the first step, each odd numbered node sends its
buffer to the even numbered node just before itself, where the contents of the two buffers are
combined into one. After the first step, there are four buffers left to be reduced on nodes 0, 2, 4, and
6, respectively. In the second step, the contents of the buffers on nodes 0 and 2 are accumulated on
node 0 and those on nodes 6 and 4 are accumulated on node 4. Finally, node 4 sends its buffer to
node 0, which computes the final result of the reduction.
Figure : Reduction on an eight-node ring with node 0 as the destination of the reduction.
Answer:
Gather and scatter operations are used in many domains. However, to use these types of functions
on an SIMD architecture creates some programming challenges. As SIMD systems are optimized to
work with memory laid out in a contiguous manner. Whereas a gather operation reads elements
from memory and packs them in an SIMD register, the scatter operation unpacks the data and then
writes to individual memory locations.
Typical coding for this will result in the non-optimal use of the SIMD instructions on an Intel Xeon
Phi coprocessor. Gathers and scatters will result in more work than when memory that is being
access is laid out in a contiguous manner. More cache line misses and more pages in memory will
have to be accessed.
The Intel architecture, using the Streaming SIME Extensions (SSE) and the Intel Advanced Vector
Extensions (AVX), gather and scatter operations would need to be performed with the scalar loads
and stores. AVX2 and the Intel Initial Many Core (IMCI) instructions can also be used.
An example of this use is within the molecular dynamics domain. N-body simulations may use
scatter and gather techniques to optimize the compute intensive portions of the applications. Using a
number of the techniques mentioned below, a performance gain of 2X was observed on the miniMD
application using the Intel Xeon processors or the Intel Xeon Phi coprocessors.
A number of optimization techniques can be used for improving gather and scatter operations.
Improve the temporal and special locality
Choosing the right data layout, Structure of Arrays or Arrays of Structures.
Transposition between AoS and SoA.
Amortize the costs of gatter/scatter.
Q. Questions Max. Unit. CO Bloom'
No. Marks No. as mapp s
per ed Taxon
syllabu omy
s Level
3. What is All-to-All broadcast and reduction? 02 3 2 1
Answer:
All-to-all broadcast is a generalization of one-to-all broadcast in which all p nodes simultaneously
initiate a broadcast. A process sends the same m-word message to every other process, but different
processes may broadcast different messages. All-to-all broadcast is used in matrix operations,
including matrix multiplication and matrix-vector multiplication. The dual of all-to-all broadcast is
all-to-all reduction, in which every node is the destination of an all-to-one reduction
Speedup
When evaluating a parallel system, we are often interested in knowing how much performance gain
is achieved by parallelizing a given application over a sequential implementation. Speedup is a
measure that captures the relative benefit of solving a problem in parallel. It is defined as the ratio
of the time taken to solve a problem on a single processing element to the time required to solve the
same problem on a parallel computer with p identical processing elements. We denote speedup by
the symbol S.
Example :Adding n numbers using n processing elements
Consider the problem of adding n numbers by using n processing elements. Initially, each
processing element is assigned one of the numbers to be added and, at the end of the computation,
one of the processing elements stores the sum of all the numbers. Assuming that n is a power of
two, we can perform this operation in log n steps by propagating partial sums up a logical binary
tree of processing elements. Following figure illustrates the procedure for n = 16. The processing
elements are labeled from 0 to 15. Similarly, the 16 numbers to be added are labeled from 0 to 15.
The sum of the numbers with consecutive labels from i to j is denoted by .
Figure. Computing the global sum of 16 partial sums using 16 processing elements. denotes the sum of
numbers with consecutive labels from i to j.
Equation
At least three distinct parallel formulations of matrix-vector multiplication are possible, depending
on whether rowwise 1-D, columnwise 1-D, or a 2-D partitioning is used.
Algorithm :A serial algorithm for multiplying an n x n matrix A with an n x 1 vector x to
yield an n x 1 product vector y.
1. procedure MAT_VECT ( A, x, y)
2. begin
3. for i := 0 to n - 1 do
4. begin
5. y[i]:=0;
6. for j := 0 to n - 1 do
7. y[i] := y[i] + A[i, j] x x[j];
8. endfor;
9. end MAT_VECT
Rowwise 1-D Partitioning
This section details the parallel algorithm for matrix-vector multiplication using rowwise block 1-D
partitioning. The parallel algorithm for columnwise block 1-D partitioning is similar (Problem 8.2)
and has a similar expression for parallel run time. Figure describes the distribution and movement
of data for matrix-vector multiplication with block 1-D partitioning.
Figure:Multiplication of an n x n matrix with an n x 1 vector using rowwise block 1-D partitioning. For the one-
row-per-process case, p = n.
One Row Per Process
First, consider the case in which the n x n matrix is partitioned among n processes so that each
process stores one complete row of the matrix. The n x 1 vector x is distributed such that each
process owns one of its elements. The initial distribution of the matrix and the vector for rowwise
block 1-D partitioning is shown in fig(a). Process Pi initially owns x[i] and A[i, 0], A[i, 1], ...,
A[i, n-1] and is responsible for computing y[i]. Vector x is multiplied with each row of the matrix
(Algorithm); hence, every process needs the entire vector. Since each process starts with only one
element of x, an all-to-all broadcast is required to distribute all the elements to all the processes.
Fig(b), process Pi computes (lines 6 and 7 of algorithm). As Fig(b)
shows, the result vector y is stored exactly the way the starting vector x was stored.
Parallel Run Time Starting with one vector element per process, the all-to-all broadcast of the
vector elements among n processes requires time Q(n) on any architecture . The multiplication of
a single row of A with x is also performed by each process in time Q(n). Thus, the entire
procedure is completed by n processes in time Q(n), resulting in a process-time product of Q(n2).
The parallel algorithm is cost-optimal because the complexity of the serial algorithm is Q(n2).
Using Fewer than n Processes
Consider the case in which p processes are used such that p < n, and the matrix is partitioned
among the processes by using block 1-D partitioning. Each process initially stores n/p complete
rows of the matrix and a portion of the vector of size n/p. Since the vector x must be multiplied
with each row of the matrix, every process needs the entire vector (that is, all the portions belonging
to separate processes). This again requires an all-to-all broadcast as shown in fig(b) and (c). The
all-to-all broadcast takes place among p processes and involves messages of size n/p. After this
communication step, each process multiplies its n/p rows with the vector x to produce n/p
elements of the result vector.Fig(d) shows that the result vector y is distributed in the same format
as that of the starting vector x.
Parallel Run Time According to table, an all-to-all broadcast of messages of size n/p among p
processes takes time ts log p + tw(n/ p)( p - 1). For large p, this can be approximated by ts log
p + twn. After the communication, each process spends time n2/pmultiplying its n/p rows with the
vector. Thus, the parallel run time of this procedure is
Equation
The process-time product for this parallel formulation is n2 + ts p log p + twnp. The algorithm is
cost-optimal for p = O(n).
Scalability Analysis We now derive the isoefficiency function for matrix-vector multiplication
along the lines of the analysis in section by considering the terms of the overhead function one at a
time. Consider the parallel run time given by equation for the hypercube architecture. The relation
To = pTP - W gives the following expression for the overhead function of matrix-vector
multiplication on a hypercube with block 1-D partitioning:
Equation
Recall from chapter that the central relation that determines the isoefficiency function of a parallel
algorithm is W = KTo , where K = E/(1 - E) and E is the desired efficiency. Rewriting this
relation for matrix-vector multiplication, first with only the ts term of To,
Equation
Equation gives the isoefficiency term with respect to message startup time. Similarly, for the tw
term of the overhead function,
Since W = n2 (Equation), we derive an expression for W in terms of p, K , and tw (that is, the
isoefficiency function due to tw) as follows:
Equation 8.5
Now consider the degree of concurrency of this parallel algorithm. Using 1-D partitioning, a
maximum of n processes can be used to multiply an n x n matrix with an n x 1 vector. In other
words, p is O(n), which yields the following condition:
Equation 8.6
The overall asymptotic isoefficiency function can be determined by comparing Equations 8.4, 8.5,
and 8.6. Among the three, Equations 8.5 and 8.6 give the highest asymptotic rate at which the
problem size must increase with the number of processes to maintain a fixed efficiency. This rate of
Q(p2) is the asymptotic isoefficiency function of the parallel matrix-vector multiplication algorithm
with 1-D partitioning.
2-D Partitioning
This section discusses parallel matrix-vector multiplication for the case in which the matrix is
distributed among the processes using a block 2-D partitioning. Figure 8.2 shows the distribution
of the matrix and the distribution and movement of vectors among the processes.
Figure:Matrix-vector multiplication with block 2-D partitioning. For the one-element-per-process case, p = n 2 if the
matrix size is n x n.
Answer:
Cannon's algorithm is a distributed algorithm for matrix multiplication for two-dimensional
meshes. It is especially suitable for computers laid out in an N × N mesh. While Cannon's algorithm
works well in homogeneous 2D grids, extending it to heterogeneous 2D grids has been shown to be
difficult.The main advantage of the algorithm is that its storage requirements remain constant
and are independent of the number of processors.
Assignment 3
Q. Questions Max. Unit. CO Bloom's
No. Marks No. as mapped Taxono
per my
syllab Level
us
1. Explain Bitonic sort with an example. 04 5 3 2
2. Enlist the issues in sorting on parallel computers. 02 5 3 1
3. Explain the working of parallel quick sort algorithms with an 04 5 4 2
example.
4. Explain CUDA Architecture with Schematic Diagram. 04 6 3 3
5. Write a short note on Memory Hierarchy. 04 6 2 3
6
Solution(Assignment 3 )
Q. Questions Max. Unit. CO Bloom's
No. Marks No. as mapped Taxonom
per y Level
syllabu
s
1. Explain Bitonic sort with an example. 05 5 3 2
Answer:
Bitonic sort is a parallel sorting algorithm which performs O(n2 log n) comparisons. Although, the
number of comparisons are more than that in any other popular sorting algorithm, It performs better
for the parallel implementation because elements are compared in predefined sequence which must
not be depended upon the data being sorted. The predefined sequence is called Bitonic sequence.
What is Bitonic Sequence ?
In order to understand Bitonic sort, we must understand Bitonic sequence. Bitonic sequence is the
one in which, the elements first comes in increasing order then start decreasing after some particular
index. An array A[0... i ... n-1] is called Bitonic if there exist an index i such that,
A[0] < A[1] < A[2] .... A[i-1] < A[i] > A[i+1] > A[i+2] > A[i+3] > ... >A[n-1]
Figure :. A parallel compare-exchange operation. Processes Pi and Pj send their elements to each other. Process Pi
keeps min{ai, aj}, and Pj keeps max{ai , aj}.
If we assume that processes Pi and Pj are neighbors, and the communication channels are
bidirectional, then the communication cost of a compare-exchange step is (ts + tw), where ts and
tw are message-startup time and per-word transfer time, respectively. In commercially available
message-passing computers, ts is significantly larger than tw, so the communication time is
dominated by ts. Note that in today's parallel computers it takes more time to send an element from
one process to another than it takes to compare the elements. Consequently, any parallel sorting
formulation that uses as many processes as elements to be sorted will deliver very poor performance
because the overall parallel run time will be dominated by interprocess communication.
More than One Element Per Process
A general-purpose parallel sorting algorithm must be able to sort a large sequence with a relatively
small number of processes. Let p be the number of processes P0, P1, ..., Pp-1, and let n be the
number of elements to be sorted. Each process is assigned a block of n/p elements, and all the
processes cooperate to sort the sequence. Let A0, A1, ... A p-1 be the blocks assigned to
processes P0, P1, ... Pp-1, respectively. We say that Ai Aj if every element of Ai is less
than or equal to every element in Aj. When the sorting algorithm finishes, each process Pi holds a
set such that for i j, and .
As in the one-element-per-process case, two processes Pi and Pj may have to redistribute their
blocks of n/p elements so that one of them will get the smaller n/p elements and the other will get
the larger n/p elements. Let Ai and Aj be the blocks stored in processes Pi and Pj. If the block
of n/p elements at each process is already sorted, the redistribution can be done efficiently as
follows. Each process sends its block to the other process. Now, each process merges the two sorted
blocks and retains only the appropriate half of the merged block. We refer to this operation of
comparing and splitting two sorted blocks as compare-split. The compare-split operation is
illustrated in figure.
Answer:
CPUs are designed to process as many sequential instructions as quickly as possible. While most
CPUs support threading, creating a thread is usually an expensive operation and high-end CPUs can
usually make efficient use of no more than about 12 concurrent threads.GPUs on the other hand are
designed to process a small number of parallel instructions on large sets of data as quickly as
possible. For instance, calculating 1 million polygons and determining which to draw on the screen
and where. To do this they rely on many slower processors and inexpensive threads.
The numbers of SPs/cores in an SM and the number of SMs depend on your device: see the
Finding your Device Specifications section below for details. It is important to realize, however,
that regardless of GPU model, there are many more CUDA cores in a GPU than in a typical
multicore CPU: hundreds or thousands more. For example, the Kepler Streaming Multiprocessor
design, dubbed SMX, contains 192 single-precision CUDA cores, 64 double-precision units, 32
special function units, and 32 load/store units. (See the Kepler Architecture Whitepaper for a
description and diagram.)
CUDA cores are grouped together to perform instructions in a what nVIDIA has termed a warp of
threads. Warp simply means a group of threads that are scheduled together to execute the same
instructions in lockstep. All CUDA cards to date use a warp size of 32. Each SM has at least one
warp scheduler, which is responsible for executing 32 threads. Depending on the model of GPU, the
cores may be double or quadruple pumped so that they execute one instruction on two or four
threads in as many clock cycles. For instance, Tesla devices use a group of 8 quadpumped cores to
execute a single warp. If there are less than 32 threads scheduled in the warp, it will still take as
long to execute the instructions.
The CUDA programmer is responsible for ensuring that the threads are being assigned efficiently
for code that is designed to run on the GPU. The assignment of threads is done virtually in the code
using what is sometimes referred to as a ‘tiling’ scheme of blocks of threads that form a grid.
Programmers define a kernel function that will be executed on the CUDA card using a particular
tiling scheme.
Virtual Architecture
When programming in CUDA C we work with blocks of threads and grids of blocks. What is the
relationship between this virtual architecture and the CUDA card’s physical architecture?
When kernels are launched, each block in a grid is assigned to a Streaming Multiprocessor. This
allows threads in a block to use __shared__ memory. If a block doesn’t use the full resources of
the SM then multiple blocks may be assigned at once. If all of the SMs are busy then the extra
blocks will have to wait until a SM becomes free.
Once a block is assigned to an SM, it’s threads are split into warps by the warp scheduler and
executed on the CUDA cores. Since the same instructions are executed on each thread in the warp
simultaneously it’s generally a bad idea to have conditionals in kernel code. This type of code is
sometimes called divergent: when some threads in a warp are unable to execute the same
instruction as other threads in a warp, those threads are diverged and do no work.
Because a warp’s context (it’s registers, program counter etc.) stays on chip for the life of the warp,
there is no additional cost to switching between warps vs executing the next step of a given warp.
This allows the GPU to switch to hide some of it’s memory latency by switching to a new warp
while it waits for a costly read.
CUDA Memory
CUDA on chip memory is divided into several different regions
Registers act the same way that registers on CPUs do, each thread has it’s own set of
registers.Local Memory local variables used by each thread. They are not accessible by other
threads even though they use the same L1 and L2 cache as global memory.
Shared Memory is accessible by all threads in a block. It must be declared using the __shared__
modifier. It has a higher bandwidth and lower latency than global memory. However, if multiple
threads request the same address, the requests are processed serially, which slows down the
application.
Constant Memory is read-accessible by all threads and must be declared with the __const__
modifier. In newer devices there is a separate read only constant cache.
Global Memory is accessible by all threads. It’s the slowest device memory, but on new cards, it is
cached. Memory is pulled in 32, 64, or 128 byte memory transactions. Warps executing global
memory accesses attempt to pull all the data from global memory simultaneously therefore it’s
advantageous to use block sizes that are multiples of 32. If multidimensional arrays are used, it’s
also advantageous to have the bounds padded so that they are multiples of 32 Texture/Surface
Memory is read-accesible by all threads, but unlike Constant Memory, it is optimized for 2D
spacial locality, and cache hits pull in surrounding values in both x and y directions.
Q. Questions Max. Unit. CO Bloom's
No. Marks No. as mapped Taxonom
per y Level
syllabu
s
5. Write a short note on Memory Hierarchy. 05 6 2 3
Answer:
The memory in a computer can be divided into five hierarchies based on the speed as well as use.
The processor can move from one level to another based on its requirements. The five hierarchies in
the memory are registers, cache, main memory, magnetic discs, and magnetic tapes. The first three
hierarchies are volatile memories which mean when there is no power, and then automatically they
lose their stored data. Whereas the last two hierarchies are not volatile which means they store the
data permanently.
A memory element is the set of storage devices which stores the binary data in the type of bits. In
general, the storage of memory can be classified into two categories such as volatile as well as
non- volatile.
Memory Hierarchy in Computer Architecture
The memory hierarchy design in a computer system mainly includes different storage devices.
Most of the computers were inbuilt with extra storage to run more powerfully beyond the main
memory capacity. The following memory hierarchy diagram is a hierarchical pyramid for computer
memory. The designing of the memory hierarchy is divided into two types such as primary
(Internal) memory and secondary (External) memory.
Memory Hierarchy
Primary Memory:
The primary memory is also known as internal memory, and this is accessible by the processor
straightly. This memory includes main, cache, as well as CPU registers.
Secondary Memory:
The secondary memory is also known as external memory, and this is accessible by the processor
through an input/output module. This memory includes an optical disk, magnetic disk, and magnetic
tape.
Unit-II
Q.1.Explain decomposition, Task & Depedancy graph.
Q.2.Explain Granularity, Concurrency & Task interaction.
Q.3.Explain decomposition techniques with its types.
Q.4.What are the characteristics of Task and Interactions?
Q.5.Explain the Mapping techniques in details.
Q.6.Explain parallel Algorithm Model.
Q.7.Explain Thread Organization.
Q.8.Write a short note on IBM CBE
Q.9.Explain history of GPUs and NVIDIA Tesla GPU.
Unit-III
Q.1.Explain Broadcast & Reduce operation with help of diagram.
Q.2.Explain One-to-all broadcast and reduction on a Ring?
Q.3.Explain Operation of All to one broadcast & Reduction on a ring?
Q.4.Write a pseudo code for One-to-all broadcast algorithm on hyper cube with different cases?
Q.5.Explain term of All-to-all broadcast & reduction on Liner array, mesh and Hypercube
Typologies.
Q.6.Explain Scatter and Gather Operation.
Q.7.Write short note on Circular shaft on Mesh and hypercube.
Q.8.Explain different approaches of Communication operation.
Q.9.Explain all to all personalized communication?
Unit-IV
Unit-V
Unit-VI
Section A
Q2.When the processor executes multiple instructions at a time it is said to use _______
a) single issue
b) Multiplicity
c) Visualization
d) Multiple issues
Ans:Multiple issues
Q3.When the processor executes multiple instructions at a time it is said to use _______
a) single issue
b) Multiplicity
c) Visualization
d) Multiple issues
Ans:Multiple issues
Q4.Which of the following is informal name of address register for the memory operations?
1. Storage register
2. Memory address register
3. Instruction register
4. Microinstruction register
1. Arithmetic operation
2. Logical operation
3. Both (1) and (2)
4. None of the above
Single Instruction, Single Data (SISD) refers to an Instruction Set Architecture in which a single processor (one
CPU) executes exactly one instruction stream at a time and also fetches or stores one item of data at a time to
operate on data stored in a single memory unit. Most of the CPU design, based on the von Neumann
architecture, from the beginning till recent times are based on the SISD. The SISD model is a typical non-
pipelined architecture with the general-purpose registers, as well as dedicated special registers such as the
Program Counter (PC), the Instruction Register (IR), Memory Address Registers (MAR) and Memory Data
Registers (MDR).
Single Instruction, Multiple Data (SIMD) is an Instruction Set Architecture that have a single control unit (CU)
and more than one processing unit (PU) that operates like a von Neumann machine by executing a single
instruction stream over PUs, handled through the CU. The CU generates the control signals for all of the PUs
and by which executes the same operation on different data streams. The SIMD architecture, in effect, is
capable of achieving data level parallelism just like with vector processor.
Some of the examples of the SIMD based systems include IBM's AltiVec and SPE for PowerPC, HP's PA-RISC
Multimedia Acceleration eXtensions (MAX), Intel's MMX and iwMMXt, SSE, SSE2, SSE3 and SSSE3,
AMD's 3DNow! etc.
Multiple Instruction, Single Data (MISD) is an Instruction Set Architecture for parallel computing where many
functional units perform different operations by executing different intructions on the same data set. This type
of architecture is common mainly in the fault-tolerant computers executing the same instructions redundantly in
order to detect and mask errors.
Q2.Write a short note on UMA and NUMA
Ans:UMA (Uniform Memory Access) system is a shared memory architecture for the multiprocessors. In this
model, a single memory is used and accessed by all the processors present the multiprocessor system with the
help of the interconnection network. Each processor has equal memory accessing time (latency) and access
speed. It can employ either of the single bus, multiple bus or crossbar switch. As it provides balanced shared
memory access, it is also known as SMP (Symmetric multiprocessor) systems.
Ans:The process of dividing a computation into smaller parts, some or all of which may potentially be
executed in parallel, is called decomposition. Tasks are programmer-defined units of computation
into which the main computation is subdivided by means of decomposition. Simultaneous execution of
multiple tasks is the key to reducing the time required to solve the entire problem. Tasks can be of
arbitrary size, but once defined, they are regarded as indivisible units of computation. The tasks into
which a problem is decomposed may not all be of the same size.
Consider the multiplication of a dense n x n matrix A with a vector b to yield another vector y.
The ith element y[i] of the product vector is the dot-product of the ith row of A with the input
vector b; i.e., . As shown later in fig, the computation of each y[i] can be
regarded as a task. Alternatively, as shown later in fig, the computation could be decomposed into
fewer, say four, tasks where each task computes roughly n/4 of the entries of the vector y.
Figure:. Decomposition of dense matrix-vector multiplication into n tasks, where n is the number of rows in the matrix.
The portions of the matrix and the input and output vectors accessed by Task 1 are highlighted.
Note that all tasks in figure are independent and can be performed all together or in any sequence.
However, in general, some tasks may use data produced by other tasks and thus may need to wait for
these tasks to finish execution. An abstraction used to express such dependencies among tasks and
their relative order of execution is known as a task-dependency graph. A task-dependency graph is a
directed acyclic graph in which the nodes represent tasks and the directed edges indicate the
dependencies amongst them. The task corresponding to a node can be executed when all tasks
connected to this node by incoming edges have completed. Note that task-dependency graphs can be
disconnected and the edge-set of a task-dependency graph can be empty. This is the case for matrix-
vector multiplication, where each task computes a subset of the entries of the product vector. To see a
more interesting task-dependency graph, consider the following database query processing example.
CLASS TEST- II
(AY 2018-19)
Branch: Computer Engineering (BE) Date:
Semester: V Duration: 1 hour
Subject: High Performance Computing - 410241 Max. Marks: 20M
Ans:complex system
Section B
Q1.What are different partitioning techniques used in matrix vector multiplication.
Ans:This section addresses the problem of multiplying a dense n x n matrix A with an n x 1 vector
x to yield the n x 1 result vector y. algorithm shows a serial algorithm for this problem. The
sequential algorithm requires n2 multiplications and additions. Assuming that a multiplication and
addition pair takes unit time, the sequential run time is
Equation
At least three distinct parallel formulations of matrix-vector multiplication are possible, depending on
whether rowwise 1-D, columnwise 1-D, or a 2-D partitioning is used.
Algorithm :A serial algorithm for multiplying an n x n matrix A with an n x 1 vector x to yield
an n x 1 product vector y.
1. procedure MAT_VECT ( A, x, y)
2. begin
3. for i := 0 to n - 1 do
4. begin
5. y[i]:=0;
6. for j := 0 to n - 1 do
7. y[i] := y[i] + A[i, j] x x[j];
8. endfor;
9. end MAT_VECT
Rowwise 1-D Partitioning
This section details the parallel algorithm for matrix-vector multiplication using rowwise block 1-D
partitioning. The parallel algorithm for columnwise block 1-D partitioning is similar (Problem 8.2) and
has a similar expression for parallel run time. Figure describes the distribution and movement of data
for matrix-vector multiplication with block 1-D partitioning
2-D Partitioning
This section discusses parallel matrix-vector multiplication for the case in which the matrix is
distributed among the processes using a block 2-D partitioning. fig shows the distribution of the
matrix and the distribution and movement of vectors among the processes.
Q2.How search overhead factor works ?
Ans:Parallel search algorithms incur overhead from several sources. These include communication
overhead, idle time due to load imbalance, and contention for shared data structures. Thus, if both the
sequential and parallel formulations of an algorithm do the same amount of work, the speedup of
parallel search on p processors is less than p. However, the amount of work done by a parallel
formulation is often different from that done by the corresponding sequential formulation because they
may explore different parts of the search space.Let W be the amount of work done by a single
processor, and Wp be the total amount of work done by p processors. The search overhead factor
of the parallel system is defined as the ratio of the work done by the parallel formulation to that done
by the sequential formulation, or Wp/W. Thus, the upper bound on speedup for the parallel system is
given by p x(W/Wp). The actual speedup, however, may be less due to other parallel processing
overhead. In most parallel search algorithms, the search overhead factor is greater than one. However,
in some cases, it may be less than one, leading to superlinear speedup. If the search overhead factor is
less than one on the average, then it indicates that the serial search algorithm is not the fastest
algorithm for solving the problem.To simplify our presentation and analysis, we assume that the time
to expand each node is the same, and W and Wp are the number of nodes expanded by the serial and
the parallel formulations, respectively. If the time for each expansion is tc, then the sequential run
time is given by TS = tcW. In the remainder of the chapter, we assume that tc = 1. Hence, the
problem size W and the serial run time TS become the same.
The relatively new research field of green computing pursues energy conservation not just as a
commercial advantage, (longer battery life, less weight), but as an environmental goal in itself. Some
of the green computing topics studied at Stanford include long-term trends in energy-efficient
computing, resource management in large multi-core systems, and data center economics and best
practices. Stanford engineers are developing low-power wireless networks, tiny semiconductor lasers
for low-energy data interconnects, nano-sized electromechanical relays for ultra-low power
computation, and an image and signal processor 20 times more power efficient than conventional
processors. They are also working on circuit, architecture and application optimization tools;
nanomaterials for energy-efficient transistors, data storage and integrated circuits; and efficient
networks for homes and offices.
Optical computing:Computers have enhanced human life to a great extent. The speed of conventional
computers is achieved by miniaturizing electronic components to a very small micron-size scale so
that those electrons need to travel only very short distances within a very short time. The goal of
improving on computer speed has resulted in the development of the Very Large Scale Integration
(vlsi) technology with smaller device dimensions and greater complexity. Last year, the smallest-to-
date dimensions of vlsi reached 0.08 ìm by researchers at Lucent Technology. Whereas vlsi
technology has revolutionized the electronics industry and established the 20th century as the
computer age, increasing usage of the Internet demands better accommodation of a 10 to 15 percent
per month growth rate. Additionally, our daily lives demand solutions to increasingly sophisticated
and complex problems, which requires more speed and better performance of computers.
For these reasons, it is unfortunate that vlsi technology is approaching its fundamental limits in the
sub-micron miniaturization process. It is now possible to fit up to 300 million transistors on a single
silicon chip. It is also estimated that the number of transistor switches that can be put onto a chip
doubles every 18 months. Further miniaturization of lithography introduces several problems such as
dielectric breakdown, hot carriers, and short channel effects. All of these factors combine to seriously
degrade device reliability. Even if developing technology succeeded in temporarily overcoming these
physical problems, we will continue to face them as long as increasing demands for higher integration
continues. Therefore, a dramatic solution to the problem is needed, and unless we gear our thoughts
toward a totally different pathway, we will not be able to further improve our computer performance
for the future.
Optical interconnections and optical integrated circuits will provide a way out of these limitations to
computational speed and complexity inherent in conventional electronics. Optical computers will use
photons traveling on optical fibers or thin films instead of electrons to perform the appropriate
functions. In the optical computer of the future, electronic circuits and wires will be replaced by a few
optical fibers and films, making the systems more efficient with no interference, more cost effective,
lighter and more compact. Optical components would not need to have insulators as those needed
between electronic components because they don’t experience cross talk. Indeed, multiple frequencies
(or different colors) of light can travel through optical components without interfacing with each
others, allowing photonic devices to process multiple streams of data simultaneously.
Q4.Define Describe Cannon’s Algorithm for Matrix multiplication with suitable example.
Cannon’s algorithm
Q.9 a. Define term HPC and elaborate its use in Indian society 09 CO2 3
b. What is the Search-Overhead -Factor 09 CO2 3
OR
Q.10 a. Explain Power aware Processing 09 CO3 4
b. Explain Quantum Computer with suitable example 09 CO2 2
Q.1 a. Explain SIMD, MIMD and SIMT architecture
5
Answer:SISD (Single Instruction, Single Data stream)
Single Instruction, Single Data (SISD) refers to an Instruction Set Architecture in which a single
processor (one CPU) executes exactly one instruction stream at a time and also fetches or stores one item
of data at a time to operate on data stored in a single memory unit. Most of the CPU design, based on the
von Neumann architecture, from the beginning till recent times are based on the SISD. The SISD model is
a typical non-pipelined architecture with the general-purpose registers, as well as dedicated special
registers such as the Program Counter (PC), the Instruction Register (IR), Memory Address Registers
(MAR) and Memory Data Registers (MDR).
Single Instruction, Multiple Data (SIMD) is an Instruction Set Architecture that have a single control unit
(CU) and more than one processing unit (PU) that operates like a von Neumann machine by executing a
single instruction stream over PUs, handled through the CU. The CU generates the control signals for all
of the PUs and by which executes the same operation on different data streams. The SIMD architecture,
in effect, is capable of achieving data level parallelism just like with vector processor.
Some of the examples of the SIMD based systems include IBM's AltiVec and SPE for PowerPC, HP's
PA-RISC Multimedia Acceleration eXtensions (MAX), Intel's MMX and iwMMXt, SSE, SSE2, SSE3
and SSSE3, AMD's 3DNow! etc.
Multiple Instruction, Single Data (MISD) is an Instruction Set Architecture for parallel computing where
many functional units perform different operations by executing different intructions on the same data set.
This type of architecture is common mainly in the fault-tolerant computers executing the same
instructions redundantly in order to detect and mask errors.
Q.2 a. State difference between write- invalidate and write update Protocol
Answer:The performance differences between write update and write invalidate protocols arise from
three characteristics: 1 Multiple writes to the same word with no intervening reads require multiple write
broadcasts in an update protocol, but only one initial invalidation in a write invalidate protocol. 2 With
multiword cache blocks, each word written in a cache block requires a write broadcast in an update
protocol, although only the first write to any word in the block needs to generate an invalidate in an
invalidation protocol. An invalidation protocol works on cache blocks, while an update protocol must
work on individual words (or bytes, when bytes are written). It is possible to try to merge writes in a write
broadcast scheme. 3 The delay between writing a word in one processor and reading the written value in
another processor is usually less in a write update scheme, since the written data are immediately updated
in the reader’s cache.
#include <pthread.h>
A failure of EDEADLK indicates that the mutex is already held by the calling thread.
The maximum number of recursive locks by the owning thread is 32,767. When this number is
exceeded, attempts to lock the mutex return the ERECURSE error.
Basically, the producer produces goods while the consumer consumes the goods and typically
does something with them.
In our case our producer will produce an item and place it in a bound-buffer for the consumer.
Then the consumer will remove the item from the buffer and print it to the screen.
Where: We will use semaphores in any place where we may have concurrency issues. In other
words any place where we feel more than one thread will access the data or structure at any
given time.
Why: Think about how registers work in the operating system for a second. Here is an example
of how registers work when you increment a counter-
register1 = counter;
register1 = register1 + 1;
counter = register1;
Now image two threads manipulating this same example but one thread is decrementing–
Because both threads were allowed to run without synchronization our counter now has a
definitely wrong value. With synchronization the answer should come out to be 5 like it started.
How: We implement a semaphore as an integer value that is only accessible through two atomic
operations wait() and signal(). Defined as follows:
wait(S) {
signal(S) {
S++;
S: Semaphore
The operation wait() tells the system that we are about to enter a critical section and signal()
notifies that we have left the critical section and it is now accessible to other threads.
Therefore:
wait(mutex);
signal(mutex)
Mutex stands for mutual exclusion. Meaning only one process may execute the section at a time.
We have an example that demonstrates how semaphores are used in reference to pthreads
coming up right after this problem walk-through.
Basically, we are going to have a program that creates an N number of producer and consumer
threads. The job of the producer will be to generate a random number and place it in a bound-
buffer. The role of the consumer will be to remove items from the bound-buffer and print them to
the screen. Remember the big issue here is concurrency so we will be using semaphores to help
prevent any issues that might occur. To double our efforts we will also be using a pthread mutex
lock to further guarantee synchronization.
The user will pass in three arguments to start to application: <INT, time for the main method to
sleep before termination> <INT, Number of producer threads> <INT, number of consumer
threads>
We will then use a function in initialize the data, semaphores, mutex lock, and pthread attributes.
Unified Memory allows applications to directly access the memory of all GPUs and all of
system memory
ECC memory error protection – meets a critical requirement for computing accuracy and
reliability in data centers and supercomputing centers.
System monitoring features – integrate the GPU subsystem with the host system’s
monitoring and management capabilities such as IPMI. IT staff can manage the GPU
processors in the computing system with widely-used cluster/grid management tools.
ii) Important point while using communication is synchronization point among processes.
iii) MPI provides special function which is designed and implemented for synchronization
named as MPI_Barrier().
iv) The working of this function is such a way that no process a allowed to cross a barrier
untill all the processes have reached up to that barrier in their respective codes.
vi) The argument passed to this function is the communicator. A group of processes need to
be synchronized are defined in communicator. Calling process blocks until all the processes
in the given communicator have called it. This means the call only returns when all processes
have entered the call.
vii) The MPI_Barrier() function is invoked by process 0.
viii) When Process 0 reaches barrier, it stops and wait for remaining process to reach to the
barrier point.
ix) After every process reaches barrier point, the execution continues. Like this
synchronization is achieved using Barrier.
ii) The number of columns present in the network is called as depth of the network.
iii) Comparator plays an important role in the network. It is a device which takes two inputs a
and b, and generates two outputs a’ and b’.
vii) Each column performs a permutation and the sorted output is taken from the last column.
Answer:MPI views the processes as being arranged in a one-dimensional topology and uses a
linear ordering to number the processes. However, in many parallel programs, processes are
naturally arranged in higher-dimensional topologies (e.g., two- or three-dimensional). In such
programs, both the computation and the set of interacting processes are naturally identified by
their coordinates in that topology. For example, in a parallel program in which the processes are
arranged in a two-dimensional topology, process (i, j) may need to send message to (or receive
message from) process (k, l). To implement these programs in MPI, we need to map each MPI
process to a process in that higher-dimensional topology.
Many such mappings are possible. Figure 6.5 illustrates some possible mappings of eight MPI
processes onto a 4 x 4 two-dimensional topology. For example, for the mapping shown in Figure
6.5(a), an MPI process with rank rank corresponds to process (row, col) in the grid such that
row = rank/4 and col = rank%4 (where '%' is C's module operator). As an illustration, the
process with rank 7 is mapped to process (1, 3) in the grid.
Figure 6.5. Different ways to map a set of processes to a two-dimensional grid. (a) and (b) show
a row- and column-wise mapping of these processes, (c) shows a mapping that follows a space-
filling curve (dotted line), and (d) shows a mapping in which neighboring processes are directly
connected in a hypercube.
n general, the goodness of a mapping is determined by the pattern of interaction among the
processes in the higher-dimensional topology, the connectivity of physical processors, and the
mapping of MPI processes to physical processors. For example, consider a program that uses a
two-dimensional topology and each process needs to communicate with its neighboring
processes along the x and y directions of this topology. Now, if the processors of the
underlying parallel system are connected using a hypercube interconnection network, then the
mapping shown in Figure 6.5(d) is better, since neighboring processes in the grid are also
neighboring processors in the hypercube topology.
However, the mechanism used by MPI to assign ranks to the processes in a communication
domain does not use any information about the interconnection network, making it impossible to
perform topology embeddings in an intelligent manner. Furthermore, even if we had that
information, we will need to specify different mappings for different interconnection networks,
diminishing the architecture independent advantages of MPI. A better approach is to let the
library itself compute the most appropriate embedding of a given topology to the processors of
the underlying parallel computer. This is exactly the approach facilitated by MPI. MPI provides a
set of routines that allows the programmer to arrange the processes in different topologies
without having to explicitly specify how these processes are mapped onto the processors. It is up
to the MPI library to find the most appropriate mapping that reduces the cost of sending and
receiving messages.
Example Quicksort
Consider the problem of sorting a sequence A of n elements using the commonly used
quicksort algorithm. Quicksort is a divide and conquer algorithm that starts by selecting a pivot
element x and then partitions the sequence A into two subsequences A0 and A1 such that all
the elements in A0 are smaller than x and all the elements in A1 are greater than or equal to
x. This partitioning step forms the divide step of the algorithm. Each one of the subsequences
A0 and A1 is sorted by recursively calling quicksort. Each one of these recursive calls further
partitions the sequences. This is illustrated in fig for a sequence of 12 numbers. The recursion
terminates when each subsequence contains only a single element.
In fig, we define a task as the work of partitioning a given subsequence. Therefore, fig also
represents the task graph for the problem. Initially, there is only one sequence (i.e., the root of
the tree), and we can use only a single process to partition it. The completion of the root task
results in two subsequences (A0 and A1, corresponding to the two nodes at the first level of the
tree) and each one can be partitioned in parallel. Similarly, the concurrency continues to increase
as we move down the tree.
Answer:i) To solve a discrete optimization problem, depth first search is used if it can be
formulated as tree search problem. Depth-first search can be performed in parallel by
partitioning the search space into many small, disjunct parts (subtrees) that can be explored
concurrently. DFS starts with initial node by generating its successors.
ii) If any node has no successors, then it indicates that there is no solution in that path. Thus
backtracking is done and continued to expand another node. Following figure gives the DFS
expansion of the 8-puzzle.
iii) The initial configuration is given in (A) .There are only two possible moves,Blank up or
blank right. Thus from (A) two children or successors are generated. Those are (B) and (C).
iv) This is done in step 1. In step 2, any one of (B) and (C) is selected. If (B) is selected then
its successors (D), (E) and (F) are generated. If (C) is selected then its successors (G), (H)
and (I) are generated.
v) Assuming (B) is selected, and then in the next step (D) is selected. It is clear that (D) is
same as (A), thus backtracking is necessary. This process is repeated until the required result
is found.
Answer: Make in India is an initiative launched by the Government of India to encourage multi
national as well as national companies to manufacture their products in India. It was launched by
Prime Minister Narendra Modi on 25 september 2014. After the initiation of the programme in
2015, India emerged as the top destination for foreign direct investment.
Biotechnology
Construction
Chemicals
Electronic System
Aviation
Mining
Railways
iii) The Mission Make in India also includes development of highly Professional High
Performance Computing (HPC) aware human resource for meeting challenges of development of
these applications. As far as HPC are considered, the construction of super computer is a big
achievement. Till now India has developed many supercomputers, among them, 8 computers are
in the list of world’s best 500 supercomputers.
Q.9 a. Define term HPC and elaborate its use in Indian society
Let W be the amount of work done by a single processor, and Wp be the total amount of work
done by p processors. The search overhead factor of the parallel system is defined as the ratio of
the work done by the parallel formulation to that done by the sequential formulation, or Wp/W.
Thus, the upper bound on speedup for the parallel system is given by p x(W/Wp). The actual
speedup, however, may be less due to other parallel processing overhead. In most parallel search
algorithms, the search overhead factor is greater than one. However, in some cases, it may be
less than one, leading to superlinear speedup. If the search overhead factor is less than one on the
average, then it indicates that the serial search algorithm is not the fastest algorithm for solving
the problem.
To simplify our presentation and analysis, we assume that the time to expand each node is the
same, and W and Wp are the number of nodes expanded by the serial and the parallel
formulations, respectively. If the time for each expansion is tc, then the sequential run time is
given by TS = tcW. In the remainder of the chapter, we assume that tc = 1. Hence, the problem
size W and the serial run time TS become the same.
Answer:Quantum is the minimum amount of any physical entity which is involved in any
communication.
ii) The computer which is designed by using the principles of quantum physics is called quantum
computer.
iii) A quantum computer stores the information using special types of bits called quantum bit
represented as |0> and | 1 >.
iv) This increases the flexibility of the computations. It performs the calculations based on the
laws of quantum physics.
v) The quantum bits are implemented using the two energy levels of an atom. An excited state
represents | 1> and a ground state represents | 0>.
vi) Quantum gates are used to perform operations on the data. They are very similar to the
traditional logical gates.
vii) Since the quantum gates are reversible, we can generate the original input from the obtained
output as well.
viii) A quantum computer has the power of atoms to perform any operation. It has a capability of
processing millions of operations parallely.
High Performance Computing BE Computer Engineering
BE/Insem./Oct.-583
B. E. (Computer Engineering)
HIGH PERFORMANCE COMPUTING
(2015 Pattern) (Semester – I)
[Time : 1 Hour] Max Marks : 30
Ans :-
.
High Performance Computing BE Computer Engineering
b ) Explain the impact of Memory Latency & Memory Bandwidth on system performance. [6]
Ans:
.
High Performance Computing BE Computer Engineering
OR
.
High Performance Computing BE Computer Engineering
.
High Performance Computing BE Computer Engineering
b ) Describe the scope of parallel computing. Give application of parallel computing. [4]
Ans :-
.
High Performance Computing BE Computer Engineering
Q 3.a ) Explain any three data decomposition techniques with example [6]
Ans :-
.
High Performance Computing BE Computer Engineering
.
High Performance Computing BE Computer Engineering
.
High Performance Computing BE Computer Engineering
OR
.
High Performance Computing BE Computer Engineering
Ans :-
Characteristics of GPUs :
A Graphics Processing Unit (GPU) is a single-chip processor primarily used to manage and boost the
performance of video and graphics. GPU features include:
These features are designed to lessen the work of the CPU and produce faster video and graphics.
A GPU is not only used in a PC on a video card or motherboard; it is also used in mobile phones,
display adapters, workstations and game consoles.
This term is also known as a visual processing unit (VPU).
Application of GPUs :
Bioinformatics
Computational Finance
Computational Fluid Dynamics
Data Science, Analytics, and Databases
Defense and Intelligence
Electronic Design Automation
Imaging and Computer Visions
Machine Learning
Materials Science
Media and Entertainment
Medical Imaging
Molecular Dynamics
Numerical Analytics
Physics
Quantum Chemistry
Oil and Gas/Seismic
Structural Mechanics
Visualization and Docking
Weather and Climate
b ) Explain any three parallel algorithm models with suitable example. [6]
.
High Performance Computing BE Computer Engineering
Ans :
Data-parallel model can be applied on shared-address spaces and message-passing paradigms. In data-
parallel model, interaction overheads can be reduced by selecting a locality preserving decomposition,
by using optimized collective interaction routines, or by overlapping computation and interaction.
The primary characteristic of data-parallel model problems is that the intensity of data parallelism
increases with the size of the problem, which in turn makes it possible to use more processes to solve
larger problems.
Examples − Parallel quick sort, sparse matrix factorization, and parallel algorithms derived via divide-
and-conquer approach.
.
High Performance Computing BE Computer Engineering
Here, problems are divided into atomic tasks and implemented as a graph. Each task is an independent
unit of job that has dependencies on one or more antecedent task. After the completion of a task, the
output of an antecedent task is passed to the dependent task. A task with antecedent task starts
execution only when its entire antecedent task is completed. The final output of the graph is received
when the last dependent task is completed (Task 6 in the above figure).
3.Master-Slave Model:
In the master-slave model, one or more master processes generate task and allocate it to slave
processes. The tasks may be allocated beforehand if −
In some cases, a task may need to be completed in phases, and the task in each phase must be
completed before the task in the next phases can be generated. The master-slave model can be
generalized to hierarchical or multi-level master-slave model in which the top level master feeds the
large portion of tasks to the second-level master, who further subdivides the tasks among its own
slaves and may perform a part of the task itself.
.
High Performance Computing BE Computer Engineering
Q 5. a) Explain Broadcast and Reduction example for multiplying matrix with a vector. [6]
Ans :-
.
High Performance Computing BE Computer Engineering
Ans :
In the scatter operation, a single node sends a unique message of size m to every other node. This
operation is also known as one-to-all personalized communication. One-to-all personalized
communication is different from one-to-all broadcast in that the source node starts with p unique
messages, one destined for each node. Unlike one-to-all broadcast, one-to-all personalized
communication does not involve any duplication of data. The dual of one-to-all personalized
communication or the scatter operation is the gather operation, or concatenation, in which a single
node collects a unique message from each node. A gather operation is different from an all-to-one
reduce operation in that it does not involve any combination or reduction of data. illustrates the scatter
and gather operations.
Although the scatter operation is semantically different from one-to-all broadcast, the scatter algorithm
is quite similar to that of the broadcast shows the communication steps for the scatter operation on an
eight-node hypercube. The communication patterns of one-to-all broadcast and scatter are identical.
Only the size and the contents of messages are different. In , the source node (node 0) contains all the
messages. The messages are identified by the labels of their destination nodes. In the first
communication step, the source transfers half of the messages to one of its neighbors. In subsequent
steps, each node that has some data transfers half of it to a neighbor that has yet to receive any data.
There is a total of log p communication steps corresponding to the log p dimensions of the hypercube.
.
High Performance Computing BE Computer Engineering
The gather operation is simply the reverse of scatter. Each node starts with an m word message. In the
first step, every odd numbered node sends its buffer to an even numbered neighbor behind it, which
concatenates the received message with its own buffer. Only the even numbered nodes participate in
the next communication step which results in nodes with multiples of four labels gathering more data
and doubling the sizes of their data. The process continues similarly, until node 0 has gathered the
entire data
OR
Q 6.a ) Compare the one-to-all broadcast operation on Ring, Mesh and Hypercube topologies . [6]
Ans :
2.
.
High Performance Computing BE Computer Engineering
Ans :
.
B. E. (Computer Engineering)
HIGH PERFORMANCE COMPUTING (2015 Pattern) (Semester - I) (410241)
Time : 2½ Hours] [Max. Marks : 70
Instructions to the candidates:
1)Answer Q.1 or Q.2, Q.3 or Q.4, Q.5 or Q.6, Q.7 or Q.8.
2)Neat diagrams must be drawn wherever necessary.
3)Figures to the right indicate full marks.
4)Assume suitable data if necessary.
Q1) a) State and explain basic working principle of Super Scalar Processors.[6]
b)Explain basic working of VLIW Processor. [6]
c)Elaborate four subclasses of the Parallel Random Access Machine (PRAM). [8]
Ans:
Parallel Random Access Machines (PRAM) is a model, which is considered for most
of the parallel algorithms. Here, multiple processors are attached to a single block of
memory. A PRAM model contains −
A set of similar type of processors.
All the processors share a common memory unit. Processors can communicate
among themselves through the shared memory only.
A memory access unit (MAU) connects the processors with the single shared
memory.
Here, n number of processors can perform independent operations on nnumber of
data in a particular unit of time. This may result in simultaneous access of same
memory location by different processors.
To solve this problem, the following constraints have been enforced on PRAM
model −
Exclusive Read Exclusive Write (EREW) − Here no two processors are
allowed to read from or write to the same memory location at the same time.
Exclusive Read Concurrent Write (ERCW) − Here no two processors are
allowed to read from the same memory location at the same time, but are allowed
to write to the same memory location at the same time.
Concurrent Read Exclusive Write (CREW) − Here all the processors are
allowed to read from the same memory location at the same time, but are not
allowed to write to the same memory location at the same time.
Concurrent Read Concurrent Write (CRCW) − All the processors are
allowed to read from or write to the same memory location at the same time.
There are many methods to implement the PRAM model, but the most prominent
ones are −
Shared memory model
Message passing model
Data parallel model
OR
Q2) a) Differentiate Static and Dynamic mapping techniques for load balancing.[6]
Answer: Once a computation has been decomposed into tasks, these tasks are mapped
onto processes with the objective that all tasks complete in the shortest amount of
elapsed time. In order to achieve a small execution time, the overheads of executing
the tasks in parallel must be minimized. A good mapping of tasks onto processes must
strive to achieve the twin objectives of (1) reducing the amount of time processes
spend in interacting with each other, and (2) reducing the total amount of time some
processes are idle while the others are engaged in performing some tasks. Mapping
techniques used in parallel algorithms can be broadly classified into two categories:
static and dynamic. The parallel programming paradigm and the characteristics of
tasks and the interactions among them determine whether a static or a dynamic
mapping is more suitable.
• Static Mapping: Static mapping techniques distribute the tasks among processes
prior to the execution of the algorithm. For statically generated tasks, either static or
dynamic mapping can be used. The choice of a good mapping in this case depends on
several factors, including the knowledge of task sizes, the size of data associated with
tasks, the characteristics of inter-task interactions, and even the parallel programming
paradigm. Even when task sizes are known, in general, the problem of obtaining an
optimal mapping is an NP-complete problem for nonuniform tasks. However, for
many practical cases, relatively inexpensive heuristics provide fairly acceptable
approximate solutions to the optimal static mapping problem. Algorithms that make
use of static mapping are in general easier to design and program.
• Dynamic Mapping: Dynamic mapping techniques distribute the work among
processes during the execution of the algorithm. If tasks are generated dynamically,
then they must be mapped dynamically too. If task sizes are unknown, then a static
mapping can potentially lead to serious load-imbalances and dynamic mappings are
usually more effective. If the amount of data associated with tasks is large relative to
the computation, then a dynamic mapping may entail moving this data among
processes. The cost of this data movement may outweigh some other advantages of
dynamic mapping and may render a static mapping more suitable. However, in a
shared-address-space paradigm, dynamic mapping may work well even with large
data associated with tasks if the interaction is read-only. The reader should be aware
that the shared-addressspace programming paradigm does not automatically provide
immunity against data-movement costs.
Speedup
When evaluating a parallel system, we are often interested in knowing how much
performance gain is achieved by parallelizing a given application over a sequential
implementation. Speedup is a measure that captures the relative benefit of solving a
problem in parallel. It is defined as the ratio of the time taken to solve a problem on a
single processing element to the time required to solve the same problem on a parallel
computer with p identical processing elements. We denote speedup by the symbol
S.
Example :Adding n numbers using n processing elements
Consider the problem of adding n numbers by using n processing elements.
Initially, each processing element is assigned one of the numbers to be added and, at
the end of the computation, one of the processing elements stores the sum of all the
numbers. Assuming that n is a power of two, we can perform this operation in log
n steps by propagating partial sums up a logical binary tree of processing elements.
Following figure illustrates the procedure for n = 16. The processing elements are
labeled from 0 to 15. Similarly, the 16 numbers to be added are labeled from 0 to 15.
The sum of the numbers with consecutive labels from i to j is denoted by .
Figure. Computing the global sum of 16 partial sums using 16 processing elements. denotes
the sum of numbers with consecutive labels from i to j.
OR
Q4) a) Explain Parallel Matrix-Matrix Multiplication algorithm with an example.[8]
Ans:We start by examining algorithms for various distributions of A , B ,
and C . We first consider a one-dimensional, column wise decomposition in which
each task encapsulates corresponding columns from A ,B , and C . One parallel
algorithm makes each task responsible for all computation associated with its
set
for j
=0 to
in each row i
accumulate .
endfor
b) Interpret the effect of Granularity on Performance of parallel execution. [8]
Ans: illustrated an instance of an algorithm that is not cost-optimal. The algorithm
discussed in this example uses as many processing elements as the number of inputs,
which is excessive in terms of the number of processing elements. In practice, we
assign larger pieces of input data to processing elements. This corresponds to
increasing the granularity of computation on the processing elements. Using fewer
than the maximum possible number of processing elements to execute a parallel
algorithm is called scaling down a parallel system in terms of the number of
processing elements. A naive way to scale down a parallel system is to design a
parallel algorithm for one input element per processing element, and then use fewer
processing elements to simulate a large number of processing elements. If there are n
inputs and only p processing elements (p < n), we can use the parallel algorithm
designed for n processing elements by assuming n virtual processing elements
and having each of the p physical processing elements simulate n/p virtual
processing elements.As the number of processing elements decreases by a factor of
n/p, the computation at each processing element increases by a factor of n/p
because each processing element now performs the work of n/p processing
elements. If virtual processing elements are mapped appropriately onto physical
processing elements, the overall communication time does not grow by more than a
factor of n/p. The total parallel runtime increases, at most, by a factor of n/p, and
the processor-time product does not increase. Therefore, if a parallel system with n
processing elements is cost-optimal, using p processing elements (where p <
n)to simulate n processing elements preserves cost-optimality.
A drawback of this naive method of increasing computational granularity is that if a
parallel system is not cost-optimal to begin with, it may still not be cost-optimal after
the granularity of computation increases. This is illustrated by the following example
for the problem of adding n numbers.
Example 5.9 Adding n numbers on p processing elements
Consider the problem of adding n numbers on p processing elements such that p
< n and both n and p are powers of 2. We use the same algorithm as in
example and simulate n processing elements on p processing elements. The steps
leading to the solution are shown in figure for n = 16 and p = 4. Virtual
processing element i is simulated by the physical processing element labeled i
mod p; the numbers to be added are distributed similarly. The first log p of the log
n steps of the original algorithm are simulated in (n/p) log p steps on p
processing elements. In the remaining steps, no communication is required because
the processing elements that communicate in the original algorithm are simulated by
the same processing element; hence, the remaining numbers are added locally. The
algorithm takes Q((n/p) log p) time in the steps that require communication, after
which a single processing element is left with n/p numbers to add, taking time
Q(n/p). Thus, the overall parallel execution time of this parallel system is Q((n/p) log
p). Consequently, its cost is Q(n log p), which is asymptotically higher than the
Q(n) cost of adding n numbers sequentially. Therefore, the parallel system is not cost-
optimal.
Q5) a)Compare an algorithm for sequential and parallel Merge sort. Analyze the
complexity for the same. [8]
Ans:
Sorting is a common and important problem in computing. Given a sequence
of N data elements, we are required to generate an ordered sequence that
contains the same elements. Here, we present a parallel version of the well-known
mergesort algorithm. The algorithm assumes that the sequence to be sorted is
distributed and so generates a distributed sorted sequence. For simplicity, we assume
that N is an integer multiple of P , that the N data are distributed evenly among
P tasks, and that is an integer power of two. Relaxing these assumptions
does not change the essential character of the algorithm but would complicate the
presentation.
The two partition phases each split the input sequence; the two merge phases each
combine two sorted subsequences generated in a previous phase.
The sequential mergesort algorithm is as follows; its execution is illustrated in Figure
.
If the input sequence has fewer than two elements, return.
Partition the input sequence into two halves.
Sort the two subsequences using the same algorithm.
Merge the two sorted subsequences to form the output sequence.
The merge operation employed in step (4) combines two sorted subsequences to
produce a single sorted sequence. It repeatedly compares the heads of the two
subsequences and outputs the lesser value until no elements remain. Mergesort
requires time to sort N elements, which is the best that can be
achieved (modulo constant factors) unless data are known to have special properties
such as a known distribution or degeneracy.
We first describe two algorithms required in the implementation of parallel
mergesort: compare-exchange and parallel merge.
Compare-Exchange.
A compare-exchange operation merges two sorted sequences of length M ,
contained in tasks A and B . Upon completion of the operation, both tasks have
M data, and all elements in task A are less than or equal to all elements in task B
. As illustrated in Figure, each task sends its data to the other task. Task A identifies
the M lowest elements and discards the remainder; this process requires at least
M/2 and at most M comparisons. Similarly, task B identifies the M highest
elements.
Figure : The compare-exchange algorithm, with M=4 . (a) Tasks A and B
exchange their sorted subsequences. (b) They perform a merge operation to identify
the lowest and highest M elements, respectively. (c) Other elements are discarded,
leaving a single sorted sequence partitioned over the two tasks.
Notice that a task may not need all M of its neighbor's data in order to identify the
M lowest (or highest) values. On average, only M/2 values are required. Hence, it
may be more efficient in some situations to require the consumer to request data
explicitly. This approach results in more messages that contain a total of less than M
data, and can at most halve the amount of data transferred.
Figure : The parallel merge operation, performed in hypercubes of dimension one,
two, and three. In a hypercube of dimension d , each task performs d compare-
exchange operations. Arrows point from the ``high'' to the ``low'' task in each
exchange.
Parallel Merge.
A parallel merge algorithm performs a merge operation on two sorted sequences of
length , each distributed over tasks, to produce a single sorted sequence of
length distributed over tasks. As illustrated in Figure, this is
achieved by using the hypercube communication template. Each of the tasks
engages in d+1 compare-exchange steps, one with each neighbor. In effect, each
node executes Algorithm , applying the following operator at each step.
state = compare_exchange_high(state,message)
else
state = compare_exchange_low(state,message)
endif
In this code fragment, AND is a bitwise logical and operator, used to determine
whether the task is ``high'' or ``low'' in a particular exchange; myid and i are as in
Algorithm .
Mergesort.
We next describe the parallel mergesort algorithm proper. Each task in the
computation executes the following logic.
procedure parallel_mergesort(myid, d, data, newdata)
begin
data = sequential_mergesort(data)
for dim = 1 to d
data = parallel_merge(myid, dim, data)
endfor
newdata = data
end
First, each task sorts its local sequence using sequential mergesort. Second, and again
using the hypercube communication structure, each of the tasks executes
the parallel merge algorithm d times, for subcubes of dimension 1.. d . The i th
parallel merge takes two sequences, each distributed over tasks, and generates
a sorted sequence distributed over tasks. After d such merges, we have a single
sorted list distributed over tasks.
b) Modify Depth First Search for parallel execution and analyze its complexity. [8]
Ans:
Two characteristics of parallel DFS are critical to determining its performance. First is
the method for splitting work at a processor, and the second is the scheme to
determine the donor processor when a processor becomes idle.
Work-Splitting Strategies
When work is transferred, the donor's stack is split into two stacks, one of which is
sent to the recipient. In other words, some of the nodes (that is, alternatives) are
removed from the donor's stack and added to the recipient's stack. If too little work is
sent, the recipient quickly becomes idle; if too much, the donor becomes idle. Ideally,
the stack is split into two equal pieces such that the size of the search space
represented by each stack is the same. Such a split is called a half-split. It is difficult
to get a good estimate of the size of the tree rooted at an unexpanded alternative in the
stack. However, the alternatives near the bottom of the stack (that is, close to the
initial node) tend to have bigger trees rooted at them, and alternatives near the top of
the stack tend to have small trees rooted at them. To avoid sending very small
amounts of work, nodes beyond a specified stack depth are not given away. This
depth is called the cutoff depth.
Some possible strategies for splitting the search space are (1) send nodes near the
bottom of the stack, (2) send nodes near the cutoff depth, and (3) send half the nodes
between the bottom of the stack and the cutoff depth. The suitability of a splitting
strategy depends on the nature of the search space. If the search space is uniform, both
strategies 1 and 3 work well. If the search space is highly irregular, strategy 3 usually
works well. If a strong heuristic is available (to order successors so that goal nodes
move to the left of the state-space tree), strategy 2 is likely to perform better, since it
tries to distribute those parts of the search space likely to contain a solution. The cost
of splitting also becomes important if the stacks are deep. For such stacks, strategy 1
has lower cost than strategies 2 and 3.
Fig shows the partitioning of the DFS tree of Figure (a) into two subtrees using
strategy 3. Note that the states beyond the cutoff depth are not partitioned. Fig also
shows the representation of the stack corresponding to the two subtrees. The stack
representation used in the figure stores only the unexplored alternatives.
Figure 11.9. Splitting the DFS tree in Fig. The two subtrees along with their stack
representations are shown in (a) and (b).
Load-Balancing Schemes
This section discusses three dynamic load-balancing schemes: asynchronous round
robin, global round robin, and random polling. Each of these schemes can be coded
for message passing as well as shared address space machines.
Asynchronous Round Robin In asynchronous round robin (ARR), each processor
maintains an independent variable, target. Whenever a processor runs out of work, it
uses target as the label of a donor processor and attempts to get work from it. The
value of target is incremented (modulo p) each time a work request is sent. The
initial value of target at each processor is set to ((label + 1) modulo p) where
label is the local processor label. Note that work requests are generated
independently by each processor. However, it is possible for two or more processors
to request work from the same donor at nearly the same time.
Global Round Robin Global round robin (GRR) uses a single global variable called
target. This variable can be stored in a globally accessible space in shared address
space machines or at a designated processor in message passing machines. Whenever
a processor needs work, it requests and receives the value of target, either by
locking, reading, and unlocking on shared address space machines or by sending a
message requesting the designated processor (say P0). The value of target is
incremented (modulo p) before responding to the next request. The recipient
processor then attempts to get work from a donor processor whose label is the value
of target. GRR ensures that successive work requests are distributed evenly over all
processors. A drawback of this scheme is the contention for access to target.
Random Polling Random polling (RP) is the simplest load-balancing scheme. When
a processor becomes idle, it randomly selects a donor. Each processor is selected as a
donor with equal probability, ensuring that work requests are evenly distributed.
OR
Q6) a) Discuss the issues in sorting for parallel computers. [8]
Ans:Parallelizing a sequential sorting algorithm involves distributing the elements to
be sorted onto the available processes. This process raises a number of issues that we
must address in order to make the presentation of parallel sorting algorithms clearer.
Where the Input and Output Sequences are Stored
In sequential sorting algorithms, the input and the sorted sequences are stored in the
process's memory. However, in parallel sorting there are two places where these
sequences can reside. They may be stored on only one of the processes, or they may
be distributed among the processes. The latter approach is particularly useful if
sorting is an intermediate step in another algorithm. In this chapter, we assume that
the input and sorted sequences are distributed among the processes.
Consider the precise distribution of the sorted output sequence among the processes.
A general method of distribution is to enumerate the processes and use this
enumeration to specify a global ordering for the sorted sequence. In other words, the
sequence will be sorted with respect to this process enumeration. For instance, if Pi
comes before Pj in the enumeration, all the elements stored in Pi will be smaller
than those stored in Pj . We can enumerate the processes in many ways. For certain
parallel algorithms and interconnection networks, some enumerations lead to more
efficient parallel formulations than others.
How Comparisons are Performed
A sequential sorting algorithm can easily perform a compare-exchange on two
elements because they are stored locally in the process's memory. In parallel sorting
algorithms, this step is not so easy. If the elements reside on the same process, the
comparison can be done easily. But if the elements reside on different processes, the
situation becomes more complicated.
One Element Per Process
Consider the case in which each process holds only one element of the sequence to be
sorted. At some point in the execution of the algorithm, a pair of processes (Pi, Pj)
may need to compare their elements, ai and aj. After the comparison, Pi will
hold the smaller and Pj the larger of {ai, aj}. We can perform comparison by
having both processes send their elements to each other. Each process compares the
received element with its own and retains the appropriate element. In our example,
Pi will keep the smaller and Pjwill keep the larger of {ai, aj}. As in the sequential
case, we refer to this operation as compare-exchange. As fig illustrates, each
compare-exchange operation requires one comparison step and one communication
step.
Pick the vertex with minimum distance value and not already included in SPT (not in
sptSET). The vertex 1 is picked and added to sptSet. So sptSet now becomes {0, 1}.
Update the distance values of adjacent vertices of 1. The distance value of vertex 2
becomes 12.
Pick the vertex with minimum distance value and not already included in SPT (not in
sptSET). Vertex 7 is picked. So sptSet now becomes {0, 1, 7}. Update the distance
values of adjacent vertices of 7. The distance value of vertex 6 and 8 becomes finite
(15 and 9 respectively).
Pick the vertex with minimum distance value and not already included in SPT (not in
sptSET). Vertex 6 is picked. So sptSet now becomes {0, 1, 7, 6}. Update the distance
values of adjacent vertices of 6. The distance value of vertex 5 and 8 are updated.
We repeat the above steps until sptSet doesn’t include all vertices of given graph.
Finally, we get the following Shortest Path Tree (SPT).
OR
The numbers of SPs/cores in an SM and the number of SMs depend on your device:
see the Finding your Device Specifications section below for details. It is important
to realize, however, that regardless of GPU model, there are many more CUDA cores
in a GPU than in a typical multicore CPU: hundreds or thousands more. For example,
the Kepler Streaming Multiprocessor design, dubbed SMX, contains 192 single-
precision CUDA cores, 64 double-precision units, 32 special function units, and 32
load/store units. (See the Kepler Architecture Whitepaper for a description and
diagram.)
CUDA cores are grouped together to perform instructions in a what nVIDIA has
termed a warp of threads. Warp simply means a group of threads that are scheduled
together to execute the same instructions in lockstep. All CUDA cards to date use a
warp size of 32. Each SM has at least one warp scheduler, which is responsible for
executing 32 threads. Depending on the model of GPU, the cores may be double or
quadruple pumped so that they execute one instruction on two or four threads in as
many clock cycles. For instance, Tesla devices use a group of 8 quadpumped cores to
execute a single warp. If there are less than 32 threads scheduled in the warp, it will
still take as long to execute the instructions.
The CUDA programmer is responsible for ensuring that the threads are being
assigned efficiently for code that is designed to run on the GPU. The assignment of
threads is done virtually in the code using what is sometimes referred to as a ‘tiling’
scheme of blocks of threads that form a grid. Programmers define a kernel function
that will be executed on the CUDA card using a particular tiling scheme.
Virtual Architecture
When programming in CUDA C we work with blocks of threads and grids of blocks.
What is the relationship between this virtual architecture and the CUDA card’s
physical architecture?
When kernels are launched, each block in a grid is assigned to a Streaming
Multiprocessor. This allows threads in a block to use __shared__ memory. If a
block doesn’t use the full resources of the SM then multiple blocks may be assigned
at once. If all of the SMs are busy then the extra blocks will have to wait until a SM
becomes free.
Once a block is assigned to an SM, it’s threads are split into warps by the warp
scheduler and executed on the CUDA cores. Since the same instructions are executed
on each thread in the warp simultaneously it’s generally a bad idea to have
conditionals in kernel code. This type of code is sometimes called divergent: when
some threads in a warp are unable to execute the same instruction as other threads in a
warp, those threads are diverged and do no work.
Because a warp’s context (it’s registers, program counter etc.) stays on chip for the
life of the warp, there is no additional cost to switching between warps vs executing
the next step of a given warp. This allows the GPU to switch to hide some of it’s
memory latency by switching to a new warp while it waits for a costly read.
CUDA Memory
CUDA on chip memory is divided into several different regions
Registers act the same way that registers on CPUs do, each thread has it’s own set of
registers.Local Memory local variables used by each thread. They are not accessible
by other threads even though they use the same L1 and L2 cache as global memory.
Shared Memory is accessible by all threads in a block. It must be declared using the
__shared__ modifier. It has a higher bandwidth and lower latency than global
memory. However, if multiple threads request the same address, the requests are
processed serially, which slows down the application.
Constant Memory is read-accessible by all threads and must be declared with the
__const__ modifier. In newer devices there is a separate read only constant cache.
Global Memory is accessible by all threads. It’s the slowest device memory, but on
new cards, it is cached. Memory is pulled in 32, 64, or 128 byte memory transactions.
Warps executing global memory accesses attempt to pull all the data from global
memory simultaneously therefore it’s advantageous to use block sizes that are
multiples of 32. If multidimensional arrays are used, it’s also advantageous to have
the bounds padded so that they are multiples of 32 Texture/Surface Memory is read-
accesible by all threads, but unlike Constant Memory, it is optimized for 2D spacial
locality, and cache hits pull in surrounding values in both x and y directions.
b) Write advantages and limitations of CUDA. [5]
Ans:
Advantages of CUDA:
Huge increase in processing power over conventional CPU processing.
Early reports suggest speed increases of 10x to 200x over CPU processing speed.
Researchers can use several GPU's to preform the same amount of operations as
many servers in less time, thus saving money, time, and space.
C language is widely used, so it is easy for devolopers to learn how to program
for CUDA.
All graphics cards in the G80 series and beyond support CUDA.
Harnesses the power of the GPU by using parallel processing; running thousands
of simultanious reads instead of single, dual, or quad reads on the CPU.
Disadvantages of CUDA:
Limited user base- Only NVIDIA G80 and onward video cards can use CUDA,
thus isolating all ATI users.
Speeds may be bottlenecked at the bus between CPU and GPU.
Developers still sceptical as to whether CUDA will catch on.
Mainly developed for researchers- not many uses for average users.
System is still in development
BE (Computer) Semester -VII
Part A:Scheme, Course Outcomes, Syllabus, and Evaluation guidelines of
Artificial Intelligence and Robotics (410242)
Course Outcomes:
On completion of the course, student will be able to–
1. CO1: Identify and apply suitable intelligent agents for various AI applications
2. CO2: Design smart system using different informed search / uninformed search or
heuristic approaches.
3. CO3: Identify knowledge associated and represent it by ontological engineering to
plan a strategy to solve given problem.
4. CO4: Apply the suitable algorithms to solve AI problems.
5. CO5: Identify and use suitable sensors to solve Robotics problems.
Reference Books:
1. Nilsson Nils J , “Artificial Intelligence: A new Synthesis, Morgan Kaufmann Publishers Inc.
San Francisco, CA, ISBN: 978-1-55-860467-4
2. Patrick Henry Winston, “Artificial Intelligence”, Addison-Wesley Publishing Company, ISBN:
0-201-53377-4
3. Andries P. Engelbrecht-Computational Intelligence: An Introduction, 2nd Edition-Wiley India-
ISBN: 978-0-470-51250-0
Teaching Plan
Sub: Artificial Intelligence and Robotics (410242)
Evaluation Guidelines
Internal Assessment (IA):
1. Two Class tests & One Prelim must be conducted. Average marks will be considered
2. Three Assignments on each 2 units in entire syllabus to be conducted and average of
three to be considered.
3. Attendance marks as per Institute rule to be considered.
External Evaluation
Insem Examination:
1. Insem Examination will be conducted in mid semester on first 03 units and carries 30
marks
2. Question paper consist of total 06 questions carries 10 marks of each questions. It
consist of solve question No. 1 OR question No. 2 on Unit No. 1, Solve question No. 3 OR
question No. 4 on Unit No. 2, and solve question No. 5 OR question no. 6 on Unit No. 3.
Pre-requisite:
Course Delivery:
The course will be delivered through lectures, class room interaction, and presentations.
Course Objectives:
Course Outcomes:
On completion of the course, student will be able to–
1. Identify [L1: Knowledge] and apply suitable Intelligent agents for various AI applications.
2.Design [L2: Analysis] smart system using different informed search / uninformed search or
heuristic approaches.
3.Identify [L1: Knowledge] knowledge associated and represent it by ontological engineering to
plan a strategy to solve given problem.
4. Apply [L3: Application] the suitable algorithms to solve AI problems.
PSO 2 : Graduate of programme should be able to use proficient engineering praxis &
strategies for the build out, maintenance and testing of software solutions.
PSO 3 : Graduate of programme should be able to provide conclusive and cost effective real
time solutions using savoir faire in IT domain.
Mapping of Course Outcomes (COs) with Program Outcome (POs) and Program
Specific Outcome (PSOs)
1: Slight (Low) 2: Moderate (Medium) 3: Substantial (High)
If there is no correlation, put “-“
CO/PO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
CO1 1 - - 1 - - - - - - - - 2 1 1
CO2 2 1 2 1 - - - - - - - - 1 2
CO3 1 1 - 2 - - - - - - - - 1 2 1
CO4 - 1 1 - - - - - - - - - 2 1 2
AIR-
course 1 1 1 1 - - - - - - - - 1 1 1
average
Justification of CO-PO Mapping:
CO1 WITH PO1 According to CO1 students get basic knowledge of AI and its
applications. So it is slightly correlated to PO1.
CO2 WITH PO2 According to CO2 students get basic knowledge of smart system
using different informed search / uninformed search or heuristic
approaches. So it is moderately correlated to PO2.
CO4 WITH PO4 According to CO4 students get knowledge of the algorithms to
solve AI problems. So it is slightly correlated to PO4.
Questions for CIE and SEE will be designed to evaluate the various educational components
(Blooms taxonomy) such as:
Course Exit Feedback Analysis: The exit survey was carried out using online with “Survey
Monkey” software package. The printouts of survey details are attached.
*Goal: Assume 30% of the students score more than 60%, 60% of the students score between
40 & 60% and 10% of the students score less than 40% of marks. Thus,
**Average Attainment (%) = (0.8 Direct Method Average %) + (0.2 Indirect Method
Average %)
(Note: For the calculation of CO attainment level, 80% weightage is given to the direct
assessment method and 20% weightage is given to the indirect assessment method.)
Remarks:
PO Attainment of AIR :
Program Outcomes
Course PO PO PO PO PO PO PO1
Outcomes PO1 PO2 PO3 4 5 6 7 8 9 PO10 1 PO12
CO1 1
CO2 2
CO3 2
CO4 2
PO attainment 1.75
with Compiler
Course:
B. E. Computer Engineering
“ARTIFICIAL INTELLIGENCE AND ROBOTICS -[410242] ”
Final Year, Semester VII
Prepared by the Course Coordinator: Mr. Devidas S. Thosar
Unit wise Question Bank
Unit – 1: Introduction
Assignments No. 2
(On Unit –3 & 4)
Q.1 . Explain in brief the building blocks of conceptual dependency.
Q.2 . Explain backward chaining with suitable example.
Q.3. How Deductive Retrieval works? Explain with suitable example.
Q.4 .What is mean by Information Retrieval? Explain the process of IR.
Q.5 . Explain the role of Big Data Information Retrieval.
Q.6 . Explain Stages in natural language Processing.
Assignments No. 3
(On Unit –5 & 6)
Q.1. Explain Mobile Robot Hardware, Non Visual Sensors.Q.2
Q.2. Short Note on:
1. Contact Sensors
2. Inertial Sensors
Q.3 Explain Sonar, Radar, laser Range finders, Biological Sensing .
Q.4. Explain Sensorial Maps, Topological Maps, And Geometric Maps.
Q.5. Explain Robot Pose Maintenance and Localization.
Q.6.Expain the concept Mining Automation in Robotics in detail.
University Question Papers
Subject
Data Analytics (410243)
B. E. (Odd Semester), Session 2019-2020
Scheme, Syllabus and Evaluation Guidelines, Of “Data Analytics
(410241)”
SEMESTER – I
Teaching Scheme
Course Code Course Name
Lecture Tutorial Practical
Data Analytics
410243 03 - -
Examination scheme
Theory
Practical
Internal External
Course
Class test 1
Class test 2
Prelim
Test average
Attendance
Teacher Assessment
InSem
EndSem
Total
Internal
External
Total
Course
Code Name
Course Contents
UNIT – I INTRODUCTION AND LIFE CYCLE 08 Hours
Introduction: Big data overview, state of the practice in Analytics- BI Vs Data Science, Current
Analytical Architecture, drivers of Big Data, Emerging Big Data Ecosystem and new approach.
Data Analytic Life Cycle: Overview, phase 1- Discovery, Phase 2- Data preparation, Phase 3-
Model Planning, Phase 4- Model Building, Phase 5- Communicate
Results, Phase 6- Operationalize. Case Study: GINA
Text Books:
1. David Dietrich, Barry Hiller, “Data Science and Big Data Analytics”, EMC education
services, Wiley publications, 2012, ISBN0-07-120413-X
2. Ashutosh Nandeshwar , “Tableau Data Visualization Codebook”, Packt Publishing, ISBN
978-1-84968-978-6
Reference Books:
1. Maheshwari, Anil,Rakshit, Acharya, “Data Analytics”,McGraw Hill, ISBN:
789353160258.
2. Mark Gardner, “Beginning R: The Statistical Programming Language”, Wrox
Publication,ISBN: 978-1-118-16430-3
3. Luís Torgo, “Data Mining with R, Learning with Case Studies”, CRC Press, Talay and
Francis Group, ISBN9781482234893
4. Carlo Vercellis, “Business Intelligence - Data Mining and Optimization for Decision
Making”, Wiley Publications, ISBN: 9780470753866.
Evaluation Guidelines:
Internal Assessment (IA) : [CT (20Marks)+TA/AT(10 Marks)]
Class Test (CT) [20 marks]:- Three class tests, 20 marks each, will be conducted in a semester
and out of these three, the average of best two will be selected for calculation of class test marks.
Format of question paper is same as university.
Teacher Assessment TA [5 marks]: Three/four assignments will be conducted in the semester.
Teacher assessment will be calculated on the basis of performance in assignments, class test and
pre-university test
Attendance (AT) [5 marks]: Attendance marks will be given as per university policy.
Paper pattern and marks distribution for Class tests:
1. Question Paper will have 5 questions. Question 1 is objective question contain 5 sub
questions each carry 1 marks.
2. Attempt any 3 questions from remaining 4 question each carry 5 marks.
In semester Exam :
30 Marks in semester exam : As per university guidelines.
Pre-University Test [ 70 Marks]
Paper pattern and marks distribution for PUT: Same as End semester exam
End Semester Examination [ 70 Marks]:
Paper pattern and marks distribution for End Semester Exam: As per university guidelines.
Lecture Plan
Data Analytics (410243)
10 difference of means
11 wilcoxon rank–sum test,
12 power and sample size
13 ANNOVA
14 Advanced Analytical Theory and Methods:
15 K means- Use cases, Overview of methods
determining number of clusters, diagnostics,reasons to choose
16 and cautions.
Assignment-I
Unit – III Association Rules and Regression(8 Hours)
17 Advanced Analytical Theory and Methods:
Association Rules-
18 Overview a-priori algorithm
19 evaluation of candidate rules
20 case study-transactions in grocery store
21 validation and testing,
22 Regression- linear, logistics, reasons to choose and cautions
23 Regression- linear, logistics, reasons to choose and cautions
24 Additional regression models.
UNIT – IV Classification (8 Hours)
25
Decision trees- Overview
26 general algorithm, decision tree algorithm,
27 evaluating a decision tree
28 Naïve Bayes – Bayes‟ Algorithm
29 Naïve Bayes‟ Classifier
30 Smoothing, diagnostics. Diagnostics of classifiers
31 Additional classification methods.
Revision
32
Assignment-II
UNIT V Big Data Visualization (8 Hours)
33
Introduction to Data visualization .
34 Challenges to Big data visualization,
35 Conventional data visualization tools
36 Techniques for visual data representations
37 Types of data visualization
38 Visualizing Big Data
39 Tools used in data visualization
40 Analytical techniques used in Big data visualization.
UNIT – VI Advanced Analytics-Technology and Tools (8
41 Hours)
Analytics for unstructured data- Use cases
42 Map Reduce, Apache Hadoop
43 The Hadoop Ecosystem- Pig
44 HIVE, HBase,
45 Mahout, NoSQL
46 An Analytics Project-Communicating
47 Operational zing, creating final deliverables.
Revision
48
Assignment-III
Course Delivery, Objectives, Outcomes
DATA ANALYTICS (410243)
Semester-VII
Course Objectives :
Course Outcomes :
CO1-Write case studies in Business Analytic and Intelligence using mathematical models
CO2- Present a survey on applications for Business Analytic and Intelligence
CO3-Provide problem solutions for multi-core or distributed, concurrent/Parallel
environments
CO-PO Mapping
Course
Outcomes PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO1 1 1
CO2 2 2
CO3 3 3
Justification of CO - PO Mapping
According to CO1 students get basic knowledge of Business
CO1 WITH PO1 Analytics. So its moderately related with PO1.
Students will get the knowledge of real time projects used in BI. So it is
CO2 WITH PO11 moderately correlated to PO1
(structured/ unstructured)
Answer:
Application of K-Means includes Image processing, Medical, Customer Segmentation etc.
Image Processing:
Video is one example of the growing volumes of unstructured data being collected. Within each frame
of a video, k-means analysis ca n be used to identify objects in the video. For each frame, the task is to
determine which pixels are most similar to each other. The attributes of each pixel ca n include
brightness, color, and location, the x and y coordinates in the frame. With security video images, for
example, successive frames are examined to identify any changes to the clusters. These newly
identified clusters may indicate unauthorized access to a facility.
Medical
Patient attributes such as age, height, weight, systolic and diastolic blood pressures, cholesterol level,
and other attributes can identify naturally occurring clusters. These clusters could be used to target
individuals for specific preventive measures or clinical trial participation. Clustering, in general, is
useful in biology for the classification of plants and animals as well as in the field of human genetics.
Assignment 02
Q. No Questions Max. Unit no. as CO Blooms
Marks per syllabus Mapped Taxonomy
Leval
1 Explain different approaches to improve 4 3 CO 1 1
Apriori's efficiency
Ans: Some approaches to improve Apriori's efficiency:
• Partitioning: Any item set that is potential ly frequent in a transaction database must be frequent in at
least one of the partitions of the transaction database.
• Sampling: This extracts a subset of the data with a lower support threshold and uses the subset to
perform association rule mining.
• Transaction reduction: A transaction that does not contain frequent k-itemsets is useless in subsequent
scans and therefore can be ignored.
• Hash- based item set cou nting: If the corresponding hashing bucket count of a k-itemset is below a
certain threshold, the k-itemset cannot be frequent.
• Dynamic itemset counting: Only add new candidate itemsets when all of their subsets are estimated to
be frequent.
Ans:
1: ID3
The ID3 algorithm begins with the original set as the root node. On each iteration of the
algorithm, it iterates through every unused attribute of the set and calculates the entropy (or
information gain ) of that attribute. It then selects the attribute which has the smallest entropy (or
largest information gain) value. The set is then split or partitioned by the selected attribute to
produce subsets of the data.
Recursion on a subset may stop in one of these cases:
every element in the subset belongs to the same class
there are no more attributes to be selected, but the examples still do not belong to the
same class.
there are no examples in the subset
2: C4.5
The C4.5 algorithm is improvements in the ID3 algorithm. The C4.5 algorithm can handle
missing data. If the training records contain unknown attribute values, the C4 .5 evaluates the
gain for an attribute by considering only the records w here the attribute is defined. Both
categorical and continuous attributes are supported by C4.5. Values of a continuous variable are
sorted and partitioned. For the corresponding records of each partition, the gain is calculated, and
the partition that maximizes the gain is chosen for the next split.
Assignment 03
Q. No Questions Max. Unit no. as CO Blooms
Marks per syllabus Mapped Taxonomy
Leval
1 What are the challenges in Big data 4 5 CO 1 1
visualization?
Answer:
Problems for big data visualization :
•Visual noise: Most of the objects in data-set are too relative to each other. Users cannot divide
them as separate objects on the screen.
•Information loss: Reduction of visible data sets can be used, but leads to information loss.
•Large image perception: Data visualization methods are not only limited by aspect ratio and
resolution of device, but also by physical perception limits.
•High rate of image change: Users observe data and cannot react to the number of data change
or its intensity on display.
•High performance requirements: It can be hardly noticed in static visualization because of
lower visualization speed requirements--high performance requirement.
Answer:
common tools used in data visualization are
1. R (Base package, lattice, ggplot2) 2.Tableau 3. DataHero 4. Chart.js 5. Dygraphs
Tableau is a Data Visualisation tool that is widely used for Business Intelligence but is not limited to
it. It helps create interactive graphs and charts in the form of dashboards and worksheets to gain
business insights Visualisation in Tableau is possible through dragging and dropping Measures and
Dimensions onto these different Shelves.
Rows and Columns : Represent the x and y – axis of your graphs / charts.
Filter : Filters help you view a strained version of your data. For example, instead of seeing the
combined 0Sales of all the Categories, you can look at a specific one, such as just Furniture.
Pages :Pages work on the same principle as Filters, with the difference that you can actually see the
changes as you shift between the Paged values.Remember that Rosling chart? You can easily make
one of your own using Pages.
Marks : The Marks property is used to control the mark types of your data. You may choose to
represent your data using different shapes, sizes or text.
1. Meeting the need for speed: One possible solution is hardware. Increased memory and
powerful parallel processing can be used.
2. Understanding the data: One solution is to have the proper domain expertise in place.
3. Addressing data quality: It is necessary to ensure the data is clean through the process of data
governance or information management.
4. Displaying meaningful results: One way is to cluster data into a higher-level view where
smaller groups of data are visible and the data can be effectively visualized.
CLASS TEST- I
(AY 2018-19)
Branch: Computer Engineering Department Date:
Semester: I Duration: 1 hour
Subject: Data Analytics(410243)) Max. Marks: 20M
Note:
1. All Questions are compulsory
2. Bloom’s Taxonomy level: Bloom Levels (BL): 1. Remember 2. Understand 3. Apply 4. Create
3. All questions are as per course outcomes
4. Assume suitable data wherever is required.
Solution:
What is Data Analytics?
Ans:
Data analytics refers to qualitative and quantitative techniques and processes used to enhance
productivity and business gain. Data is extracted and categorized to identify and analyze
behavioral data and patterns, and techniques vary according to organizational requirements.
Data analytics is also known as data analysis.
Data analytics is primarily conducted in business-to-consumer (B2C) applications. Global
organizations collect and analyze data associated with customers, business processes, market
economics or practical experience. Data is categorized, stored and analyzed to study purchasing
trends and patterns.
Evolving data facilitates thorough decision-making. For example, a social networking website
collects data related to user preferences, community interests and segment according to specified
criteria such as demographics, age or gender. Proper analysis reveals key user and customer
trends and facilitates the social network's alignment of content, layout and overall strategy.
(structured/ unstructured)
Explain the Model Planning phase from Data Analytic Life Cycle.
Ans: The data science team identifies candidate models to apply to the data for clustering,
classifying, or finding relationships in the data depending on the goal of the project
Some of the activities to consider in this phase include the following:
• Assess the structure of the data-sets. The structure of the data sets is one factor that dictates the
tools and analytical techniques for the next phase. Depending on whether the team plans to
analyze textual data or transnational data, for example, different tools and approaches are
required.
• Ensure that the analytical techniques enable the team to meet the business objectives and accept
or reject the working hypotheses.
• Determine if the situation warrants a single model or a series of techniques as part of a larger
analytic workflow.
In many cases, stakeholders and subject matter experts have instincts and hunches about what
the data science team should be considering and analyzing. Likely, this group had some
hypothesis that led to the genesis of the project. Often, stakeholders have a good grasp of the
problem and domain, although they may not be aware of t he subtleties within the data or the
model needed to accept or reject a hypothesis.
Q.4 Give the new approach for Big Data Ecosystem.
Ans:
Organizations and data collectors are realizing that the data they can gather from
individuals contains intrinsic value and, as a result, a new economy is emerging. As this new
digital economy continues to evolve, the market sees the introduction of data vendors and data
cleaners that use crowd sourcing to test the outcomes of machine learning techniques
1. Data devices and the "Sensornet" gather data from multiple locations and continuously generate
new data about this data. For each gigabyte of new data created, an additional petabyte of data is created
about that data.
2. Data collectors include sample entities that collect data from the device and users.
Data results from a cable TV provider tracking the shows a person watches, which TV channels someone
will and will not pay for to watch on demand, and the prices someone is willing to pay for premium TV
content
3. Data aggregators make sense of the data collected from the various entities from the "SensorNet" or
the "Internet of Things." These organizations compile data from the devices and usage patterns collected
by government agencies, retail stores, and websites. ln turn, they can choose to transform and package the
data as products to sell to list brokers, who may want to generate marketing lists of people who may be
good targets for specific ad campaigns.
4. Data users and buyers :These groups directly benefit from the data collected and aggregated by others
within the data value chain.
Q2: significant difference between data visualization methods and traditional text-based
data methods is that _____
A. Data Text-based data is more detailed and therefore more accurate than data visualization
presentations
B. visualization methods are only necessary with complex data
C. Data visualization brings better understanding much quicker and easier than text-based data
D. The volumes comprising the text-based data depict the complete representation of the
situation while the visuals in data visualization do not
Ans: C
Section B:
Q1: What is Naive Bayes?
Ans:A Naive Bayes classifier is a probabilistic machine learning model that’s used for
classification task. The crux of the classifier is based on the Bayes theorem.
Naive Bayes is a classification algorithm for binary (two-class) and multi-class classification
problems. The technique is easiest to understand when described using binary or categorical
input values.
It is called naive Bayes or idiot Bayes because the calculation of the probabilities for each
hypothesis are simplified to make their calculation tractable. Rather than attempting to calculate
the values of each attribute value P(d1, d2, d3|h), they are assumed to be conditionally
independent given the target value and calculated as P(d1|h) * P(d2|H) and so on.
This is a very strong assumption that is most unlikely in real data, i.e. that the attributes do not
interact. Nevertheless, the approach performs surprisingly well on data where this assumption
does not hold.
Representation Used By Naive Bayes Models
The representation for naive Bayes is probabilities.
A list of probabilities are stored to file for a learned naive Bayes model. This includes:
Class Probabilities: The probabilities of each class in the training dataset.
Conditional Probabilities: The conditional probabilities of each input value given each class
value.
Q2: What are the challenges and their possible solutions in Big data visualization?
Ans:
Problems in big data visualization :
•Visual noise: Most of the objects in data-set are too relative to each other. Users cannot divide
them as separate objects on the screen.
•Information loss: Reduction of visible data sets can be used, but leads to information loss.
•Large image perception: Data visualization methods are not only limited by aspect ratio and
resolution of device, but also by physical perception limits.
•High rate of image change: Users observe data and cannot react to the number of data change or
its intensity on display.
•High performance requirements: It can be hardly noticed in static visualization because of lower
visualization speed requirements--high performance requirement.
perceptual and interactive scalability are also challenges of big data visualization. Visualizing
every data point can lead to over-plotting and may overwhelm users’ perceptual and cognitive
capacities; reducing the data through sampling or filtering can elide interesting structures or
outliers. Querying large data stores can result in high latency, disrupting fluent interaction.
Potential solutions to some challenges or problems about visualization and big data were
presented :
1. Meeting the need for speed: One possible solution is hardware. Increased memory and
powerful parallel processing can be used. Another method is putting data in-memory but using a
grid computing approach, where many machines are used.
2. Understanding the data: One solution is to have the proper domain expertise in place.
3. Addressing data quality: It is necessary to ensure the data is clean through the process of data
governance or information management.
4. Displaying meaningful results: One way is to cluster data into a higher-level view where
smaller groups of data are visible and the data can be effectively visualized.
5. Dealing with outliers: Possible solutions are to remove the outliers from the data or create a
separate chart for the outliers.
Solution:
Big data can come in multiple forms, including structured and non-structured data such as financial
data, text files, multimedia files, and genetic mappings. Contrary to much of the traditional data
analysis performed by organizations, most of the Big Data is unstructured or semi-structured in nature,
which requires different techniques and tools to process and analyze. Analyzing structured data tends
to be the most familiar technique, a different technique is required to meet the challenges to analyze
semi-structured data (eg XML), quasi-structured , and unstructured data.
Logistic Regression
In linear regression modeling, the outcome variable is a continuous variable. As seen in the earlier
Income example, linear regression can be used to model the relationship between age and education to
income. Suppose a person's actual income was not of interest, but rather whether someone was
wealthy or poor. In such a case, when the outcome variable is categorical in nature, logistic regression
can be used to predict the likelihood of an outcome based on the input variables. Although logistic
regression can be applied to an outcome variable that represents multiple values
Q.3 b. Explain Data Analytical Life Cycle with all the six phases.
Answer:
• Phase 1- Discovery: In Phase 1, the team learns the business domain, including relevant history such
as whether the organization or business unit has attempted similar projects in the past from which they
can learn. The team assesses the resources available to support the project in terms of people,
technology, time, and data. Important activities in this phase include framing the business problem as
an analytic challenge that can be addressed in subsequent phases and formulating initial hypotheses
(IHs) to test and begin learning the data.
• Phase 2- Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team
can work with data and perform analytic for the duration of the project. The team needs to execute
extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox.
The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT
process so t he team can work with it and analyze it. In t his phase, the team also needs to familiarize
itself with the data thoroughly and take steps to condition the data
• Phase 3-Model planning: Phase 3 is model planning, where the team determines the methods,
techniques, and workflow it intends to follow for the subsequent model building phase. The team
explores the data to learn about the relationships between variables and subsequently selects key
variables and the most suitable models.
• Phase 4-Model building: In Phase 4, the team develops data sets for testing, training, and production
purposes. In addition, in this phase the team builds and executes models based on the work done in the
model planning phase. The team also considers whether its existing tools will suffice for running the
models, or if it will need a more robust environment for executing models and work flows (for
example, fast hardware and parallel processing, if applicable).
• Phase 5-Communicate results: In Phase 5, the team, in collaboration with major stakeholders,
determines if the results of the project are a success or a failure based on the criteria developed in
Phase 1. The team should identify key findings, quantify the business value, and develop a narrative to
summarize and convey findings to stakeholders.
• Phase 6-0perationalize: In Phase 6, the team delivers final reports, briefings, code, and technical
documents. In addition, the team may run a pilot project to implement the models in a production
environment.
Q.5 b. What are the challenges and their possible solutions in Big data visualization?
Answer:
Problems for big data visualization :
•Visual noise: Most of the objects in data-set are too relative to each other. Users cannot divide
them as separate objects on the screen.
•Information loss: Reduction of visible data sets can be used, but leads to information loss.
•Large image perception: Data visualization methods are not only limited by aspect ratio and
resolution of device, but also by physical perception limits.
•High rate of image change: Users observe data and cannot react to the number of data change
or its intensity on display.
•High performance requirements: It can be hardly noticed in static visualization because of
lower visualization speed requirements--high performance requirement.
perceptual and interactive scalability are also challenges of big data visualization. Visualizing
every data point can lead to over-plotting and may overwhelm users’ perceptual and cognitive
capacities; reducing the data through sampling or filtering can elide interesting structures or
outliers. Querying large data stores can result in high latency, disrupting fluent interaction.
Potential solutions to some challenges or problems about visualization and big data were
presented :
1. Meeting the need for speed: One possible solution is hardware. Increased memory and
powerful parallel processing can be used. Another method is putting data in-memory but using a
grid computing approach, where many machines are used.
2. Understanding the data: One solution is to have the proper domain expertise in place.
3. Addressing data quality: It is necessary to ensure the data is clean through the process of data
governance or information management.
4. Displaying meaningful results: One way is to cluster data into a higher-level view where
smaller groups of data are visible and the data can be effectively visualized.
5. Dealing with outliers: Possible solutions are to remove the outliers from the data or create a
separate chart for the outliers.
Q.6 a. Explain tools used in data visualization.
Answer:
common tools used in data visualization are
1. R (Base package, lattice, ggplot2) 2.Tableau 3. DataHero 4. Chart.js 5. Dygraphs
Tableau is a Data Visualisation tool that is widely used for Business Intelligence but is not limited to
it. It helps create interactive graphs and charts in the form of dashboards and worksheets to gain
business insights Visualisation in Tableau is possible through dragging and dropping Measures and
Dimensions onto these different Shelves.
Rows and Columns : Represent the x and y – axis of your graphs / charts.
Filter : Filters help you view a strained version of your data. For example, instead of seeing the
combined 0Sales of all the Categories, you can look at a specific one, such as just Furniture.
Pages :Pages work on the same principle as Filters, with the difference that you can actually see
the changes as you shift between the Paged values.Remember that Rosling chart? You can easily
make one of your own using Pages.
Marks : The Marks property is used to control the mark types of your data. You may choose to
represent your data using different shapes, sizes or text.
R supports four different graphics systems: base graphics, grid graphics, lattice graphics, and ggplot2.
Base graphics is the default graphics system in R, the easiest of the four systems to learn to use, and
provides a wide variety of useful tools, especially for exploratory graphics where we wish to learn
what is in an unfamiliar dataset.
Pie charts are designed to show the components, or parts relative to a whole set of things. A pie chart
is also the most commonly misused kind of chart. If the situation calls for using a pie chart, employ it
only when showing only 2- 3 items in a chart, and only for sponsor audiences.
Bar charts and line charts are used much more often and are useful for showing comparisons and
trends over time. Even though people use vertical bar charts more often, horizontal bar charts allow an
author more room to fit the text labels. Vertical bar charts tend to work we ll when the labels are
small, such as when showing comparisons over time using years.
For frequency, histograms are useful for demonstrating the distribution of data to an analyst audience
or to data scientists. As shown in the pricing example earlier in this chapter, data distributions are
typically one of the first steps when visualizing data to prepare for model planning. To qualitatively
evaluate correlations, scatter plots ca n be useful to compare relationships among variables.
Q.10 b. What is HBase? Discuss various HBase Data Model and application.
Pig and Hive are intended for batch applications, Apache HBase is capable of providing real-time read
and write access to data sets with billions of rows and millions of columns.
The HBase design is based on Google's 2006 paper on Bigtable. This paper described Bigtable as a
"distributed storage system for managing structured data."
HBase is a data store that is intended to be distributed across a cluster of nodes. Like Hadoop and
many of its related Apache projects, HBase is built upon HDFS and achieves its real-time access
speeds by sharing the workload over a large number of nodes in a distributed cluster. An HBase table
consists of rows and columns. However, an HBase table also has a third dimension, version, to
maintain the different values of a row and column intersection over time.
HBase is built on top of HDFS. HBase uses a key/value structure to store the contents of an HBase
table.
HBase Data Model
HBase Data Model consists of following elements
Set of tables
Each table with column families and rows
Each table must have an element defined as Primary Key.
Row key acts as a Primary key in HBase.
Any access to HBase tables uses this Primary Key,
Each column present in HBase denotes attribute corresponding to object
The applications of HBase are as follows:
Medical: HBase is used in the medical field for storing genome sequences and running
MapReduce on it, storing the disease history of people or an area, and many others.
Sports: HBase is used in the sports field for storing match histories for better analytics and
prediction.
Web: HBase is used to store user history and preferences for better customer targeting.
Oil and petroleum: HBase is used in the oil and petroleum industry to store exploration data for
analysis and predict probable places where oil can be found.
University Question Papers
Subject – 4
Text:
1. Han, Jiawei Kamber, Micheline Pei and Jian, “Data Mining: Concepts and Techniques”,
Elsevier Publishers, ISBN:9780123814791, 9780123814807.
2. Parag Kulkarni, “Reinforcement and Systemic Machine Learning for Decision Making” by
Wiley-IEEE Press, ISBN: 978-0-470-91999-6
References:
1. Matthew A. Russell, "Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn,
Google+, GitHub, and More" , Shroff Publishers, 2nd Edition, ISBN: 9780596006068.
Class Test (CT) [20 marks]:- Two class tests, 20 marks each, will be conducted in a semester and
out of these two , the average of best two will be selected for calculation of class test marks. Format
of question paper is same as university.
TA [5 marks]: Three/four assignments will be conducted in the semester. Teacher assessment will be
calculated on the basis of performance in assignments, class test and pre-university test
Attendance (AT) [5 marks]: Attendance marks will be given as per university policy.
1. Question paper will comprise of 3 Section A, B and C with internal choice of questions.
2. Section A contains 5 short answer type questions of 1 mark each. All questions are
compulsory. (Total 5 Marks)
3. Section B contains 4 medium answer type questions of 2.5 marks each. All questions are
compulsory. ( Total 10 marks)
4. Section C contains 1 long answer type questions of 5 marks. ( Total 5 marks)
Course Delivery :
The course will be delivered through lectures, assignment/tutorial sessions, classroom interaction,
and presentations.
Course Objectives:
Course Outcomes:
On completion of the course, student will be able to–
1. CO1: Apply basic, intermediate and advanced techniques to mine the data
2. CO2: Analyze the output generated by the process of data mining
3. CO3: Explore the hidden patterns in the data
4. CO4: Optimize the mining process by choosing best data mining technique
CO-PO Mapping
Course PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
Outcomes
CO1 1
CO2 2
CO3 2
CO4 2
Justification Of CO-PO Mapping
Unit - 1.
Q.1 What are the Steps involved in data pre-processing? Discuss.
Q.2 Explain the concept hierarchy.
Q.3 Describe the functions of various components in a typical Multi-tiered Data Warehouse
architecture with the block diagram.
Q.4 Describes the application and trends in data mining in details.
Q.5 Explain in detail z-score normalization and decimal scaling.
Unit – 2:
Q.1 What is Multi-Dimensional Modeling? What is the use of Snow Flake Schema.
Q.2 What is the difference between OLTP an OLAP?
Q.3 Draw and explain the architecture of a typical data mining system.
Q.4 Discuss the various OLAP operations which can be performed on a multidimensional data cube.
Q.5 Explain the Process of Data Warehouse Design with suitable diagram.
Unit – 3 :
Unit -4 :
Unit – 6:
Q.1 Which classification algorithm would you recommend for multiclass classification where the
number of classes is large? Explain.
Q.2 Write a short note on-
1. Accuracy,
2. Error Rate,
3. Precision,
4. Recall
Q.3 How to evaluate the accuracy of classifier using Holdout Method? Explain with example.
Q.4 Difference between Wholistic learning and multi-perspective learning.
Q.5 What is the purpose of performing cross-validation? Give one example.
Assignment 1
1. Binning Method:
This method works on sorted data in order to smooth it. The whole data is divided
into segments of equal size and then various methods are performed to complete the
task. Each segmented is handled separately. One can replace all data in a segment by
its mean or boundary values can be used to complete the task.
2. Regression:
Here data can be made smooth by fitting it to a regression function.The regression
used may be linear (having one independent variable) or multiple (having multiple
independent variables).
3. Clustering:
This approach groups the similar data in a cluster. The outliers may be undetected or
it will fall outside the clusters.
2. Data Transformation:
This step is taken in order to transform the data in appropriate forms suitable for mining process.
This involves following ways:
1. Normalization:
It is done in order to scale the data values in a specified range (-1.0 to 1.0 or 0.0 to 1.0)
2. Attribute Selection:
In this strategy, new attributes are constructed from the given set of attributes to help the
mining process.
3. Discretization:
This is done to replace the raw values of numeric attribute by interval levels or conceptual
levels.
4. Concept Hierarchy Generation:
Here attributes are converted from level to higher level in hierarchy. For Example-The
attribute “city” can be converted to “country”.
3. Data Reduction:
Since data mining is a technique that is used to handle huge amount of data. While working with
huge volume of data, analysis became harder in such cases. In order to get rid of this, we uses data
reduction technique. It aims to increase the storage efficiency and reduce data storage and analysis
costs.
The various steps to data reduction are:
Concept Hierarchy reduce the data by collecting and replacing low level concepts (such as numeric
values for the attribute age) by higher level concepts (such as young, middle-aged, or senior).
Concept hierarchy generation for numeric data is as follows:
Basic It is an online transactional system and manages It is an online data retrieving and
database modification. data analysis system.
Focus Insert, Update, Delete information from the Extract data for analyzing that helps
database in decision making
Data OLTP and its transactions are the original source Different OLTPs database becomes
of data. the source of data for OLAP.
Normalization Tables in OLTP database are normalized (3NF). Tables in OLAP database are not
normalized.
Integrity OLTP database must maintain data integrity OLAP database does not get
constraint. frequently modified. Hence, data
integrity is not affected.
Data mining is widely used in diverse areas. There are a number of commercial data mining system
available today and yet there are many challenges in this field. In this tutorial, we will discuss the
applications and the trend of data mining.
The financial data in banking and financial industry is generally reliable and of high quality which
facilitates systematic data analysis and data mining. Some of the typical cases are as follows −
● Design and construction of data warehouses for multidimensional data analysis and data
mining.
● Loan payment prediction and customer credit policy analysis.
● Classification and clustering of customers for targeted marketing.
● Detection of money laundering and other financial crimes.
Retail Industry
Data Mining has its great application in Retail Industry because it collects large amount of data from
on sales, customer purchasing history, goods transportation, consumption and services. It is natural
that the quantity of data collected will continue to expand rapidly because of the increasing ease,
availability and popularity of the web.
Data mining in retail industry helps in identifying customer buying patterns and trends that lead to
improved quality of customer service and good customer retention and satisfaction. Here is the list of
examples of data mining in the retail industry −
● Design and Construction of data warehouses based on the benefits of data mining.
● Multidimensional analysis of sales, customers, products, time and region.
● Analysis of effectiveness of sales campaigns.
● Customer Retention.
● Product recommendation and cross-referencing of items.
Telecommunication Industry
Today the telecommunication industry is one of the most emerging industries providing various
services such as fax, pager, cellular phone, internet messenger, images, e-mail, web data
transmission, etc. Due to the development of new computer and communication technologies, the
telecommunication industry is rapidly expanding. This is the reason why data mining is become very
important to help and understand the business.
Data mining in telecommunication industry helps in identifying the telecommunication patterns,
catch fraudulent activities, make better use of resource, and improve quality of service. Here is the
list of examples for which data mining improves telecommunication services −
In recent times, we have seen a tremendous growth in the field of biology such as genomics,
proteomics, functional Genomics and biomedical research. Biological data mining is a very
important part of Bioinformatics. Following are the aspects in which data mining contributes for
biological data analysis −
● Semantic integration of heterogeneous, distributed genomic and proteomic databases.
● Alignment, indexing, similarity search and comparative analysis multiple nucleotide
sequences.
● Discovery of structural patterns and analysis of genetic networks and protein pathways.
● Association and path analysis.
● Visualization tools in genetic data analysis.
The applications discussed above tend to handle relatively small and homogeneous data sets for
which the statistical techniques are appropriate. Huge amount of data have been collected from
scientific domains such as geosciences, astronomy, etc. A large amount of data sets is being
generated because of the fast numerical simulations in various fields such as climate and ecosystem
modeling, chemical engineering, fluid dynamics, etc. Following are the applications of data mining
in the field of Scientific Applications −
Intrusion Detection
Intrusion refers to any kind of action that threatens integrity, confidentiality, or the availability of
network resources. In this world of connectivity, security has become the major issue. With increased
usage of internet and availability of the tools and tricks for intruding and attacking network prompted
intrusion detection to become a critical component of network administration. Here is the list of areas
in which data mining technology may be applied for intrusion detection −
The multidimensional data model is an integral part of On-Line Analytical Processing, or OLAP.
Because OLAP is on-line, it must provide answers quickly; analysts pose iterative queries during
interactive sessions, not in batch jobs that run overnight. And because OLAP is also analytic, the
queries are complex. The multidimensional data model is designed to solve complex queries in real
time. The multidimensional data model is important because it enforces simplicity.
What is snowflaking?
The snowflake design is the result of further expansion and normalized of the dimension table. In
other words, a dimension table is said to be snowflaked if the low-cardinality attribute of the
dimensions have been divided into separate normalized tables. These tables are then joined to the
original dimension table with referential constraints(foreign key constraint).
Generally, snowflaking is not recommended in the dimension table, as it hampers the
understandability and performance of the dimension model as more tables would be required to be
joined to satisfy the queries.
Characteristics of snowflake schema:
The dimension model of snowflake under the following conditions:
Advantages:
There are two main advantages of snowflake schema given below:
● It provides structured data which reduces the the problem of data integrity.
● It uses small disk space because data are highly structured.
Disadvantages:
● Snowflaking reduces space consumed by dimension tables, but compared with the entire data
warehouse the saving is usually insignificant.
● Avoid snowflaking or normalization of a dimension table, unless required and appropriate.
● Do not snowflake hierarchies of one dimension table into separate tables. Hierarchies should
belong to the dimension table only and should never be snowfalked.
● Multiple hierarchies can belong to the same dimension has been designed at the lowest
possible detail.
Generally a data warehouses adopts a three-tier architecture. Following are the three tiers of the data
warehouse architecture.
● Bottom Tier − The bottom tier of the architecture is the data warehouse database server. It is
the relational database system. We use the back end tools and utilities to feed data into the
bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh
functions.
● Middle Tier − In the middle tier, we have the OLAP Server that can be implemented in
either of the following ways.
○ By Relational OLAP (ROLAP), which is an extended relational database management
system. The ROLAP maps the operations on multidimensional data to standard
relational operations.
○ By Multidimensional OLAP (MOLAP) model, which directly implements the
multidimensional data and operations.
● Top-Tier − This tier is the front-end client layer. This layer holds the query tools and
reporting tools, analysis tools and data mining tools.
● Virtual Warehouse
● Data mart
● Enterprise Warehouse
Virtual Warehouse
The view over an operational data warehouse is known as a virtual warehouse. It is easy to build a
virtual warehouse. Building a virtual warehouse requires excess capacity on operational database
servers.
Data Mart
Data mart contains a subset of organization-wide data. This subset of data is valuable to specific
groups of an organization.
In other words, we can claim that data marts contain data specific to a particular group. For example,
the marketing data mart may contain data related to items, customers, and sales. Data marts are
confined to subjects.
● Window-based or Unix/Linux-based servers are used to implement data marts. They are
implemented on low-cost servers.
● The implementation data mart cycles is measured in short periods of time, i.e., in weeks
rather than months or years.
● The life cycle of a data mart may be complex in long run, if its planning and design are not
organization-wide.
● Data marts are small in size.
● Data marts are customized by department.
● The source of a data mart is departmentally structured data warehouse.
● Data mart are flexible.
Enterprise Warehouse
● An enterprise warehouse collects all the information and the subjects spanning an entire
organization
● It provides us enterprise-wide data integration.
● The data is integrated from operational systems and external information providers.
● This information can vary from a few gigabytes to hundreds of gigabytes, terabytes or
beyond.
Assignment 2
Generally attribute constitutes a character and explains the characteristics of an entity. In database
management system (DBMS) it assigns a database component or database field. Attribute stores or
saves only a piece of data. For example, in an invoice the attribute may be the price or date.
Another example: consider the entity student and it has the attribute like student- Lname, student-
Fname, student-Email, student-phone and many.
Types of Attributes with Examples
The different types of attributes are as follows
● Example: Any manufactured product can have only one serial no. , but the single valued
attribute cannot be simple valued attribute because it can be subdivided. Likewise in the
above example the serial no. can be subdivided on the basis of region, part no. etc.
Multi Valued Attributes: These are the attributes which can have multiple values for a single or
same entity.
● Example: Car’s colors can be divided into many colors like for roof, trim.
● The notation for multi valued attribute is:
● Example: Entity Employee Name can be divided into sub divisions like FName, MName,
LName.
● Example: The entities like age, marital status cannot be subdivided and are simple attributes.
Stored Attributes: Attribute that cannot be derived from other attributes are called as stored
attributes.
● Example: Today’s date and age can be derived. Age can be derived by the difference between
current date and date of birth.
● The notation for the derived attribute is:
● Example: A person can have more than one residence; each residence can have more than one
phone.
Key Attributes: This attribute represents the main characteristic of an entity i.e. primary key. Key
attribute has clearly different value for each element in an entity set.
● Example: The entity student ID is a key attribute because no other student will have the same
ID.
● Example: Taking the example of a college, there the student’s name is a vital thing.
Fig 9: sample of required attribute
Optional / Null value Attributes: It does not have a value and can be left blank, it’s optional can be
filled or cannot be.
● Example: Considering the entity student there the student’s middle name and the email ID is
optional.
Cosine similarity measures the similarity between two vectors of an inner product space. It is
measured by the cosine of the angle between two vectors and determines whether two vectors are
pointing in roughly the same direction. It is often used to measure document similarity in text
analysis.
A document can be represented by thousands of attributes, each recording the frequency of a
particular word (such as a keyword) or phrase in the document. Thus, each document is an object
represented by what is called a term-frequency vector. For example, in Table 2.5, we see that
Document1 contains five instances of the word team, while hockey occurs three times. The word
coach is absent from the entire document, as indicated by a count value of 0. Such data can be highly
asymmetric.
Term-frequency vectors are typically very long and sparse (i.e., they have many 0 values).
Applications using such structures include information retrieval, text document clustering, biological
taxonomy, and gene feature mapping. The traditional distance measures that we have studied in this
chapter do not work well for such sparse numeric data. For example, two term-frequency vectors
may have many 0 values in common, meaning that the corresponding documents do not share many
words, but this does not make them similar. We need a measure that will focus on the words that the
two documents do have in common, and the occurrence frequency of such words. In other words, we
need a measure for numeric data that ignores zero-matches.
● A frequent itemset is an itemset whose support is greater than some user-specified minimum
support (denoted Lk, where k is the size of the itemset)
● A candidate itemset is a potentially frequent itemset (denoted Ck, where k is the size of the
itemset)
Pass 1
Step 1 : Create a table containing support count of each item present in data set - Called C1
(Candidate set)
Compare candidate set item’s support count with minimum support count. This gives the
itemset L1
Step-2: K=2
Generate candidate set C2 using L1 (this is called join step). Condition of joining Lk-1 and Lk-1 is
that it should have (K-2) elements in common. Check all subsets of an itemset are frequent or not and
if not frequent remove that itemset.(Example subset of{I1, I2} are {I1}, {I2} they are frequent.Check
for each itemset) Now find support count of these itemsets by searching in dataset.
(II) compare candidate (C2) support count with minimum support count(here min_support=2 if
support_count of candidate set item is less than min_support then remove those items) this gives us
itemset L2.
Step-3:
● Generate candidate set C3 using L2 (join step). Condition of joining Lk-1 and Lk-1 is that it should
have (K-2) elements in common. So here, for L2, first element should match. So itemset generated by
joining L2 is {I1, I2, I3}{I1, I2, I5}{I1, I3, i5}{I2, I3, I4}{I2, I4, I5}{I2, I3, I5}
● Check if all subsets of these itemsets are frequent or not and if not, then remove that itemset.(Here
subset of {I1, I2, I3} are {I1, I2},{I2, I3},{I1, I3} which are frequent. For {I2, I3, I4}, subset {I3, I4} is
not frequent so remove it. Similarly check for every itemset) find support count of these remaining
support_count of candidate set item is less than min_support then remove those items) this gives us
itemset L3.
Step-4:
● Generate candidate set C4 using L3 (join step). Condition of joining Lk-1 and Lk-1 (K=4) is
that, they should have (K-2) elements in common. So here, for L3, first 2 elements (items)
should match.
● Check all subsets of these itemsets are frequent or not (Here itemset formed by joining L3 is
{I1, I2, I3, I5} so its subset contains {I1, I3, I5}, which is not frequent). So no itemset in C4
● We stop here because no frequent itemsets are found further
●
Fp Growth Algorithm
Fp Growth Algorithm (Frequent pattern growth). FP growth algorithm is an improvement of apriori
algorithm. FP growth algorithm used for finding frequent itemset in a transaction database without
candidate generation.
FP growth represents frequent items in frequent pattern trees or FP-tree.
K-Nearest Neighbours
K-Nearest Neighbors is one of the most basic yet essential classification algorithms in Machine
Learning. It belongs to the supervised learning domain and finds intense application in pattern
recognition, data mining and intrusion detection.
It is widely disposable in real-life scenarios since it is non-parametric, meaning, it does not make any
underlying assumptions about the distribution of data (as opposed to other algorithms such as GMM,
which assume a Gaussian distribution of the given data).
We are given some prior data (also called training data), which classifies coordinates into groups
identified by an attribute.
As an example, consider the following table of data points containing two features:
Algorithm:
Let m be the number of training data samples. Let p be an unknown point.
1. Store the training samples in an array of data points arr[]. This means each element of this
array represents a tuple (x, y), for i=0 to m:
2. Calculate Euclidean distance d(arr[i], p).
3. Make set S of K smallest distances obtained. Each of these distances correspond to an already
classified data point.
4. Return the majority label among S.
Now, given another set of data points (also called testing data), allocate these points a group by
analyzing the training set. Note that the unclassified points are marked as ‘White’.
Intuition
If we plot these points on a graph, we may be able to locate some clusters, or groups. Now, given an
unclassified point, we can assign it to a group by observing what group its nearest neighbours belong
to. This means, a point close to a cluster of points classified as ‘Red’ has a higher probability of
getting classified as ‘Red’.
Intuitively, we can see that the first point (2.5, 7) should be classified as ‘Green’ and the second point
(5.5, 4.5) should be classified as ‘Red’.
IF-THEN Rules
Rule-based classifier makes use of a set of IF-THEN rules for classification. We can express a rule in
the following from −
IF condition THEN conclusion
Let us consider a rule R1,
R1: IF age = youth AND student = yes
THEN buy_computer = yes
Points to remember −
A Bayesian Belief Network (BBN) , or simply Bayesian Network, is a statistical model used to
describe the conditional dependencies between different random variables.
BBNs are chiefly used in areas like computational biology and medicine for risk analysis and
decision support (basically, to understand what caused a certain problem, or the probabilities of
different effects given an action).
The shown example, ‘Burglary-Alarm‘ is one of the most quoted ones in texts on Bayesian theory.
04. Write a short note on-Accuracy, Error Rate 02 6 CO-2 2
Accuracy :
Accuracy is an indicator of a measurement that is true. Simply put, we are looking at how close is the
average of all measurements to the real value of what is measured. So, in average, we are really
measuring what we say we are measuring. If we were to use shooting as an example, high accuracy
would mean that the average of all shots taken is right at the target, or very close to it. In the case of
web guiding, accurate sensing and guiding of material means the average sensing and placement of
the material is very close to the true and desired position. However, the spread of individual
occasions of the positioning of the material might be so wide that it makes the accuracy useless. In
the shooting example, we would have a wide pattern with the average on the target.
Error Rate:
The degree of errors encountered during data transmission over a communications or network
connection. The higher the error rate, the less reliable the connection or data transfer will be.
The term error rate can refer to anything where errors can occur. For example, when taking a typing
test that measures errors an error rate is used to calculate your final score or net WPM.
If we had access to an unlimited number of examples these questions have a simple answer. Choose
the model that provides the lowest error rate on the entire population and, of course, that error rate is
the true error rate
In real applications we only have access to a finite set of examples, usually smaller than we wanted.
One approach is to use the entire training data to select our classifier and estimate the error rate
This naïve approach has two fundamental problems
The final model will normally overfit the training data
The error rate estimate will be overly optimistic (lower than the true error rate) , Actually, it is not
uncommon to have 100% correct classification on training data
A much better approach is to split the training data into disjoint subsets: the holdout method
The holdout method
Split dataset into two groups
Training set: used to train the classifier
Test set (or ‘hold out’ set) : used to estimate the error rate of the trained classifier
The holdout method has two basic drawbacks
In problems where we have a small dataset we may not be able to afford the “luxury” of setting aside
a portion of the dataset for testing
Since it is a single train-and-test experiment, the holdout estimate of performance (for example error
rate) will be misleading if we happen to get an “unfortunate” split between train and test
The limitations of the holdout can be overcome with a family of resampling methods at the expense
of more computations
Cross Validation
Random Subsampling
K-Fold Cross-Validation
Leave-one-out Cross-Validation
Bootstrap
Class test Question Papers
CLASS TEST- I
(AY 2018-19)
Branch: B.E. Computer Engineering Date: 27/07/2018
1. Binning Method:
This method works on sorted data in order to smooth it. The whole data is divided
into segments of equal size and then various methods are performed to complete the
task. Each segmented is handled separately. One can replace all data in a segment by
its mean or boundary values can be used to complete the task.
2. Regression:
Here data can be made smooth by fitting it to a regression function.The regression
used may be linear (having one independent variable) or multiple (having multiple
independent variables).
3. Clustering:
This approach groups the similar data in a cluster. The outliers may be undetected or
it will fall outside the clusters.
2. Data Transformation:
This step is taken in order to transform the data in appropriate forms suitable for mining process.
This involves following ways:
1. Normalization:
It is done in order to scale the data values in a specified range (-1.0 to 1.0 or 0.0 to
1.0)
2. Attribute Selection:
In this strategy, new attributes are constructed from the given set of attributes to help the
mining process.
3. Discretization:
This is done to replace the raw values of numeric attribute by interval levels or conceptual
levels.
3. Data Reduction:
Since data mining is a technique that is used to handle huge amount of data. While working with
huge volume of data, analysis became harder in such cases. In order to get rid of this, we uses data
reduction technique. It aims to increase the storage efficiency and reduce data storage and analysis
costs.
The various steps to data reduction are:
Concept Hierarchy reduce the data by collecting and replacing low level concepts (such as numeric
values for the attribute age) by higher level concepts (such as young, middle-aged, or senior).
Concept hierarchy generation for numeric data is as follows:
Focus Insert, Update, Delete information from Extract data for analyzing that helps in
the database decision making
Data OLTP and its transactions are the original Different OLTPs database becomes the
source of data. source of data for OLAP.
Normalization Tables in OLTP database are normalized Tables in OLAP database are not
(3NF). normalized.
Integrity OLTP database must maintain data OLAP database does not get frequently
integrity constraint. modified. Hence, data integrity is not
affected.
The multidimensional data model is an integral part of On-Line Analytical Processing, or OLAP.
Because OLAP is on-line, it must provide answers quickly; analysts pose iterative queries during
interactive sessions, not in batch jobs that run overnight. And because OLAP is also analytic, the
queries are complex. The multidimensional data model is designed to solve complex queries in real
time. The multidimensional data model is important because it enforces simplicity.
What is snowflaking?
The snowflake design is the result of further expansion and normalized of the dimension table. In
other words, a dimension table is said to be snowflaked if the low-cardinality attribute of the
dimensions have been divided into separate normalized tables. These tables are then joined to the
original dimension table with referential constraints(foreign key constraint).
Generally, snowflaking is not recommended in the dimension table, as it hampers the
understandability and performance of the dimension model as more tables would be required to be
joined to satisfy the queries.
Characteristics of snowflake schema:
The dimension model of snowflake under the following conditions:
Advantages:
There are two main advantages of snowflake schema given below:
● It provides structured data which reduces the the problem of data integrity.
● It uses small disk space because data are highly structured.
Disadvantages:
● Snowflaking reduces space consumed by dimension tables, but compared with the entire data
warehouse the saving is usually insignificant.
● Avoid snowflaking or normalization of a dimension table, unless required and appropriate.
● Do not snowflake hierarchies of one dimension table into separate tables. Hierarchies should
belong to the dimension table only and should never be snowfalked.
● Multiple hierarchies can belong to the same dimension has been designed at the lowest
possible detail.
CLASS TEST- I
(AY 2018-19)
Branch: Computer Engineering Date: 27/07/2018
To design an effective and efficient data warehouse, we need to understand and analyze the business
needs and construct a business analysis framework. Each person has different views regarding the
design of a data warehouse. These views are as follows −
● The top-down view − This view allows the selection of relevant information needed for a
data warehouse.
● The data source view − This view presents the information being captured, stored, and
managed by the operational system.
● The data warehouse view − This view includes the fact tables and dimension tables. It
represents the information stored inside the data warehouse.
● The business query view − It is the view of the data from the viewpoint of the end-user.
● Bottom Tier − The bottom tier of the architecture is the data warehouse database server. It is
the relational database system. We use the back end tools and utilities to feed data into the
bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh
functions.
● Middle Tier − In the middle tier, we have the OLAP Server that can be implemented in
either of the following ways.
○ By Relational OLAP (ROLAP), which is an extended relational database management
system. The ROLAP maps the operations on multidimensional data to standard
relational operations.
○ By Multidimensional OLAP (MOLAP) model, which directly implements the
multidimensional data and operations.
● Top-Tier − This tier is the front-end client layer. This layer holds the query tools and
reporting tools, analysis tools and data mining tools.
Data mining is widely used in diverse areas. There are a number of commercial data mining system
available today and yet there are many challenges in this field.
The financial data in banking and financial industry is generally reliable and of high quality which
facilitates systematic data analysis and data mining. Some of the typical cases are as follows −
● Design and construction of data warehouses for multidimensional data analysis and data
mining.
● Loan payment prediction and customer credit policy analysis.
● Classification and clustering of customers for targeted marketing.
● Detection of money laundering and other financial crimes.
Retail Industry
Data Mining has its great application in Retail Industry because it collects large amount of data from
on sales, customer purchasing history, goods transportation, consumption and services. It is natural
that the quantity of data collected will continue to expand rapidly because of the increasing ease,
availability and popularity of the web.
Data mining in retail industry helps in identifying customer buying patterns and trends that lead to
improved quality of customer service and good customer retention and satisfaction. Here is the list of
examples of data mining in the retail industry −
● Design and Construction of data warehouses based on the benefits of data mining.
● Multidimensional analysis of sales, customers, products, time and region.
● Analysis of effectiveness of sales campaigns.
● Customer Retention.
● Product recommendation and cross-referencing of items.
Telecommunication Industry
Today the telecommunication industry is one of the most emerging industries providing various
services such as fax, pager, cellular phone, internet messenger, images, e-mail, web data
transmission, etc. Due to the development of new computer and communication technologies, the
telecommunication industry is rapidly expanding. This is the reason why data mining is become very
important to help and understand the business.
Data mining in telecommunication industry helps in identifying the telecommunication patterns,
catch fraudulent activities, make better use of resource, and improve quality of service. Here is the
list of examples for which data mining improves telecommunication services −
In recent times, we have seen a tremendous growth in the field of biology such as genomics,
proteomics, functional Genomics and biomedical research. Biological data mining is a very
important part of Bioinformatics. Following are the aspects in which data mining contributes for
biological data analysis −
OLAP is a category of software that allows users to analyze information from multiple database
systems at the same time. It is a technology that enables analysts to extract and view business data
from different points of view. OLAP stands for Online Analytical Processing.
Analysts frequently need to group, aggregate and join data. These operations in relational databases
are resource intensive. With OLAP data can be pre-calculated and pre-aggregated, making analysis
faster.
OLAP databases are divided into one or more cubes. The cubes are designed in such a way that
creating and viewing reports become easy.
1. Roll-up
2. Drill-down
3. Slice and dice
4. Pivot (rotate)
1) Roll-up:
Roll-up is also known as "consolidation" or "aggregation." The Roll-up operation can be performed
in 2 ways
1. Reducing dimensions
2. Climbing up concept hierarchy. Concept hierarchy is a system of grouping things based on
their order or level.
2) Drill-down
In drill-down data is fragmented into smaller parts. It is the opposite of the rollup process. It can be
done via
● Quater Q1 is drilled down to months January, February, and March. Corresponding sales are
also registers.
● In this example, dimension months are added.
3) Slice:
Here, one dimension is selected, and a new sub-cube is created.
Following diagram explain how slice operation performed:
3)Dice:
This operation is similar to a slice. The difference in dice is you select 2 or more dimensions that
result in the creation of a sub-cube.
4) Pivot
In Pivot, you rotate the data axes to provide a substitute presentation of data.
In the following example, the pivot is based on item types.
04. Draw and explain the architecture of a typical 05 2 CO-1 1
data mining system.
The architecture of a typical data mining system may have the following major components
Database, data warehouse, World Wide Web, or other information repository:
This is one or a set of databases, data warehouses, spreadsheets, or other kinds of information
repositories. Data cleaning and data integration techniques may be performed on the data.
Database or data warehouse server:
The database or data warehouse server is responsible for fetching the relevant data, based on the
user’s data mining request.
Knowledge base:
This is the domain knowledge that is used to guide the search or evaluate the interestingness of
resulting patterns. Such knowledge can include concept hierarchies, used to organize attributes or
attribute values into different levels of abstraction. Knowledge such as user beliefs, which can be
used to assess a pattern’s interestingness based on its unexpectedness, may also be included.
Data mining engine:
This is essential to the data mining system and ideally consists of a set of functional modules for
tasks such as characterization, association and correlation analysis, classification, prediction, cluster
analysis, outlier analysis, and evolution analysis.
Pattern evaluation module:
This component typically employs interestingness measures and interacts with the data mining
modules so as to focus the search toward interesting patterns.
It may use interestingness thresholds to filter out discovered patterns.
Alternatively, the pattern evaluation module may be integrated with the mining module, depending
on the implementation of the data mining method used.
For efficient data mining, it is highly recommended to push the evaluation of pattern interestingness
as deep as possible into the mining process so as to confine the search to only the interesting patterns.
User interface:
This module communicates between users and the data mining system, allowing the user to interact
with the system by specifying a data mining query or task, providing information to help focus the
search, and performing exploratory data mining based on the intermediate data mining results.
In addition, this component allows the user to browse database and data warehouse schemas or data
structures, evaluate mined patterns, and visualize the patterns in different forms.
DIAGRAM:
Q.1. Unlike traditional production rules, association rules
allow the same variable to be an input attribute in one rule and an output attribute in another rule.
Q.2. The apriori algorithm is used for the following data mining task
Association
Market basket analysis is applied to various fields of the retail sector in order to boost sales and
generate revenue by identifying the needs of the customers and make purchase suggestions to them.
1. Cross Selling: Cross-selling is basically a sales technique in which seller suggests some
related product to a customer after he buys a product. A seller influences the customer to
spend more by purchasing more products related to the product that has already been
purchased by him. For instance, if someone buys milk from a store, the seller asks or suggests
him to buy coffee or tea as well. So basically the seller suggests the complementary product
to the customer with the product that he has already purchased. Market basket analysis helps
the retailer to know the consumer behavior and then go for cross-selling.
2. Product Placement: It refers to placing the complimentary (pen and paper)and substitute
goods (tea and coffee) together so that the customer addresses the goods and will buy both
the goods together. If a seller places these kinds of goods together there is a probability that a
customer will purchase them together. Market basket analysis helps the retailer to identify the
goods that a customer can purchase together.
3. Affinity Promotion: Affinity promotion is a method of promotion that design promotional
events based on associated products. Market basket analysis affinity promotion is a useful
way to prepare and analyze questionnaire data.
4. Fraud Detection: Market basket analysis is also applied to fraud detection. It may be
possible to identify purchase behavior that can associate with fraud on the basis of market
basket analysis data that contain credit card usage. Hence market basket analysis is also
useful in fraud detection.
5. Customer Behavior: Market basket analysis helps to understand customer behavior. It
understands the customer behavior under different conditions. It provides an insight into
customer behavior. It allows the retailer to identify the relationship between two products that
people tend to buy and hence helps to understand the customer behavior towards a product or
service.
Hence, market basket analysis helps the retailer to get an insight into customer behavior and to
understand the relationship between two or more goods so that they can offer or do purchase
suggestions to their customers so that they will buy more from their stores and they can earn great
revenue.
● Support
● Confidence
● Lift
Support is the default popularity of any item. You calculate the Support as a quotient of the division
of the number of transactions containing that item by the total number of transactions. Hence, in our
example,
Support (Jam) = (Transactions involving jam) / (Total Transactions)
= 200/2000 = 10%
Confidence
In our example, Confidence is the likelihood that customer bought both bread and jam. Dividing the
number of transactions that include both bread and jam by the total number of transactions will give
the Confidence figure.
Confidence = (Transactions involving both bread and jam) / (Total Transactions involving jam)
= 100/200 = 50%
It implies that 50% of customers who bought jam bought bread as well.
Lift
According to our example, Lift is the increase in the ratio of the sale of bread when you sell jam. The
mathematical formula of Lift is as follows.
Lift = (Confidence (Jam͢͢ – Bread)) / (Support (Jam))
= 50 / 10 = 5
It says that the likelihood of a customer buying both jam and bread together is 5 times more than the
chance of purchasing jam alone. If the Lift value is less than 1, it entails that the customers are
unlikely to buy both the items together. Greater the value, the better is the combination.
Consider a supermarket scenario where the itemset is I = {Onion, Burger, Potato, Milk, Beer}. The
database consists of six transactions where 1 represents the presence of the item and 0 the absence.
Step 1
Create a frequency table of all the items that occur in all the transactions. Now, prune the frequency
table to include only those items having a threshold support level over 50%. We arrive at this
frequency table.
Step 2
Make pairs of items such as OP, OB, OM, PB, PM, BM. This frequency table is what you arrive at.
Step 3
Apply the same threshold support of 50% and consider the items that exceed 50% (in this case 3 and
above).
Thus, you are left with OP, OB, PB, and PM
Step 4
Look for a set of three items that the customers buy together. Thus we get this combination.
Step 5
Determine the frequency of these two itemsets. You get this frequency table.
If you apply the threshold assumption, you can deduce that the set of three items frequently
purchased by the customers is OPB
03. What are the issues regarding classification and 05 5 CO-3 3
prediction?
To select the K that’s right for your data, we run the KNN algorithm several times with
different values of K and choose the K that reduces the number of errors we encounter while
maintaining the algorithm’s ability to accurately make predictions when it’s given data it
hasn’t seen before.
1. As we decrease the value of K to 1, our predictions become less stable. Just think for a
minute, image K=1 and we have a query point surrounded by several reds and one green (I’m
thinking about the top left corner of the colored plot above), but the green is the single nearest
neighbor. Reasonably, we would think the query point is most likely red, but because K=1,
KNN incorrectly predicts that the query point is green.
2. Inversely, as we increase the value of K, our predictions become more stable due to majority
voting / averaging, and thus, more likely to make more accurate predictions (up to a certain
point). Eventually, we begin to witness an increasing number of errors. It is at this point we
know we have pushed the value of K too far.
3. In cases where we are taking a majority vote (e.g. picking the mode in a classification
problem) among labels, we usually make K an odd number to have a tiebreaker.
Advantages
Disadvantages
Q.3 a) Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8): Compute the 10 CO3 4
Euclidean distance between the two objects using q = 3..
b) Explain Proximity Measures for Nominal Attributes and Binary Attributes. CO4 2
OR
Q.4 a) Explain the Process of Data Warehouse Design with suitable diagram . CO3 1
10
b) Explain four types of attributes by giving appropriate example? CO2 4
Q.5 a.) The following is the list of large two item sets. Show the steps to apply the Apriori property to 16 CO1 2&3
generate and prune the candidates for large three itemsets. Describe how the Apriori property is used is
in the steps. Give the final list of candidate large three item sets {10,20} {10,30} {20,30} {20,40}
CO2 4
b.) Explain Mining Frequent Patterns using FP-Growth
OR
CO3 4
16
Q.6 a). What is rule based classifier? Explain how a rule based classifier works.
CO2 3
b). Write the algorithm for k-nearest neighbour classification
b.) Discuss the methods for estimating predictive accuracy of classification method. CO2 4
OR
Q.8 a.) Develop the Apriori Algorithm for generating frequent itemset. CO1 2
16
b.) What is the purpose of performing cross-validation? Give one example. CO3
1
Q.9 a.) Explain how the Bayesian Belief Networks are trained to perform classification. CO2 3
2
b.) What is Ruled-Base Classifier? Explain how a is Ruled-Base Classifier works.
CO1
OR
CO3
Q.10 a). Difference between Wholistic learning and multi-perspective learning. 2&3
b) Write a short note on-
1. Accuracy, 2. Error Rate, CO1 4
3. Precision, 4. Recall
Q.1.)
a)
x′:=(x−xmin)/(xmax−xmin)
(1.0-0)+ 0 = 0.716
b)
1. Parsing
2. Correcting
3. Standardizing
4. Matching
5. Conolidating
6. Data Cleaning
7. Data staging
c)
As written above, the main drawback of correlation is the linear relationship restriction. If the
correlation is null between two variables, they may be non-linearly related.
Q.2
a)
In context of data reduction in data mining there are a few basic methods of attribute subset
selection
1) Stepwise forward selection: This procedure begins with an empty set of attributes as the
reduced set (temporarily).Next the best among the original attributes is determined and added to
the reduced set. With each iteration the best among the remaining original attributes is added to
the reduced set.
2) Stepwise backward elimination: This procedure begins with full set of attributes .Each step
sees the worst attribute getting removed.
3) Combination of forward selection and backward elimination: Here the first two methods are
combined and the procedure at every step selects the best attribute and removes the worst.
4) Decision tree induction (ID3,C4.5 etc) are employed to construct a flowchart like structure
and each non leaf node is a test on attribute whereas the external node is a prediction. Algorithm
selects best attribute.
A tree is constructed and attributes that are not selected are assumed to be irrelevant.
b)
c)
Q.3)
Virtual Warehouse
Data mart
Enterprise Warehouse
Virtual Warehouse
The view over an operational data warehouse is known as a virtual warehouse. It is easy to build
a virtual warehouse. Building a virtual warehouse requires excess capacity on operational
database servers.
Data Mart
Data mart contains a subset of organization-wide data. This subset of data is valuable to specific
groups of an organization.
In other words, we can claim that data marts contain data specific to a particular group. For
example, the marketing data mart may contain data related to items, customers, and sales. Data
marts are confined to subjects.
Window-based or Unix/Linux-based servers are used to implement data marts. They are
implemented on low-cost servers.
The implementation data mart cycles is measured in short periods of time, i.e., in weeks
rather than months or years.
The life cycle of a data mart may be complex in long run, if its planning and design are
not organization-wide.
Enterprise Warehouse
An enterprise warehouse collects all the information and the subjects spanning an entire
organization
The data is integrated from operational systems and external information providers.
This information can vary from a few gigabytes to hundreds of gigabytes, terabytes or
beyond.
b)
c)
Concept Hierarchy reduce the data by collecting and replacing low level concepts (such as
numeric values for the attribute age) by higher level concepts (such as young, middle-aged, or
senior).
Concept hierarchy generation for numeric data is as follows:
Binning
Histogram analysis
Clustering analysis
Entropy-based discretization
Segmentation by natural partitioning
Binning
o In binning, first sort data and partition into (equi-depth) bins then one can smooth
by bin means, smooth by bin median, smooth by bin boundaries, etc.
Histogram analysis
o Histogram is a popular data reduction technique
o Divide data into buckets and store average (sum) for each bucket
o Can be constructed optimally in one dimension using dynamic programming
o Related to quantization problems.
Clustering analysis
o Partition data set into clusters, and one can store cluster representation only
o Can be very effective if data is clustered but not if data is “smeared”
o Can have hierarchical clustering and be stored in multi-dimensional index tree
structures
Q.4)
Bottom Tier − The bottom tier of the architecture is the data warehouse database server.
It is the relational database system. We use the back end tools and utilities to feed data
into the bottom tier. These back end tools and utilities perform the Extract, Clean, Load,
and refresh functions.
Middle Tier − In the middle tier, we have the OLAP Server that can be implemented in
either of the following ways.
o By Relational OLAP (ROLAP), which is an extended relational database
Top-Tier − This tier is the front-end client layer. This layer holds the query tools and
reporting tools, analysis tools and data mining tools.
The following diagram depicts the three-tier architecture of data warehouse –
b)
OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will
discuss OLAP operations in multidimensional data.
Roll-up
Drill-down
Pivot (rotate)
Roll-up
Roll-up performs aggregation on a data cube in any of the following ways −
By dimension reduction
When roll-up is performed, one or more dimensions from the data cube
are removed.
Slice
The slice operation selects one particular dimension from a given cube and
provides a new sub-cube. Consider the following diagram that shows how
slice works.
Here Slice is performed for the dimension "time" using the criterion time
= "Q1".
Dice
Dice selects two or more dimensions from a given cube and provides a new
sub-cube. Consider the following diagram that shows the dice operation.
The dice operation on the cube based on the following selection criteria
involves three dimensions.
c)
Fact Table- Fact table contains the measurement along the attributes of a dimension table.
Dimension Table- Dimension table contains the attributes along which fact table calculates the metric.
Q5)
a)
b)
Method 1:
Simple matching – The dissimilarity between two objects i and j can be computed based on the ratio of
mismatches:
– m is the number of matches (i.e., the number of variables for which i and j are in the same state)
Weights can be assigned to increase the effect of m or to assign greater weight to the matches in
variables having a larger number of states.
– creating a new asymmetric binary variable for each of the nominal states
– For an object with a given state value, the binary variable representing that state is set to 1, while the
remaining binary variables are set to 0.
– For example, to encode the categorical variable map _color, a binary variable can be created for each
of the five colors listed above.
– For an object having the color yellow, the yellow variable is set to 1, while the remaining four
variables are set to 0.
c)
a)
As you may have noted, these matrices representing the term frequencies
tend to be very sparse (with majority of terms zeroed), and that’s why you’ll
see a common representation of these matrix as sparse matrices.
b)
Step 1: if we replace each value for test-2 by its rank, the four objects are assigned the ranks 3, 1, 2, and
3, respectively.
Step 2: normalizes the ranking by mapping rank 1 to 0.0, rank 2 to 0.5, and rank 3 to 1.0.
Step 3: we can use, say, the Euclidean distance, which results in the following dissimilarity matrix:
c)
Data matrix
o n data points with p dimensions
o Two modes
Dissimilarity matrix
o n data points, but registers only the distance
o A triangular matrix
o Single mode
Total No. of Questions : 8] SEAT No. :
8
23
P3337 [5461]-597
[Total No. of Pages : 3
ic-
B.E. (Computer Engineering)
tat
8s
DATA MINING AND WAREHOUSING
8:5
(2015 Course) (Semester - I) (End Sem.) (410244D)
01 91
3:0
Time : 2½ Hours] [Max. Marks : 70
0
81
8/1 13
Instructions to the candidates:
1) Answer Q1 or Q2, Q3 or Q4, Q5 or Q6, Q7 or Q8.
0
2) Assume suitable data if necessary.
2/2
.23 GP
8
C
23
ic-
Q1) a) For the given attribute AGE values : 16, 16, 180, 4, 12, 24, 26, 28, apply
16
tat
following Binning technique for smoothing the noise. [6]
8.2
8s
i) Bin Medians
.24
8:5
91
ii) Bin Boundaries
49
3:0
iii) Bin Means
30
81
c) Calculate the Jaccard coefficient between Ram and Hari assuming that
GP
all binary attributes are a symmetric and for each pair values for an
8/1
8
23
Object Gender Food Caste Education Hobby Job
.23
tat
(0)
8.2
8s
8:5
91
(0)
49
3:0
30
(1)
01
01
OR
2/2
GP
i) Ordinal
80
.23
ii) Binary
16
iii) Nominal
8.2
P.T.O.
49
c) Calculate the Euclidean distance matrix for given Data points. [8]
8
point x y
23
ic-
p1 0 2
tat
p2 2 0
8s
p3 3 1
8:5
p4 5 1
01 91
3:0
0
Q3) a) A database has 6 transactions. Let minimum support = 60% and Minimum
81
8/1 13
confidence = 70% [8]
0
Transaction ID Items Bought
2/2
.23 GP
T1 {A, B, C, E}
T2 {A, C, D, E}
E
80
8
T3 {B, C, E}
C
23
T4 {A, C, D, E}
ic-
16
tat
T5 {C, D, E}
8.2
8s
T6 {A, D, E}
.24
8:5
i) Find Closed frequent Itemsets
91
49
b) Explain with example Multi level and Constraint based association Rule
01
01
mining. [5]
2/2
GP
OR
CE
80
8
Q4) a) Consider the Market basket transactions shown below. Assuming the
23
.23
3:0
i) Minimum Support
.23
iii) Support
8.2
iv) Confidence
.24
[5461]-597 2
49
Q5) a) Explain the training and testing phase using Decision Tree in detail.
8
Support your answer with relevant example. [8]
23
ic-
b) Apply KNN algorithm to find class of new tissue paper (X1 = 3,
tat
X2 = 7). Assume K = 3 [5 ]
8s
X1 = Acid Durability (secs) X2 = Strength(kg/sq.meter) Y = Classification
8:5
01 91
7 7 Bad
3:0
0
7 4 Bad
81
3
8/1 13 4 Good
0
2/2
1 4 Good
.23 GP
8
OR
C
23
Q6) a) What is Bayesian Belief Network. Elaborate the training process of a
ic-
16
tat
8.2
8s
b) Explain K-nearest neighbor classifier algorithm with suitable application.
.24
8:5
[5]
91
49
3:0
c) Elaborate on Associative Classification with appropriate applications.[4]
30
81
01
01
i) Specificity
CE
80
8
23
ii) Sensitivity
.23
c) ic-
Differentiate between Wholistic learning and Multi perspective learning.[4]
16
tat
8.2
8s
OR
.24
8:5
3:0
detail. [8]
30
81
i) Recall
CE
80
ii) Precision
.23
16
8.2
.24
[5461]-597 3
49
Subject – 5
Books:
Text:
Jochen Schiller, “Mobile Communications”, Pearson Education, Second Edition, 2004,
ISBN: 13: 978-8131724262.
Jason Yi-Bing Lin, Yi-Bing Lin, Imrich Chlamtac, “Wireless and Mobile network
Architecture”, 2005, Wiley Publication, ISBN: 978812651560.
Martin Sauter, “3G, 4G and Beyond: Bringing Networks, Devices and the Web Together”,
2012, ISBN-13: 978-1118341483
References:
Class Test (CT) [20 marks]:- Three class tests, 20 marks each, will be conducted in a semester and
out of these three, the average of best two will be selected for calculation of class test marks.
Format of question paper is same as university.
TA [5 marks]: Three/four assignments will be conducted in the semester. Teacher assessment will
be calculated on the basis of performance in assignments, class test and pre-university test.
Attendance (AT) [5 marks]: Attendance marks will be given as per university policy.
Paper pattern and marks distribution for PUT: Same as End semester exam.
Paper pattern and marks distribution for Prelim Exam: Same as End Semester Exam.
End Semester Examination [70 Marks]:Paper pattern and marks distribution for End Semester
Exam: As per university guidelines.
Lecture Plan
Mobile Communication
Course Delivery:
The course will be delivered through lectures, assignment/tutorial sessions, classroom
interaction, and presentations.
Course Objectives:
To understand the Personal Communication Services.
To learn the design parameters for setting up mobile network.
To know GSM architecture and support services.
To learn current technologies being used on field.
Course Outcomes:
On completion of the course, student will be able to–
CO1: Justify the Mobile Network performance parameters and design decisions.
CO2: Choose the modulation technique for setting up mobile network.
CO3: Formulate GSM/CDMA mobile network layout considering futuristic requirements which
conforms to the technology.
CO4: Use the 3G/4G technology based network with bandwidth capacity planning.
CO5: Percept to the requirements of next generation mobile network and mobile applications.
CO-PO Mapping:
Course PO PO PO PO PO PO PO PO PO PO1 PO1 PO12
1 2 3 4 5 6 7 8 9 0 1
Outcome
s
CO1 1 3 1 1 1 - 1 - - - 1 1
CO2 1 2 - 3 3 - - - - - 2 2
CO3 1 2 2 3 2 - - - - - 1 2
CO4 1 1 - 1 3 - 1 1
CO5 1 3 - 2 2 2 2 3
Justification of CO - PO Mapping
Unit-IV: GSM
Q.1 Explain Incoming and Outgoing Call setup?
Q.2 Draw & Explain GPRS Architecture?
Q.3 Draw & Explain GSM Architecture?
Q.4 Short Note on GSM Bursts & GSM Frame?
Q.5 Explain Physical and Logical Traffic?
Unit-V: Current 3G and 4G Technologies for GSM and CDMA
Q.1 Explain 1xRTT, EV-DO?
Q.2 Explain High Speed Packet Access?
Q.3 Draw & Explain W-CDMA Architecture?
Q.4 Short Note on
1. HSDPA
2. HSUPA
3. HSPA+
Q.5 Explain Long Term Evolution (LTE) in 4G?
Solution
A personal communications service (PCS) is a type of wireless mobile service with advanced
coverage and that delivers services at a more personal level. It generally refers to the modern
mobile communication that boosts the capabilities of conventional cellular networks and fixed-line
telephony networks as well.
PCS is also known as digital cellular.
A PCS works similarly to a cellular network in basic operations, but requires more service provider
infrastructure to cover a wider geographical area. A PCS generally includes the following:
PCS has three broad categories: narrowband, broadband and unlicensed. TDMA, CDMA and GSM,
and 2G, 3G and 4G are some of the common technologies that are used to deliver a PCS.
The permanent data associated with the mobile are those that do not change as it moves
from one area to another. On the other hand, temporary data changes from call to call. The HLR
interacts with MSCs mainly for the procedures of interrogation for routing calls to a MS and to
transfer charging information after call termination. Location registration is performed by HLR.
When the subscriber changes the VLR area, the HLR is informed about the address of the actual
VLR. The HLR updates the new VLR with all relevant subscriber data. Similarly, location
canceling is done by HLR. After the subscriber roams to a different VLR area, the HLR updates the
new VLR with all the relevant subscriber data. Supplementary services are add-ons to the basic
service. These parameters need not all be stored in the HLR. However, it is safer to store all
subscription parameters in the HLR even when some are stored in a subscriber card. The data stored
in the HLR is changed only by MMI action when new subscribers are added, old subscribers are
deleted, or the specific services to which they subscribe are changed and not dynamically updated
by the system.
b) VLR (Visitor location register):- A MS roaming in an MSC area is controlled by the VLR
responsible for that area. When a MS appears in a LA, it starts a registration procedure. The MSC
for that area notices this registration and transfers to the VLR the identity of the LA where the MS
is situated. A VLR may be in charge of one or several MSC LAs. The VLR constitutes the database
that supports the MSC in the storage and retrieval of the data of subscribers present in its area.
When an MS enters the MSC area borders, it signals its arrival to the MSC that stores its identity in
the VLR. The information necessary to manage the MS is contained in the HLR and is transferred
to the VLR so that they can be easily retrieved if so required.
The location registration procedure allows the subscriber data to follow the movements of the MS.
For such reasons the data contained in the VLR and in the HLR are more or less the same.
Nevertheless, the data are present in the VLR only as long as the MS is registered in the area related
to that VLR. The terms permanent and temporary, in this case, are meaningful only during that time
interval when the mobile is in the area of local MSCNLR combination. The data contained in the
VLR can be compared with the subscriber-related data contained in a normal fixed exchange; the
location information can be compared with the line equipment reference attached to each fixed
subscriber connected to that exchange. The VLR is responsible for assigning a new TMSI number
to the subscriber. It also relays the ciphering key from HLR to BSS.
Cells in the PLMN are grouped into geographic areas, and each is assigned a LAI, as shown in
Figure 2.2(c). Each VLR controls a certain set of LAs. When a mobile subscriber roams from one
LA to another, their current location is automatically updated in their VLR. If the old and new LAs
are under the control of two different VLRs, the entry on the old VLR is deleted and an entry is
created in the new VLR by copying the basic data from the HLR. The subscriber's current VLR
address, stored at the HLR, is also updated. This provides the information necessary to complete
calls to roaming mobiles. The VLR supports a mobile paging and tracking subsystem in the local
area where the mobile is presently roaming. The detailed functions of VLR are as follows.
Works with the HLR and AUC on authentication;
Relays cipher key from HLR to BSS for encryption decryption;
Controls allocation of new TMSI numbers; a subscriber's TMSI number can be periodically
changed to secure a subscriber's identity;
Supports paging;
Tracks state of all MSs in its area.
Different cellular standards handle hand over / handoff in slightly different ways. Therefore for the
sake of an explanation the example of the way that GSM handles handover is given.
There are a number of parameters that need to be known to determine whether a handover is
required. The signal strength of the base station with which communication is being made, along
with the signal strengths of the surrounding stations. Additionally the availability of channels also
needs to be known. The mobile is obviously best suited to monitor the strength of the base stations,
but only the cellular network knows the status of channel availability and the network makes the
decision about when the handover is to take place and to which channel of which cell.
Types of handover / handoff
With the advent of CDMA systems where the same channels can be used by several mobiles, and
where it is possible to adjacent cells or cell sectors to use the same frequency channel there are a
number of different types of handover that can be performed:
Hard handover (hard handoff)
Soft handover (soft handoff)
Fig:-Types of Handover
Hard handover
The definition of a hard handover or handoff is one where an existing connection must be broken
before the new one is established. One example of hard handover is when frequencies are changed.
As the mobile will normally only be able to transmit on one frequency at a time, the connection
must be broken before it can move to the new channel where the connection is re-established. This
is often termed and inter-frequency hard handover. While this is the most common form of hard
handoff, it is not the only one. It is also possible to have intra-frequency hard handovers where the
frequency channel remains the same.
Although there is generally a short break in transmission, this is normally short enough not to be
noticed by the user.
Soft handover
The new 3G technologies use CDMA where it is possible to have neighboring cells on the same fre-
quency and this opens the possibility of having a form of handover or handoff where it is not neces-
sary to break the connection. This is called soft handover or soft handoff, and it is defined as a han-
dover where a new connection is established before the old one is released. In UMTS most of the
handovers that are performed are intra-frequency soft handovers.
Equation 1
S= kN
The N cells which collectively use the complete set of available frequencies is called a cluster. If a
cluster is replicated M times within the system, the total number of duplex channels, C, can be used
as a measure of capacity and is given by
Equation 2
C = MkN = MS
As seen from equation 2, the capacity of a cellular system is directly proportional to the number of
times a cluster is replicated in a fixed service area.
The factor N is called the cluster size and is typically equal to 4, 7, or 12. Also it describe the
number of cells in the cluster.
So there are two ways to increase the system capacity.
1. Increase cluster size N.
2. Increase the number of allocated channels to each cell.
A large cluster size indicates that the ratio between the cell radius and the distance between co-
channel cells is small.
And a small cluster size indicates that co-channel cells are located much closer together.
The value for N is a function of how much interference a mobile or base station can tolerate while
maintaining a sufficient quality of communications. From a design point of view, the smallest
possible value of N is desirable in order to maximize capacity over a given coverage area (i.e., to
maximize C in Equation 2)).
The frequency reuse factor of a cellular system is given by 1/N, since each cell within a cluster is
only assigned 1/N of the total available channels in the system.
Due to the fact that the hexagonal geometry exactly six equidistant neighbors and that the lines
joining the centers of any cell and each of its neighbors are separated by multiples of 60 degrees,
there are only certain cluster sizes and cell layouts which are possible. In order to connect without
gaps between adjacent cells—the geometry of hexagons is such that the number of cells per cluster,
N, can only have values which satisfy Equation.
But now, as time passed by, the number of mobile users in the same area increased from 100 to 700.
Now if the same BS has to connect to these 700 users’ MS, obviously the BS will be overloaded. A
single BS, which served for 100 users is forced to serve for 700 users, which is impractical. To
reduce the load of this BS, we can use cell splitting. That is, we will divide the above single cell
into 7 separate adjacent cells, each having its own BS. This is shown in Fig .
Now, let us look into the big picture. Until now, we have discussed about cell splitting in a
small area. Now, we use this same concept to deal with large networks. In a large network, it is not
necessary to split up all the cells in all the clusters. Certain BSes can handle the traffic well if their
cells (coverage areas) are split up. Only those cells must be ideal for cell splitting. Fig 2-3 shows
network architecture with a few number of cells split up into smaller cells, without affecting the
other cells in the network.
Solution
FDMA:
Frequency division multiplexing (FDM) describes schemes to subdivide the frequency dimension
into several non-overlapping frequency bands.
Frequency Division Multiple Access is a method employed to permit several users to transmit
simultaneously on one satellite transponder by assigning a specific frequency within the channel to
each user. Each conversation gets its own, unique, radio channel. The channels are relatively
narrow, usually 30 KHz or less and are defined as either transmit or receive channels. A full duplex
conversation requires a transmit & receive channel pair. FDM is often used for simultaneous access
to the medium by base station and mobile station in cellular networks establishing a duplex channel.
A scheme called frequency division duplexing (FDD) in which the two directions, mobile station to
base station and vice versa are now separated using different frequencies.
Listening to different frequencies at the same time is quite difficult, but listening to many channels
separated in time at the same frequency is simple. Fixed schemes do not need identification, but are
not as flexible considering varying bandwidth requirements.
his general scheme still wastes a lot of bandwidth. It is too static, too inflexible for data
communication. In this case, connectionless, demand-oriented TDMA schemes can be used
TDMA:
A more flexible multiplexing scheme for typical mobile communications is time division
multiplexing (TDM). Compared to FDMA, time division multiple access (TDMA) offers a much
more flexible scheme, which comprises all technologies that allocate certain time slots for
communication. Now synchronization between sender and receiver has to be achieved in the time
domain. Again this can be done by using a fixed pattern similar to FDMA techniques, i.e.,
allocating a certain time slot for a channel, or by using a dynamic allocation scheme.
Listening to different frequencies at the same time is quite difficult, but listening to many channels
separated in time at the same frequency is simple. Fixed schemes do not need identification, but are
not as flexible considering varying bandwidth requirements.
Fixed TDM :-
The simplest algorithm for using TDM is allocating time slots for channels in a fixed pattern. This
results in a fixed bandwidth and is the typical solution for wireless phone systems. MAC is quite
simple, as the only crucial factor is accessing the reserved time slot at the right moment. If this
synchronization is assured, each mobile station knows its turn and no interference will happen. The
fixed pattern can be assigned by the base station, where competition between different mobile
stations that want to access the medium is solved.
The above figure shows how these fixed TDM patterns are used to implement multiple access and a
duplex channel between a base station and mobile station. Assigning different slots for uplink and
downlink using the same frequency is called time division duplex (TDD). As shown in the figure,
the base station uses one out of 12 slots for the downlink, whereas the mobile station uses one out of
12 different slots for the uplink. Uplink and downlink are separated in time. Up to 12 different
mobile stations can use the same frequency without interference using this scheme. Each connection
is allotted its own up- and downlink pair. This general scheme still wastes a lot of bandwidth. It is
too static, too inflexible for data communication. In this case, connectionless, demand-oriented
TDMA schemes can be used
Classical Aloha :-
In this scheme, TDM is applied without controlling medium access. Here each station can access
the medium at any time as shown below:
This is a random access scheme, without a central arbiter controlling access and without
coordination among the stations. If two or more stations access the medium at the same time, a
collision occurs and the transmitted data is destroyed. Resolving this problem is left to higher layers
(e.g., retransmission of data). The simple Aloha works fine for a light load and does not require any
complicated access mechanisms.
Slotted Aloha:-
The first refinement of the classical Aloha scheme is provided by the introduction of time slots
(slotted Aloha). In this case, all senders have to be synchronized, transmission can only start at the
beginning of a time slot as shown below.
The introduction of slots raises the throughput from 18 per cent to 36 per cent, i.e., slotting doubles
the throughput. Both basic Aloha principles occur in many systems that implement distributed
access to a medium. Aloha systems work perfectly well under a light load, but they cannot give any
hard transmission guarantees, such as maximum delay before accessing the medium or minimum
throughput.
CDMA:
Code division multiple access systems apply codes with certain characteristics to the transmission
to separate different users in code space and to enable access to a shared medium without
interference.
All terminals send on the same frequency probably at the same time and can use the whole
bandwidth of the transmission channel. Each sender has a unique random number, the sender XORs
the signal with this random number. The receiver can “tune” into this signal if it knows the pseudo
random number, tuning is done via a correlation function
Disadvantages:
higher complexity of a receiver (receiver cannot just listen into the medium and start
receiving if there is a signal)
all signals should have the same strength at a receiver
Advantages:
all terminals can use the same frequency, no planning needed
huge code space (e.g. 232) compared to frequency space
interferences (e.g. white noise) is not coded
forward error correction and encryption can be easily integrated
Mobile station- These are the users .Number of users are controlled by one BTS
1. The mobile stations (MS) communicate with the base station subsystem over the radio the radio interface.
2. The BSS called as radio the subsystem, provides and manages the radio transmission path between the
mobile stations and the Mobile Switching Centre(MSC).It also manages radio interface between the mobile
stations and other subsystems of GSM.
3. Each BSS comprises many Base Station Controllers(BSC) that connect the mobile station to the network
and switching subsystem (NSS) through the mobile switching center
4. The NSS controls the switching functions of the GSM system. It allows the mobile switching center to
communicate with networks like PSTN, ISDN, CSPDN, PSPDN and other data networks.
5. The operation support system (OSS) allows the operation and maintenance of the GSM system. It allows
the system engineers to diagnose, troubleshoot and observe the parameters of the GSM systems. The OSS
subsystem interacts with the other subsystems and is provided for the GSM operating company staff that
provides service facilities for the network.
Base station(BSS)-- The following stations subsystem comprises of two parts:
1. Base Transceiver Station (BTS).
2. Base Station Controller(BSC).
The BSS consists many BSC that connect to a single MSC. Each BSC controls up to several
hundred BTS.
Base Transceiver Station(BTS)-BTS
It has radio transreciever that define a cell and are capable of handling radio link protocols with
MS.
Functions of BTS are
1. Handling radio link protocols
2. Providing FD communication to MS.
3. Interleaving and de- interleaving.
Base station controller(BSC) IT manages radio resources for one or more BTS.It controls several
hundred BTS al are connected to single MSC.
Functions of BTS are
• To control BTS.
• Radio resource management
• Handoff management and control
• Radio channel setup and frequency hoping
Network subsystem( NSS)
1.It handles the switching of GSM calls between external networks and indoor BSC
2.It includes three different data bases for mobility management as
A .HLR (Home Location Register)
B .VLR (Visitor Location Register)
C. AUC (Authentication center)
Mobile switching center (MSC)--
It connects fix networks like ISDN ,PSTN etc.
Following are the functions of MSC
1. Call setup, supervision and relies
2. Collection Of Billing Information
3. Call handling / routing
4. Management of signaling protocol
5. Record of VLR and HLR
HLR (Home Location Register) - Call ramming and call routing capabilities of GSM are
handeled.It stores all the administrative information of sub scriber registered in the networks. IT
maintains unique international mobile subscriber identity.(IMSI).
VLR (Visitor Location Register) - It is a temporary data base. It stores the IMSC number and
customer information for each roaming customer visiting specific MSC.
Authentication center - It is protected database .It maintains authentication keys and algorithms. It
contain s a register called as Equipment Identity Register.
Operation subsystem(OSS) - IT manages all mobile equipment in the system
1)management for charging and billing procedure
2)To maintain all hardware and network operations
AuC:
The AuC database holds different algorithms that are used for authentication and encryptions of the
mobile subscribers that verify the mobile user’s identity and ensure the confidentiality of each call.
The AuC holds the authentication and encryption keys for all the subscribers in both the home and
visitor location register.
EIR:
The EIR is another database that keeps the information about the identity of mobile equipment such
the International mobile Equipment Identity (IMEI) that reveals the details about the manufacturer,
country of production, and device type. This information is used to prevent calls from being
misused, to prevent unauthorized or defective MSs, to report stolen mobile phones or check if the
mobile phone is operating according to the specification of its type.
Mobile station- These are the users .Number of users are controlled by one BTS
1. The mobile stations (MS) communicate with the base station subsystem over the radio the radio interface.
2. The BSS called as radio the subsystem, provides and manages the radio transmission path between the
mobile stations and the Mobile Switching Centre(MSC).It also manages radio interface between the mobile
stations and other subsystems of GSM.
3. Each BSS comprises many Base Station Controllers(BSC) that connect the mobile station to the network
and switching subsystem (NSS) through the mobile switching center
4. The NSS controls the switching functions of the GSM system. It allows the mobile switching center to
communicate with networks like PSTN, ISDN, CSPDN, PSPDN and other data networks.
5. The operation support system (OSS) allows the operation and maintenance of the GSM system. It allows
the system engineers to diagnose, troubleshoot and observe the parameters of the GSM systems. The OSS
subsystem interacts with the other subsystems and is provided for the GSM operating company staff that
provides service facilities for the network.
Base station(BSS)-- The following stations subsystem comprises of two parts:
1. Base Transceiver Station (BTS).
2. Base Station Controller(BSC).
The BSS consists many BSC that connect to a single MSC. Each BSC controls up to several
hundred BTS.
Base Transceiver Station(BTS)-BTS
It has radio transreciever that define a cell and are capable of handling radio link protocols with
MS.
Functions of BTS are
1. Handling radio link protocols
2. Providing FD communication to MS.
3. Interleaving and de- interleaving.
Base station controller(BSC) IT manages radio resources for one or more BTS.It controls several
hundred BTS al are connected to single MSC.
Functions of BTS are
• To control BTS.
• Radio resource management
• Handoff management and control
• Radio channel setup and frequency hoping
Network subsystem( NSS)
1.It handles the switching of GSM calls between external networks and indoor BSC
2.It includes three different data bases for mobility management as
A .HLR (Home Location Register)
B .VLR (Visitor Location Register)
C. AUC (Authentication center)
Mobile switching center (MSC)--
It connects fix networks like ISDN ,PSTN etc.
Following are the functions of MSC
1. Call setup, supervision and relies
2. Collection Of Billing Information
3. Call handling / routing
4. Management of signaling protocol
5. Record of VLR and HLR
HLR (Home Location Register) - Call ramming and call routing capabilities of GSM are
handeled.It stores all the administrative information of sub scriber registered in the networks. IT
maintains unique international mobile subscriber identity.(IMSI).
VLR (Visitor Location Register) - It is a temporary data base. It stores the IMSC number and
customer information for each roaming customer visiting specific MSC.
Authentication center - It is protected database .It maintains authentication keys and algorithms. It
contain s a register called as Equipment Identity Register.
Operation subsystem(OSS) - IT manages all mobile equipment in the system
1)management for charging and billing procedure
2)To maintain all hardware and network operations
AuC:
The AuC database holds different algorithms that are used for authentication and encryptions of the
mobile subscribers that verify the mobile user’s identity and ensure the confidentiality of each call.
The AuC holds the authentication and encryption keys for all the subscribers in both the home and
visitor location register.
EIR:
The EIR is another database that keeps the information about the identity of mobile equipment such
the International mobile Equipment Identity (IMEI) that reveals the details about the manufacturer,
country of production, and device type. This information is used to prevent calls from being
misused, to prevent unauthorized or defective MSs, to report stolen mobile phones or check if the
mobile phone is operating according to the specification of its type
White list:
This list contains the IMEI of the phones who are allowed to enter in the network.
Black list:
This list on the contrary contains the IMEI of the phones who are not allowed
to enter in the network, for example because they are stolen.
Grey list:
This list contains the IMEI of the phones momentarily not allowed to enter in the
network, for example because the software version is too old or because they are in
repair.
Ans:
The information contained in one time The information contained in one time slot on the
TDMA frame is call a slot on the TDMA frame is call a burst.
Five types of burst
1) Normal Burst (NB)
2) Frequency Correction Burst (FB)
3) Synchronization Burst (SB)
4) Access Burst (AB)
5) Dummy Burst
Assignment No.-03 (AY 2018-19 SEM-II)
Unit-V: Current 3G and 4G Technologies for GSM and CDMA &
Unit–VI: Advances in Mobile Technologies
Solution
UMTS HSPA, High Speed Packet Access, combines HSDPA and HSUPA for uplink and
downlink to provide high speed data access.
3G HSPA, High Speed packet Access is the combination of two technologies, one of the
downlink and the other for the uplink that can be built onto the existing 3G UMTS or W-
CDMA technology to provide increased data transfer speeds.
The original 3G UMTS / W-CDMA standard provided a maximum download speed of 384
kbps.
With many users requiring much high data transfer speeds to compete with fixed line
broadband services and also to support services that require higher data rates, the need for an
increase in the speeds obtainable became necessary.
This resulted in the development of the technologies for 3G HSPA.
3G HSPA benefits
The UMTS cellular system as defined under the 3GPP Release 99 standard was orientated
more towards switched circuit operation and was not well suited to packet operation.
Additionally greater speeds were required by users than could be provided with the original
UMTS networks. Accordingly the changes required for HSPA were incorporated into many
UMTS networks to enable them to operate more in the manner required for current
applications.
HSPA provides a number of significant benefits that enable the new service to provide a far
better performance for the user. While 3G UMTS HSPA offers higher data transfer rates, this
is not the only benefit, as the system offers many other improvements as well:
1. Use of higher order modulation: 16QAM is used in the downlink instead of QPSK to
enable data to be transmitted at a higher rate. This provides for maximum data rates of 14
Mbps in the downlink. QPSK is still used in the uplink where data rates of up to 5.8 Mbps are
achieved. The data rates quoted are for raw data rates and do not include reductions in actual
payload data resulting from the protocol overheads.
2. Shorter Transmission Time Interval (TTI): The use of a shorter TTI reduces the round trip time
and enables improvements in adapting to fast channel variations and provides for reductions in latency.
3. Use of shared channel transmission: Sharing the resources enables greater levels of efficiency to be
achieved and integrates with IP and packet data concepts.
4. Use of link adaptation: By adapting the link it is possible to maximize the channel usage.
5. Fast Node B scheduling: The use of fast scheduling with adaptive coding and modulation (only
downlink) enables the system to respond to the varying radio channel and interference conditions and
to accommodate data traffic which tends to be "bursty" in nature.
6. Node B based Hybrid ARQ: This enables 3G HSPA to provide reduced retransmission
round trip times and it adds robustness to the system by allowing soft combining of
retransmissions.
For the network operator, the introduction of 3G HSPA technology brings a cost reduction
per bit carried as well as an increase in system capacity. With the increase in data traffic, and
operators looking to bring in increased revenue from data transmission, this is a particularly
attractive proposition. A further advantage of the introduction of 3G HSPA is that it can
often be rolled out by incorporating a software update into the system. This means its use
brings significant benefits to user and operator alike.
The two technologies were released at different times through 3GPP. They also have
different properties resulting from the different modes of operation that are required. In view
of these facts they were often treated as almost separate entities. Now they are generally
rolled out together. The two technologies are summarised below:
HSDPA - High Speed Downlink Packet Access: HSDPA provides packet data
support, reduced delays, and a peak raw data rate (i.e. over the air) of 14 Mbps. It also
provides around three times the capacity of the 3G UMTS technology defined in
Release 99 of the 3GPP UMTS standard. Read more about High speed downlink
packet access, HSDPA
HSUPA - High Speed Uplink Packet Access: HSUPA provides improved uplink
packet support, reduced delays and a peak raw data rate of 5.74 Mbps. This results
in a capacity increase of around twice that provided by the Release 99 services.
Read more about High speed uplink packet access, HSUPA
The high-level network architecture of LTE is comprised of following three main components:
1. The User Equipment (UE).
2. The Evolved UMTS Terrestrial Radio Access Network (E-UTRAN).
3. The Evolved Packet Core (EPC).
The evolved packet core communicates with packet data networks in the outside world such
as the internet, private corporate networks or the IP multimedia subsystem. The interfaces
between the different parts of the system are denoted Uu, S1 and SGi as shown below:
and just has one component, the evolved base stations, called eNodeB or eNB. Each eNB is a
base station that controls the mobiles in one or more cells. The base station that is communicating
with a mobile is known as its serving eNB.
LTE Mobile communicates with just one base station and one cell at a time and there are
following two main functions supported by eNB:
The eBN sends and receives radio transmissions to all the mobiles using the analogue and
digital signal processing functions of the LTE air interface.
The eNB controls the low-level operation of all its mobiles, by sending them signalling
messages such as handover commands.
Each eBN connects with the EPC by means of the S1 interface and it can also be connected to nearby
base stations by the X2 interface, which is mainly used for signaling and packet forwarding during
handover.
A home eNB (HeNB) is a base station that has been purchased by a user to provide femtocell
coverage within the home. A home eNB belongs to a closed
subscriber group (CSG) and can only be accessed by mobiles with a USIM that also
belongs to the closed subscriber group.
The Evolved Packet Core (EPC) (The core network)
The architecture of Evolved Packet Core (EPC) has been illustrated below. There are few more
components which have not been shown in the diagram to keep it simple. These components are like
the Earthquake and Tsunami Warning System (ETWS), the Equipment Identity Register (EIR) and
Policy Control and Charging Rules Function (PCRF).
Below is a brief description of each of the components shown in the above architecture:
The Home Subscriber Server (HSS) component has been carried forward from UMTS and
GSM and is a central database that contains information about all the network operator's
subscribers.
The Packet Data Network (PDN) Gateway (P-GW) communicates with the outside world ie.
packet data networks PDN, using SGi interface. Each packet data network is identified by an
access point name (APN). The PDN gateway has the same role as the GPRS support node
(GGSN) and the serving GPRS support node (SGSN) with UMTS and GSM.
The serving gateway (S-GW) acts as a router, and forwards data between the base station
and the PDN gateway.
The mobility management entity (MME) controls the high-level operation of the mobile by
means of signalling messages and Home Subscriber Server (HSS).
The Policy Control and Charging Rules Function (PCRF) is a component which is not
shown in the above diagram but it is responsible for policy
control decision-making, as well as for controlling the flow-based charging functionalities in
the Policy Control Enforcement Function (PCEF), which resides in the P-GW.
The interface between the serving and PDN gateways is known as S5/S8. This has two slightly
different implementations, namely S5 if the two devices are in the same network, and S8 if they are
in different networks.
A personal communications service (PCS) is a type of wireless mobile service with advanced
coverage and that delivers services at a more personal level. It generally refers to the modern
mobile communication that boosts the capabilities of conventional cellular networks and fixed-line
telephony networks as well.
PCS is also known as digital cellular.
A PCS works similarly to a cellular network in basic operations, but requires more service provider
infrastructure to cover a wider geographical area. A PCS generally includes the following:
PCS has three broad categories: narrowband, broadband and unlicensed. TDMA, CDMA and GSM,
and 2G, 3G and 4G are some of the common technologies that are used to deliver a PCS.
Q.2 Short note on Cell phone generation-(1G to 5G)?
Ans:
1G - First Generation
This was the first generation of cell phone technology . The very first generation of commercial
cellular network was introduced in the late 70's with fully implemented standards being established
throughout the 80's. It was introduced in 1987 by Telecom (known today as Telstra), Australia
received its first cellular mobile phone network utilising a 1G analog system. 1G is an analog
technology and the phones generally had poor battery life and voice quality was large without much
security, and would sometimes experience dropped calls . These are the analog
telecommunications standards that were introduced in the 1980s and continued until being replaced
by 2G digital telecommunications. The maximum speed of 1G is 2.4 Kbps .
2G - Second Generation
Cell phones received their first major upgrade when they went from 1G to 2G. The main difference
between the two mobile telephone systems (1G and 2G), is that the radio signals used by 1G
network are analog, while 2G networks are digital . Main motive of this generation was to provide
secure and reliable communication channel. It implemented the concept of CDMA and GSM .
Provided small data service like sms and mms. Second generation 2G cellular telecom networks
were commercially launched on the GSM standard in Finland by Radiolinja (now part of Elisa Oyj)
in 1991. 2G capabilities are achieved by allowing multiple users on a single channel via
multiplexing. During 2G Cellular phones are used for data also along with voice. The advance in
technology from 1G to 2G introduced many of the fundamental services that we still use today,
such as SMS, internal roaming , conference calls, call hold and billing based on services e.g.
charges based on long distance calls and real time billing. The max speed of 2G with General
Packet Radio Service ( GPRS ) is 50 Kbps or 1 Mbps with Enhanced Data Rates for GSM
Evolution ( EDGE ). Before making the major leap from 2G to 3G wireless networks, the lesser-
known 2.5G and 2.75G was an interim standard that bridged the gap.
3G - Third Generation
This generation set the standards for most of the wireless technology we have come to know and
love. Web browsing, email, video downloading, picture sharing and other Smartphone technology
were introduced in the third generation. Introduced commercially in 2001, the goals set out for third
generation mobile communication were to facilitate greater voice and data capacity, support a wider
range of applications, and increase data transmission at a lower cost .
The 3G standard utilises a new technology called UMTS as its core network architecture -
Universal Mobile Telecommunications System. This network combines aspects of the 2G network
with some new technology and protocols to deliver a significantly faster data rate. Based on a set of
standards used for mobile devices and mobile telecommunications use services and networks that
comply with the International Mobile Telecommunications-2000 ( IMT-2000 ) specifications by
the International Telecommunication Union. One of requirements set by IMT-2000 was that speed
should be at least 200Kbps to call it as 3G service.
3G has Multimedia services support along with streaming are more popular. In 3G, Universal
access and portability across different device types are made possible (Telephones, PDA's, etc.). 3G
increased the efficiency of frequency spectrum by improving how audio is compressed during a
call, so more simultaneous calls can happen in the same frequency range. The UN's International
Telecommunications Union IMT-2000 standard requires stationary speeds of 2Mbps and mobile
speeds of 384kbps for a "true" 3G. The theoretical max speed for HSPA+ is 21.6 Mbps.
Like 2G, 3G evolved into 3.5G and 3.75G as more features were introduced in order to bring about
4G. A 3G phone cannot communicate through a 4G network , but newer generations of phones are
practically always designed to be backward compatible, so a 4G phone can communicate through a
3G or even 2G network .
4G - Fourth Generation
4G is a very different technology as compared to 3G and was made possible practically only
because of the advancements in the technology in the last 10 years. Its purpose is to provide high
speed , high quality and high capacity to users while improving security and lower the cost of voice
and data services, multimedia and internet over IP. Potential and current applications include
amended mobile web access, IP telephony , gaming services, high-definition mobile TV, video
conferencing, 3D television, and cloud computing.
The key technologies that have made this possible are MIMO (Multiple Input Multiple Output) and
OFDM (Orthogonal Frequency Division Multiplexing). The two important 4G standards are
WiMAX (has now fizzled out) and LTE (has seen widespread deployment). LTE (Long Term
Evolution) is a series of upgrades to existing UMTS technology and will be rolled out on Telstra's
existing 1800MHz frequency band. The max speed of a 4G network when the device is moving is
100 Mbps or 1 Gbps for low mobility communication like when stationary or walking, latency
reduced from around 300ms to less than 100ms, and significantly lower congestion. When 4G first
became available, it was simply a little faster than 3G. 4G is not the same as 4G LTE which is very
close to meeting the criteria of the standards. To download a new game or stream a TV show in HD,
you can do it without buffering .
Newer generations of phones are usually designed to be backward-compatible , so a 4G phone can
communicate through a 3G or even 2G network. All carriers seem to agree that OFDM is one of the
chief indicators that a service can be legitimately marketed as being 4G. OFDM is a type of digital
modulation in which a signal is split into several narrowband channels at different frequencies.
There are a significant amount of infrastructure changes needed to be implemented by service
providers in order to supply because voice calls in GSM , UMTS and CDMA2000 are circuit
switched, so with the adoption of LTE, carriers will have to re-engineer their voice call network.
And again, we have the fractional parts: 4.5G and 4.9G marking the transition of LTE (in the stage
called LTE-Advanced Pro) getting us more MIMO, more D2D on the way to IMT-2020 and the
requirements of 5G .
5G - Fifth Generation
5G is a generation currently under development , that's intended to improve on 4G. 5G promises
significantly faster data rates, higher connection density, much lower latency, among other
improvements. Some of the plans for 5G include device-to-device communication, better battery
consumption, and improved overall wireless coverage. The max speed of 5G is aimed at being as
fast as 35.46 Gbps , which is over 35 times faster than 4G.
Key technologies to look out for: Massive MIMO , Millimeter Wave Mobile Communications etc.
Massive MIMO, milimetre wave, small cells, Li-Fi all the new technologies from the previous
decade could be used to give 10Gb/s to a user, with an unseen low latency, and allow connections
for at least 100 billion devices . Different estimations have been made for the date of commercial
introduction of 5G networks. Next Generation Mobile Networks Alliance feel that 5G should be
rolled out by 2020 to meet business and consumer demands.
Fig:-Types of Handover
Hard handover
The definition of a hard handover or handoff is one where an existing connection must be broken
before the new one is established. One example of hard handover is when frequencies are changed.
As the mobile will normally only be able to transmit on one frequency at a time, the connection
must be broken before it can move to the new channel where the connection is re-established. This
is often termed and inter-frequency hard handover. While this is the most common form of hard
handoff, it is not the only one. It is also possible to have intra-frequency hard handovers where the
frequency channel remains the same.
Although there is generally a short break in transmission, this is normally short enough not to be
noticed by the user.
Soft handover
The new 3G technologies use CDMA where it is possible to have neighbouring cells on the same
frequency and this opens the possibility of having a form of handover or handoff where it is not
necessary to break the connection. This is called soft handover or soft handoff, and it is defined as a
handover where a new connection is established before the old one is released. In UMTS most of
the handovers that are performed are intra-frequency soft handovers.
Voice teleservices
Data bearer services
The features' supplementary services
The MS Functions
The MS also provides the receptor for SMS messages, enabling the user to toggle between the voice
and data use. Moreover, the mobile facilitates access to voice messaging systems. The MS also
provides access to the various data services available in a GSM network. These data services
include:
X.25 packet switching through a synchronous or asynchronous dial-up connection to the
PAD at speeds typically at 9.6 Kbps.
General Packet Radio Services (GPRSs) using either an X.25 or IP based data transfer
method at speeds up to 115 Kbps.
High speed, circuit switched data at speeds up to 64 Kbps.
2. SIM
A subscriber identity module or subscriber identification module (SIM), widely known as a SIM
card, is an integrated circuit that is intended to securely store the international mobile subscriber
identity (IMSI) number and its related key, which are used to identify and authenticate subscribers
on mobile telephony devices (such as mobile phones and computers). It is also possible to store
contact information on many SIM cards. SIM cards are always used on GSM phones; for CDMA
phones, they are only needed for newer LTE-capable handsets. SIM cards can also be used in
satellite phones, smart watches, computers, or cameras.
The SIM circuit is part of the function of a universal integrated circuit card (UICC) physical smart
card, which is usually made of PVC with embedded contacts and semiconductors. SIM cards are
transferable between different mobile devices. The first UICC smart cards were the size of credit
and bank cards; sizes were reduced several times over the years, usually keeping electrical contacts
the same, so that a larger card could be cut down to a smaller size.
A SIM card contains its unique serial number (ICCID), international mobile subscriber identity
(IMSI) number, security authentication and ciphering information, temporary information related to
the local network, a list of the services the user has access to, and two passwords: a personal
identification number (PIN) for ordinary use, and a personal unblocking code (PUC) for PIN
unlocking.
The SIM provides personal mobility so that the user can have access to all subscribed services
irrespective of both the location of the terminal and the use of a specific terminal. You need to
insert the SIM card into another GSM cellular phone to receive calls at that phone, make calls from
that phone, or receive other subscribed services.
3. Base Station
A base station is a fixed point of communication for customer cellular phones on a carrier network.
The base station is connected to an antenna (or multiple antennae) that receives and transmits
the signals in the cellular network to customer phones and cellular devices. That equipment is con-
nected to a mobile switching station that connects cellular calls to the public switched telephone
network (PSTN).
Class Test 2
CLASS TEST- II
(AY 2018-19)
Branch: B.E. Computer Engineering Date: 29 /09/ 2018
Semester: I Duration: 1 hour
Subject: EL-II: Mobile Communication ( 410245) Max. Marks: 20M
Note: 1. Attempt All Questions in Section A. 2. Attempt any 3 Questions in Section B.
3. All questions are as per course outcomes. 4. Assume suitable data wherever is required.
Bloom’s Taxonomy level: Bloom Levels (BL) : 1. Remember 2. Understand 3. Apply 4. Create
Mobile station- These are the users .Number of users are controlled by one BTS
1. The mobile stations (MS) communicate with the base station subsystem over the radio the radio
interface.
2. The BSS called as radio the subsystem, provides and manages the radio transmission path
between the mobile stations and the Mobile Switching Centre(MSC).It also manages radio interface
between the mobile stations and other subsystems of GSM.
3. Each BSS comprises many Base Station Controllers(BSC) that connect the mobile station to the
network and switching subsystem (NSS) through the mobile switching center
4. The NSS controls the switching functions of the GSM system. It allows the mobile switching
center to communicate with networks like PSTN, ISDN, CSPDN, PSPDN and other data networks.
5. The operation support system (OSS) allows the operation and maintenance of the GSM system. It
allows the system engineers to diagnose, troubleshoot and observe the parameters of the GSM
systems. The OSS subsystem interacts with the other subsystems and is provided for the GSM
operating company staff that provides service facilities for the network.
Base station(BSS)-- The following stations subsystem comprises of two parts:
1. Base Transceiver Station (BTS).
2. Base Station Controller(BSC).
The BSS consists many BSC that connect to a single MSC. Each BSC controls up to several
hundred BTS.
Base Transceiver Station(BTS)-BTS
It has radio transreciever that define a cell and are capable of handling radio link protocols with
MS.
Functions of BTS are
1. Handling radio link protocols
2. Providing FD communication to MS.
3. Interleaving and de- interleaving.
Base station controller(BSC) IT manages radio resources for one or more BTS.It controls several
hundred BTS al are connected to single MSC.
Functions of BTS are
• To control BTS.
• Radio resource management
• Handoff management and control
• Radio channel setup and frequency hoping
Network subsystem( NSS)
1.It handles the switching of GSM calls between external networks and indoor BSC
2.It includes three different data bases for mobility management as
A .HLR (Home Location Register)
B .VLR (Visitor Location Register)
C. AUC (Authentication center)
Mobile switching center (MSC)--
It connects fix networks like ISDN ,PSTN etc.
Following are the functions of MSC
1. Call setup, supervision and relies
2. Collection Of Billing Information
3. Call handling / routing
4. Management of signaling protocol
5. Record of VLR and HLR
HLR (Home Location Register) - Call ramming and call routing capabilities of GSM are
handeled.It stores all the administrative information of sub scriber registered in the networks. IT
maintains unique international mobile subscriber identity.(IMSI).
VLR (Visitor Location Register) - It is a temporary data base. It stores the IMSC number and
customer information for each roaming customer visiting specific MSC.
Authentication center - It is protected database .It maintains authentication keys and algorithms. It
contain s a register called as Equipment Identity Register.
Operation subsystem(OSS) - IT manages all mobile equipment in the system
1)management for charging and billing procedure
2)To maintain all hardware and network operations
AuC:
The AuC database holds different algorithms that are used for authentication and encryptions of the
mobile subscribers that verify the mobile user’s identity and ensure the confidentiality of each call.
The AuC holds the authentication and encryption keys for all the subscribers in both the home and
visitor location register.
EIR:
The EIR is another database that keeps the information about the identity of mobile equipment such
the International mobile Equipment Identity (IMEI) that reveals the details about the manufacturer,
country of production, and device type. This information is used to prevent calls from being
misused, to prevent unauthorized or defective MSs, to report stolen mobile phones or check if the
mobile phone is operating according to the specification of its type
Ans:
General Packet Radio Service (GPRS)
MS-Mobile station , BSS- Base station Subsystem, HLR -Home Location Register , VLR-
Visitor Location Register , MSC- Mobile switching center EIR- Equipment Identity Register, PDN –
Packet Data Network SGSN-Serving GPRS Support Node, GGSN-Gateway GPRS Support Node
03. Explain UMTS Architecture with diagram? 05 5 CO-1 2
UMTS/3G
the Universal Mobile Telecommunications System is the third generation (3G) successor to the
second generation GSM based cellular technologies which also include GPRS, and EDGE.
Although UMTS uses a totally different air interface, the core network elements have been
migrating towards the UMTS requirements with the introduction of GPRS and EDGE. In this way
the transition from GSM to the 3G UMTS architecture did not require such a large instantaneous
investment.
UMTS uses Wideband CDMA (WCDMA / W-CDMA) to carry the radio transmissions, and often
the system is referred to by the name WCDMA. It is also gaining a third name.
The scope of 3GPP was to produce globally applicable Technical Specifications and Technical
Reports for a 3rd Generation Mobile Telecommunications System. This would be based upon the
GSM core networks and the radio access technologies that they support (i.e., Universal Terrestrial
Radio Access (UTRA) both Frequency Division Duplex (FDD) and Time Division Duplex (TDD)
modes).
Since it was originally formed, 3GPP has also taken over responsibility for the GSM standards as
well as looking at future developments including LTE (Long Term Evolution) and the 4G
technology known as LTE Advanced.
There are several key areas of 3G UMTS / WCDMA. Within these there are several key
technologies that have been employed to enable UMTS / WCDMA to provide a leap in
performance over its 2G predecessors.
Radio interface: The UMTS radio interface provides the basic definition of the radio
signal. W-CDMA occupies 5 MHz channels and has defined formats for elements such as
synchronization, power control and the like Read more about the UMTS / W-CDMA radio
interface.
CDMA technology: 3G UMTS relies on a scheme known as CDMA or code division
multiple access to enable multiple handsets or user equipments to have access to the base
station. Using a scheme known as direct sequence spread spectrum, different UEs have
different codes and can all talk to the base station even though they are all on the same
frequency Read more about the code division multiple access.
UMTS network architecture: The architecture for a UMTS network was designed to enable
packet data to be carried over the network, whilst still enabling it to support circuit
switched voice. All the usual functions enabling access to the network, roaming and the
like are also supported.
UMTS modulation schemes: Within the CDMA signal format, a variety of forms of
modulation are used. These are typically forms of phase shift keying.
UMTS channels: As with any cellular system, different data channels are required for
passing payload data as well as control information and for enabling the required resources
to be allocated. A variety of different data channels are used to enable these facilities to be
accomplished
UMTS TDD: There are two methods of providing duplex for 3G UMTS. One is what is
termed frequency division duplex, FDD. This uses two channels spaced sufficiently apart
so that the receiver can receive whilst the transmitter is also operating. Another method is
to use time vision duplex, TDD where short time blocks are allocated to transmissions in
both directions. Using this method, only a single channel is required.
Handover: One key area of any cellular telecommunications system is the handover
(handoff) from one cell to the next. Using CDMA there are several forms of handover that
are implemented within the system.
UTRAN interfaces
Serving GPRS Support Node (SGSN): As the name implies, this entity was first developed
when GPRS was introduced, and its use has been carried over into the UMTS network
architecture. The SGSN provides a number of functions within the UMTS network
architecture.
o Mobility management When a UE attaches to the Packet Switched domain of the
UMTS Core Network, the SGSN generates MM information based on the mobile's
current location.
o Session management: The SGSN manages the data sessions providing the required
quality of service and also managing what are termed the PDP (Packet data Protocol)
contexts, i.e. the pipes over which the data is sent.
o Interaction with other areas of the network: The SGSN is able to manage its
elements within the network only by communicating with other areas of the network,
e.g. MSC and other circuit switched areas.
o Billing: The SGSN is also responsible billing. It achieves this by monitoring the flow
of user data across the GPRS network. CDRs (Call Detail Records) are generated by
the SGSN before being transferred to the charging entities (Charging Gateway
Function, CGF).
The UMTS standards are structured in a way that the internal functionality of the different network
elements is not defined. Instead, the interfaces between the network elements is defined and in this
way, so too is the element functionality.
There are several interfaces that are defined for the UTRAN elements:
Iub : The Iub connects the NodeB and the RNC within the UTRAN. Although when it was
launched, a standardization of the interface between the controller and base station in the
UTRAN was revolutionary, the aim was to stimulate competition between suppliers,
allowing opportunities like some manufacturers who might concentrate just on base stations
rather than the controller and other network entities.
Iur : The Iur interface allows communication between different RNCs within the UTRAN.
The open Iur interface enables capabilities like soft handover to occur as well as helping to
stimulate competition between equipment manufacturers.
Iu : The Iu interface connects the UTRAN to the core network.
Having standardised interfaces within various areas of the network including the UTRAN allows
network operators to select different network entities from different suppliers.
Ans:
LTE/4G
The high-level network architecture of LTE is comprised of following three main components:
The User Equipment (UE).
The Evolved UMTS Terrestrial Radio Access Network (E-UTRAN).
The Evolved Packet Core (EPC).
The evolved packet core communicates with packet data networks in the outside world such as the
internet, private corporate networks or the IP multimedia subsystem. The interfaces between the
different parts of the system are denoted Uu, S1 and SGi as shown below:
The E-UTRAN handles the radio communications between the mobile and the evolved packet core
and just has one component, the evolved base stations, called eNodeB or eNB. Each eNB is a base
station that controls the mobiles in one or more cells. The base station that is communicating with
a mobile is known as its serving eNB.
LTE Mobile communicates with just one base station and one cell at a time and there are following
two main functions supported by eNB:
The eBN sends and receives radio transmissions to all the mobiles using the analogue and
digital signal processing functions of the LTE air interface.
The eNB controls the low-level operation of all its mobiles, by sending them signalling
messages such as handover commands.
Each eBN connects with the EPC by means of the S1 interface and it can also be connected to
nearby base stations by the X2 interface, which is mainly used for signaling and packet forwarding
during handover.
A home eNB (HeNB) is a base station that has been purchased by a user to provide femtocell
coverage within the home. A home eNB belongs to a closed subscriber group (CSG) and can only
be accessed by mobiles with a USIM that also belongs to the closed subscriber group.
The Evolved Packet Core (EPC) (The core network)
The architecture of Evolved Packet Core (EPC) has been illustrated below. There are few more
components which have not been shown in the diagram to keep it simple. These components are
like the Earthquake and Tsunami Warning System (ETWS), the Equipment Identity Register (EIR)
and Policy Control and Charging Rules Function (PCRF).
Below is a brief description of each of the components shown in the above architecture:
The Home Subscriber Server (HSS) component has been carried forward from UMTS and
GSM and is a central database that contains information about all the network operator's
subscribers.
The Packet Data Network (PDN) Gateway (P-GW) communicates with the outside world
ie. packet data networks PDN, using SGi interface. Each packet data network is identified
by an access point name (APN). The PDN gateway has the same role as the GPRS support
node (GGSN) and the serving GPRS support node (SGSN) with UMTS and GSM.
The serving gateway (S-GW) acts as a router, and forwards data between the base station
and the PDN gateway.
The mobility management entity (MME) controls the high-level operation of the mobile by
means of signalling messages and Home Subscriber Server (HSS).
The Policy Control and Charging Rules Function (PCRF) is a component which is not
shown in the above diagram but it is responsible for policy control decision-making, as
well as for controlling the flow-based charging functionalities in the Policy Control
Enforcement Function (PCEF), which resides in the P-GW.
The interface between the serving and PDN gateways is known as S5/S8. This has two slightly dif-
ferent implementations, namely S5 if the two devices are in the same network, and S8 if they are in
different networks.
Prelim Exam (AY 2018-19)
Branch: B.E.Computer Date:13/10/2018
Semester: VII Duration: 2:30 hour
Subject: EL-II: Mobile Communication ( 410245) (2015 Pattern) Max. Marks: 70
Note: (1) Answer Q. 1 or Q. 2, Q. 3 or Q. 4, Q. 5 or Q. 6, Q. 7 or Q. 8, Q. 9 or Q. 10.
(2) Figures to the right indicate full marks.
(3) Neat diagrams must be drawn wherever necessary.
(4) Assume suitable data, if necessary
Max. CO Bloom's
Questions Marks mapped Taxonomy
Level
Q.1 a. Draw and explain Frequency Reuse and Co-channel Interference 5 CO-1 2
b. Short Note on
5 CO-2 1
a) GMSK Modulation b) 8PSK
OR
5 CO-2 2
Q.2 a. Define and Explain handoff /handover?
b. Difference Between SDMA, FDMA, TDMA, CDMA 5 CO-1 2
5 CO-3 2
Q.3 a. Difference Between FHSS and DSSS
b. Explain PCS Architecture? 5 CO-1 2
OR
Typical frequency reuse plan for 3 different radio frequencies, based on hexagonal cells. Radio
channels are indicated by color. In fact some problems in cellular frequency assignment are solved
using map coloring theory.
The FCC had the foresight to require:
1. a large subscriber capacity
2. efficient use of spectrum resources
3. nationwide coverage
4. adaptability to traffic density
5. telephone service to both vehicle and portable user terminals
6. telephony but also other services including closed user groups with voice dispatch
operations
7. toll quality
Co-channel cells: Frequency reuse implies that in a given coverage area, there are several cells that
use the same set of frequencies. These cells are called co-channel cells.
Causes:
1. Reduction of D/R ratio, which reduce distance between two co-channels.
2. Use of omnidirectional antennas at the base station.
3. Increasing the antenna height at the base station.
Effects of co-channel interference on system capacity:
The parameter Q, called the co-channel reuse ratio, is related to cluster size N,
Q=D/R=√3N
A small value of Q provides larger capacity since the cluster size N is small, whereas a large value
of Q implies smaller level of co-channel interference. Thus with reduction in cp-channel
interference there will reduction in system capacity.
b. Short Note on
a) GMSK Modulation b) 8PSK
Ans:
a) GMSK Modulation
The Gaussian Minimum Shift Keying (GMSK) modulation is a modified version of the Minimum
Shift Keying (MSK) modulation where the phase is further filtered through a Gaussian filter to
smooth the transitions from one point to the next in the constellation. Next figure presents the
GMSK generation scheme:
where the Gaussian filter adopts the following form in the time domain:
Where λ is a normalization constant to maintain the power and the product BTc is the -3 dB band-
width-symbol time product. The higher this value, the cleaner will be the eye diagram of the signal
but more power will be transmitted on the side lobes of the spectrum. A typical value in communi-
cation applications is BTc=0.3 which is a good compromise between spectral efficiency and Inter-
Symbol interference
In DSSS, which stands for Direct Sequence Spread Spectrum, information bits are modulated by PN
codes(chips). PN codes are Pseudonoise code symbols. This PN codes have short duration compare
to information bits. Here transmitted information over the air occupies more bandwidth compare to
user informaion bits. DSSS is the modulation technique adopted in IEEE 802.11 based WLAN
compliant products. In DSSS systems entire system bandwidth is available for each user all the
time.
Figure-1 depicts DSSS Transmitter and DSSS receiver Block Diagram. PRS stands for Pseudo-
Random Sequence. Refer CCK vs DSSS vs OFDM>> which explains DSSS transmitter and
receiver with signal waveforms.
FHSS-Frequency Hopping Spread Spectrum
In FHSS, which stands for Frequency Hopping Spread Spectrum, RF carrier frequency is changed
according to the Pseudo-random sequence(PRS or PN sequence). This PN sequence is known to
both transmitter and Receiver and hence help demodulate/decode the information. Within one chip
duration, RF frequency does not vary. Based on this fact there are two types of FHSS, fast hopped
FHSS and slow hopped FHSS.
In Fast hopped FHSS, hopping is done at the rate faster than message(information) bit rate. In slow
hopped FHSS, hopping is done at the rate slower than information bit rate.
b. Explain PCS Architecture?
Ans:
A personal communications service (PCS) is a type of wireless mobile service with advanced cover-
age and that delivers services at a more personal level. It generally refers to the modern mobile
communication that boosts the capabilities of conventional cellular networks and fixed-line tele-
phony networks as well.
PCS is also known as digital cellular.
A PCS works similarly to a cellular network in basic operations, but requires more service provider
infrastructure to cover a wider geographical area. A PCS generally includes the following:
Wireless communication (data, voice and video)
Mobile PBX
Paging and texting
Wireless radio
Personal communication networks
Satellite communication systems, etc.
PCS has three broad categories: narrowband, broadband and unlicensed. TDMA, CDMA and GSM,
and 2G, 3G and 4G are some of the common technologies that are used to deliver a PCS.
Q.4 a. Short Note on
1) AuC 2) EIR 3) PSTN
4) Home Location Register 5)Visitor Location Register
Ans:
1) AuC
Authentication Centre (AuC): The AuC is a protected database that contains the secret key also con-
tained in the user's SIM card. It is used for authentication and for ciphering on the radio channel.
2) EIR
Equipment Identity Register (EIR): The EIR is the entity that decides whether a given mobile
equipment may be allowed onto the network. Each mobile equipment has a number known as the
International Mobile Equipment Identity. This number, as mentioned above, is installed in the
equipment and is checked by the network during registration. Dependent upon the information held
in the EIR, the mobile may be allocated one of three states - allowed onto the network, barred
access, or monitored in case its problems.
3) PSTN
PSTN stands for Public Switched Telephone Network, or the traditional circuit-switched telephone
network. This is the system that has been in general use since the late 1800s.
Using underground copper wires, this legacy platform has provided businesses and households alike
with a reliable means to communicate with anyone around the world for generations.
The phones themselves are known by several names, such as PSTN, landlines, Plain Old Telephone
Service (POTS), or fixed-line telephones.
PSTN phones are widely used and generally still accepted as a standard form of communication.
However, they have seen a steady decline over the last decade.
4) Home Location Register
Home Location Register (HLR): This database contains all the administrative information about
each subscriber along with their last known location. In this way, the GSM network is able to route
calls to the relevant base station for the MS. When a user switches on their phone, the phone regis-
ters with the network and from this it is possible to determine which BTS it communicates with so
that incoming calls can be routed appropriately. Even when the phone is not active (but switched
on) it re-registers periodically to ensure that the network (HLR) is aware of its latest position. There
is one HLR per network, although it may be distributed across various sub-centres to for operational
reasons.
The system level objectives of maximum utilization of available radio spectrum in cellular
communication system often translate into maximum number of simultaneous mobile users
served bye the system with acceptable signal quality,which is directly related to minimizing
the transmitted power of each mobile users at all time of its operation.
In IS-95, a slow mobile assisted power control is employed on forward channel therefore
Non coherent detection is employed on reverse channel and hence power control implemen-
tation is a must on the reverse channel.
There are mainly two types of power control mechanism an open loop and close loop.
Because all the voice channel occupy the same frequency and time slot, the received signal
from multiple mobile user located anywhere within the periphery of the serving cell must all
have the same receive signal strength at the base station for detection.
The advantage of implementing strict power control is that the mobile user can operate at
minimum required EbNoEbNofor adequate performance.This increase battery life and re-
duces the size of weight of mobile users phone equipment.
The additional components of the GSM architecture comprise of databases and messaging systems
functions:
Home Location Register (HLR)
Visitor Location Register (VLR)
Equipment Identity Register (EIR)
Authentication Center (AuC)
SMS Serving Center (SMS SC)
Gateway MSC (GMSC)
Chargeback Center (CBC)
Transcoder and Adaptation Unit (TRAU)
The following diagram shows the GSM network along with the added elements:
The MS and the BSS communicate across the Um interface. It is also known as the air interface or
the radio link. The BSS communicates with the Network Service Switching (NSS) center across
the A interface.
Cell : Cell is the basic service area; one BTS covers one cell. Each cell is given a Cell
Global Identity (CGI), a number that uniquely identifies the cell.
Location Area : A group of cells form a Location Area (LA). This is the area that is paged
when a subscriber gets an incoming call. Each LA is assigned a Location Area Identity
(LAI). Each LA is served by one or more BSCs.
MSC/VLR Service Area : The area covered by one MSC is called the MSC/VLR service
area.
PLMN : The area covered by one network operator is called the Public Land Mobile Net-
work (PLMN). A PLMN can contain one or more MSCs.
b. Explain GSM Bursts and GSM Frame.
Ans:
GSM Bursts:
The GSM burst, or transmission can fulfil a variety of functions. Some GSM bursts are used for car-
rying data while others are used for control information. As a result of this a number of different
types of GSM burst are defined.
The burst is the sequence of bits transmitted by the BTS or MS, the timeslot is the discrete period of
real time within which it must arrive in order to be correctly decoded by the receiver:
Normal Burst
The normal burst carries traffic channels and all types of control channels.
Frequency Correction Burst
This burst carries FCCH downlink to correct the frequency of the MS’s local oscillator, effectively
locking it to that of the BTS.
Synchronization Burst
So called because its function is to carry SCH downlink, synchronizing the timing of the MS to that
of the BTS.
Dummy Burst
Used when there is no information to be carried on the unused timeslots of the BCCH Carrier
(downlink only).
Access Burst
This burst is of much shorter duration than the other types. The increased guard period is necessary
because the timing of its transmission is unknown. When this burst is transmitted, the BTS does not
know the location of the MS and therefore the timing of the message from the MS can not be
accurately accounted for. (The Access Burst is uplink only.)
GSM Frame:
GSM data structure is split into slots, frames, multiframes, superframes and hyperframes to give the
required structure and timing to the transmitted data.
The data frames and slots within 2G GSM are organised in a logical manner so that the system
understands when particular types of data are to be transmitted.
Having the GSM frame structure enables the data to be organised in a logical fashion so that the
system is able to handle the data correctly. This includes not only the voice data, but also the
important signalling information as well.
The GSM frame structure provides the basis for the various physical channels used within GSM,
and accordingly it is at the heart of the overall system.
GSM frame structure - the basics
The basic element in the GSM frame structure is the frame itself. This comprises the eight slots,
each used for different users within the TDMA system. As mentioned in another page of the
tutorial, the slots for transmission and reception for a given mobile are offset in time so that the
mobile does not transmit and receive at the same time.
The basic GSM frame defines the structure upon which all the timing and structure of the GSM
messaging and signalling is based. The fundamental unit of time is called a burst period and it lasts
for approximately 0.577 ms (15/26 ms). Eight of these burst periods are grouped into what is known
as a TDMA frame. This lasts for approximately 4.615 ms (i.e.120/26 ms) and it forms the basic unit
for the definition of logical channels. One physical channel is one burst period allocated in each
TDMA frame.
In simplified terms the base station transmits two types of channel, namely traffic and control.
Accordingly the channel structure is organised into two different types of frame, one for the traffic
on the main traffic carrier frequency, and the other for the control on the beacon frequency.
GSM multiframe
The GSM frames are grouped together to form multiframes and in this way it is possible to establish
a time schedule for their operation and the network can be synchronised.
Traffic multiframe: The Traffic Channel frames are organised into multiframes consisting
of 26 bursts and taking 120 ms. In a traffic multiframe, 24 bursts are used for traffic. These
are numbered 0 to 11 and 13 to 24. One of the remaining bursts is then used to accommo-
date the SACCH, the remaining frame remaining free. The actual position used alternates
between position 12 and 25.
Control multiframe: The Control Channel multiframe that comprises 51 bursts and occu-
pies 235.4 ms. This always occurs on the beacon frequency in time slot zero and it may also
occur within slots 2, 4 and 6 of the beacon frequency as well. This multiframe is subdivided
into logical channels which are time-scheduled. These logical channels and functions in-
clude the following:
Frequency correction burst
Synchronisation burst
Broadcast channel (BCH)
Paging and Access Grant Channel (PACCH)
Stand Alone Dedicated Control Channel (SDCCH)
GSM Superframe
Multiframes are then constructed into superframes taking 6.12 seconds. These consist of 51 traffic
multiframes or 26 control multiframes. As the traffic multiframes are 26 bursts long and the control
multiframes are 51 bursts long, the different number of traffic and control multiframes within the
superframe, brings them back into line again taking exactly the same interval.
GSM Hyperframe
Above this 2048 superframes (i.e. 2 to the power 11) are grouped to form one hyperframe which
repeats every 3 hours 28 minutes 53.76 seconds. It is the largest time interval within the GSM frame
structure.
Within the GSM hyperframe there is a counter and every time slot has a unique sequential number
comprising the frame number and time slot number. This is used to maintain synchronisation of the
different scheduled operations with the GSM frame structure. These include functions such as:
Frequency hopping: Frequency hopping is a feature that is optional within the GSM system. It can
help reduce interference and fading issues, but for it to work, the transmitter and receiver must be
synchronised so they hop to the same frequencies at the same time.
Encryption: The encryption process is synchronised over the GSM hyperframe period where a
counter is used and the encryption process will repeat with each hyperframe. However, it is unlikely
that the cellphone conversation will be over 3 hours and accordingly it is unlikely that security will
be compromised as a result.
The slots and frames are handled in a very logical manner to enable the system to expect and accept
the data that needs to be sent. Organising it in this logical fashion enables it to be handled in the
most efficient manner.
The term HSPA actually refers to two specific protocols used in tandem, high speed downlink
packet access (HSDPA) and high speed uplink packet access (HSUPA). HSPA networks offer a
maximum of 14.4 megabytes per second (MBps) of throughput per cell.
An improved version of high speed packet access technology, known as Evolved HSPA, offers 42
Mbps of throughput per cell. By using dual cell deployment and multiple input, multiple outputar-
chitecture, HSPA+ networks can achieve maximum throughput of 168 Mbps overall.
True mobility is almost here, free from the constraints of meager 3G networks! HSPA+ will be de-
buting here in Winnipeg on March 31st. Rogers and MTS Mobility are going to share towers, so all
customers benefit from the joint effort put forth. The new network will have speeds up to 21Mbps
down.
For example, every 26 TDMA frames a logical channel gets bandwidth in a physical channel.
Traffic channel are mainly of two types half rate and full rate traffic channels. There are various
control channels such as BCCH (Broadcast control channel), SCH (synchronous channel), FCCH
( Frequency control channel), DCCH(Dedicated control channel).
All these gsm channels help maintain GSM network and also helps GSM mobile phone connect to
GSM network and maintain the connection and help tear down the connection. Figure below
mention all the channels used in GSM.
Fig. GSM Channels
GPRS attempts to reuse the existing GSM network elements as much as possible, but to effectively
build a packet-based mobile cellular network, some new network elements, interfaces, and proto-
cols for handling packet traffic are required.
Therefore, GPRS requires modifications to numerous GSM network elements as summarized be-
low:
GPRS Mobile Stations
New Mobile Stations (MS) are required to use GPRS services because existing GSM phones do
not handle the enhanced air interface or packet data. A variety of MS can exist, including a high-
speed version of current phones to support high-speed data access, a new PDA device with an
embedded GSM phone, and PC cards for laptop computers. These mobile stations are backward
compatible for making voice calls using GSM.
When either voice or data traffic is originated at the subscriber mobile, it is transported over the air
interface to the BTS, and from the BTS to the BSC in the same way as a standard GSM call.
However, at the output of the BSC, the traffic is separated; voice is sent to the Mobile Switching
Center (MSC) per standard GSM, and data is sent to a new device called the SGSN via the PCU
over a Frame Relay interface.
GPRS Support Nodes
Following two new components, called Gateway GPRS Support Nodes (GSNs) and, Serving
GPRS Support Node (SGSN) are added:
Internal Backbone
The internal backbone is an IP based network used to carry packets between different GSNs.
Tunnelling is used between SGSNs and GGSNs, so the internal backbone does not need any
information about domains outside the GPRS network. Signalling from a GSN to a MSC, HLR or
EIR is done using SS7.
Routing Area
GPRS introduces the concept of a Routing Area. This concept is similar to Location Area in GSM,
except that it generally contains fewer cells. Because routing areas are smaller than location areas,
less radio resources are used While broadcasting a page message.
Circuit Side
For the circuit side, the BSC or RNC connects to the Mobile Switching Center (MSC), which sets
up and tears down the calls, handles text messages (SMS) and tracks users as they move from cell
to cell. When a user arrives within an MSC's jurisdiction, subscriber information is sent from the
Home Location Register (HLR) database to the Visitor Location Register (VLR) within the MSC.
The Gateway MSC (GMSC) connects the MSC to the external circuit-switched networks.
Packet Side
The counterpart to the MSC on the packet side is the Serving GPRS Support Node (SGSN), which
manages the packet connection for the user. The Gateway GPRS Support Node (GGSN) provides
the connection to the external packet networks. The GGSN also receives subscriber information
from the HLR. Contrast with LTE architecture. See UMTS and cellular generations.
2G and 3G Equipment
Voice is handled by the circuit-switched network, and data are handled by the packet-switched side.
b. Short Note on
1) Virtual Reality 2) Augmented Reality
Ans:
1) Virtual Reality
Virtual Reality (VR) is an immersive computer system that mimics the world we see around us.
It can also be used to create imaginary worlds, or in other words it can be used to create immer-
sive games. VR isn’t a new idea, in fact it was first described in the 1930s, and the first VR sys -
tem was built in the late 1960s. Its boom time came in the 1990s with companies like Sega
and Nintendo started developing consumer level VR gaming products. However after a boom,
there is often a bust. And that is what happened to VR. Sega’s product was never release, and
Nintendo’s Virtual Boy was a commercial failure.
Since then very little has happened at a consumer level. The reasons for VRs failures in the
1990s were not only to do with computing power. Think back to the size and design of laptops
and mobile phones in that era. To make VR headsets truly useful the technology in terms of
miniaturization, displays, materials and computing power needed to improve.
After almost 20 years VR is now making a come back. In 2012 Palmer Luckey launched a Kick-
starter campaign for an immersive virtual reality headset for video games. The Oculus
Rift project aimed to raise $250,000, but actually raise $2.4 million.
In late 2013 John Carmack, famous for his 3D game series like Doom and Quake, joined Ocu-
lus. The Oculus Rift is designed to be connected and used with a PC, however Carmack helped
Oculus develop a mobile version in collaboration with Samsung.
The Samsung Gear VR uses a smartphone which is clipped into a headset to create a VR plat-
form. It is an untethered solution which means there are no wires connecting it to a PC or other
computing device. The smartphone’s GPU is used to render the virtual world and the phone’s
display is split in half for the images needed by the left and right eyes. The headset includes the
head-tracking module from the Oculus Rift.
Android
As we can see with the difference between the Oculus Rift and the Gear VR, today’s Virtual Reality
market is split into two segments: tethered and untethered. The advantage of the tethered
approach is that the processing power and the electrical power comes from a PC or console. These
machines have high performance CPUs and GPUs, and don’t need to worry about battery life.
However the disadvantage is that they are generally fixed to one room in your house. The advantage
of untethered VR is that it is truly portable. Wherever you go, your VR headset can go with you. It
also means it has a greater social impact. Although using a VR headset could be considered as anti-
social if used in public, there is the aspect of sharing the VR experience within a group of friends.
For example, the “WOW” factor when the headset is passed from one person to the other.
2) Augmented Reality
Augmented reality is the technology that expands our physical world, adding layers of digital infor-
mation onto it. Unlike Virtual Reality (VR), AR does not create the whole artificial environments to
replace real with a virtual one. AR appears in direct view of an existing environment and
adds sounds, videos, graphics to it.
A view of the physical real-world environment with superimposed computer-generated images, thus
changing the perception of reality, is the AR.
The term itself was coined back in 1990, and one of the first commercial uses were in television and
military. With the rise of the Internet and smartphones, AR rolled out its second wave and nowa-
days is mostly related to the interactive concept. 3D models are directly projected onto physical
things or fused together in real-time, various augmented reality apps impact our habits, social life,
and the entertainment industry.
AR apps typically connect digital animation to a special ‘marker’, or with the help of GPS in
phones pinpoint the location. Augmentation is happening in real time and within the context of the
environment, for example, overlaying scores to a live feed sport events.
markerless AR
marker-based AR
projection-based AR
superimposition-based AR
What is Augmented Reality for many of us implies a technical side, i.e. how does AR work? For
AR a certain range of data (images, animations, videos, 3D models) may be used and people will
see the result in both natural and synthetic light. Also, users are aware of being in the real world
which is advanced by computer vision, unlike in VR.
AR can be displayed on various devices: screens, glasses, handheld devices, mobile phones, head-
mounted displays. It involves technologies like S.L.A.M. (simultaneous localization and
mapping), depth tracking (briefly, a sensor data calculating the distance to the objects), and the
following components:
Cameras and sensors. Collecting data about user’s interactions and sending it for processing.
Cameras on devices are scanning the surroundings and with this info, a device locates physical
objects and generates 3D models. It may be special duty cameras, like in Microsoft Hololens, or
common smartphone cameras to take pictures/videos.
Processing. AR devices eventually should act like little computers, something modern smart-
phones already do. In the same manner, they require a CPU, a GPU, flash memory, RAM, Blue-
tooth/WiFi, a GPS, etc. to be able to measure speed, angle, direction, orientation in space, and
so on.
Projection. This refers to a miniature projector on AR headsets, which takes data from sensors
and projects digital content (result of processing) onto a surface to view. In fact, the use of pro-
jections in AR has not been fully invented yet to use it in commercial products or services.
Reflection. Some AR devices have mirrors to assist human eyes to view virtual images. Some
have an “array of small curved mirrors” and some have a double-sided mirror to reflect light to
a camera and to a user’s eye. The goal of such reflection paths is to perform a proper image
alignment.
Projection-based AR. Projecting synthetic light to physical surfaces, and in some cases allows to
interact with it. These are the holograms we have all seen in sci-fi movies like Star Wars. It detects
user interaction with a projection by its alterations.
Superimposition-based AR. Replaces the original view with an augmented, fully or partially. Ob-
ject recognition plays a key role, without it the whole concept is simply impossible. We’ve all seen
the example of superimposed augmented reality in IKEA Catalog app, that allows users to place vir-
tual items of their furniture catalog in their rooms.
Q.10 a. Explain 4G Architecture with diagram?
Ans:
4G Architecture
1. 4G stands for fourth generation cellular system.
2. 4G is evaluation of 3G to meet the forecasted rising demand.
3. It is an integration of various technologies including GSM,CDMA,GPRS,IMT-2000 ,Wire-
less LAN.
4. Data rate in 4G system will range from 20 to 100 Mbps.
Features:
1. Fully IP based Mobile System.
2. It supports interactive multimedia, voice, streaming video, internet and other broadband ser-
vice.
3. It has better spectrum efficiency.
4. It supports Ad-hoc and multi hop network.
4 G Architecture
1. Figure shows Generic Mobile Communication architecture.
2. 4 G network is an integration of all heterogeneous wireless access networks such as Ad-hoc,
cellular, hotspot and satellite radio component.
3. Technologies used in 4 G are smart antennas for multiple input and multiple output
(MIMO), IPv6, VoIP, OFDM and Software defined radio (SDR) System.
Smart Antennas:
1. Smart Antennas are Transmitting and receiving antennas.
2. It does not require increase power and additional frequency.
IPV6 Technology:
1. 4G uses IPV6 Technology in order to support a large number of wireless enable devices.
2. It enables a number of application with better multicast, security and route optimization ca-
pabilities.
VoIP:
1. It stands for Voice over IP.
2. It allows only packet to be transferred eliminating complexity of 2 protocols over the same
circuit.
OFDM:
1. OFDM stands for Orthogonal Frequency Division Multiplexing.
2. It is currently used as WiMax and WiFi.
SDR:
1. SDR stands for Software Defined Radio.
2. It is the form of open wireless architecture.
Advantages:
1. It provides better spectral efficiency.
2. It has high speed, high capacity and low cost per bit.
Disadvantage:
1. Battery usage is more.
2. Hard to implement.
b. Short Note on
1) HSPA+ 2)HSUPA 3) HSDPA
Ans:
1) HSPA+
HSPA stands for High speed packet access, the standard mainly designed to support high speed
data rate in the uplink and downlink. HSPA falls under categories viz. HSDPA, HSUPA and
HSPA+. All the HSPA standards follow different UMTS releases. HSDPA was UMTS R5 release
and supports about 14Mbps peak data rates. HSUPA was UMTS R6 release and supports about 5.76
Mbps uplink data rates. HSPA+ follows R7-R9 UMTS releases. HSPA supports spectral efficiency
of about 2.9bits/sec/Hz.
HSPA+ uses the same 5MHz band of WCDMA spectrum and hence great ease for operators. In the
same 5MHz HSPA+ tries to increase data rate by using MIMO/higher order modulation techniques.
Hence it roughly achieves 42.2 Mbps data rate and spectral efficiency of about 8.4 bits/sec/Hz.
The biggest achievement here with HSPA+ is that latency is reduced using a concept called CPC
(Continuous Packet Connectivity). HSPA+ supports DC-HSDPA which supports dual cell or dual
carrier concept where carrier aggregation of two nearby adjacent bands of 5MHz each are used for
the same area of the cell to increase the performance.
2)HSUPA
High-Speed Uplink Packet Access (HSUPA) is a 3G mobile telephony protocol in the HSPA fam-
ily. This technology was the second major step in the UMTS evolution process. It was specified and
standardized in 3GPP Release 6 to improve the uplink data rate to 5.76 Mbit/s,[6]extending the ca-
pacity, and reducing latency. Together with additional improvements, this creates opportunities for
a number of new applications including VoIP, uploading pictures, and sending large e-mail mes-
sages.
HSUPA has been superseded by newer technologies further advancing transfer rates. LTE provides
up to 300 Mbit/s for downlink and 75 Mbit/s for uplink. Its evolution LTE Advanced supports max-
imum downlink rates of over 1 Gbit/s.
Technology
Enhanced Uplink adds a new transport channel to WCDMA, called the Enhanced Dedicated
Channel (E-DCH). It also features several improvements similar to those of HSDPA, including
multi-code transmission, shorter transmission time interval enabling faster link adaptation, fast
scheduling, and fast Hybrid Automatic Repeat Request (HARQ) with incremental redundancy
making retransmissions more effective. Similarly to HSDPA, HSUPA uses a "packet scheduler",
but it operates on a "request-grant" principle where the user equipment (UE) requests permission to
send data and the scheduler decides when and how many UEs will be allowed to do so. A request
for transmission contains data about the state of the transmission buffer and the queue at the UE and
its available power margin. However, unlike HSDPA, uplink transmissions are not orthogonal to
each other.
In addition to this "scheduled" mode of transmission, the standards allows a self-initiated
transmission mode from the UEs, denoted "non-scheduled". The non-scheduled mode can, for
example, be used for VoIP services for which even the reduced TTI and the Node B based
scheduler will be unable to provide the very short delay time and constant bandwidth required.
Each MAC-d flow (i.e., QoS flow) is configured to use either scheduled or non-scheduled modes.
The UE adjusts the data rate for scheduled and non-scheduled flows independently. The maximum
data rate of each non-scheduled flow is configured at call setup, and typically not changed
frequently. The power used by the scheduled flows is controlled dynamically by the Node B
through absolute grant (consisting of an actual value) and relative grant (consisting of a single
up/down bit) messages.
At the physical layer, HSUPA introduces new channels E-AGCH (Absolute Grant Channel), E-
RGCH (Relative Grant Channel), F-DPCH (Fractional-DPCH), E-HICH (E-DCH Hybrid ARQ
Indicator Channel), E-DPCCH (E-DCH Dedicated Physical Control Channel), and E-DPDCH (E-
DCH Dedicated Physical Data Channel).
E-DPDCH is used to carry the E-DCH Transport Channel; and E-DPCCH is used to carry the
control information associated with the E-DCH.
3) HSDPA
As mentioned HSDPA is mainly designed for high speed data rates in the downlink mainly for
internet based applications and hence the name High Speed Downlink Packet Access. As mentioned
in UMTS tutorial, UMTS architecture composed of three main parts UE (User Equipment), RAN
(Radio Access Network) and Core Network. In HSDPA changes are incorporated on air interface
side and hence UE and RAN have been modified to take care of higher data rate requirements
compare to its predecessor i.e. UMTS R99. No changes have been done on core network side.
Perhaps the most challenging technical problem facing communications systems engineers is fading
in a mobile environment. The term fading refers to the time variation of received signal power
caused by changes in the transmission medium or path(s). In a fixed environment, fading is affected
by changes in atmospheric conditions, such as rainfall. But in a mobile environment, where one of
the two antennas is moving relative to the other, the relative location of various obstacles changes
over time, creating complex transmission effects.
For example, suppose a ground-reflected wave near the mobile unit is received. Because the
ground-reflected wave has a 180° phase shift after reflection, the ground wave and the line-of-sight
(LOS) wave may tend to cancel, resulting in high signal loss.2 Further, because the mobile antenna
is lower than most human-made structures in the area, multipath interference occurs. These
reflected waves may interfere constructively or destructively at the receiver.
Diffraction occurs at the edge of an impenetrable body that is large compared to the wavelength of
the radio wave. When a radio wave encounters such an edge, waves propagate in different
directions with the edge as the source. Thus, signals can be received even when there is no
unobstructed LOS from the transmitter. If the size of an obstacle is on the order of the wavelength
of the signal or less, scattering occurs. An incoming signal is scattered into several weaker outgoing
signals. At typical cellular microwave frequencies, there are numerous objects, such as lamp posts
and traffic signs, that can cause scattering. Thus, scattering effects are difficult to predict. These
three propagation effects influence system performance in various ways depending on local
conditions and as the mobile unit moves within a cell. If a mobile unit has a clear LOS to the
transmitter, then diffraction and scattering are generally minor effects, although reflection may have
a significant impact. If there is no clear LOS, such as in an urban area at street level, then diffraction
and scattering are the primary means of signal reception.
As just noted, one unwanted effect of multipath propagation is that multiple copies of a signal may
arrive at different phases. If these phases add destructively, the signal level relative to noise
declines, making signal detection at the receiver more difficult. A second phenomenon, of particular
importance for digital transmission, is intersymbol interference (ISI). Consider that we are sending
a narrow pulse at a given frequency across a link between a fixed antenna and a mobile unit. Figure
14.8 shows what the channel may deliver to the receiver if the impulse is sent at two different times.
The upper line shows two pulses at the time of transmission. The lower line shows the resulting
pulses at the receiver. In each case the first received pulse is the desired LOS signal. The magnitude
of that pulse may change because of changes in atmospheric attenuation. Further, as the mobile unit
moves farther away from the fixed antenna, the amount of LOS attenuation increases. But in
addition to this primary pulse, there may be multiple secondary pulses due to reflection, diffraction,
and scattering. Now suppose that this pulse encodes one or more bits of data. In that case, one or
more delayed copies of a pulse may arrive at the same time as the primary pulse for a subsequent
bit. These delayed pulses act as a form of noise to the subsequent primary pulse, making recovery
of the bit information more difficult. As the mobile antenna moves, the location of various obstacles
changes; hence the number, magnitude, and timing of the secondary pulses change. This makes it
difficult to design signal processing techniques that will filter out multipath effects so that the
intended signal is recovered with fidelity.
Types of Fading
Fading effects in a mobile environment can be classified as either fast or slow. At a frequency of
900 MHz, which is typical for mobile cellular applications, a wavelength is 0.33 m. Changes of
amplitude can be as much as 20 or 30 dB over a short distance. This type of rapidly changing fading
phenomenon, known as fast fading, affects not only mobile phones in automobiles, but even a
mobile phone user walking down an urban street. As the mobile user covers distances well in excess
of a wavelength, the urban environment changes, as the user passes buildings of different heights,
vacant lots, intersections, and so forth. Over these longer distances, there is a change in the average
received power level about which the rapid fluctuations occur. This is referred to as slow fading.
Fading effects can also be classified as flat or selective. Flat fading, or nonselective fading, is that
type of fading in which all frequency components of the received signal fluctuate in the same
proportions simultaneously. Selective fading affects unequally the different spectral components of
a radio signal. The term selective fading is usually significant only relative to the bandwidth of the
overall communications channel. If attenuation occurs over a portion of the bandwidth of the signal,
the fading is considered to be selective; nonselective fading implies that the signal bandwidth of
interest is narrower than, and completely covered by, the spectrum affected by the fading.
The efforts to compensate for the errors and distortions introduced by multipath fading fall into
three general categories: forward error correction, adaptive equalization, and diversity techniques.
In the typical mobile wireless environment, techniques from all three categories are combined to
combat the error rates encountered. Forward error correction is applicable in digital transmission
applications: those in which the transmitted signal carries digital data or digitized voice or video
data. Typically in mobile wireless applications, the ratio of total bits sent to data bits sent is between
2 and 3. This may seem an extravagant amount of overhead, in that the capacity of the system is cut
to one-half or one-third of its potential, but the mobile wireless environment is so difficult that such
levels of redundancy are necessary. Chapter 6 discusses forward error correction. Adaptive
equalization can be applied to transmissions that carry analog information (e.g., analog voice or
video) or digital information (e.g., digital data, digitized voice or video) and is used to combat
intersymbol interference. The process of equalization involves some method of gathering the
dispersed symbol energy back together into its original time interval. Equalization is a broad topic;
techniques include the use of so-called lumped analog circuits as well as sophisticated digital signal
processing algorithms. Diversity is based on the fact that individual channels experience
independent fading events. We can therefore compensate for error effects by providing multiple
logical channels in some sense between transmitter and receiver and sending part of the signal over
each channel. This technique does not eliminate errors but it does reduce the error rate, since we
have spread the transmission out to avoid being subjected to the highest error rate that might occur.
The other techniques (equalization, forward error correction) can then cope with the reduced error
rate. Some diversity techniques involve the physical transmission path and are referred to as space
diversity. For example, multiple nearby antennas may be used to receive the message, with the
signals combined in some fashion to reconstruct the most likely transmitted signal. Another
example is the use of collocated multiple directional antennas, each oriented to a different reception
angle with the incoming signals again combined to reconstitute the transmitted signal. More
commonly, the term diversity refers to frequency diversity or time diversity techniques. With
frequency diversity, the signal is spread out over a larger frequency bandwidth or carried on
multiple frequency carriers.
b) Compare Cell Phone Generation- 1G To 5G
Ans:
1G: Voice Only
Cell phones began with 1G in the 1980s. 1G is an analog technology and the phones generally had
poor battery life and voice quality was large without much security, and would sometimes
experience dropped calls. The max speed of 1G is 2.4 Kbps
3G service, also known as third-generation service, is high-speed access to data and voice services,
made possible by the use of a 3G network. A 3G network is a high-speed mobile broadband
network, offering data speeds of at least 144 kilobits per second (Kbps).
For comparison, a dial-up Internet connection on a computer typically offers speeds of about 56
Kbps. If you've ever sat and waited for a Web page to download over a dial-up connection, you
know how slow that is.
3G networks can offer speeds of 3.1 megabits per second (Mbps) or more; that's on par with speeds
offered by cable modems. In day-to-day use, the actual speed of the 3G network will vary. Factors
such as signal strength, your location, and network traffic all come into play.
4G wireless is the term used to describe the fourth-generation of wireless cellular service. 4G is a
big step up from 3G and is up to 10 times faster than 3G service. Sprint was the first carrier to offer
4G speeds in the U.S. beginning in 2009. Now all the carriers offer 4G service in most areas of the
country, although some rural areas still have only the slower 3G coverage.
Why 4G Speed Matters
As smart phones and tablets developed the capability to stream video and music, the need for speed
became critically important. Historically, cellular speeds were much slower than those offered by
high-speed broadband connections to computers. 4G speed compares favorably with some
broadband options and is particularly useful in areas without broadband connections.
4G Technology
While all 4G service is called 4G or 4G LTE, the underlying technology is not the same with every
carrier. Some use WiMax technology for their 4G network, while Verizon Wireless uses a
technology called Long Term Evolution, or LTE.
Sprint says its 4G WiMax network offers download speeds that are ten times faster than a 3G
connection, with speeds that top out at 10 megabits per second.
Ans:
It is a method of transmitting radio signals by rapidly switching a carrier among many frequency
channels, using a pseudorandom sequence known to both transmitter and receiver. It is used as
a multiple access method in the code division multiple access (CDMA) scheme frequency-hopping
code division multiple access (FH-CDMA).
Each available frequency band is divided into sub-frequencies. Signals rapidly change ("hop")
among these in a predetermined order. Interference at a specific frequency will only affect the
signal during that short interval. FHSS can, however, because interference with adjacent direct-
sequence spread spectrum (DSSS) systems.
Fig. FHSS
In FHSS, the transmitter hops between available narrowband frequencies within a specified
broad channel in a pseudo-random sequence known to both sender and receiver. A short burst of
data is transmitted on the current narrowband channel, then transmitter and receiver tune to the
next frequency in the sequence for the next burst of data. In most systems, the transmitter will hop
to a new frequency more than twice per second. Because no channel is used for long, and the odds
of any other transmitter being on the same channel at the same time are low, FHSS is often used as
a method to allow multiple transmitter and receiver pairs to operate in the same space on the same
broad channel at the same time.
Modulation
The frequency of a Radio frequency channel can be explained best as the frequency of a carrier
wave. A carrier wave is purely made up of constant frequency, bit similar to sine wave. It doesn’t
carry much information that we can relate to data or speech. The concepts of Amplitude
Modulation.
To involve data information or speech information, another wave has to be imposed known as input
signal above the carrier wave. This process of imposing an input signal on a carrier wave is known
as modulation. Put differently; modulation modifies the shape of a carrier wave to encode the data
information that we intended in carrying. Modulation is similar to hiding a code in the carrier wave.
Types of Modulation
Frequency modulation
Amplitude modulation
Phase modulation.
Amplitude modulation
A kind of modulation where the amplitude of the carrier signal is changed in proportion to the
message signal while the phase and frequency are kept constant.
Phase modulation
A kind of modulation where the phase of the carrier signal is altered according to the low frequency
of the message signal is called as phase modulation.
Frequency modulation
A kind of modulation where the frequency of the carrier signal is altered in proportion to the
message signal while the phase and amplitude are kept constant.
Modulation mechanisms can also be digital or analog. An analog modulation scheme has an input
wave that changes like a sine wave continuously, but it’s a bit more complicated when it comes to
digital. The voice sample is considered at some rate and then compressed into a bit (stream of zeros
and ones). This, in turn, is made into a specific type of wave that is superimposed on the carrier.
Demodulation
Demodulation is defined as extracting the original information carrying signal from a modulated
carrier wave. A demodulator is an electronic circuit that is mainly used to recover the information
content from the modulated carrier wave. There are different types of modulation and so are
demodulators. The output signal via demodulator may describe the sound, images or binary data.
Even though there are different methods for modulation and demodulation processes, each has its
own advantages and disadvantages. For example, AM is used in shortwave and radio wave
broadcasting; FM is mostly used in high-frequency radio broadcasting, and pulse modulation is
known for digital signal modulation
A PCS works similarly to a cellular network in basic operations, but requires more service provider
infrastructure to cover a wider geographical area. A PCS generally includes the following:
Wireless communication (data, voice and video)
Mobile PBX
Paging and texting
Wireless radio
Personal communication networks
Satellite communication systems, etc.
PCS has three broad categories: narrowband, broadband and unlicensed. TDMA, CDMA and GSM,
and 2G, 3G and 4G are some of the common technologies that are used to deliver a PCS.
Cell splitting is the process of subdividing a congested cell into smaller cells such that each
smaller cell has its own base station with Reduced antenna height and Reduced transmitter power. It
increases the capacity of a cellular system since number of times channels are reused increases.
Cell Sectorization. One way to increase to subscriber capacity of a cellular network is replace the
omni-directional antenna at each base station by three (or six) sector antennas of 120 (or 60)
degrees opening.
Both mitosis and meiosis, as already mentioned (two different types) could be used. More correctly,
I'd say Cytokines is, the process after mitosis, where the cell is actually split.
The concept of Cell Splitting is quite self explanatory by its name itself. Cell splitting means to split
up cells into smaller cells. The process of cell splitting is used to expand the capacity (number of
channels) of a mobile communication system. As a network grows, a quite large number of mobile
users in an area come into picture. Consider the following scenario.
There are 100 people in a specific area. All of them owns a mobile phone (MS) and are quite
comfortable to communicate with each other. So, a provision for all of them to mutually
communicate must be made. As there are only 100 users, a single base station (BS) is built in the
middle of the area and all these users’ MS are connected to it. All these 100 users now come under
the coverage area of a single base station. This coverage area is called a cell. This is shown in Fig .
But now, as time passed by, the number of mobile users in the same area increased from 100 to 700.
Now if the same BS has to connect to these 700 users’ MS, obviously the BS will be overloaded. A
single BS, which served for 100 users is forced to serve for 700 users, which is impractical. To
reduce the load of this BS, we can use cell splitting. That is, we will divide the above single cell
into 7 separate adjacent cells, each having its own BS. This is shown in Fig .
Now, let us look into the big picture. Until now, we have discussed about cell splitting in a
small area. Now, we use this same concept to deal with large networks. In a large network, it is not
necessary to split up all the cells in all the clusters. Certain BSes can handle the traffic well if their
cells (coverage areas) are split up. Only those cells must be ideal for cell splitting. Fig 2-3 shows
network architecture with a few number of cells split up into smaller cells, without affecting the
other cells in the network.
(DSSS) is a related technique. It spreads a signal across a wide channel, but it does so all at once
instead of in discrete bursts separated by hops. It can achieve higher throughput, but DSSS is more
susceptible to interference and less effective as a spectrum-sharing method.
Ans:
Co-channel interference or CCI is crosstalk from two different radio transmitters using the
same channel. Co-channel interference can be caused by many factors from weather conditions to
administrative and design issues. Co-channel interference may be controlled by various radio
resource management schemes.
How could we reduce co-channel interference:-
Different cellular standards handle hand over / handoff in slightly different ways. Therefore for the
sake of an explanation the example of the way that GSM handles handover is given.
There are a number of parameters that need to be known to determine whether a handover is
required. The signal strength of the base station with which communication is being made, along
with the signal strengths of the surrounding stations. Additionally the availability of channels also
needs to be known. The mobile is obviously best suited to monitor the strength of the base stations,
but only the cellular network knows the status of channel availability and the network makes the
decision about when the handover is to take place and to which channel of which cell.
Types of handover / handoff
With the advent of CDMA systems where the same channels can be used by several mobiles, and
where it is possible to adjacent cells or cell sectors to use the same frequency channel there are a
number of different types of handover that can be performed:
Hard handover
The definition of a hard handover or handoff is one where an existing connection must be broken
before the new one is established. One example of hard handover is when frequencies are changed.
As the mobile will normally only be able to transmit on one frequency at a time, the connection
must be broken before it can move to the new channel where the connection is re-established. This
is often termed and inter-frequency hard handover. While this is the most common form of hard
handoff, it is not the only one. It is also possible to have intra-frequency hard handovers where the
frequency channel remains the same.
Although there is generally a short break in transmission, this is normally short enough not to be
noticed by the user.
Soft handover
The new 3G technologies use CDMA where it is possible to have neighboring cells on the same
frequency and this opens the possibility of having a form of handover or handoff where it is not
necessary to break the connection. This is called soft handover or soft handoff, and it is defined as a
handover where a new connection is established before the old one is released. In UMTS most of
the handovers that are performed are intra-frequency soft handovers.
Frequency Division Multiple Access is a method employed to permit several users to transmit
simultaneously on one satellite transponder by assigning a specific frequency within the channel to
each user. Each conversation gets its own, unique, radio channel. The channels are relatively
narrow, usually 30 KHz or less and are defined as either transmit or receive channels. A full duplex
conversation requires a transmit& receive channel pair. FDM is often used for simultaneous access
to the medium by base station and mobile station in cellular networks establishing a duplex channel.
A scheme called frequency division duplexing (FDD) in which the two directions, mobile station to
base station and vice versa are now separated using different frequencies.
Time division multiplexing (TDM)
A more flexible multiplexing scheme for typical mobile communications is time division
multiplexing (TDM). Compared to FDMA, time division multiple access (TDMA) offers a
much more flexible scheme, which comprises all technologies that allocate certain time slots
for communication. Now synchronization between sender and receiver has to be achieved in
the time domain. Again this can be done by using a fixed pattern similar to FDMA
techniques, i.e., allocating a certain time slot for a channel, or by using a dynamic allocation
scheme.
Listening to different frequencies at the same time is quite difficult, but listening to many
channels separated in time at the same frequency is simple. Fixed schemes do not need
identification, but are not as flexible considering varying bandwidth requirements.
c) Mobile Switching Center: A mobile switching center (MSC) is the centerpiece of a network
switching subsystem (NSS). The MSC is mostly associated with communications switching
functions, such as call set-up, release, and routing. However, it also performs a host of other duties,
including routing SMS messages, conference calls, fax, and service billing as well as interfacing
with other networks, such as the public switched telephone network (PSTN).
d) Base station: -A base station is a fixed point of communication for customer cellular phones on
a network. The base station is connected to an antenna (or multiple antennae) that receives and
transmits the signals in the cellular network to customer phones and cellular devices. That
equipment is connected to a mobile switching station that connects cellular calls to the public
switched telephone network (PSTN).
e) Mobile Station: The MS includes radio equipment and the man machine interface (MMI) that a
subscriber needs in order to access the services provided by the GSM PLMN. MSs can be installed
in vehicles or can be portable or handheld stations. The MS may include provisions for data
communication as well as voice. A mobile transmits and receives messages to and from the GSM
system over the air interface to establish and continue connections through the system.
Functions of MS
The primary functions of MS are to transmit and receive voice and data over the air interface of the
GSM system. MS performs the signal processing functions of digitizing, encoding, error protecting,
encrypting, and modulating the transmitted signals. It also performs the inverse functions on the
received signals from the BS.
functions includes the following.
Voice and data transmission;
Frequency and time synchronization;
Monitoring of power and signal quality of the surrounding cells for optimum handover;
Provision of location updates;
Equalization of multipath distortions.
University Endsem Question Paper
MOC MODEL ANSWER DECEMBER– 2018
Q.1 a) What frequencies reuse and give its frequency reuse factors?
Ans:
Frequency Reuse
Frequency reuse is the process of using the same radio frequencies on radio transmitter sites within
a geographic area that are separated by sufficient distance to cause minimal interference with each
other. Frequency reuse allows for a dramatic increase in the number of customers that can be served
(capacity) within a geographic area on a limited amount of radio spectrum (limited number of radio
channels). The ability to reuse frequencies depends on various factors that include the ability of
channels to operate in with interference signal energy attenuation between the transmitters.
The IS-95 CDMA and CDMA radio channels use coded channels that are uniquely assigned to each
user. This allows many users to operate on the same frequency. This also allows frequencies to be
reused in every cell site and sectors within a cell site. However, the use of the same frequency in the
same cell site and sector increases the interference levels and decreases the capacity of the radio
channels.
This diagram shows the process of cell splitting that is used to expand the capacity (number of
channels) of a mobile communication system. In this example, the radio coverage area of large cells
sites are split by adjusting the power level and/or using reduced antenna height to cover a reduced
area. Reducing the radio coverage area of a cell site by changing the RF boundaries of a cell site has
the same effect as placing cells farther apart, and allows new cell sites to be added.
Q.2 a) Explain PCS Architecture in details.
Ans:
A personal communications service (PCS) is a type of wireless mobile service with advanced
coverage and that delivers services at a more personal level. It generally refers to the modern
mobile communication that boosts the capabilities of conventional cellular networks and fixed-line
telephony networks as well.
PCS is also known as digital cellular.
A PCS works similarly to a cellular network in basic operations, but requires more service provider
infrastructure to cover a wider geographical area. A PCS generally includes the following:
PCS has three broad categories: narrowband, broadband and unlicensed. TDMA, CDMA and GSM,
and 2G, 3G and 4G are some of the common technologies that are used to deliver a PCS.
For example, suppose a ground-reflected wave near the mobile unit is received. Because the
ground-reflected wave has a 180° phase shift after reflection, the ground wave and the line-of-sight
(LOS) wave may tend to cancel, resulting in high signal loss.2 Further, because the mobile antenna
is lower than most human-made structures in the area, multipath interference occurs. These
reflected waves may interfere constructively or destructively at the receiver.
Diffraction occurs at the edge of an impenetrable body that is large compared to the wavelength of
the radio wave. When a radio wave encounters such an edge, waves propagate in different
directions with the edge as the source. Thus, signals can be received even when there is no
unobstructed LOS from the transmitter. If the size of an obstacle is on the order of the wavelength
of the signal or less, scattering occurs. An incoming signal is scattered into several weaker outgoing
signals. At typical cellular microwave frequencies, there are numerous objects, such as lamp posts
and traffic signs, that can cause scattering. Thus, scattering effects are difficult to predict. These
three propagation effects influence system performance in various ways depending on local
conditions and as the mobile unit moves within a cell. If a mobile unit has a clear LOS to the
transmitter, then diffraction and scattering are generally minor effects, although reflection may have
a significant impact. If there is no clear LOS, such as in an urban area at street level, then diffraction
and scattering are the primary means of signal reception.
As just noted, one unwanted effect of multipath propagation is that multiple copies of a signal may
arrive at different phases. If these phases add destructively, the signal level relative to noise
declines, making signal detection at the receiver more difficult. A second phenomenon, of particular
importance for digital transmission, is intersymbol interference (ISI). Consider that we are sending
a narrow pulse at a given frequency across a link between a fixed antenna and a mobile unit. Figure
14.8 shows what the channel may deliver to the receiver if the impulse is sent at two different times.
The upper line shows two pulses at the time of transmission. The lower line shows the resulting
pulses at the receiver. In each case the first received pulse is the desired LOS signal. The magnitude
of that pulse may change because of changes in atmospheric attenuation. Further, as the mobile unit
moves farther away from the fixed antenna, the amount of LOS attenuation increases. But in
addition to this primary pulse, there may be multiple secondary pulses due to reflection, diffraction,
and scattering. Now suppose that this pulse encodes one or more bits of data. In that case, one or
more delayed copies of a pulse may arrive at the same time as the primary pulse for a subsequent
bit. These delayed pulses act as a form of noise to the subsequent primary pulse, making recovery
of the bit information more difficult. As the mobile antenna moves, the location of various obstacles
changes; hence the number, magnitude, and timing of the secondary pulses change. This makes it
difficult to design signal processing techniques that will filter out multipath effects so that the
intended signal is recovered with fidelity.
MAC address is suitable when multiple devices are connected with same physical link then to
prevent from collisions system uniquely identify the devices one another at the data link layer, by
using the MAC addresses that are assigned to all ports on a switch. The MAC sublayer uses MAC
protocols to prevent collisions and MAC protocols uses MAC algorithm that accepts as input a
secret key and an arbitrary-length message to be authenticated, and outputs a MAC address.
where the Gaussian filter adopts the following form in the time domain:
Where λ is a normalization constant to maintain the power and the product BTc is the -3 dB band-
width-symbol time product. The higher this value, the cleaner will be the eye diagram of the signal
but more power will be transmitted on the side lobes of the spectrum. A typical value in communi-
cation applications is BTc=0.3 which is a good compromise between spectral efficiency and Inter-
Symbol interference.
SIM:
A subscriber identity module or subscriber identification module (SIM), widely known as a SIM
card, is an integrated circuit that is intended to securely store the international mobile subscriber
identity (IMSI) number and its related key, which are used to identify and authenticate subscribers
on mobile telephony devices (such as mobile phones and computers). It is also possible to store
contact information on many SIM cards. SIM cards are always used on GSM phones; for CDMA
phones, they are only needed for newer LTE-capable handsets. SIM cards can also be used in
satellite phones, smart watches, computers, or cameras.
The SIM circuit is part of the function of a universal integrated circuit card (UICC) physical smart
card, which is usually made of PVC with embedded contacts and semiconductors. SIM cards are
transferable between different mobile devices. The first UICC smart cards were the size of credit
and bank cards; sizes were reduced several times over the years, usually keeping electrical contacts
the same, so that a larger card could be cut down to a smaller size.
A SIM card contains its unique serial number (ICCID), international mobile subscriber identity
(IMSI) number, security authentication and ciphering information, temporary information related to
the local network, a list of the services the user has access to, and two passwords: a personal
identification number (PIN) for ordinary use, and a personal unblocking code (PUC) for PIN
unlocking.
The SIM provides personal mobility so that the user can have access to all subscribed services irre-
spective of both the location of the terminal and the use of a specific terminal. You need to insert
the SIM card into another GSM cellular phone to receive calls at that phone, make calls from that
phone, or receive other subscribed services.
The permanent data associated with the mobile are those that do not change as it moves
from one area to another. On the other hand, temporary data changes from call to call. The HLR
interacts with MSCs mainly for the procedures of interrogation for routing calls to a MS and to
transfer charging information after call termination. Location registration is performed by HLR.
When the subscriber changes the VLR area, the HLR is informed about the address of the actual
VLR. The HLR updates the new VLR with all relevant subscriber data. Similarly, location
canceling is done by HLR. After the subscriber roams to a different VLR area, the HLR updates the
new VLR with all the relevant subscriber data. Supplementary services are add-ons to the basic
service. These parameters need not all be stored in the HLR. However, it is safer to store all
subscription parameters in the HLR even when some are stored in a subscriber card. The data stored
in the HLR is changed only by MMI action when new subscribers are added, old subscribers are
deleted, or the specific services to which they subscribe are changed and not dynamically updated
by the system.
b) VLR (Visitor location register):- A MS roaming in an MSC area is controlled by the VLR
responsible for that area. When a MS appears in a LA, it starts a registration procedure. The MSC
for that area notices this registration and transfers to the VLR the identity of the LA where the MS
is situated. A VLR may be in charge of one or several MSC LAs. The VLR constitutes the database
that supports the MSC in the storage and retrieval of the data of subscribers present in its area.
When an MS enters the MSC area borders, it signals its arrival to the MSC that stores its identity in
the VLR. The information necessary to manage the MS is contained in the HLR and is transferred
to the VLR so that they can be easily retrieved if so required.
The location registration procedure allows the subscriber data to follow the movements of the MS.
For such reasons the data contained in the VLR and in the HLR are more or less the same.
Nevertheless, the data are present in the VLR only as long as the MS is registered in the area related
to that VLR. The terms permanent and temporary, in this case, are meaningful only during that time
interval when the mobile is in the area of local MSCNLR combination. The data contained in the
VLR can be compared with the subscriber-related data contained in a normal fixed exchange; the
location information can be compared with the line equipment reference attached to each fixed
subscriber connected to that exchange. The VLR is responsible for assigning a new TMSI number
to the subscriber. It also relays the ciphering key from HLR to BSS.
Cells in the PLMN are grouped into geographic areas, and each is assigned a LAI, as shown in
Figure 2.2(c). Each VLR controls a certain set of LAs. When a mobile subscriber roams from one
LA to another, their current location is automatically updated in their VLR. If the old and new LAs
are under the control of two different VLRs, the entry on the old VLR is deleted and an entry is
created in the new VLR by copying the basic data from the HLR. The subscriber's current VLR
address, stored at the HLR, is also updated. This provides the information necessary to complete
calls to roaming mobiles. The VLR supports a mobile paging and tracking subsystem in the local
area where the mobile is presently roaming. The detailed functions of VLR are as follows.
Works with the HLR and AUC on authentication;
Relays cipher key from HLR to BSS for encryption decryption;
Controls allocation of new TMSI numbers; a subscriber's TMSI number can be periodically
changed to secure a subscriber's identity;
Supports paging;
Tracks state of all MSs in its area.
Mobile station- These are the users .Number of users are controlled by one BTS
1. The mobile stations (MS) communicate with the base station subsystem over the radio the radio interface.
2. The BSS called as radio the subsystem, provides and manages the radio transmission path between the
mobile stations and the Mobile Switching Centre(MSC).It also manages radio interface between the mobile
stations and other subsystems of GSM.
3. Each BSS comprises many Base Station Controllers(BSC) that connect the mobile station to the network
and switching subsystem (NSS) through the mobile switching center
4. The NSS controls the switching functions of the GSM system. It allows the mobile switching center to
communicate with networks like PSTN, ISDN, CSPDN, PSPDN and other data networks.
5. The operation support system (OSS) allows the operation and maintenance of the GSM system. It allows
the system engineers to diagnose, troubleshoot and observe the parameters of the GSM systems. The OSS
subsystem interacts with the other subsystems and is provided for the GSM operating company staff that
provides service facilities for the network.
Base station(BSS)-- The following stations subsystem comprises of two parts:
1. Base Transceiver Station (BTS).
2. Base Station Controller(BSC).
The BSS consists many BSC that connect to a single MSC. Each BSC controls up to several
hundred BTS.
Base Transceiver Station(BTS)-BTS
It has radio transreciever that define a cell and are capable of handling radio link protocols with
MS.
Functions of BTS are
1. Handling radio link protocols
2. Providing FD communication to MS.
3. Interleaving and de- interleaving.
Base station controller(BSC) IT manages radio resources for one or more BTS.It controls several
hundred BTS al are connected to single MSC.
Functions of BTS are
• To control BTS.
• Radio resource management
• Handoff management and control
• Radio channel setup and frequency hoping
Network subsystem( NSS)
1.It handles the switching of GSM calls between external networks and indoor BSC
2.It includes three different data bases for mobility management as
A .HLR (Home Location Register)
B .VLR (Visitor Location Register)
C. AUC (Authentication center)
Mobile switching center (MSC)--
It connects fix networks like ISDN ,PSTN etc.
Following are the functions of MSC
1. Call setup, supervision and relies
2. Collection Of Billing Information
3. Call handling / routing
4. Management of signaling protocol
5. Record of VLR and HLR
HLR (Home Location Register) - Call ramming and call routing capabilities of GSM are
handeled.It stores all the administrative information of sub scriber registered in the networks. IT
maintains unique international mobile subscriber identity.(IMSI).
VLR (Visitor Location Register) - It is a temporary data base. It stores the IMSC number and
customer information for each roaming customer visiting specific MSC.
Authentication center - It is protected database .It maintains authentication keys and algorithms. It
contain s a register called as Equipment Identity Register.
Operation subsystem(OSS) - IT manages all mobile equipment in the system
1)management for charging and billing procedure
2)To maintain all hardware and network operations
AuC:
The AuC database holds different algorithms that are used for authentication and encryptions of the
mobile subscribers that verify the mobile user’s identity and ensure the confidentiality of each call.
The AuC holds the authentication and encryption keys for all the subscribers in both the home and
visitor location register.
EIR:
The EIR is another database that keeps the information about the identity of mobile equipment such
the International mobile Equipment Identity (IMEI) that reveals the details about the manufacturer,
country of production, and device type. This information is used to prevent calls from being
misused, to prevent unauthorized or defective MSs, to report stolen mobile phones or check if the
mobile phone is operating according to the specification of its type.
GSM treats the users and the equipment in different ways. Phone numbers, subscribers, and
equipment identifiers are some of the known ones. There are many other identifiers that have been
well-defined, which are required for the subscriber’s mobility management and for addressing the
remaining network elements. Vital addresses and identifiers that are used in GSM are addressed
below.
International Mobile Station Equipment Identity (IMEI)
The International Mobile Station Equipment Identity (IMEI) looks more like a serial number which
distinctively identifies a mobile station internationally. This is allocated by the equipment
manufacturer and registered by the network operator, who stores it in the Equipment Identity
Register (EIR). By means of IMEI, one recognizes obsolete, stolen, or non-functional equipment.
Following are the parts of IMEI:
Type Approval Code (TAC) : 6 decimal places, centrally assigned.
Final Assembly Code (FAC) : 6 decimal places, assigned by the manufacturer.
Serial Number (SNR) : 6 decimal places, assigned by the manufacturer.
Spare (SP) : 1 decimal place.
Thus, IMEI = TAC + FAC + SNR + SP. It uniquely characterizes a mobile station and gives clues
about the manufacturer and the date of manufacturing.
GPRS attempts to reuse the existing GSM network elements as much as possible, but to effectively
build a packet-based mobile cellular network, some new network elements, interfaces, and proto-
cols for handling packet traffic are required.
Therefore, GPRS requires modifications to numerous GSM network elements as summarized be-
low:
GPRS Mobile Stations
New Mobile Stations (MS) are required to use GPRS services because existing GSM phones do
not handle the enhanced air interface or packet data. A variety of MS can exist, including a high-
speed version of current phones to support high-speed data access, a new PDA device with an
embedded GSM phone, and PC cards for laptop computers. These mobile stations are backward
compatible for making voice calls using GSM.
When either voice or data traffic is originated at the subscriber mobile, it is transported over the air
interface to the BTS, and from the BTS to the BSC in the same way as a standard GSM call.
However, at the output of the BSC, the traffic is separated; voice is sent to the Mobile Switching
Center (MSC) per standard GSM, and data is sent to a new device called the SGSN via the PCU
over a Frame Relay interface.
GPRS Support Nodes
Following two new components, called Gateway GPRS Support Nodes (GSNs) and, Serving
GPRS Support Node (SGSN) are added:
Internal Backbone
The internal backbone is an IP based network used to carry packets between different GSNs.
Tunnelling is used between SGSNs and GGSNs, so the internal backbone does not need any
information about domains outside the GPRS network. Signalling from a GSN to a MSC, HLR or
EIR is done using SS7.
Routing Area
GPRS introduces the concept of a Routing Area. This concept is similar to Location Area in GSM,
except that it generally contains fewer cells. Because routing areas are smaller than location areas,
less radio resources are used While broadcasting a page message.
Signaling channels
The signaling channels on the air interface are used for call establishment, paging, call maintenance,
synchronization, etc.
There are three type of signaling channels
1. Broadcast Channels
2. Common Control Channels
3. Dedicated Control Channel
Broadcast Channels (BCH)
Carry only downlink information and are responsible mainly for synchronization and frequency
correction. This is the only channel type enabling point-to-multipoint communications in which
short messages are simultaneously transmitted to several mobiles
BCH Characteristics
• Each cell has a designated BCH carrier
• All BCH timeslots transmit continuously on full power
• TS 0 contains logical control channels
• TS1-7 optionally carries traffic
• BCCH block occur once each 51-frame multiframe
• Each block comprises 4 frames carrying 1 message
The BCHs include the following channels;
1. Broadcast Control Channel (BCCH): General information, cell specific (local area code
(LAC), network operator, access parameters, list of neighboring cells, etc). The MS receives
signals via the BCCH from many BTSs within the same network and/or different networks.
2. Frequency Correction Channel (FCCH): Downlink only; correction of MS frequencies;
transmission of frequency standard to MS; it is also used for synchronization of an
acquisition by providing the boundaries between timeslots and the position of the first
timeslot of a TDMA frame.
3. Synchronization Channel (SCH): Downlink only; frame synchronization (TDMA frame
number) and identification of base station. The valid reception of one SCH burst will
provide the MS with all the information needed to synchronize with a BTS
Common Control Channels (CCCH)
A group of uplink and downlink channels between the MS and the BTS. These channels are used to
convey information from the network to MSs and provide access to the network. The CCCHs
include the following channels;
1. Paging Channel (PCH): Downlink only; the MS is informed by the BTS for incoming calls via
the PCH
2. Access Grant Channel (AGCH): Downlink only, BTS allocates a TCH or SDCCH to the MS,
thus allowing the MS access to the network.
3. Random Access Channel (RACH): Uplink only, allows the MS to request an SDCCH in
response to a page or due to a call; the MS chooses a random time to send on this channel. This
creates a possibility of collisions with transmissions from other MSs
Dedicated Control Channels (DCCH)
Responsible for roaming, handovers, encryption, etc. The DCCHs include the following channels;
1. Stand-alone Dedicated Control Channel (SDCCH); Communications channel between
MS and the BTS; signaling during call setup before a traffic channel (TCH) is allocated
2. Slow Associated Control Channel (SACCH); Transmits continuous measurement reports
in parallel to operation of a TCH or SDCCH
3. Fast Associated Control Channel (FACCH); Similar to the SDCCH, but used in parallel
to operation of the TCH; if the data rate of the SACCH is insufficient, “borrowing mode” is
used: Additional bandwidth is borrowed from the TCH; this happens for messages
associated with call establishment authentication of the subscriber, handover decisions, etc.
GSM multiframe
The GSM frames are grouped together to form multiframes and in this way it is possible to establish
a time schedule for their operation and the network can be synchronised.
GSM Hyperframe
Above this 2048 superframes (i.e. 2 to the power 11) are grouped to form one hyperframe which
repeats every 3 hours 28 minutes 53.76 seconds. It is the largest time interval within the GSM frame
structure.
Within the GSM hyperframe there is a counter and every time slot has a unique sequential number
comprising the frame number and time slot number. This is used to maintain synchronisation of the
different scheduled operations with the GSM frame structure. These include functions such as:
Frequency hopping: Frequency hopping is a feature that is optional within the GSM
system. It can help reduce interference and fading issues, but for it to work, the transmitter
and receiver must be synchronised so they hop to the same frequencies at the same time.
Encryption: The encryption process is synchronised over the GSM hyperframe period
where a counter is used and the encryption process will repeat with each hyperframe.
However, it is unlikely that the cellphone conversation will be over 3 hours and accordingly
it is unlikely that security will be compromised as a result.
The slots and frames are handled in a very logical manner to enable the system to expect and accept
the data that needs to be sent. Organising it in this logical fashion enables it to be handled in the
most efficient manner
UMTS/3G
the Universal Mobile Telecommunications System is the third generation (3G) successor to the
second generation GSM based cellular technologies which also include GPRS, and EDGE.
Although UMTS uses a totally different air interface, the core network elements have been
migrating towards the UMTS requirements with the introduction of GPRS and EDGE. In this way
the transition from GSM to the 3G UMTS architecture did not require such a large instantaneous
investment.
UMTS uses Wideband CDMA (WCDMA / W-CDMA) to carry the radio transmissions, and often
the system is referred to by the name WCDMA. It is also gaining a third name.
Since it was originally formed, 3GPP has also taken over responsibility for the GSM standards as
well as looking at future developments including LTE (Long Term Evolution) and the 4G
technology known as LTE Advanced.
There are several key areas of 3G UMTS / WCDMA. Within these there are several key
technologies that have been employed to enable UMTS / WCDMA to provide a leap in
performance over its 2G predecessors.
Radio interface: The UMTS radio interface provides the basic definition of the radio
signal. W-CDMA occupies 5 MHz channels and has defined formats for elements such as
synchronization, power control and the like Read more about the UMTS / W-CDMA radio
interface.
CDMA technology: 3G UMTS relies on a scheme known as CDMA or code division
multiple access to enable multiple handsets or user equipments to have access to the base
station. Using a scheme known as direct sequence spread spectrum, different UEs have
different codes and can all talk to the base station even though they are all on the same
frequency.
UMTS network architecture: The architecture for a UMTS network was designed to enable
packet data to be carried over the network, whilst still enabling it to support circuit
switched voice. All the usual functions enabling access to the network, roaming and the
like are also supported.
UMTS modulation schemes: Within the CDMA signal format, a variety of forms of
modulation are used. These are typically forms of phase shift keying.
UMTS channels: As with any cellular system, different data channels are required for
passing payload data as well as control information and for enabling the required resources
to be allocated. A variety of different data channels are used to enable these facilities to be
accomplished
UMTS TDD: There are two methods of providing duplex for 3G UMTS. One is what is
termed frequency division duplex, FDD. This uses two channels spaced sufficiently apart
so that the receiver can receive whilst the transmitter is also operating. Another method is
to use time vision duplex, TDD where short time blocks are allocated to transmissions in
both directions. Using this method, only a single channel is required.
Handover: One key area of any cellular telecommunications system is the handover
(handoff) from one cell to the next. Using CDMA there are several forms of handover that
are implemented within the system.
UTRAN interfaces
Serving GPRS Support Node (SGSN): As the name implies, this entity was first developed
when GPRS was introduced, and its use has been carried over into the UMTS network
architecture. The SGSN provides a number of functions within the UMTS network
architecture.
o Mobility management When a UE attaches to the Packet Switched domain of the
UMTS Core Network, the SGSN generates MM information based on the mobile's
current location.
o Session management: The SGSN manages the data sessions providing the required
quality of service and also managing what are termed the PDP (Packet data Protocol)
contexts, i.e. the pipes over which the data is sent.
o Interaction with other areas of the network: The SGSN is able to manage its
elements within the network only by communicating with other areas of the network,
e.g. MSC and other circuit switched areas.
o Billing: The SGSN is also responsible billing. It achieves this by monitoring the flow
of user data across the GPRS network. CDRs (Call Detail Records) are generated by
the SGSN before being transferred to the charging entities (Charging Gateway
Function, CGF).
The UMTS standards are structured in a way that the internal functionality of the different network
elements is not defined. Instead, the interfaces between the network elements is defined and in this
way, so too is the element functionality.
There are several interfaces that are defined for the UTRAN elements:
Iub : The Iub connects the NodeB and the RNC within the UTRAN. Although when it was
launched, a standardization of the interface between the controller and base station in the
UTRAN was revolutionary, the aim was to stimulate competition between suppliers,
allowing opportunities like some manufacturers who might concentrate just on base stations
rather than the controller and other network entities.
Iur : The Iur interface allows communication between different RNCs within the UTRAN.
The open Iur interface enables capabilities like soft handover to occur as well as helping to
stimulate competition between equipment manufacturers.
Iu : The Iu interface connects the UTRAN to the core network.
Having standardised interfaces within various areas of the network including the UTRAN allows
network operators to select different network entities from different suppliers.
UMTS HSPA, High Speed Packet Access, combines HSDPA and HSUPA for uplink and
downlink to provide high speed data access.
3G HSPA, High Speed packet Access is the combination of two technologies, one of the
downlink and the other for the uplink that can be built onto the existing 3G UMTS or W-
CDMA technology to provide increased data transfer speeds.
The original 3G UMTS / W-CDMA standard provided a maximum download speed of 384
kbps.
With many users requiring much high data transfer speeds to compete with fixed line
broadband services and also to support services that require higher data rates, the need for an
increase in the speeds obtainable became necessary.
This resulted in the development of the technologies for 3G HSPA.
3G HSPA benefits
The UMTS cellular system as defined under the 3GPP Release 99 standard was orientated
more towards switched circuit operation and was not well suited to packet operation.
Additionally greater speeds were required by users than could be provided with the original
UMTS networks. Accordingly the changes required for HSPA were incorporated into many
UMTS networks to enable them to operate more in the manner required for current
applications.
HSPA provides a number of significant benefits that enable the new service to provide a far
better performance for the user. While 3G UMTS HSPA offers higher data transfer rates, this
is not the only benefit, as the system offers many other improvements as well:
1. Use of higher order modulation: 16QAM is used in the downlink instead of QPSK to
enable data to be transmitted at a higher rate. This provides for maximum data rates of 14
Mbps in the downlink. QPSK is still used in the uplink where data rates of up to 5.8 Mbps are
achieved. The data rates quoted are for raw data rates and do not include reductions in actual
payload data resulting from the protocol overheads.
2. Shorter Transmission Time Interval (TTI): The use of a shorter TTI reduces the round trip time
and enables improvements in adapting to fast channel variations and provides for reductions in latency.
3. Use of shared channel transmission: Sharing the resources enables greater levels of efficiency to be
achieved and integrates with IP and packet data concepts.
4. Use of link adaptation: By adapting the link it is possible to maximize the channel usage.
5. Fast Node B scheduling: The use of fast scheduling with adaptive coding and modulation (only
downlink) enables the system to respond to the varying radio channel and interference conditions and
to accommodate data traffic which tends to be "bursty" in nature.
6. Node B based Hybrid ARQ: This enables 3G HSPA to provide reduced retransmission
round trip times and it adds robustness to the system by allowing soft combining of
retransmissions.
For the network operator, the introduction of 3G HSPA technology brings a cost reduction
per bit carried as well as an increase in system capacity. With the increase in data traffic, and
operators looking to bring in increased revenue from data transmission, this is a particularly
attractive proposition. A further advantage of the introduction of 3G HSPA is that it can
often be rolled out by incorporating a software update into the system. This means its use
brings significant benefits to user and operator alike.
The two technologies were released at different times through 3GPP. They also have
different properties resulting from the different modes of operation that are required. In view
of these facts they were often treated as almost separate entities. Now they are generally
rolled out together. The two technologies are summarised below:
HSDPA - High Speed Downlink Packet Access: HSDPA provides packet data
support, reduced delays, and a peak raw data rate (i.e. over the air) of 14 Mbps. It also
provides around three times the capacity of the 3G UMTS technology defined in
Release 99 of the 3GPP UMTS standard. Read more about High speed downlink
packet access, HSDPA
HSUPA - High Speed Uplink Packet Access: HSUPA provides improved uplink
packet support, reduced delays and a peak raw data rate of 5.74 Mbps. This results
in a capacity increase of around twice that provided by the Release 99 services.
Read more about High speed uplink packet access, HSUPA
Q.8 a) What are the three main CDMA 2000 std and explain all three.
Ans:
1. 1X:
CDMA2000 1X (IS-2000), also known as 1x and 1xRTT, is the core CDMA2000 wireless air
interface standard. The designation "1x", meaning 1 times radio transmission technology, indicates
the same radio frequency (RF) bandwidth as IS-95: a duplex pair of 1.25 MHz radio channels.
1xRTT almost doubles the capacity of IS-95 by adding 64 more traffic channels to the forward link,
orthogonal to (in quadrature with) the original set of 64. The 1X standard supports packet data
speeds of up to 153 kbit/s with real world data transmission averaging 80–100 kbit/s in most
commercial applications.[3] IMT-2000 also made changes to the data link layer for greater use of
data services, including medium and link access control protocols and QoS. The IS-95 data link
layer only provided "best efforts delivery" for data and circuit switched channel for voice (i.e., a
voice frame once every 20 ms).
2. 1xEV-DO
CDMA2000 1xEV-DO (Evolution-Data Optimized), often abbreviated as EV-DO or EV, is a
telecommunications standard for the wireless transmission of data through radio signals, typically
for broadband Internet access. It uses multiplexing techniques including code division multiple
access (CDMA) as well as time-division access to maximize both individual user's throughput and
the overall system throughput. It is standardized (IS-856) by 3rd Generation Partnership Project 2
(3GPP2) as part of the CDMA2000 family of standards and has been adopted by many mobile
phone service providers around the world – particularly those previously employing CDMA
networks.
3. 1X Advanced
1X Advanced(Rev.E)[4][5] is the evolution of CDMA2000 1X. It provides up to four times the
capacity and 70% more coverage compared to 1X.
CDMA2000-3x
CDMA2000-3x (or CDMA 3G-3xRTT) uses 5 MHz of bandwidth, and it is therefore classified
together with UMTS in the Wideband CDMA (W-CDMA) family of radio transmission
technologies. It delivers peak bit rates of up to 144 Kbps for mobile applications and as much as 2
Mbps for stationary applications. CDMA2000-3x will also introduce higher bit rates for data
transmission, more sophisticated QoS and policy mechanisms, and advanced multimedia
capabilities. It will rely on the ATM-based data link layer between the base stations and MSCs to
accommodate the higher speeds and advanced call model. Table 3.2 shows a comparison between
the CDMA technologies, including the UMTS W-CDMA technology, which is described in the
following section.
and just has one component, the evolved base stations, called eNodeB or eNB. Each eNB is a
base station that controls the mobiles in one or more cells. The base station that is communicating
with a mobile is known as its serving eNB.
LTE Mobile communicates with just one base station and one cell at a time and there are
following two main functions supported by eNB:
The eBN sends and receives radio transmissions to all the mobiles using the analogue and
digital signal processing functions of the LTE air interface.
The eNB controls the low-level operation of all its mobiles, by sending them signalling
messages such as handover commands.
Each eBN connects with the EPC by means of the S1 interface and it can also be connected to nearby
base stations by the X2 interface, which is mainly used for signaling and packet forwarding during
handover.
A home eNB (HeNB) is a base station that has been purchased by a user to provide femtocell
coverage within the home. A home eNB belongs to a closed subscriber group (CSG) and can only be
accessed by mobiles with a USIM that also belongs to the closed subscriber group.
The Evolved Packet Core (EPC) (The core network)
The architecture of Evolved Packet Core (EPC) has been illustrated below. There are few more
components which have not been shown in the diagram to keep it simple. These components are like
the Earthquake and Tsunami Warning System (ETWS), the Equipment Identity Register (EIR) and
Policy Control and Charging Rules Function (PCRF).
Below is a brief description of each of the components shown in the above architecture:
The Home Subscriber Server (HSS) component has been carried forward from UMTS and
GSM and is a central database that contains information about all the network operator's
subscribers.
The Packet Data Network (PDN) Gateway (P-GW) communicates with the outside world ie.
packet data networks PDN, using SGi interface. Each packet data network is identified by an
access point name (APN). The PDN gateway has the same role as the GPRS support node
(GGSN) and the serving GPRS support node (SGSN) with UMTS and GSM.
The serving gateway (S-GW) acts as a router, and forwards data between the base station
and the PDN gateway.
The mobility management entity (MME) controls the high-level operation of the mobile by
means of signalling messages and Home Subscriber Server (HSS).
The Policy Control and Charging Rules Function (PCRF) is a component which is not
shown in the above diagram but it is responsible for policy
control decision-making, as well as for controlling the flow-based charging functionalities in
the Policy Control Enforcement Function (PCEF), which resides in the P-GW.
The interface between the serving and PDN gateways is known as S5/S8. This has two slightly
different implementations, namely S5 if the two devices are in the same network, and S8 if they are
in different networks.