Вы находитесь на странице: 1из 3

CP8210

ASSIGNMENT 2

Assignment 2 Due date Oct 18 (15%)


PART 1 - Data Mining
Data Mining refers to analyzing massive amount of data for finding patterns, trends and
relationships to form model to be used in business decision making, prediction, simulations and
etc. Two approaches frequently used in data mining are clustering and classification.
a) By explaining two examples for classification and clustering methods explain what are the
differences between two approaches and when each of them should be used.
b) One of the common method of clustering is k-Means. General k-Means algorithm that is
shown below is from Chapter 28th of Database book by Elmasri detailed in reference).

First use the above algorithm and with explaining and using common similarity metric distance
between a record, and also using a value of 3 for K, cluster the data of following table. You can
assume that the records with RIDs 1, 3, and 5 are used for the initial cluster centroids (means).
Try to follow the algorithm and calculate the centroids in a way that clusters to be optimum.
RID
1
2
3
4
5
6

Dimension 1
8
5
2
2
2
8

Dimension 2
4
4
4
6
8
6

Reference Part 1: Fundamental of Database Systems, 7th (or 6th) edition , By Elmasri and
Navathe, Pearson Publication, Chapter 28, Data Mining concepts.

CP8210

ASSIGNMENT 2

PART2-Simulating Distributed Computing using MapReduce


Implement Matrix multiplication by simulating MapReduce using java code below. Modify the
code and show the output of mapper and reducer and explain your model in details. Assume you
have computing nodes working in parallel and show in your model the number of mappers and
reducers. You can use the MapReduce models explained in Sections 2.3.9 and 2.3.10 in Mining
of Massive Datasets book by Leskovec et. al.

Below is the Java code to perform Multiplication of two matrix A[i][j] and B[j][k].
Where,
C[i][j] = A[i][0] * B[0][j] + A[i][1] * B[1][j] + A[i][2] * B[2][j] + .... A[i][n-1] * B[n1][j]

import java.util.Scanner;
public class MatrixMultiplication {
public static void main(String[] args) {
Scanner s = new Scanner(System.in);
System.out.print("Enter number of rows in A: ");
int rowsInA = s.nextInt();
System.out.print("Enter number of columns in A / rows in B: ");
int columnsInA = s.nextInt();
System.out.print("Enter number of columns in B: ");
int columnsInB = s.nextInt();
int[][] a = new int[rowsInA][columnsInA];
int[][] b = new int[columnsInA][columnsInB];
System.out.println("Enter matrix A");
for (int i = 0; i < a.length; i++) {
for (int j = 0; j < a[0].length; j++) {
a[i][j] = s.nextInt();
}
}
System.out.println("Enter matrix B");
for (int i = 0; i < b.length; i++) {
for (int j = 0; j < b[0].length; j++) {
b[i][j] = s.nextInt();
}
}

CP8210

ASSIGNMENT 2

int[][] c = multiply(a, b);


System.out.println("Product of A and B is");
for (int i = 0; i < c.length; i++) {
for (int j = 0; j < c[0].length; j++) {
System.out.print(c[i][j] + " ");
}
System.out.println();
}
}
public static int[][] multiply(int[][] a, int[][] b) {
int rowsInA = a.length;
int columnsInA = a[0].length; // same as rows in B
int columnsInB = b[0].length;
int[][] c = new int[rowsInA][columnsInB];
for (int i = 0; i < rowsInA; i++) {
for (int j = 0; j < columnsInB; j++) {
for (int k = 0; k < columnsInA; k++) {
c[i][j] = c[i][j] + a[i][k] * b[k][j];
}
}
}
return c;
}
}

Bonus Mark : There will b up to 5% bonus marks if you implement Hadoop as an underline
platform for the implemented MapReduce model

Вам также может понравиться