Вы находитесь на странице: 1из 17

UNIT-1

1. Machine learning is
a) The autonomous acquisition of knowledge through the use of computer
programs
b) The autonomous acquisition of knowledge through the use of manual programs
c) The selective acquisition of knowledge through the use of computer programs
d) The selective acquisition of knowledge through the use of manual programs
2. Factors which affect the performance of learner system does not include
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
3. Different learning methods does not include
a) Memorization
b) Analogy
c) Deduction
d) Introduction
4. In language understanding, the levels of knowledge that does not include
a) Phonological
b) Syntactic
c) Empirical
d) Logical
5. A model of language consists of the categories which does not include
a) Language units
b) Role structure of units
c) System constraints
d) Structural units
6. What is a top-down parser?
a) Begins by hypothesizing a sentence (the symbol S) and successively
predicting lower level constituents until individual preterminal symbols are
written
b) Begins by hypothesizing a sentence (the symbol S) and successively predicting
upper level constituents until individual preterminal symbols are written
c) Begins by hypothesizing lower level constituents and successively predicting a
sentence (the symbol S)
d) Begins by hypothesizing upper level constituents and successively predicting a
sentence (the symbol S)
7. Among the following which is not a horn clause?
a) p
b) Øp V q
c) p → q
d) p → Øq
8. The action ‘STACK(A, B)’ of a robot arm specify to
a) Place block B on Block A
b) Place blocks A, B on the table in that order
c) Place blocks B, A on the table in that order
d) Place block A on block B
9. A _________ is a decision support tool that uses a tree-like graph or model of
decisions and their possible consequences, including chance event outcomes,
resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks
10. Decision Tree is a display of an algorithm.
a) True
b) False
11. Decision Tree is
a) Flow-Chart
b) Structure in which internal node represents test on an attribute, each branch represents
outcome of test and each leaf node represents class label
c) Flow-Chart & Structure in which internal node represents test on an attribute, each
branch represents outcome of test and each leaf node represents class label
d) None of the mentioned
12. Decision Trees can be used for Classification Tasks.
a) True
b) False
13. Choose from the following that are Decision Tree nodes
a) Decision Nodes
b) End Nodes
c) Chance Nodes
d) All of the mentioned
14. Decision Nodes are represented by ____________
a) Disks
b) Squares
c) Circles
d) Triangles
15. Chance Nodes are represented by,
a) Disks
b) Squares
c) Circles
d) Triangles
16. End Nodes are represented by __________
a) Disks
b) Squares
c) Circles
d) Triangles
17. Following are the advantage/s of Decision Trees. Choose that apply.
a) Possible Scenarios can be added
b) Use a white box model, If given result is provided by a model
c) Worst, best and expected values can be determined for different scenarios
d) All of the mentioned
18. A perceptron is:
a) a single layer feed-forward neural network with pre-processing
b) an auto-associative neural network
c) a double layer auto-associative neural network
d) a neural network that contains feedback
19. An auto-associative network is:
a) a neural network that contains no loops
b) a neural network that contains feedback
c) a neural network that has only one loop
d) a single layer feed-forward neural network with pre-processing
20. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the
constant of proportionality being equal to 2. The inputs are 4, 10, 5 and 20
respectively. The output will be:
a) 238
b) 76
c) 119
d) 123
Explanation: The output is found by multiplying the weights with their respective
inputs, summing the results and multiplying with the transfer function. Therefore:
Output = 2 * (1*4 + 2*10 + 3*5 + 4*20) = 238.
21. What are the advantages of neural networks over conventional computers?
(i) They have the ability to learn by example
(ii) They are more fault tolerant
(iii)They are more suited for real time operation due to their high ‘computational’ rates
a) (i) and (ii) are true
b) (i) and (iii) are true
c) Only (i)
d) All of the mentioned
22. What is back propagation?
a) It is another name given to the curvy function in the perceptron
b) It is the transmission of error back through the network to adjust the inputs
c) It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn
d) None of the mentioned
23. Which of the following is not the promise of artificial neural network?
a) It can explain result
b) It can survive the failure of some nodes
c) It has inherent parallelism
d) It can handle noise
24. Neural Networks are complex ______________ with many parameters.
a) Linear Functions
b) Nonlinear Functions
c) Discrete Functions
d) Exponential Functions
25. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain
value, it outputs a 1, otherwise it just outputs a 0.
a) True
b) False
c) Sometimes – it can also output intermediate values as well
d) Can’t say
26. The name for the function in question 16 is
a) Step function
b) Heaviside function
c) Logistic function
d) Perceptron function
27. he network that involves backward links from output to the input and hidden layers is
called as ____
a) Self organizing maps
b) Perceptrons
c) Recurrent neural network
d) Multi layered perceptron
28. Which of the following is an application of NN (Neural Network)?
a) Sales forecasting
b) Data validation
c) Risk management
d) All of the mentioned
29. The process by which you become aware of messages through your sense is called
a) Organization
b) Sensation
c) Interpretation-Evaluation
d) Perception
30. Susan is so beautiful; I bet she is smart too. This is an example of
a) The halo effect
b) The primary effect
c) A self-fulfilling prophecy
d) The recency effect
31. _____ prevents you from seeing an individual as an individual rather than as a member of a
group.
a) Cultural mores
b) Stereotypes
c) Schematas
d) Attributions
32. Mindless processing is
a) careful, critical thinking
b) inaccurate and faulty processing
c) information processing that relies heavily on familiar schemata
d) processing that focuses on unusual or novel events
33. Selective retention occurs when
a) we process, store, and retrieve information that we have already selected,
organized, and interpreted
b) we make choices to experience particular stimuli
c) we make choices to avoid particular stimuli
d) we focus on specific stimuli while ignoring other stimuli
34. Which of the following strategies would NOT be effective at improving your
communication competence?
a) Recognize the people, objects, and situations remain stable over time
b) Recognize that each person’s frame of perception is unique
c) Be active in perceiving
d) Distinguish facts from inference
35. A perception check is
a) a cognitive bias that makes us listen only to information we already agree with
b) a method teachers use to reward good listeners in the classroom
c) any factor that gets in the way of good listening and decreases our ability to
interpret correctly
d) a response that allows you to state your interpretation and ask your partner
whether or not that interpretation is correct
36. The process of forming general concept definitions from examples of concepts
to be learned.
A.Deduction B.abduction C.induction D.conjunction
37. Computers are best at learning
A.facts. B.concepts. C.procedures. D.principles.
38. Data used to build a data mining model.
A.validation data B.training data C.test data D.hidden data
39. Supervised learning and unsupervised clustering both require at least one
A.hidden attribute. B.output attribute. C.input attribute. D.categorical
attribute
39. Supervised learning differs from unsupervised clustering in that supervised
learning requires
A.at least one input attribute. B.input attributes to be categorical.
C.at least one output attribute. D.ouput attriubutes to be categorical.

40. Which of the following is a common use of unsupervised clustering?


A.detect outliers B.determine a best set of input attributes for supervised
learning
C.evaluate the likely performance of a supervised learner model
D.determine if meaningful relationships can be found in a dataset
E.All of a,b,c, and d are common uses of unsupervised clustering.
41.

UNIT-2
1. Classification problems are distinguished from estimation problems in that
A.classification problems require the output attribute to be numeric.
B.classification problems require the output attribute to be categorical.
C.classification problems do not allow an output attribute.
D.classification problems are designed to predict future outcome.
2. Which statement is true about prediction problems?
A.The output attribute must be categorical.
B.The output attribute must be numeric.
C.The resultant model is designed to determine future outcomes.
D.The resultant model is designed to classify current behavior.
3. Which statement about outliers is true?
A.Outliers should be identified and removed from a dataset.
B.Outliers should be part of the training dataset but should not be present in the
test data.
C.Outliers should be part of the test dataset but should not be present in the
training data.
D.The nature of the problem determines how outliers are used.
E.More than one of a,b,c or d is true
4. The average positive difference between computed and desired outcome
values.
A.root mean squared error B.mean squared error
C.mean absolute error D.mean positive error
5. Selecting data so as to assure that each class is properly represented in
both the training and test set.
A.cross validation B.stratification C.verification D.bootstrapping
6. The standard error is defined as the square root of this computation.
A.The sample variance divided by the total number of sample instances.
B.The population variance divided by the total number of sample instances.
C.The sample variance divided by the sample mean.
D.The population variance divided by the sample mean.
7. Data used to optimize the parameter settings of a supervised learner
model.
A.training B.test C.verification D.validation
8. Bootstrapping allows us to
A.choose the same training instance several times.
B.choose the same test set instance several times.
C.build models with alternative subsets of the training data several times.
D.test a model with alternative subsets of the test data several times.
9. The correlation coefficient for two real-valued attributes is –0.85. What
does this value tell you?
A.The attributes are not linearly related.
B.As the value of one attribute increases the value of the second attribute also
increases.
C.As the value of one attribute decreases the value of the second attribute
increases.
D.The attributes show a curvilinear relationship.
10. The average squared difference between classifier predicted output and
actual output.
A.mean squared error B.root mean squared error
C.mean absolute error D.mean relative error
11. With Bayes classifier, missing data items are
A.treated as equal compares. B.treated as unequal compares.
C.replaced with a default value. D.ignored.
12. A statement about a population developed for the purpose of testing is called:
(a) Hypothesis (b) Hypothesis testing
(c) Level of significance (d) Test-statistic
The hypothesis is the supposition that we want totest.
13. Any hypothesis which is tested for the purpose ofrejection under the
assumption that it is true iscalled:
(a) Null hypothesis (b) Alternative hypothesis
(c) Statistical hypothesis (d) Composite hypothesis
The Null hypothesis serves as counter-weight inorder to prove the alternative
hypothesis.
14. A statement about the value of a population parameter is called:
(a) Null hypothesis (b) Alternative hypothesis
(c) Simple hypothesis (d) Composite hypothesis
In the null hypothesis we do not have all the parameters so we try to
approximate it.
15. Any statement whose validity is tested on the basis of a sample is called:
(a) Null hypothesis (b) Alternative hypothesis
(c) Statistical hypothesis (b) Simple hypothesis
In the statistical hypothesis we receive most of the parameters, so we can
test a sample within those parameters.
16. A quantitative statement about a population iscalled:
(a) Research hypothesis (b) Composite hypothesis
(c) Simple hypothesis(d) Statistical hypothesis
A statistical hypothesis is an assumption about a population parameter
17. A statement that is accepted if the sample data provide sufficient evidence that
the nullhypothesis is false is called:
(a) Simple hypothesis (b) Composite hypothesis
(c) Statistical hypothesis(d) Alternative hypothesis
The alternative hypothesis is the one that we want prove
18. A hypothesis that specifies all the values of parameter is called:
(a) Simple hypothesis (b) Composite hypothesis (c) Statistical
hypothesis
(d) None of the above
19. A hypothesis may be classified as:
(a) Simple (b) Composite (c) Null (d) All of the above
The simple and the composite are types ofhypothesis based on the information used
in the statement.
20. The probability of rejecting the null hypothesiswhen it is true is called:
(a) Level of confidence (b) Level of significance (c) Power of the test
(d) Difficult to tellThe level of confidence is used to calculate thecritical value.
21. The dividing point between the region where the null hypothesis is rejected and
the region where itis not rejected is said to be:
(a) Critical region (b) Critical value (c) Acceptance region
(d) Significant regionThe critical value defines the regions ofacceptance and
rejection.
22. If the critical region is located equally in bothsides of the sampling distribution
of test-statistic,the test is called:
(a) One tailed (b) Two tailed (c) Right tailed
(d) Left tailedWe use two tail when our null hypothesis states anequality.
23. A rule or formula that provides a basis for testinga null hypothesis is called:
(a) Test-statistic (b) Population statistic (c) Both of these
(d) None of the above
24. Critical region is also called:
(a)Acceptance region (b) Rejection region (c) Confidence region d) Statistical region
The rejection region goes from the critical value to infinite.
25. The probability of rejecting Ho when it is false is called:
(a) Power of the test (b) Size of the test
(c)Level of confidence (d)Confidence coefficient
The power of a test is also called statistical power andit refers to the probability the test correctly
rejects thenull hypothesis
26. Suppose you are given an EM algorithm that finds maximum likelihood estimates for a
model with latent variables. You are asked to modify the algorithm so that it finds MAP estimates
instead. Which step or steps do you need to modify:
A. Expectation B. Maximization C. No modification necessary D. Both
UNIT-3

A regression model in which more than one independent variable is used to predict
the dependentvariable is called
A.a simple linear regression model B.a multiple regression models
C.an independent model D.none of the above
A term used to describe the case when the independent variables in a multiple
regression model arecorrelated is
A.regression B.correlation C.multicollinearity D.none of the
above
A multiple regression model has the form: y = 2 + 3x1 + 4x2. As x1 increases by 1 unit
(holding x2constant), y will
A.increase by 3 units B.decrease by 3 units
C.increase by 4 units D.decrease by 4 units
A multiple regression model has
A.only one independent variable B.more than one dependent variable
C.more than one independent variable D.none of the above

A measure of goodness of fit for the estimated regression equation is the


A.multiple coefficient of determination B.mean square due to error
C.mean square due to regression D.none of the above
The adjusted multiple coefficient of determination accounts for
A.the number of dependent variables in the model
B.the number of independent variables in the model
C.unusually large predictors D.none of the above
The multiple coefficient of determination is computed by
A.dividing SSR by SST B.dividing SST by SSR
C.dividing SST by SSE D.none
A nearest neighbor approach is best used
A.with large-sized datasets. B.when irrelevant attributes have been removed from the
data.
C.when a generalized model of the data is desireable.
D.when an explanation of what has been found is of primary importance.
Another name for an output attribute.
A.predictive variable B.independent variable
C.estimated variable D.dependent variable
Which statement is true about neural network and linear regression models?
A.Both models require input attributes to be numeric.
B.Both models require numeric attributes to range between 0 and 1.
C.The output of both models is a categorical attribute value.
D.Both techniques build models whose output is determined by a linear sum of weighted
inputattribute values.
E.More than one of a,b,c or d is true.
Simple regression assumes a __________ relationship between the input attribute
andoutput attribute.
A.linear B.quadratic C.reciprocal D.inverse
Regression trees are often used to model _______ data.
A.linear B.nonlinear C.categorical D.symmetrical
The leaf nodes of a model tree are
A.averages of numeric output attribute values. B.nonlinear regression equations.
C.linear regression equations. D.sums of numeric output attribute values.

Logistic regression is a ________ regression technique that is used to model data


having a _____outcome.
A.linear, numeric B.linear, binary C.nonlinear, numeric D.nonlinear, binary
This technique associates a conditional probability value with each data instance.
A.linear regression B.logistic regression
C.simple regression D.multiple linear regression
This supervised learning technique can process both numeric and categorical input
attributes.
A.linear regression B.Bayes classifier
C.logistic regression D.backpropagation learning
This clustering algorithm merges and splits nodes to help modify nonoptimal
partitions.
A.agglomerative clustering B.expectation maximization
C.conceptual clustering D.K-Means clustering
This clustering algorithm initially assumes that each data instance represents a single
cluster.
A.agglomerative clustering B.conceptual clustering
C.K-Means clustering D.expectation maximization
This unsupervised clustering algorithm terminates when mean values computed for
thecurrent iteration of the algorithm are identical to the computed mean values for
the previousiteration.
A.agglomerative clustering B.conceptual clustering
C.K-Means clustering D.expectation maximization
When a decision tree is grown to full depth, it is more likely to fit the noise in the data. .
(True/ false)

When the hypothesis space is richer, over fitting is more likely.


(True/ false)

UNIT-4
1. Machine learning techniques differ from statistical techniques in that machine
learning methods
A.typically assume an underlying distribution for the data.
B.are better able to deal with missing and noisy data.
C.are not able to explain their behavior.
D.have trouble with large-sized datasets
2. We can get multiple local optimum solutions if we solve a linear regression problem by
minimizing the sum of squared errors using gradient descent.(True/ false)
3. When the feature space is larger, over fitting is more likely. (True/ false)
4. We can use gradient descent to learn a Gaussian Mixture Model. (True/ false)
As the number of training examples goes to infinity, your model trained on that data
will have:
A. Lower variance B. Higher variance C. Same variance
5. As the number of training examples goes to infinity, your model trained on that data
will have:
A. Lower bias B. Higher bias C. Same bias
UNIT-5

2Marks Questions

1) What is Machine learning?

Machine learning is a branch of computer science which deals with system


programming in order to automatically learn and improve with experience. For
example: Robots are programed so that they can perform the task based on data they
gather from sensors. It automatically learns programs from data.

2) Mention the difference between Data Mining and Machine learning?

Machine learning relates with the study, design and development of the algorithms
that give computers the capability to learn without being explicitly
programmed. While, data mining can be defined as the process in which the
unstructured data tries to extract knowledge or unknown interesting patterns. During
this process machine, learning algorithms are used.

3) What is ‘Overfitting’ in Machine learning?

In machine learning, when a statistical model describes random error or noise instead
of underlying relationship ‘overfitting’ occurs. When a model is excessively complex,
overfitting is normally observed, because of having too many parameters with respect
to the number of training data types. The model exhibits poor performance which has
been overfit.

4) Why overfitting happens?

The possibility of overfitting exists as the criteria used for training the model is not the
same as the criteria used to judge the efficacy of a model.

5) How can you avoid overfitting ?


By using a lot of data overfitting can be avoided, overfitting happens relatively as you
have a small dataset, and you try to learn from it. But if you have a small database
and you are forced to come with a model based on that. In such situation, you can
use a technique known as cross validation. In this method the dataset splits into two
section, testing and training datasets, the testing dataset will only test the model
while, in training dataset, the datapoints will come up with the model.

In this technique, a model is usually given a dataset of a known data on which


training (training data set) is run and a dataset of unknown data against which the
model is tested. The idea of cross validation is to define a dataset to “test” the model
in the training phase.

6) What is inductive machine learning?

The inductive machine learning involves the process of learning by examples, where
a system, from a set of observed instances tries to induce a general rule.

7) What are the five popular algorithms of Machine Learning?

a) Decision Trees

b) Neural Networks (back propagation)

c) Probabilistic networks

d) Nearest Neighbor

e) Support vector machines

8) What are the different Algorithm techniques in Machine Learning?

The different types of techniques in Machine Learning are

a) Supervised Learning

b) Unsupervised Learning

c) Semi-supervised Learning

d) Reinforcement Learning

e) Transduction

f) Learning to Learn

9) What are the three stages to build the hypotheses or model in machine learning?

a) Model building

b) Model testing

c) Applying the model

10) What is the standard approach to supervised learning?

The standard approach to supervised learning is to split the set of example into the
training set and the test.

11) What is ‘Training set’ and ‘Test set’?

In various areas of information science like machine learning, a set of data is used to
discover the potentially predictive relationship known as ‘Training Set’. Training set is
an examples given to the learner, while Test set is used to test the accuracy of the
hypotheses generated by the learner, and it is the set of example held back from the
learner. Training set are distinct from Test set.

12) List down various approaches for machine learning?


The different approaches in Machine Learning are

a) Concept Vs Classification Learning

b) Symbolic Vs Statistical Learning

c) Inductive Vs Analytical Learning

13) What is not Machine Learning?

a) Artificial Intelligence

b) Rule based inference

14) Explain what is the function of ‘Unsupervised Learning’?

a) Find clusters of the data

b) Find low-dimensional representations of the data

c) Find interesting directions in data

d) Interesting coordinates and correlations

e) Find novel observations/ database cleaning

15) Explain what is the function of ‘Supervised Learning’?

a) Classifications

b) Speech recognition

c) Regression

d) Predict time series

e) Annotate strings

16) What is algorithm independent machine learning?

Machine learning in where mathematical foundations is independent of any particular


classifier or learning algorithm is referred as algorithm independent machine learning?

17) What is the difference between artificial learning and machine learning?

Designing and developing algorithms according to the behaviours based on empirical


data are known as Machine Learning. While artificial intelligence in addition to
machine learning, it also covers other aspects like knowledge representation, natural
language processing, planning, robotics etc.

18) What is classifier in machine learning?

A classifier in a Machine Learning is a system that inputs a vector of discrete or


continuous feature values and outputs a single discrete value, the class.

19) What are the advantages of Naive Bayes?


In Naïve Bayes classifier will converge quicker than discriminative models like logistic
regression, so you need less training data. The main advantage is that it can’t learn
interactions between features.

20) In what areas Pattern Recognition is used?

Pattern Recognition can be used in

a) Computer Vision

b) Speech Recognition

c) Data Mining

d) Statistics

e) Informal Retrieval

f) Bio-Informatics

21) What is Genetic Programming?

Genetic programming is one of the two techniques used in machine learning. The
model is based on the testing and selecting the best choice among a set of results.

22) What is Inductive Logic Programming in Machine Learning?

Inductive Logic Programming (ILP) is a subfield of machine learning which uses


logical programming representing background knowledge and examples.

23) What is Model Selection in Machine Learning?

The process of selecting models among different mathematical models, which are
used to describe the same data set is known as Model Selection. Model selection is
applied to the fields of statistics, machine learning and data mining.

24) What are the two methods used for the calibration in Supervised Learning?

The two methods used for predicting good probabilities in Supervised Learning are

a) Platt Calibration

b) Isotonic Regression

These methods are designed for binary classification, and it is not trivial.

25) Which method is frequently used to prevent overfitting?

When there is sufficient data ‘Isotonic Regression’ is used to prevent an overfitting


issue.

26) What is the difference between heuristic for rule learning and heuristics for decision
trees?

The difference is that the heuristics for decision trees evaluate the average quality of
a number of disjointed sets while rule learners only evaluate the quality of the set of
instances that is covered with the candidate rule.
27) What is Perceptron in Machine Learning?

In Machine Learning, Perceptron is an algorithm for supervised classification of the


input into one of several possible non-binary outputs.

28) Explain the two components of Bayesian logic program?

Bayesian logic program consists of two components. The first component is a logical
one ; it consists of a set of Bayesian Clauses, which captures the qualitative structure
of the domain. The second component is a quantitative one, it encodes the
quantitative information about the domain.

29) What are Bayesian Networks (BN) ?

Bayesian Network is used to represent the graphical model for probability relationship
among a set of variables .

30) Why instance based learning algorithm sometimes referred as Lazy learning
algorithm?

Instance based learning algorithm is also referred as Lazy learning algorithm as they
delay the induction or generalization process until classification is performed.

31) What are the two classification methods that SVM ( Support Vector Machine) can
handle?

a) Combining binary classifiers

b) Modifying binary to incorporate multiclass learning

32) What is ensemble learning?

To solve a particular computational program, multiple models such as classifiers or


experts are strategically generated and combined. This process is known as
ensemble learning.

33) Why ensemble learning is used?

Ensemble learning is used to improve the classification, prediction, function


approximation etc of a model.

34) When to use ensemble learning?

Ensemble learning is used when you build component classifiers that are more
accurate and independent from each other.

35) What are the two paradigms of ensemble methods?

The two paradigms of ensemble methods are

a) Sequential ensemble methods

b) Parallel ensemble methods

36) What is the general principle of an ensemble method and what is bagging and
boosting in ensemble method?
The general principle of an ensemble method is to combine the predictions of several
models built with a given learning algorithm in order to improve robustness over a
single model. Bagging is a method in ensemble for improving unstable estimation or
classification schemes. While boosting method are used sequentially to reduce the
bias of the combined model. Boosting and Bagging both can reduce errors by
reducing the variance term.

37) What is bias-variance decomposition of classification error in ensemble method?

The expected error of a learning algorithm can be decomposed into bias and
variance. A bias term measures how closely the average classifier produced by the
learning algorithm matches the target function. The variance term measures how
much the learning algorithm’s prediction fluctuates for different training sets.

38) What is an Incremental Learning algorithm in ensemble?

Incremental learning method is the ability of an algorithm to learn from new data that
may be available after classifier has already been generated from already available
dataset.

39) What is PCA, KPCA and ICA used for?

PCA (Principal Components Analysis), KPCA ( Kernel based Principal Component


Analysis) and ICA ( Independent Component Analysis) are important feature
extraction techniques used for dimensionality reduction.

40) What is dimension reduction in Machine Learning?

In Machine Learning and statistics, dimension reduction is the process of reducing the
number of random variables under considerations and can be divided into feature
selection and feature extraction

41) What are support vector machines?

Support vector machines are supervised learning algorithms used for classification
and regression analysis.

42) What are the components of relational evaluation techniques?

The important components of relational evaluation techniques are

a) Data Acquisition

b) Ground Truth Acquisition

c) Cross Validation Technique

d) Query Type

e) Scoring Metric

f) Significance Test

43) What are the different methods for Sequential Supervised Learning?

The different methods to solve Sequential Supervised Learning problems are


a) Sliding-window methods

b) Recurrent sliding windows

c) Hidden Markow models

d) Maximum entropy Markow models

e) Conditional random fields

f) Graph transformer networks

44) What are the areas in robotics and information processing where sequential
prediction problem arises?

The areas in robotics and information processing where sequential prediction


problem arises are

a) Imitation Learning

b) Structured prediction

c) Model based reinforcement learning

45) What is batch statistical learning?

Statistical learning techniques allow learning a function or predictor from a set of


observed data that can make predictions about unseen or future data. These
techniques provide guarantees on the performance of the learned predictor on the
future unseen data based on a statistical assumption on the data generating process.

46) What is PAC Learning?

PAC (Probably Approximately Correct) learning is a learning framework that has been
introduced to analyze learning algorithms and their statistical efficiency.

47) What are the different categories you can categorized the sequence learning
process?

a) Sequence prediction

b) Sequence generation

c) Sequence recognition

d) Sequential decision

48) What is sequence learning?

Sequence learning is a method of teaching and learning in a logical manner.

49) What are two techniques of Machine Learning ?

The two techniques of Machine Learning are

a) Genetic Programming

b) Inductive Learning
50) Give a popular application of machine learning that you see on day to day basis?

The recommendation engine implemented by major ecommerce websites uses


Machine Learning

51) Suppose we clustered a set of N data points using two different clustering algorithms:
k-means and Gaussian mixtures. In both cases we obtained 5 clusters and in both cases the
centers
of the clusters are exactly the same. Can 3 points that are assigned to different clusters in the
kmeans
solution be assigned to the same cluster in the Gaussian mixture solution? If no, explain. If

so, sketch an example or explain in 1-2 sentences.

Solution:
Yes, k-means assigns each data point to a unique cluster based on its distance to the cluster
center. Gaussian mixture clustering gives soft (probabilistic) assignment to each data point.
Therefore, even if cluster centers are identical in both methods, if Gaussian mixture components
have large variances (components are spread around their center), points on the edges

between clusters may be given different assignments in the Gaussian mixture solution

Вам также может понравиться