Академический Документы
Профессиональный Документы
Культура Документы
1. Machine learning is
a) The autonomous acquisition of knowledge through the use of computer
programs
b) The autonomous acquisition of knowledge through the use of manual programs
c) The selective acquisition of knowledge through the use of computer programs
d) The selective acquisition of knowledge through the use of manual programs
2. Factors which affect the performance of learner system does not include
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
3. Different learning methods does not include
a) Memorization
b) Analogy
c) Deduction
d) Introduction
4. In language understanding, the levels of knowledge that does not include
a) Phonological
b) Syntactic
c) Empirical
d) Logical
5. A model of language consists of the categories which does not include
a) Language units
b) Role structure of units
c) System constraints
d) Structural units
6. What is a top-down parser?
a) Begins by hypothesizing a sentence (the symbol S) and successively
predicting lower level constituents until individual preterminal symbols are
written
b) Begins by hypothesizing a sentence (the symbol S) and successively predicting
upper level constituents until individual preterminal symbols are written
c) Begins by hypothesizing lower level constituents and successively predicting a
sentence (the symbol S)
d) Begins by hypothesizing upper level constituents and successively predicting a
sentence (the symbol S)
7. Among the following which is not a horn clause?
a) p
b) Øp V q
c) p → q
d) p → Øq
8. The action ‘STACK(A, B)’ of a robot arm specify to
a) Place block B on Block A
b) Place blocks A, B on the table in that order
c) Place blocks B, A on the table in that order
d) Place block A on block B
9. A _________ is a decision support tool that uses a tree-like graph or model of
decisions and their possible consequences, including chance event outcomes,
resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks
10. Decision Tree is a display of an algorithm.
a) True
b) False
11. Decision Tree is
a) Flow-Chart
b) Structure in which internal node represents test on an attribute, each branch represents
outcome of test and each leaf node represents class label
c) Flow-Chart & Structure in which internal node represents test on an attribute, each
branch represents outcome of test and each leaf node represents class label
d) None of the mentioned
12. Decision Trees can be used for Classification Tasks.
a) True
b) False
13. Choose from the following that are Decision Tree nodes
a) Decision Nodes
b) End Nodes
c) Chance Nodes
d) All of the mentioned
14. Decision Nodes are represented by ____________
a) Disks
b) Squares
c) Circles
d) Triangles
15. Chance Nodes are represented by,
a) Disks
b) Squares
c) Circles
d) Triangles
16. End Nodes are represented by __________
a) Disks
b) Squares
c) Circles
d) Triangles
17. Following are the advantage/s of Decision Trees. Choose that apply.
a) Possible Scenarios can be added
b) Use a white box model, If given result is provided by a model
c) Worst, best and expected values can be determined for different scenarios
d) All of the mentioned
18. A perceptron is:
a) a single layer feed-forward neural network with pre-processing
b) an auto-associative neural network
c) a double layer auto-associative neural network
d) a neural network that contains feedback
19. An auto-associative network is:
a) a neural network that contains no loops
b) a neural network that contains feedback
c) a neural network that has only one loop
d) a single layer feed-forward neural network with pre-processing
20. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the
constant of proportionality being equal to 2. The inputs are 4, 10, 5 and 20
respectively. The output will be:
a) 238
b) 76
c) 119
d) 123
Explanation: The output is found by multiplying the weights with their respective
inputs, summing the results and multiplying with the transfer function. Therefore:
Output = 2 * (1*4 + 2*10 + 3*5 + 4*20) = 238.
21. What are the advantages of neural networks over conventional computers?
(i) They have the ability to learn by example
(ii) They are more fault tolerant
(iii)They are more suited for real time operation due to their high ‘computational’ rates
a) (i) and (ii) are true
b) (i) and (iii) are true
c) Only (i)
d) All of the mentioned
22. What is back propagation?
a) It is another name given to the curvy function in the perceptron
b) It is the transmission of error back through the network to adjust the inputs
c) It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn
d) None of the mentioned
23. Which of the following is not the promise of artificial neural network?
a) It can explain result
b) It can survive the failure of some nodes
c) It has inherent parallelism
d) It can handle noise
24. Neural Networks are complex ______________ with many parameters.
a) Linear Functions
b) Nonlinear Functions
c) Discrete Functions
d) Exponential Functions
25. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain
value, it outputs a 1, otherwise it just outputs a 0.
a) True
b) False
c) Sometimes – it can also output intermediate values as well
d) Can’t say
26. The name for the function in question 16 is
a) Step function
b) Heaviside function
c) Logistic function
d) Perceptron function
27. he network that involves backward links from output to the input and hidden layers is
called as ____
a) Self organizing maps
b) Perceptrons
c) Recurrent neural network
d) Multi layered perceptron
28. Which of the following is an application of NN (Neural Network)?
a) Sales forecasting
b) Data validation
c) Risk management
d) All of the mentioned
29. The process by which you become aware of messages through your sense is called
a) Organization
b) Sensation
c) Interpretation-Evaluation
d) Perception
30. Susan is so beautiful; I bet she is smart too. This is an example of
a) The halo effect
b) The primary effect
c) A self-fulfilling prophecy
d) The recency effect
31. _____ prevents you from seeing an individual as an individual rather than as a member of a
group.
a) Cultural mores
b) Stereotypes
c) Schematas
d) Attributions
32. Mindless processing is
a) careful, critical thinking
b) inaccurate and faulty processing
c) information processing that relies heavily on familiar schemata
d) processing that focuses on unusual or novel events
33. Selective retention occurs when
a) we process, store, and retrieve information that we have already selected,
organized, and interpreted
b) we make choices to experience particular stimuli
c) we make choices to avoid particular stimuli
d) we focus on specific stimuli while ignoring other stimuli
34. Which of the following strategies would NOT be effective at improving your
communication competence?
a) Recognize the people, objects, and situations remain stable over time
b) Recognize that each person’s frame of perception is unique
c) Be active in perceiving
d) Distinguish facts from inference
35. A perception check is
a) a cognitive bias that makes us listen only to information we already agree with
b) a method teachers use to reward good listeners in the classroom
c) any factor that gets in the way of good listening and decreases our ability to
interpret correctly
d) a response that allows you to state your interpretation and ask your partner
whether or not that interpretation is correct
36. The process of forming general concept definitions from examples of concepts
to be learned.
A.Deduction B.abduction C.induction D.conjunction
37. Computers are best at learning
A.facts. B.concepts. C.procedures. D.principles.
38. Data used to build a data mining model.
A.validation data B.training data C.test data D.hidden data
39. Supervised learning and unsupervised clustering both require at least one
A.hidden attribute. B.output attribute. C.input attribute. D.categorical
attribute
39. Supervised learning differs from unsupervised clustering in that supervised
learning requires
A.at least one input attribute. B.input attributes to be categorical.
C.at least one output attribute. D.ouput attriubutes to be categorical.
UNIT-2
1. Classification problems are distinguished from estimation problems in that
A.classification problems require the output attribute to be numeric.
B.classification problems require the output attribute to be categorical.
C.classification problems do not allow an output attribute.
D.classification problems are designed to predict future outcome.
2. Which statement is true about prediction problems?
A.The output attribute must be categorical.
B.The output attribute must be numeric.
C.The resultant model is designed to determine future outcomes.
D.The resultant model is designed to classify current behavior.
3. Which statement about outliers is true?
A.Outliers should be identified and removed from a dataset.
B.Outliers should be part of the training dataset but should not be present in the
test data.
C.Outliers should be part of the test dataset but should not be present in the
training data.
D.The nature of the problem determines how outliers are used.
E.More than one of a,b,c or d is true
4. The average positive difference between computed and desired outcome
values.
A.root mean squared error B.mean squared error
C.mean absolute error D.mean positive error
5. Selecting data so as to assure that each class is properly represented in
both the training and test set.
A.cross validation B.stratification C.verification D.bootstrapping
6. The standard error is defined as the square root of this computation.
A.The sample variance divided by the total number of sample instances.
B.The population variance divided by the total number of sample instances.
C.The sample variance divided by the sample mean.
D.The population variance divided by the sample mean.
7. Data used to optimize the parameter settings of a supervised learner
model.
A.training B.test C.verification D.validation
8. Bootstrapping allows us to
A.choose the same training instance several times.
B.choose the same test set instance several times.
C.build models with alternative subsets of the training data several times.
D.test a model with alternative subsets of the test data several times.
9. The correlation coefficient for two real-valued attributes is –0.85. What
does this value tell you?
A.The attributes are not linearly related.
B.As the value of one attribute increases the value of the second attribute also
increases.
C.As the value of one attribute decreases the value of the second attribute
increases.
D.The attributes show a curvilinear relationship.
10. The average squared difference between classifier predicted output and
actual output.
A.mean squared error B.root mean squared error
C.mean absolute error D.mean relative error
11. With Bayes classifier, missing data items are
A.treated as equal compares. B.treated as unequal compares.
C.replaced with a default value. D.ignored.
12. A statement about a population developed for the purpose of testing is called:
(a) Hypothesis (b) Hypothesis testing
(c) Level of significance (d) Test-statistic
The hypothesis is the supposition that we want totest.
13. Any hypothesis which is tested for the purpose ofrejection under the
assumption that it is true iscalled:
(a) Null hypothesis (b) Alternative hypothesis
(c) Statistical hypothesis (d) Composite hypothesis
The Null hypothesis serves as counter-weight inorder to prove the alternative
hypothesis.
14. A statement about the value of a population parameter is called:
(a) Null hypothesis (b) Alternative hypothesis
(c) Simple hypothesis (d) Composite hypothesis
In the null hypothesis we do not have all the parameters so we try to
approximate it.
15. Any statement whose validity is tested on the basis of a sample is called:
(a) Null hypothesis (b) Alternative hypothesis
(c) Statistical hypothesis (b) Simple hypothesis
In the statistical hypothesis we receive most of the parameters, so we can
test a sample within those parameters.
16. A quantitative statement about a population iscalled:
(a) Research hypothesis (b) Composite hypothesis
(c) Simple hypothesis(d) Statistical hypothesis
A statistical hypothesis is an assumption about a population parameter
17. A statement that is accepted if the sample data provide sufficient evidence that
the nullhypothesis is false is called:
(a) Simple hypothesis (b) Composite hypothesis
(c) Statistical hypothesis(d) Alternative hypothesis
The alternative hypothesis is the one that we want prove
18. A hypothesis that specifies all the values of parameter is called:
(a) Simple hypothesis (b) Composite hypothesis (c) Statistical
hypothesis
(d) None of the above
19. A hypothesis may be classified as:
(a) Simple (b) Composite (c) Null (d) All of the above
The simple and the composite are types ofhypothesis based on the information used
in the statement.
20. The probability of rejecting the null hypothesiswhen it is true is called:
(a) Level of confidence (b) Level of significance (c) Power of the test
(d) Difficult to tellThe level of confidence is used to calculate thecritical value.
21. The dividing point between the region where the null hypothesis is rejected and
the region where itis not rejected is said to be:
(a) Critical region (b) Critical value (c) Acceptance region
(d) Significant regionThe critical value defines the regions ofacceptance and
rejection.
22. If the critical region is located equally in bothsides of the sampling distribution
of test-statistic,the test is called:
(a) One tailed (b) Two tailed (c) Right tailed
(d) Left tailedWe use two tail when our null hypothesis states anequality.
23. A rule or formula that provides a basis for testinga null hypothesis is called:
(a) Test-statistic (b) Population statistic (c) Both of these
(d) None of the above
24. Critical region is also called:
(a)Acceptance region (b) Rejection region (c) Confidence region d) Statistical region
The rejection region goes from the critical value to infinite.
25. The probability of rejecting Ho when it is false is called:
(a) Power of the test (b) Size of the test
(c)Level of confidence (d)Confidence coefficient
The power of a test is also called statistical power andit refers to the probability the test correctly
rejects thenull hypothesis
26. Suppose you are given an EM algorithm that finds maximum likelihood estimates for a
model with latent variables. You are asked to modify the algorithm so that it finds MAP estimates
instead. Which step or steps do you need to modify:
A. Expectation B. Maximization C. No modification necessary D. Both
UNIT-3
A regression model in which more than one independent variable is used to predict
the dependentvariable is called
A.a simple linear regression model B.a multiple regression models
C.an independent model D.none of the above
A term used to describe the case when the independent variables in a multiple
regression model arecorrelated is
A.regression B.correlation C.multicollinearity D.none of the
above
A multiple regression model has the form: y = 2 + 3x1 + 4x2. As x1 increases by 1 unit
(holding x2constant), y will
A.increase by 3 units B.decrease by 3 units
C.increase by 4 units D.decrease by 4 units
A multiple regression model has
A.only one independent variable B.more than one dependent variable
C.more than one independent variable D.none of the above
UNIT-4
1. Machine learning techniques differ from statistical techniques in that machine
learning methods
A.typically assume an underlying distribution for the data.
B.are better able to deal with missing and noisy data.
C.are not able to explain their behavior.
D.have trouble with large-sized datasets
2. We can get multiple local optimum solutions if we solve a linear regression problem by
minimizing the sum of squared errors using gradient descent.(True/ false)
3. When the feature space is larger, over fitting is more likely. (True/ false)
4. We can use gradient descent to learn a Gaussian Mixture Model. (True/ false)
As the number of training examples goes to infinity, your model trained on that data
will have:
A. Lower variance B. Higher variance C. Same variance
5. As the number of training examples goes to infinity, your model trained on that data
will have:
A. Lower bias B. Higher bias C. Same bias
UNIT-5
2Marks Questions
Machine learning relates with the study, design and development of the algorithms
that give computers the capability to learn without being explicitly
programmed. While, data mining can be defined as the process in which the
unstructured data tries to extract knowledge or unknown interesting patterns. During
this process machine, learning algorithms are used.
In machine learning, when a statistical model describes random error or noise instead
of underlying relationship ‘overfitting’ occurs. When a model is excessively complex,
overfitting is normally observed, because of having too many parameters with respect
to the number of training data types. The model exhibits poor performance which has
been overfit.
The possibility of overfitting exists as the criteria used for training the model is not the
same as the criteria used to judge the efficacy of a model.
The inductive machine learning involves the process of learning by examples, where
a system, from a set of observed instances tries to induce a general rule.
a) Decision Trees
c) Probabilistic networks
d) Nearest Neighbor
a) Supervised Learning
b) Unsupervised Learning
c) Semi-supervised Learning
d) Reinforcement Learning
e) Transduction
f) Learning to Learn
9) What are the three stages to build the hypotheses or model in machine learning?
a) Model building
b) Model testing
The standard approach to supervised learning is to split the set of example into the
training set and the test.
In various areas of information science like machine learning, a set of data is used to
discover the potentially predictive relationship known as ‘Training Set’. Training set is
an examples given to the learner, while Test set is used to test the accuracy of the
hypotheses generated by the learner, and it is the set of example held back from the
learner. Training set are distinct from Test set.
a) Artificial Intelligence
a) Classifications
b) Speech recognition
c) Regression
e) Annotate strings
17) What is the difference between artificial learning and machine learning?
a) Computer Vision
b) Speech Recognition
c) Data Mining
d) Statistics
e) Informal Retrieval
f) Bio-Informatics
Genetic programming is one of the two techniques used in machine learning. The
model is based on the testing and selecting the best choice among a set of results.
The process of selecting models among different mathematical models, which are
used to describe the same data set is known as Model Selection. Model selection is
applied to the fields of statistics, machine learning and data mining.
24) What are the two methods used for the calibration in Supervised Learning?
The two methods used for predicting good probabilities in Supervised Learning are
a) Platt Calibration
b) Isotonic Regression
These methods are designed for binary classification, and it is not trivial.
26) What is the difference between heuristic for rule learning and heuristics for decision
trees?
The difference is that the heuristics for decision trees evaluate the average quality of
a number of disjointed sets while rule learners only evaluate the quality of the set of
instances that is covered with the candidate rule.
27) What is Perceptron in Machine Learning?
Bayesian logic program consists of two components. The first component is a logical
one ; it consists of a set of Bayesian Clauses, which captures the qualitative structure
of the domain. The second component is a quantitative one, it encodes the
quantitative information about the domain.
Bayesian Network is used to represent the graphical model for probability relationship
among a set of variables .
30) Why instance based learning algorithm sometimes referred as Lazy learning
algorithm?
Instance based learning algorithm is also referred as Lazy learning algorithm as they
delay the induction or generalization process until classification is performed.
31) What are the two classification methods that SVM ( Support Vector Machine) can
handle?
Ensemble learning is used when you build component classifiers that are more
accurate and independent from each other.
36) What is the general principle of an ensemble method and what is bagging and
boosting in ensemble method?
The general principle of an ensemble method is to combine the predictions of several
models built with a given learning algorithm in order to improve robustness over a
single model. Bagging is a method in ensemble for improving unstable estimation or
classification schemes. While boosting method are used sequentially to reduce the
bias of the combined model. Boosting and Bagging both can reduce errors by
reducing the variance term.
The expected error of a learning algorithm can be decomposed into bias and
variance. A bias term measures how closely the average classifier produced by the
learning algorithm matches the target function. The variance term measures how
much the learning algorithm’s prediction fluctuates for different training sets.
Incremental learning method is the ability of an algorithm to learn from new data that
may be available after classifier has already been generated from already available
dataset.
In Machine Learning and statistics, dimension reduction is the process of reducing the
number of random variables under considerations and can be divided into feature
selection and feature extraction
Support vector machines are supervised learning algorithms used for classification
and regression analysis.
a) Data Acquisition
d) Query Type
e) Scoring Metric
f) Significance Test
43) What are the different methods for Sequential Supervised Learning?
44) What are the areas in robotics and information processing where sequential
prediction problem arises?
a) Imitation Learning
b) Structured prediction
PAC (Probably Approximately Correct) learning is a learning framework that has been
introduced to analyze learning algorithms and their statistical efficiency.
47) What are the different categories you can categorized the sequence learning
process?
a) Sequence prediction
b) Sequence generation
c) Sequence recognition
d) Sequential decision
a) Genetic Programming
b) Inductive Learning
50) Give a popular application of machine learning that you see on day to day basis?
51) Suppose we clustered a set of N data points using two different clustering algorithms:
k-means and Gaussian mixtures. In both cases we obtained 5 clusters and in both cases the
centers
of the clusters are exactly the same. Can 3 points that are assigned to different clusters in the
kmeans
solution be assigned to the same cluster in the Gaussian mixture solution? If no, explain. If
Solution:
Yes, k-means assigns each data point to a unique cluster based on its distance to the cluster
center. Gaussian mixture clustering gives soft (probabilistic) assignment to each data point.
Therefore, even if cluster centers are identical in both methods, if Gaussian mixture components
have large variances (components are spread around their center), points on the edges
between clusters may be given different assignments in the Gaussian mixture solution