Вы находитесь на странице: 1из 2

CSCI 5512: Articial Intelligence II (Spring13) Sample Final Exam

This is a closed book exam. You are allowed 2 sheets of notes.

1. (25 points) Consider a Hidden Markov Model (HMM) with hidden states X1:T and observed states or evidence variables E1:T . Consider the problem of nding the most likely sequence of states in the HMM, i.e., the sequence X1:T which maximizes P (X1:T |E1:T ). (a) (15 points) Describe an algorithm to nd the most likely sequence X1:T given an observation sequence E1:T . You have to clearly describe the key variables, equation(s) and computation involved, and discuss the time complexity of the algorithm. You do not have to prove correctness of the algorithm. (b) (10 points) Consider an alternative approach to the above problem: We nd P (Xt |E1:t ) which maximizes the distribution P (X |E ). based on ltering, and pick the value of Xt t 1:t to come up with a sequence We repeat the process for all t, and concatenate the values Xt . Will this alternative approach solve the problem of nding the of hidden states X1: T most likely sequence of states? If yes, clearly explain why. If not, clearly explain why not or give a counter-example which illustrates that it will not solve the problem. 2. (25 points) Consider a Markov Decision Process (MDP) with state transitions and utilities as shown in Figure 1. In particular, the state transitions occur as follows: for an attempted action to move in a certain direction, the move is successful with probability 0.8, with probability 0.1 the agent lands in the state to the left, and with probability 0.1 the agent lands in the state to the right (Figure 1(a)). (a) (5 points) Dene the utility of a state in a MDP. How is the optimal policy related to the utility? Explain using an appropriate equation. (b) (10 points) Describe the value iteration algorithm for obtaining the optimal policy. Explain why value iteration converges to a unique solution. If one needs utility estimates that are -close to the optimal utilities, i.e., U (t) U , how many iterations t of value iteration are sucient. (c) (10 points) Using the utilities given in Figure 1(b), outline a method to compute the optimal policy for the MDP. Is the optimal policy the same as trying to move to the highest utility neighbor? Explain why/why not. 3. (25 points) The binary XOR function has two boolean inputs [x1 , x2 ] and gives an output of y = T if odd number of inputs is T , and is F otherwise. The ternary XOR function has three boolean inputs [x1 , x2 , x3 ] and gives an output of y = T if odd number of inputs is T , and is F otherwise. We consider building classiers with F represented as -1, and T represented as 1, e.g., ([1, 1], 1) or ([1, 1], 1) are valid inputs ([x1 , x2 ], y ) for learning the binary XOR function.

Figure 1: A Markov Decision Process along with its utilities. (a) (13 points) While the binary XOR problem is not linearly separable in the original space [x1 , x2 ], one can construct a mapping 2 ([x1 , x2 ]) to a space where the binary XOR problem is linearly separable.1 Show that the mapping 2 ([x1 , x2 ]) = [x1 , x1 x2 ] makes the problem linearly separable, i.e., one can nd suitable weights (w1, w2 ) such that y = sign(w1 x1 + w2 x1 x2 ). (b) (12 points) The ternary XOR problem is not linearly separable in the original space [x1 , x2 , x3 ]. Can you construct a mapping 3 ([x1 , x2 , x3 ]) to a space2 where the ternary XOR problem is linearly separable? If yes, give an example of such a mapping 3 . If no, clearly explain why such a mapping is not possible. (25 points) Consider a neural network with linear activation function g (), i.e., g ( i wi xi .
i wi x i )

(a) (12 points) Assume that the network has one hidden layer. For a given assignment to the weights W , write down the value of the units in the output layer as a function of W and the input layer I , without any explicit mention of the output of the hidden layer. Show that there is a network with no hidden units that computes the same function. (b) (8 points) Repeat the above analysis in (a) above for a network with any xed number of hidden layers. (c) (5 points) Based on the relationship between the input and the output in such networks, what can you conclude about linear activation functions?

We have discussed such mappings in the context of kernel methods in class. The dimensionality of the mapped space can be higher, same, or lower than the original space [x1 , x2 , x3 ]. For the binary XOR example in (a), the mapped space has the same dimensionality as the original space [x1 , x2 ].
2

Вам также может понравиться