Вы находитесь на странице: 1из 3

Learning approximation error: how good is the hypothesis class?

f_w(x) = w · φ(x) hypothesis class= set of possible predictors with estimation error: how good is the learned predictor relative
a fixed φ(x) and varying w to the hypothesis class?
score on ex. (x, y) = w · φ(x), how confident we are in predicting +1 K-means: not guaranteed optimum: try finding soln with
margin = (w · φ(x))y, how correct we are
residual = (w · φ(x)) − y, the amount by which prediction overshoots lowest loss after mult. random iteratns, or use initialization
the target y heuristic K-means++: initialize centroids as training points
zero-one loss (binary) = regression loss fns= hinge
zero-one
logistic Guaranteed global optimal: Only guaranteed local optimal:
- value iteration - ICM
- backtracking search - Gibbs sampling
hinge loss: - dynamic programming - K-means
tips:
- time pressure: 1pt /min. move quickly!!
logistic regression: - underline problem details!
try to increase
margin when it - what are they asking for?
exceeds 1 - show all work + explanations
logistic function maps (− ∞, ∞) to [0, 1]: squared
absdev
MDP’s (offline - all data available from the start)
objective: find w that minimizes loss stochastic gradient descent: states: the set of states sstart ∈ States: starting state Actions(s): possible actions from state s
T(s,a,s′): probability of s′ if take action a in state s ((s,a) can randomly lead to s_1’ or s_2’…probabilistic)
Reward(s, a, s′): reward for the transition (s, a, s′) <= this is what we want to maximize
IsEnd(s): whether at end of game 0 ≤ γ ≤ 1: discount factor (default: 1)
MDP’s involve: generating episodes; evaluating policies (how good?=> value); choosing optimal policy (take max)
utility:
generalization: test error approx = training error
to reduce overfitting: keep dimensionality, hypothesis class small (remove features); norm ∥w∥ expectation: E[X] = sum( x * P(x) )
(regularization, early stopping) policy π - a mapping from each state s ∈ States to an
validation set - taken out of training data to estimate test set/error; used to tune hyperparameters, action a ∈ Actions(s)
choose features, choose hypothesis class
overfitting => high variance; underfitting => high bias; want just right (low bias + variance) value of a policy: Vπ(s) is the expected utility received by
following policy π from state s (labels state node)
Search Q-value: Qπ(s,a) be the expected utility of taking action a from state s, and then following policy π (labels chance node)
Define a search problem: • start: starting state
• Actions(s): possible actions • Cost(s, a): action cost
• Succ(s,a): successor • IsEnd(s): reached end state?
Search algorithms: (b= # actions/state; D = max depth;
d= # actions in soln); policy iteration - computes value of a policy:
N total states, n of which are closer than end state DP any O(N) O(N) *req: acyclic optimal value Vopt(s) = the maximum value
backtracking search - recursively tries all paths to attained by any policy.
UCS non-negative O(nlogn) O(nlogn)
find minimum cost path; will always find minimum value iteration - guaranteed to find
cost path but slow global optimal value (can extract optimal
depth-first search - backtracking except stop at first policy by extracting actions)
end state; finds a solution but disregards cost
breadth-first- search - explore all nodes in order of
increasing depth; faster than backtracking but more
space overhead bc need to maintain queue
dynamic programming - cache partial solns; assumes
graph is acyclic!
uniform cost search - explores states by increasing past cost
A* - UCS with modified edge cost to favor end states: PastCost(s) + h(s)

heuristic h(s) = any estimate of FutureCost(s)


consistent heuristic =

search problem P is relaxed iff Cost_rel(s, a) <= Cost(s, a)


coming up with heuristics: should be able to identify constraint you are removing to make the
problem relaxed (less constrained). ideas: knock down walls, walk/tram freely, overlap pieces
(closed form, easier search, independent subproblems) => to reduce edge cost
theorem - consistency of relaxed heuristics: Suppose h(s) = FutureCost_rel(s) for some relaxed
problem P_rel. Then h(s) is a consistent heuristic.
example: give Romeo and Juliet the option to not wait for each other at every city

Structured Perceptron - try to


decrease cost of true y (from training example problem (2016, #3):
data) • Try to increase cost of examples of transition probability
predicted y′ (from search)

example of
policy eval
Reinforcement Learning (online - input is fed incrementally)
which exploration policy to use? epsilon-greedy policy Games
s_start: starting state
Actions(s): possible actions from state s
- unlike MDP’s, involve maximizing reward without explicity knowledge of rewards / transitions Succ(s, a): resulting state if choose action a in state s IsEnd(s): whether s is an
end state (game over)
Utility(s): agent’s utility for end state s
Player(s) ∈ Players: player who controls state s
- model-based Monte Carlo(off-policy): use experience to estimate transitions and rewards, then pretend it’s an MDP
and use value iteration (important to explore all (s, a) pairs)

- model-free Monte Carlo (on-policy): estimate Q-values directly by generating episodes (by following a policy) and using
observations to update towards a better estimate

SARSA -(on-policy) alternative to model-free Monte Carlo; target is a combo of new data and estimate/prediction (model-free
Monte Carlo’s u is taken purely from data)

Q-learning (off-policy) - algorithm to estimate Q_opt in order to find optimal policy; can be used with model-free approaches

tradeoffs: greedy - does not find optimal policy; exploration - doesn’t learn + exploit - low utility;
vanilla Q-learning doesn’t generalize to unseen states + actions*
function approximation - paramaterizes Q-opt using weight and features
use Q-learning with func approximation to learn weights:
(make Q a model instead of a lookup table, anddescribe your
environment with features)

CSP’s
scope of a factor fj - set of variables it depends on
arity of fj - number of variables in the scope (binary: arity 2)
Bayes’ networks backtracking // choosing variable
Bayes formula: heuristics: most constrained variable-useful
when some factors are constraints
(we only save work if we can prune
assignments with zero weight)
least constrained: all the factors are constraints
forward-checking (one step lookahead)
Conditional probability: - after assigning a variable Xi, eliminate
inconsistent values from the domains
of Xi’s neighbors
- if any domain becomes empty,
don’t recurse.
Probabilistic inference strategies for P(Q = q | E = e): - when unassign Xi, restore
neighbors’ domains.
1) Bayes rule 2) Variable elimination (expensive: exponential O(2^n) ):
AC-3: pre-emptively prune
- Remove (marginalize) variables that are not ancestors of Q or E with lookahead
- Convert to factor graph by writing probs. between variables arc consistency- for each xi ∈ Domaini ,
- Eliminate intermediate probability by multiplying any dependent probs. together (f_3 = f_2 * f_1) there exists xj ∈ Domainj such that
* don’t forget to take sum of unassigned variables ( sum_a(a | b) ) f ({Xi : xi , Xj : xj }) =/= 0 for all factors f whose scope contains Xi and Xj .
- Continue until you have factor that only depends on query ( f(Q) ) EnforceArcConsistency(Xi,Xj): Remove values from Domaini to make Xi arc
- Calculate f(Q) for every possible value of q consistent with respect to Xj.
- Normalize result by calculating f(Q = q) / sum of f(q_i) for all poss. values q_i **AC-3 does not always find global solution.
3) forward-backward (HMM’s)
Sampling: estimate probabilities by sampling + counting from representative set
4) Gibbs sampling
5) Particle filtering
MLE estimation:
with Laplacian
smoothing:
hallucinate one
extra observation
expectation-
maximization
(not guaranteed
global optimum): example

Вам также может понравиться