Вы находитесь на странице: 1из 12

UNIT 5 PATTERN RECOGNITION A pattern is the opposite of a chaos; it is an entity vaguely defined, that could be given a name.

A pattern is an abstract object, such as a set of measurements describing a physical object.


Pattern recognition is the scientific discipline whose goal is the classification of objects into a number of categories or classes. Depending on the application, these objects can be images or signal waveforms or any type of measurements that need to be classified. In machine learning, pattern recognition is the assignment of some sort of output value (or label) to a given input value (or instance), according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes (for example, determine whether a given email is "spam" or "non-spam"). However, pattern recognition is a more general problem that encompasses other types of output as well.

Pattern recognition is a very active field of research intimately bound to machine learning and data mining. Also known as classification or statistical classification, pattern recognition aims at building a classifier that can determine the class of an input pattern. An input could be the ZIP code on an envelope, a satellite image, microarray gene expression data, a chemical signature of an oil-field probe, a financial record of a company and many more. The classifier may take a form of a function, an algorithm, a set of rules, etc. Pattern recognition is about training such classifiers to do tasks that could be tedious, dangerous, infeasible, impractical, expensive or simply difficult for humans. Pattern recognition faces many challenges in the modern era of massive data collection (e.g. in retail, communication and Internet) and high demand for precision and speed (e.g. in security monitoring and target tracking.

STEPS OF PATTERN RECOGNITION

PARAMETER ESTIMATION
Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements. In estimation theory, it is assumed the measured data is random with probability distribution dependent on the parameters of interest. For example, in electrical communication theory, the measurements which contain information regarding the parameters of interest are often associated with a noisy signal. Without randomness, or noise, the problem would be deterministic and estimation would not be needed.

We can design an optimal classier if we knew the prior probabilities P(i) and the class-conditional densities p(x|i). Unfortunately, in pattern recognition applications we rarely if ever have this kind of complete knowledge about the probabilistic structure of the problem. In a typical case we merely have some vague, general knowledge about the situation, together with a number of design samples or training data particular representatives of the patterns we want to training data classify. The problem, then, is to nd some way to use this information to design or train the classier. One approach to this problem is to use the samples to estimate the unknown probabilities and probability densities, and to use the resulting estimates as if they were the true values. In typical supervised pattern classication problems, the estimation of the prior probabilities presents no serious diculties. However, estimation of the class-conditional densities is quite another matter. The problem of parameter estimation is a classical one in statistics, and it can be approached in several ways. We shall consider two common and reasonable procedures, maximum likelihood estimation and Bayesian estimation. Although the results obtained with these two procedures are frequently nearly identical, the approaches are conceptually quite dierent.

One is computational complexity and here maximum likelhood methods are often to be preferred since they require merely dierential calculus techniques or gradient search for , rather than a possibly complex multidimensional integration needed in Bayesian estimation. Interpretability. In many cases the maximum likelihood solution will be easier to interpret and understand since it returns the single best model from the set the designer provided (and presumably understands).In contrast Bayesian methods give a weighted average of models (parameters), often leading to solutions more complicated and harder to understand than those provided by the designer. The Bayesian approach reects the remaining uncertainty in the possible models.

Maximum likelihood In statistics, maximum-likelihood estimation (MLE) is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters. The method of maximum likelihood corresponds to many well-known estimation methods in statistics. For example, one may be interested in the heights of adult female giraffes, but be unable due to cost or time constraints, to measure the height of every single giraffe in a population. Assuming that the heights are normally (Gaussian) distributed with some unknown mean and variance, the mean and variance can be estimated with MLE while only knowing the heights of some sample of the overall population. MLE would accomplish this by taking the mean and variance as parameters and finding particular parametric values that make the observed results the most probable (given the model). Parameters are fixed but unknown Best parameters obtained by maximizing probability of obtaining samples observed z Has good convergence properties as sample size increases z Simpler than any other alternative techniques

The General Principle Suppose that we separate a collection of samples according to class, so that we have c sets, D1, ...,Dc, with the samples in Dj having been drawn independently according to the probability law p(x|j ). We say such samples are i.i.d. independent identically i.i.d. distributed random variables. We assume that p(x|j ) has a known parametric form,and is therefore determined uniquely by the value of a parameter vector j .For example, we might have p(x|j ) N(j ,j ), where j consists of the components of j and j . To show the dependence of p(x|j )on j explicitly, we write p(x|j )as p(x|j , j ). Our problem is to use the information provided by the training samples to obtain good estimates for the unknown parameter vectors 1, ..., c associated with each category. To simplify treatment of this problem, we shall assume that samples in Di give no information about j if i = j that is, we shall assume that the parameters for the dierent classes are functionally independent. This permits us to work with each class separately, and to simplify our notation by deleting indications of class distinctions. With this assumption we thus have c separate problems of the following form:Use a set D of training samples drawn independently from the probability density p(x|)to estimate the unknown parameter vector . Use n training samples in a class to estimate If D contains n independently drawn samples, x1, x2,, xn

ML estimate of is, by definition the value that maximizes p(D | ) It is the value of that best agrees with the actually observed training samples
[3]

The maximum-likelihood estimator has essentially no optimal properties for finite samples. However, the maximum-likelihood estimator possesses a number of attractive asymptotic properties, for many problems; these asymptotic properties include: Consistency: the estimator converges in probability to the value being estimated. Asymptotic normality: as the sample size increases, the distribution of the MLE tends to the Gaussian distribution with mean and covariance matrix equal to the inverse of the Fisher information matrix. Efficiency, i.e., it achieves the CramrRao lower bound when the sample size tends to infinity. This means that no asymptotically unbiased estimator has lower asymptotic mean squared error than the MLE. Second-order efficiency after correction for bias.

Applications Maximum likelihood estimation is used for a wide range of statistical models, including: linear models and generalized linear models; exploratory and confirmatory factor analysis; structural equation modeling; many situations in the context of hypothesis testing and confidence interval formation; discrete choice models.

These uses arise across applications in widespread set of fields, including: communication systems; econometrics; data modeling in nuclear and particle physics; magnetic resonance imaging;

INTRODUCTION OF CLUSTER TECHNIQUES

A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. An HMM can be considered as the simplest dynamic Bayesian network. In a regular Markov model, the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states. Note that the adjective 'hidden' refers to the state sequence through which the model passes, not to the parameters of the model; even if the model parameters are known exactly, the model is still 'hidden'.

We continue to assume that at every time step t the system is in a state (t) but now we also assume that it emits some (visible) symbol v(t). While sophisticated Markov models allow for the emission of continuous functions (e.g., spectra), we will restrict ourselves to the case where a discrete symbol is emitted. As with the states, we dene a particular sequence of such visible states as VT = {v(1),v(2), ..., v(T)} and thus we might have V6 = {v5,v1,v1,v5,v2,v3}. Our model is then that in any state (t) we have a probability of emitting a particular visible state vk(t). We denote this probability P(vk(t)|j (t)) = bjk. Because we have access only to the visible states, while the i are unobservable, such a full model is called a hidden Markov model.

A concrete example
Consider two friends, Alice and Bob, who live far apart from each other and who talk together daily over the telephone about what they did that day. Bob is only interested in three activities: walking in the park, shopping, and cleaning his apartment. The choice of what to do is determined exclusively by the weather on a given day. Alice has no definite information about the weather where Bob lives, but she knows general trends. Based on what Bob tells her he did each day, Alice tries to guess what the weather must have been like. Alice believes that the weather operates as a discrete Markov chain. There are two states, "Rainy" and "Sunny", but she cannot observe them directly, that is, they are hidden from her. On each day, there is a certain chance that Bob will perform one of the following activities, depending on the weather: "walk", "shop", or "clean". Since Bob tells Alice about his activities, those are the observations. The entire system is that of a hidden Markov model (HMM). states = ('Rainy', 'Sunny')

observations = ('walk', 'shop', 'clean')

start_probability = {'Rainy': 0.6, 'Sunny': 0.4}

transition_probability = { 'Rainy' : {'Rainy': 0.7, 'Sunny': 0.3}, 'Sunny' : {'Rainy': 0.4, 'Sunny': 0.6}, }

emission_probability = { 'Rainy' : {'walk': 0.1, 'shop': 0.4, 'clean': 0.5}, 'Sunny' : {'walk': 0.6, 'shop': 0.3, 'clean': 0.1}, }

In this example, there is only a 30% chance that tomorrow will be sunny if today is rainy. Theemission_probability represents how likely Bob is to perform a certain activity on each day. If it is rainy, there is a 50% chance that he is cleaning his apartment; if it is sunny, there is a 60% chance that he is outside for a walk.

Knowledge representation (KR) is an area of artificial intelligence research aimed at representing knowledge in symbols to facilitate inferencing from those knowledge elements, creating new elements of knowledge. The KR can be made to be independent of the underlying knowledge model or knowledge base system (KBS) such as a semantic network.

Knowledge Representation (KR) research involves analysis of how to reason accurately and effectively and how best to use a set of symbols to represent a set of facts within a knowledge domain. A symbol vocabulary and a system of logic are combined to enable inferences about elements in the KR to create new KR sentences. Logic is used to supply formal semantics of how reasoning functions should be applied to the symbols in the KR system. Logic is also used to define how operators can process and reshape the knowledge. Examples of operators and operations include, negation, conjunction, adverbs, adjectives, quantifiers and modal operators. The logic is interpretation theory. These elements--symbols, operators, and interpretation theory--are what give sequences of symbols meaning within a KR.

A knowledge representation (KR) is most fundamentally a surrogate, a substitute for the thing itself, used to enable an entity to determine consequences by thinking rather than acting, i.e., by reasoning about the world rather than taking action in it. It is a set of ontological commitments, i.e., an answer to the question: In what terms should I think about the world? It is a fragmentary theory of intelligent reasoning, expressed in terms of three components: (i) the representation's fundamental conception of intelligent reasoning; (ii) the set of inferences the representation sanctions; and (iii) the set of inferences it recommends. It is a medium for pragmatically efficient computation, i.e., the computational environment in which thinking is accomplished. One contribution to this pragmatic efficiency is supplied by the guidance a representation provides for organizing information so as to facilitate making the recommended inferences. It is a medium of human expression, i.e., a language in which we say things about the world."

Some issues that arise in knowledge representation from an AI perspective are:

How do people represent knowledge? What is the nature of knowledge? Should a representation scheme deal with a particular domain or should it be general purpose? How expressive is a representation scheme or formal language? Should the scheme be declarative or procedural?

Characteristics
A good knowledge representation covers six basic characteristics:

Coverage, which means the KR covers a breadth and depth of information. Without a wide coverage, the KR cannot determine anything or resolve ambiguities. Understandable by humans. KR is viewed as a natural language, so the logic should flow freely. It should support modularity and hierarchies of classes (Polar bears are bears, which are animals). It should also have simple primitives that combine in complex forms. Consistency. If John closed the door, it can also be interpreted as the door was closed by John. By being consistent, the KR can eliminate redundant or conflicting knowledge. Efficient Easiness for modifying and updating. Supports the intelligent activity which uses the knowledge base

Вам также может понравиться