# Bayes Error Rate Wiki

## Contents |

The principle axes of these contours are given by the eigenvectors of S, where the eigenvalues determine the lengths of these axes. Figure 4.22: The contour lines and decision boundary from Figure 4.21 Figure 4.23: Example of parabolic decision surface. Expansion of the quadratic form (x -µi)TS-1(x -µi) results in a sum involving a quadratic term xTS-1x which here is independent of i. Rearranging these leads us to the answer to our question, which is called Bayes formula: navigate here

The Bayes decision rule to minimize risk calls for selecting the action that minimizes the conditional risk. The decision boundary is a line orthogonal to the line joining the two means. This leads to the requirement that the quadratic form wTSw never be negative. This means that the decision boundary will tilt vertically.

## Bayes Error Rate In R

If this is true for some class i then the covariance matrix for that class will have identical diagonal elements. Regardless of whether the prior probabilities are equal or not, it is not actually necessary to compute distances. Thus, it does not work well depending upon the values of the prior probabilities.

The effect of any **decision rule is to divide** the feature space into c decision boundaries, R1,…, Rc. As in the univariate case, this is equivalent to determining the region for which gi(x) is the maximum of all the discriminant functions. In Figure 4.17, the point P is at actually closer euclideanly to the mean for the orange class. Thomas Bayes Wiki The loss function states exactly how costly each action is, and is used to convert a probability determination into a decision.

If action ai is taken and the true state of nature is wj then the decision is correct if i=j and in error if iąj. Bayes Error Rate Example Expansion of the quadratic form yields Thus, we obtain the simple discriminant functions Figure 4.12: Since the bivariate normal densities have diagonal covariance matrices, their contours are spherical in shape. But as can be seen by the ellipsoidal contours extending from each mean, the discriminant function evaluated at P is smaller for class 'apple' than it is for class 'orange'.

Instead of having shperically shaped clusters about our means, the shapes may be any type of hyperellipsoid, depending on how the features we measure relate to each other. Wiki Bayes Rule The decision boundary is not orthogonal to the red line. When normal distributions are plotted that have a diagonal covariance matrix that is just a constant multplied by the identity matrix, their cluster points about the mean are shperical in shape. If the true state of nature is wj by definition, we will incur the loss l(ai|wj).

## Bayes Error Rate Example

If you observe some feature vector of color and weight that is just a little closer to the mean for oranges than the mean for apples, should the observer classify the To classify a feature vector x, measure the Euclidean distance from each x to each of the c mean vectors, and assign x to the category of the nearest mean. Bayes Error Rate In R The fact that the decision boundary is not orthogonal to the line joining the 2 means, is the only thing that seperates this situation from case 1. Optimal Bayes Error Rate Clearly, the choice of discriminant functions is not unique.

For example, suppose that you are again classifying fruits by measuring their color and weight. check over here If the distribution happens to be Gaussian, then the transformed vectors will be statistically independent. Even in one dimension, for arbitrary variance the decision regions need not be simply connected (Figure 4.20). Figure 4.13: Two bivariate normal distributions, whose priors are exactly the same. Naive Bayes Classifier Error Rate

The discriminant functions cannot be simplified and the only term that can be dropped from eq.4.41 is the (d/2) ln 2p term, and the resulting discriminant functions are inherently quadratic. Because P(wj|x) is the probability that the true state of nature is wj, the expected loss associated with taking action ai is Then this boundary can be written as: his comment is here We might for instance use a lightness measurement x to improve our classifier.

Figure 4.9: The covariance matrix for two features x and y do not co-vary, but feature x varies more than feature y. Wiki Bayes Factor When this happens, the optimum decision rule can be stated very simply: the decision rule is based entirely on the distance from the feature vector x to the different mean vectors. Moreover, in some problems it enables us to predict the error we will get when we generalize to novel patterns.

## The contour lines are stretched out in the x direction to reflect the fact that the distance spreads out at a lower rate in the x direction than it does in

A., Vetterling W. Given the covariance matrix S of **a Gaussian distribution, the eigenvectors of** S are the principal directions of the distribution, and the eigenvalues are the variances of the corresponding principal directions. One of the most useful is in terms of a set of discriminant functions gi(x), i=1,…,c. Bayes Wikipedia However, both densities show the same elliptical shape.

Finally, suppose that the variance for the colour and weight features is the same in both classes. The decision boundaries for these discriminant functions are found by intersecting the functions gi(x) and gj(x) where i and j represent the 2 classes with the highest a posteriori probabilites. Does the tilting of the decision boundary from the orthogonal direction make intuitive sense? weblink Allowing actions other than classification as {a1…aa} allows the possibility of rejection-that is, of refusing to make a decision in close (costly) cases.

If we are forced to make a decision about the type of fish that will appear next just by using the value of the prior probahilities we will decide w1 if Similarly, as the variance of feature 1 is increased, the y term in the vector will decrease, causing the decision boundary to become more horizontal.