Introduction
- The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions such as classifying the data into different categories.
- For example, in a digit detection algo
- We can use handcrafted rules/heuristics based on shape of strokes but this can be ambiguous and gives poor results.
- Instead we use ML wherein, a large set of digits make the training set is used to tune the parameters model. The category of digits is known in advance - labels
- The result of the algo is expressed as a function - y(x) which takes a digit - x and generates output vector - y, encoded in the same format as target vectors.
- The precise form of the fn is determined during training/learning phase after which it can be tested on a test set
- The ability to categorize correctly new examples that differ from those used for training is known as generalization which is a central goal in pattern recognition.
- We tend to preprocess the input also called feature extraction to improve the scalability and variability of the inputs. We also perform dimensionality reduction to improve the speed of processing.
- Types of ML methods
- Supervised - training data comprises of examples of input vectors with target vectors
- Goal is to predict target/outputs
- Classification - target is a discrete variable
- Regression - target is a continuous variable
- Unsupervised - training data comprises of examples of input vectors without target vectors
- Goal is to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization
- Reinforcement learning - finding suitable actions to take in a given situation in order to maximize a reward. Data is discovered by a process of trial and error.
Example - Polynomial Curve Fitting
- We use a regression problem wherein,
- Real valued input - x
- Real valued output/target - t
- Data generated from - sin(2\pi x) with random noise in target
- Training set - X \equiv (x_1, \ldots,x_N)^T
- Targets - T \equiv (t_1, \ldots,t_N)^T