In the context of Remote Sensing (RS) and Digital Image Processing (DIP), supervised classification is the process where an analyst defines "training sites" (Areas of Interest or ROIs) representing known land cover classes (e.g., Water, Forest, Urban). The computer then uses these training samples to teach an algorithm how to classify the rest of the image pixels.
The algorithms used to classify these pixels are generally divided into two broad categories: Parametric and Nonparametric decision rules.
Parametric Decision Rules
These algorithms assume that the pixel values in the training data follow a specific statistical distribution—almost always the Gaussian (Normal) distribution (the "Bell Curve").
Key Concept: They model the data using statistical parameters: the Mean vector ($\mu$) and the Covariance matrix ($\Sigma$).
Analogy: Imagine trying to fit a smooth hill over your data points. If a new point lands high up on the hill, it belongs to that class.
Nonparametric Decision Rules
These algorithms make no assumptions about the statistical distribution of the data. They do not care if the data fits a bell curve.
Key Concept: They classify based on discrete geometric shapes (polygons, boxes) or the relative position of the data points themselves.
Analogy: Imagine drawing a literal box or fence around your data points. If a new point falls inside the fence, it belongs to that class.
A. Minimum-Distance-to-Means (MDM)
Classification: Generally considered a simple Parametric classifier (as it relies on the mean parameter), though it operates geometrically.
How it works:
The algorithm calculates the spectral mean vector (the center point or centroid) for each training class.
For every unclassified pixel in the image, it calculates the Euclidean distance to the mean of every class.
The pixel is assigned to the class with the shortest distance.
Pros: Very fast computationally; mathematically simple.
Cons: It is insensitive to the variance (spread) of the data.
Example: If "Urban" data is very scattered (high variance) and "Water" is very tight (low variance), a pixel far from the Urban center might actually belong to Urban, but MDM might classify it as Water just because the Water mean is slightly closer geometrically.
B. Parallelepiped Classification
Classification: Nonparametric.
How it works:
The algorithm looks at the training data and finds the minimum and maximum brightness values for each band.
It creates a rectangular box (a parallelepiped in multi-dimensional space) defined by these limits.
If a pixel's value falls within the box, it is assigned to that class.
Pros: Extremely fast; easy to understand conceptually.
Cons:
The Correlation Problem: Real remote sensing data (like vegetation in Red vs. NIR bands) is often correlated (diagonal distribution). A rectangular box cannot fit a diagonal data cloud efficiently, leading to large "empty corners" in the box that capture noise/wrong pixels.
Overlapping: Pixels often fall into the overlapping area of two boxes, leaving the computer unable to decide.
C. Gaussian Maximum Likelihood (GML/MLC)
Classification: Parametric (The standard industry workhorse).
How it works:
It assumes the data for each class is normally distributed.
It uses both the Mean vector AND the Covariance matrix to calculate the probability density function.
It calculates the statistical probability of a pixel belonging to each class.
It constructs ellipsoidal equiprobability contours (rather than circles or boxes).
Pros: Highly accurate because it accounts for the variance (spread) and covariance (correlation/direction) of the data. It handles "diagonal" data clouds perfectly.
Cons: Computationally expensive (slow on massive images); requires a large number of training pixels per class to compute a stable covariance matrix (usually $10N$ to $100N$ pixels, where $N$ is the number of bands).
| Feature | Parallelepiped | Minimum Distance | Maximum Likelihood |
| Type | Nonparametric | Parametric (Simple) | Parametric (Advanced) |
| Geometry | Rectangular Boxes | Circles/Spheres | Ellipsoids |
| Assumptions | None (Min/Max thresholds) | Mean Center Point | Gaussian Distribution |
| Speed | Very Fast | Fast | Slow / Intensive |
| Accuracy | Low to Moderate | Moderate | High |
| Best Used For | Quick looks; Uncorrelated data | Well-separated classes | Complex, correlated data |
Comments
Post a Comment