Skip to main content

Supervised Classification

In the context of Remote Sensing (RS) and Digital Image Processing (DIP), supervised classification is the process where an analyst defines "training sites" (Areas of Interest or ROIs) representing known land cover classes (e.g., Water, Forest, Urban). The computer then uses these training samples to teach an algorithm how to classify the rest of the image pixels.

The algorithms used to classify these pixels are generally divided into two broad categories: Parametric and Nonparametric decision rules.


Parametric Decision Rules

These algorithms assume that the pixel values in the training data follow a specific statistical distribution—almost always the Gaussian (Normal) distribution (the "Bell Curve").

  • Key Concept: They model the data using statistical parameters: the Mean vector ($\mu$) and the Covariance matrix ($\Sigma$).

  • Analogy: Imagine trying to fit a smooth hill over your data points. If a new point lands high up on the hill, it belongs to that class.

Nonparametric Decision Rules

These algorithms make no assumptions about the statistical distribution of the data. They do not care if the data fits a bell curve.

  • Key Concept: They classify based on discrete geometric shapes (polygons, boxes) or the relative position of the data points themselves.

  • Analogy: Imagine drawing a literal box or fence around your data points. If a new point falls inside the fence, it belongs to that class.


A. Minimum-Distance-to-Means (MDM)

  • Classification: Generally considered a simple Parametric classifier (as it relies on the mean parameter), though it operates geometrically.

  • How it works:

    1. The algorithm calculates the spectral mean vector (the center point or centroid) for each training class.

    2. For every unclassified pixel in the image, it calculates the Euclidean distance to the mean of every class.

    3. The pixel is assigned to the class with the shortest distance.


  • Pros: Very fast computationally; mathematically simple.

  • Cons: It is insensitive to the variance (spread) of the data.

    • Example: If "Urban" data is very scattered (high variance) and "Water" is very tight (low variance), a pixel far from the Urban center might actually belong to Urban, but MDM might classify it as Water just because the Water mean is slightly closer geometrically.

B. Parallelepiped Classification

  • Classification: Nonparametric.

  • How it works:

    1. The algorithm looks at the training data and finds the minimum and maximum brightness values for each band.

    2. It creates a rectangular box (a parallelepiped in multi-dimensional space) defined by these limits.

    3. If a pixel's value falls within the box, it is assigned to that class.

  • Pros: Extremely fast; easy to understand conceptually.

  • Cons:

    • The Correlation Problem: Real remote sensing data (like vegetation in Red vs. NIR bands) is often correlated (diagonal distribution). A rectangular box cannot fit a diagonal data cloud efficiently, leading to large "empty corners" in the box that capture noise/wrong pixels.

    • Overlapping: Pixels often fall into the overlapping area of two boxes, leaving the computer unable to decide.

C. Gaussian Maximum Likelihood (GML/MLC)

  • Classification: Parametric (The standard industry workhorse).

  • How it works:

    1. It assumes the data for each class is normally distributed.

    2. It uses both the Mean vector AND the Covariance matrix to calculate the probability density function.

    3. It calculates the statistical probability of a pixel belonging to each class.

    4. It constructs ellipsoidal equiprobability contours (rather than circles or boxes).

  • Pros: Highly accurate because it accounts for the variance (spread) and covariance (correlation/direction) of the data. It handles "diagonal" data clouds perfectly.

  • Cons: Computationally expensive (slow on massive images); requires a large number of training pixels per class to compute a stable covariance matrix (usually $10N$ to $100N$ pixels, where $N$ is the number of bands).


FeatureParallelepipedMinimum DistanceMaximum Likelihood
TypeNonparametricParametric (Simple)Parametric (Advanced)
GeometryRectangular BoxesCircles/SpheresEllipsoids
AssumptionsNone (Min/Max thresholds)Mean Center PointGaussian Distribution
SpeedVery FastFastSlow / Intensive
AccuracyLow to ModerateModerateHigh
Best Used ForQuick looks; Uncorrelated dataWell-separated classesComplex, correlated data


Comments

Popular posts from this blog

GIS data continuous discrete ordinal interval ratio

In Geographic Information Systems (GIS) , data is categorized based on its nature (discrete or continuous) and its measurement scale (nominal, ordinal, interval, or ratio). These distinctions influence how the data is collected, analyzed, and visualized. Let's break down these categories with concepts, terminologies, and examples: 1. Discrete Data Discrete data is obtained by counting distinct items or entities. Values are finite and cannot be infinitely subdivided. Characteristics : Represent distinct objects or occurrences. Commonly represented as vector data (points, lines, polygons). Values within a range are whole numbers or categories. Examples : Number of People : Counting individuals on a train or in a hospital. Building Types : Categorizing buildings as residential, commercial, or industrial. Tree Count : Number of trees in a specific area. 2. Continuous Data Continuous data is obtained by measuring phenomena that can take any value within a range...

History of GIS

The history of Geographic Information Systems (GIS) is rooted in early efforts to understand spatial relationships and patterns, long before the advent of digital computers. While modern GIS emerged in the mid-20th century with advances in computing, its conceptual foundations lie in cartography, spatial analysis, and thematic mapping. Early Roots of Spatial Analysis (Pre-1960s) One of the earliest documented applications of spatial analysis dates back to  1832 , when  Charles Picquet , a French geographer and cartographer, produced a cholera mortality map of Paris. In his report  Rapport sur la marche et les effets du cholĂ©ra dans Paris et le dĂ©partement de la Seine , Picquet used graduated color shading to represent cholera deaths per 1,000 inhabitants across 48 districts. This work is widely regarded as an early example of choropleth mapping and thematic cartography applied to epidemiology. A landmark moment in the history of spatial analysis occurred in  1854 , when  John Snow  inv...

Disaster Management

1. Disaster Risk Analysis → Disaster Risk Reduction → Disaster Management Cycle Disaster Risk Analysis is the first step in managing disasters. It involves assessing potential hazards, identifying vulnerable populations, and estimating possible impacts. Once risks are identified, Disaster Risk Reduction (DRR) strategies come into play. DRR aims to reduce risk and enhance resilience through planning, infrastructure development, and policy enforcement. The Disaster Management Cycle then ensures a structured approach by dividing actions into pre-disaster, during-disaster, and post-disaster phases . Example Connection: Imagine a coastal city prone to cyclones: Risk Analysis identifies low-lying areas and weak infrastructure. Risk Reduction includes building seawalls, enforcing strict building codes, and training residents for emergency situations. The Disaster Management Cycle ensures ongoing preparedness, immediate response during a cyclone, and long-term recovery afterw...

Representation of Spatial and Temporal Relationships

Geographical Information System (GIS) is a powerful tool for analyzing and visualizing spatial data. One of the key features of GIS is its ability to represent spatial and temporal relationships between different geographic features. Spatial relationships refer to the physical location of an object or feature in relation to other objects or features, while temporal relationships refer to the sequence or timing of events. Together, these relationships are essential for understanding and analyzing complex spatial and temporal data. Representation of Spatial Relationships in GIS: Spatial relationships in GIS can be represented using a variety of techniques such as distance, proximity, and topology. For example, distance-based relationships can be used to measure the distance between two points, while proximity-based relationships can be used to determine which objects or features are closest to one another. Topology-based relationships can be used to represent the connectivity between dif...

How to find drugs against the Corona. Covid 19

FOR SCIENTISTS (and others interested): How to find drugs against the coronavirus: First clues on how we can beat COVID-19. This shows the many ways we can interfere with its replication cycle by repurposing existing drugs - summarized in today's Science journal. LINK TO ARTICLE:  https://science.sciencemag.org/content/367/6485/1412 .... Vineesh V Assistant Professor of Geography, Directorate of Education, Government of Kerala. https://g.page/vineeshvc