Skip to main content

WHEN TO USE WHAT STATISTICAL TEST IN RESEARCH

There are several statistical test types for analyzing Research Data. When to use what is often the challenge. This piece provides a simplification 

1️⃣t-test:

- Use when: You want to compare the means of two groups to determine if there's a significant difference.
- Example: You want to compare the average score of students who received traditional teaching vs. those who received innovative teaching.

2️⃣ANOVA (Analysis of Variance):

- Use when: You want to compare the means of three or more groups to determine if there are significant differences.
- Example: You want to compare the average score of students from different schools to determine if there are significant differences in their performance.

3️⃣Regression (Simple and Multiple):

- Use when: You want to examine the relationship between a dependent variable and one or more independent variables.
- Example: You want to examine the relationship between hours studied and exam scores (simple regression), or the relationship between hours studied, exam scores, and student motivation (multiple regression).

4️⃣Chi-squared test:

- Use when: You want to determine if there's a significant association between two categorical variables.
- Example: You want to determine if there's a significant association between smoking and lung cancer.

5️⃣Wilcoxon rank-sum test (Mann-Whitney U test):

- Use when: You want to compare the distributions of two independent groups.
- Example: You want to compare the distribution of scores between students who received traditional teaching and those who received innovative teaching.

6️⃣Kruskal-Wallis H test:

- Use when: You want to compare the distributions of three or more independent groups.
- Example: You want to compare the distribution of scores among students from different schools.

7️⃣Friedman test:

- Use when: You want to compare the distributions of three or more related groups.
- Example: You want to compare the distribution of scores among students at different time points.

8️⃣Pearson correlation coefficient:

- Use when: You want to examine the linear relationship between two continuous variables.
- Example: You want to examine the relationship between hours studied and exam scores.

9️⃣Spearman rank correlation coefficient:

- Use when: You want to examine the relationship between two variables when data is not normally distributed.
- Example: You want to examine the relationship between ranking of favorite foods and ranking of nutritional value.

🔟Kendall's tau correlation coefficient:

- Use when: You want to examine the relationship between two variables when data is ordinal or categorical.
- Example: You want to examine the relationship between socioeconomic status and education level.

1️⃣1️⃣ARIMA models:

- Use when: You want to forecast future values in a time series data.
- Example: You want to predict stock prices based on past trends.

1️⃣2️⃣Exponential smoothing (ES):

- Use when: You want to forecast future values in a time series data with a simple exponential smoothing method.
- Example: You want to predict sales based on past trends.

1️⃣3️⃣Seasonal decomposition:

- Use when: You want to decompose time series data into trend, seasonality, and residuals.
- Example: You want to analyze website traffic data to identify seasonal patterns.

1️⃣4️⃣Kaplan-Meier estimator:

- Use when: You want to estimate the survival function of a population.
- Example: You want to analyze the survival rate of patients with a specific disease.

1️⃣5️⃣Cox proportional hazards model:

- Use when: You want to examine the relationship between covariates and survival time.
- Example: You want to investigate the effect of treatment on survival time.

1️⃣6️⃣Log-rank test:

- Use when: You want to compare the survival curves of two or more groups.
- Example: You want to compare the survival rates of patients with different treatments.

1️⃣7️⃣K-means clustering:

- Use when: You want to group similar observations into clusters based on features.
- Example: You want to segment customers based on buying behavior.

1️⃣8️⃣Hierarchical clustering:

- Use when: You want to group similar observations into clusters based on features, with a hierarchical structure.
- Example: You want to analyze gene expression data to identify clusters of genes.

1️⃣9️⃣DBSCAN (density-based spatial clustering of applications with noise):

- Use when: You want to group similar observations into clusters based on features, with noise handling.
- Example: You want to analyze spatial data to identify clusters of high density.

2️⃣0️⃣Principal component analysis (PCA):

- Use when: You want to reduce the dimensionality of a dataset by identifying principal components.
- Example: You want to analyze stock prices to identify principal components of variation.

2️⃣1️⃣Discriminant analysis:

- Use when: You want to predict group membership based on multivariate data.
- Example: You want to predict customer churn based on usage patterns.

2️⃣2️⃣Canonical correlation analysis:

- Use when: You want to examine the relationship between two sets of multivariate data.
- Example: You want to investigate the relationship between personality traits and behavior.

2️⃣3️⃣Bayesian inference:

- Use when: You want to update probabilities based on new data.
- Example: You want to update the probability of a hypothesis based on new evidence.

2️⃣4️⃣Bayesian regression:

- Use when: You want to model the relationship between variables using Bayesian methods.
- Example:

2️⃣5️⃣Bayesian networks:

- Use when: You want to model complex relationships between variables using Bayesian methods.
- Example: You want to model the relationship between genes and diseases.

2️⃣6️⃣Decision trees:

- Use when: You want to classify observations based on a tree-like model.
- Example: You want to predict customer churn based on usage patterns.

2️⃣7️⃣Random forests:

- Use when: You want to classify observations based on an ensemble of decision trees.
- Example: You want to predict disease diagnosis based on symptoms.

2️⃣8️⃣Support vector machines (SVMs):

- Use when: You want to classify observations based on a hyperplane.
- Example: You want to predict customer churn based on usage patterns.

2️⃣9️⃣Cluster analysis:

- Use when: You want to group similar observations into clusters based on features.
- Example: You want to segment customers based on buying behavior.

3️⃣0️⃣Factor analysis:

- Use when: You want to reduce the dimensionality of a dataset by identifying underlying factors.
- Example: You want to analyze survey data to identify underlying factors of satisfaction.

3️⃣1️⃣Survival analysis:

- Use when: You want to analyze the time-to-event data.
- Example: You want to analyze the survival rate of patients with a specific disease.

3️⃣2️⃣Time-series analysis:

- Use when: You want to analyze data that is ordered in time.
- Example: You want to analyze stock prices to identify patterns and trends.

3️⃣3️⃣Non-parametric tests:

- Use when: You want to analyze data without assuming a specific distribution.
- Example: You want to compare the median scores of students who received traditional teaching vs. those who received innovative teaching.

3️⃣4️⃣Machine learning algorithms:

- Use when: You want to predict outcomes or classify observations based on large datasets.
- Example: You want to predict customer churn based on usage patterns.

The specific test or technique used depends on the research question, data type, and study design.




Comments

Popular posts from this blog

Supervised Classification

Image Classification in Remote Sensing Image classification in remote sensing involves categorizing pixels in an image into thematic classes to produce a map. This process is essential for land use and land cover mapping, environmental studies, and resource management. The two primary methods for classification are Supervised and Unsupervised Classification . Here's a breakdown of these methods and the key stages of image classification. 1. Types of Classification Supervised Classification In supervised classification, the analyst manually defines classes of interest (known as information classes ), such as "water," "urban," or "vegetation," and identifies training areas —sections of the image that are representative of these classes. Using these training areas, the algorithm learns the spectral characteristics of each class and applies them to classify the entire image. When to Use Supervised Classification:   - You have prior knowledge about the c...

History of GIS

1. 1832 - Early Spatial Analysis in Epidemiology:    - Charles Picquet creates a map in Paris detailing cholera deaths per 1,000 inhabitants.    - Utilizes halftone color gradients for visual representation. 2. 1854 - John Snow's Cholera Outbreak Analysis:    - Epidemiologist John Snow identifies cholera outbreak source in London using spatial analysis.    - Maps casualties' residences and nearby water sources to pinpoint the outbreak's origin. 3. Early 20th Century - Photozincography and Layered Mapping:    - Photozincography development allows maps to be split into layers for vegetation, water, etc.    - Introduction of layers, later a key feature in GIS, for separate printing plates. 4. Mid-20th Century - Computer Facilitation of Cartography:    - Waldo Tobler's 1959 publication details using computers for cartography.    - Computer hardware development, driven by nuclear weapon research, leads to broader mapping applications by early 1960s. 5. 1960 - Canada Geograph...

History of GIS

The history of Geographic Information Systems (GIS) is rooted in early efforts to understand spatial relationships and patterns, long before the advent of digital computers. While modern GIS emerged in the mid-20th century with advances in computing, its conceptual foundations lie in cartography, spatial analysis, and thematic mapping. Early Roots of Spatial Analysis (Pre-1960s) One of the earliest documented applications of spatial analysis dates back to  1832 , when  Charles Picquet , a French geographer and cartographer, produced a cholera mortality map of Paris. In his report  Rapport sur la marche et les effets du cholĂ©ra dans Paris et le dĂ©partement de la Seine , Picquet used graduated color shading to represent cholera deaths per 1,000 inhabitants across 48 districts. This work is widely regarded as an early example of choropleth mapping and thematic cartography applied to epidemiology. A landmark moment in the history of spatial analysis occurred in  1854 , when  John Snow  inv...

Supervised Classification

In the context of Remote Sensing (RS) and Digital Image Processing (DIP) , supervised classification is the process where an analyst defines "training sites" (Areas of Interest or ROIs) representing known land cover classes (e.g., Water, Forest, Urban). The computer then uses these training samples to teach an algorithm how to classify the rest of the image pixels. The algorithms used to classify these pixels are generally divided into two broad categories: Parametric and Nonparametric decision rules. Parametric Decision Rules These algorithms assume that the pixel values in the training data follow a specific statistical distribution—almost always the Gaussian (Normal) distribution (the "Bell Curve"). Key Concept: They model the data using statistical parameters: the Mean vector ( $\mu$ ) and the Covariance matrix ( $\Sigma$ ) . Analogy: Imagine trying to fit a smooth hill over your data points. If a new point lands high up on the hill, it belongs to that cl...

Pre During and Post Disaster

Disaster management is a structured approach aimed at reducing risks, responding effectively, and ensuring a swift recovery from disasters. It consists of three main phases: Pre-Disaster (Mitigation & Preparedness), During Disaster (Response), and Post-Disaster (Recovery). These phases involve various strategies, policies, and actions to protect lives, property, and the environment. Below is a breakdown of each phase with key concepts, terminologies, and examples. 1. Pre-Disaster Phase (Mitigation and Preparedness) Mitigation: This phase focuses on reducing the severity of a disaster by minimizing risks and vulnerabilities. It involves structural and non-structural measures. Hazard Identification: Recognizing potential natural and human-made hazards (e.g., earthquakes, floods, industrial accidents). Risk Assessment: Evaluating the probability and consequences of disasters using GIS, remote sensing, and historical data. Vulnerability Analysis: Identifying areas and p...