Q&A 11 How do you display individual data points by category using a swarm plot?
11.1 Explanation
A swarm plot displays individual data points while intelligently spacing them to avoid overlap. Unlike strip plots (which may stack points randomly), swarm plots use a repulsion algorithm to spread points for better visibility.
They are especially helpful when:
- The dataset is small to medium-sized
- You want to show raw observations
- Identifying clusters, gaps, or outliers is important
Combining swarm plots with color (hue
) and category grouping enhances clarity and storytelling.
11.2 Python Code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load dataset
iris = pd.read_csv("data/iris.csv")
# Set style
sns.set(style="whitegrid")
# Swarm plot
plt.figure(figsize=(8, 6))
sns.swarmplot(data=iris, x="species", y="sepal_length", hue="species", palette="Set2", dodge=False, size=6)
plt.title("Swarm Plot: Sepal Length by Species", fontsize=14)
plt.xlabel("Species")
plt.ylabel("Sepal Length")
plt.tight_layout()
plt.show()
11.3 R Code
library(readr)
library(ggplot2)
library(ggbeeswarm)
# Load dataset
iris <- read_csv("data/iris.csv")
# Swarm plot using ggbeeswarm::geom_quasirandom
ggplot(iris, aes(x = species, y = sepal_length, color = species)) +
geom_quasirandom(size = 2.5, width = 0.25) +
scale_color_brewer(palette = "Set2") +
theme_minimal() +
labs(title = "Swarm Plot: Sepal Length by Species",
x = "Species", y = "Sepal Length")
✅ Swarm plots reveal individual data points without overlap, making them ideal for exploring real observations, spotting outliers, and understanding group patterns in moderate-sized datasets.