Q&A 24 How do you enhance scatter plots by adding group color and trend lines?
24.1 Explanation
Scatter plots are a go-to tool for visualizing the relationship between two numerical variables. But they become far more insightful when enhanced with:
- Group-based coloring (e.g., by species or cut)
- Trend lines to show linear or nonlinear patterns
- Smoothers (like LOESS or regression fits)
- Transparency to handle overplotting in dense data
These enhancements help: - Detect direction and strength of relationships - Compare group-level trends side by side - Spot outliers or overlapping clusters
24.2 Python Code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load iris data
iris = pd.read_csv("data/iris.csv")
# Scatter with group color and regression lines
sns.lmplot(data=iris, x="sepal_length", y="petal_length", hue="species",
palette="Set2", height=5, aspect=1.2, markers=["o", "s", "D"])
plt.title("Relationship Between Sepal Length and Petal Length by Species")
plt.tight_layout()
plt.show()
24.3 R Code
library(ggplot2)
library(readr)
# Load iris data
iris <- read_csv("data/iris.csv")
# Scatter with group color and regression lines
ggplot(iris, aes(x = sepal_length, y = petal_length, color = species)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE) +
theme_minimal() +
labs(title = "Relationship Between Sepal Length and Petal Length by Species")
✅ Enhancing scatter plots with color and trend lines reveals both overall relationships and how those relationships vary across groups — a key part of visual EDA.