Q&A 26 How do you visualize patterns across multiple numeric features using a parallel coordinates plot?

26.1 Explanation

A parallel coordinates plot lets you explore high-dimensional patterns by plotting each feature on a vertical axis. Each observation is a line crossing all axes.

It helps you:

  • Spot group patterns across all variables
  • Detect outliers and overlaps
  • Understand feature trends in classification problems

Coloring by category (e.g., cut) helps distinguish clusters or groups.


26.2 Python Code

import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import parallel_coordinates

# Load dataset
diamonds = pd.read_csv("data/diamonds_sample.csv")

# Sample for performance
subset = diamonds[["carat", "depth", "table", "price", "x", "y", "z", "cut"]].sample(300, random_state=42)

# Normalize numeric features
normalized = subset.copy()
for col in ["carat", "depth", "table", "price", "x", "y", "z"]:
    normalized[col] = (subset[col] - subset[col].min()) / (subset[col].max() - subset[col].min())

# Parallel coordinates plot
plt.figure(figsize=(12, 6))
parallel_coordinates(normalized, class_column="cut", color=["#66c2a5", "#fc8d62", "#8da0cb", "#e78ac3", "#a6d854"])
plt.title("Parallel Coordinates Plot: Diamond Features by Cut")
plt.ylabel("Normalized Value")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

26.3 R Code

library(readr)
library(GGally)
library(dplyr)

# Load and sample dataset
diamonds <- read_csv("data/diamonds_sample.csv")
subset <- diamonds %>% select(carat, depth, table, price, x, y, z, cut) %>% sample_n(500)

# Normalize numeric columns
subset_norm <- subset %>%
  mutate(across(c(carat, depth, table, price, x, y, z), ~ (. - min(.)) / (max(.) - min(.))))

# Parallel coordinates plot
ggparcoord(data = subset_norm,
           columns = 1:7,
           groupColumn = 8,
           scale = "uniminmax",
           alphaLines = 0.5) +
  scale_color_brewer(palette = "Set2") +
  theme_minimal() +
  labs(title = "Parallel Coordinates Plot: Diamond Features by Cut",
       x = "Features", y = "Normalized Value")


Parallel coordinates plots help you detect patterns across multiple features simultaneously, especially when colored by category.