Q&A 2 How do you inspect variable types in a dataset?
2.1 Explanation
Once youβve loaded your dataset, the next step is to inspect the structure and confirm the variable types. This helps you:
- Understand what youβre working with
- Catch mismatches (e.g., numbers stored as strings)
- Decide whether conversions are needed
2.2 Python Code
import seaborn as sns
import pandas as pd
# Load and sample the diamonds dataset
df_full = sns.load_dataset("diamonds")
df = df_full.sample(n=500, random_state=42)
# Inspect the shape
print("π Dataset Shape:", df.shape)
# Preview the first few rows
print("\nπ Dataset Preview:")
print(df.head())
# Check data types
print("\nπ Variable Types:")
print(df.dtypes)
π Dataset Shape: (500, 10)
π Dataset Preview:
carat cut color clarity depth table price x y z
1388 0.24 Ideal G VVS1 62.1 56.0 559 3.97 4.00 2.47
50052 0.58 Very Good F VVS2 60.0 57.0 2201 5.44 5.42 3.26
41645 0.40 Ideal E VVS2 62.1 55.0 1238 4.76 4.74 2.95
42377 0.43 Premium E VVS2 60.8 57.0 1304 4.92 4.89 2.98
17244 1.55 Ideal E SI2 62.3 55.0 6901 7.44 7.37 4.61
π Variable Types:
carat float64
cut category
color category
clarity category
depth float64
table float64
price int64
x float64
y float64
z float64
dtype: object
2.3 R Code
library(ggplot2)
library(dplyr)
# Load and sample the diamonds dataset
set.seed(42)
df <- ggplot2::diamonds %>% sample_n(500)
# Check dimensions
cat("π Dataset Dimensions:", dim(df)[1], "rows x", dim(df)[2], "columns\n\n")
π Dataset Dimensions: 500 rows x 10 columns
π Dataset Preview:
# A tibble: 6 Γ 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.39 Ideal I VVS2 60.8 56 849 4.74 4.76 2.89
2 1.12 Very Good G SI2 63.3 58 4478 6.7 6.63 4.22
3 0.51 Very Good G VVS2 62.9 57 1750 5.06 5.12 3.2
4 0.52 Very Good D VS1 62.5 57 1829 5.11 5.16 3.21
5 0.28 Very Good E VVS2 61.4 55 612 4.22 4.25 2.6
6 1.01 Fair F SI1 67.2 60 4276 6.06 6 4.05
π Variable Types:
tibble [500 Γ 10] (S3: tbl_df/tbl/data.frame)
$ carat : num [1:500] 0.39 1.12 0.51 0.52 0.28 1.01 0.4 0.9 0.33 0.71 ...
$ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 3 3 3 3 1 3 5 5 4 ...
$ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 6 4 4 1 2 3 1 1 4 4 ...
$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 6 2 6 5 6 3 5 3 7 4 ...
$ depth : num [1:500] 60.8 63.3 62.9 62.5 61.4 67.2 60.8 62.1 62 62.1 ...
$ table : num [1:500] 56 58 57 57 55 60 59 57 55 62 ...
$ price : int [1:500] 849 4478 1750 1829 612 4276 954 4523 838 2623 ...
$ x : num [1:500] 4.74 6.7 5.06 5.11 4.22 6.06 4.74 6.18 4.45 5.71 ...
$ y : num [1:500] 4.76 6.63 5.12 5.16 4.25 6 4.76 6.25 4.49 5.65 ...
$ z : num [1:500] 2.89 4.22 3.2 3.21 2.6 4.05 2.89 3.86 2.77 3.53 ...
β Always check the variable types before analysis β it helps prevent errors, ensures correct plotting and modeling behavior, and guides you in converting variables where needed (e.g., from text to category or numeric).