Q&A 2 How do you inspect variable types in a dataset?

2.1 Explanation

Once you’ve loaded your dataset, the next step is to inspect the structure and confirm the variable types. This helps you:

  • Understand what you’re working with
  • Catch mismatches (e.g., numbers stored as strings)
  • Decide whether conversions are needed

2.2 Python Code

import seaborn as sns
import pandas as pd

# Load and sample the diamonds dataset
df_full = sns.load_dataset("diamonds")
df = df_full.sample(n=500, random_state=42)

# Inspect the shape
print("πŸ“ Dataset Shape:", df.shape)

# Preview the first few rows
print("\nπŸ” Dataset Preview:")
print(df.head())

# Check data types
print("\nπŸ”  Variable Types:")
print(df.dtypes)
πŸ“ Dataset Shape: (500, 10)

πŸ” Dataset Preview:
       carat        cut color clarity  depth  table  price     x     y     z
1388    0.24      Ideal     G    VVS1   62.1   56.0    559  3.97  4.00  2.47
50052   0.58  Very Good     F    VVS2   60.0   57.0   2201  5.44  5.42  3.26
41645   0.40      Ideal     E    VVS2   62.1   55.0   1238  4.76  4.74  2.95
42377   0.43    Premium     E    VVS2   60.8   57.0   1304  4.92  4.89  2.98
17244   1.55      Ideal     E     SI2   62.3   55.0   6901  7.44  7.37  4.61

πŸ”  Variable Types:
carat       float64
cut        category
color      category
clarity    category
depth       float64
table       float64
price         int64
x           float64
y           float64
z           float64
dtype: object

2.3 R Code

library(ggplot2)
library(dplyr)

# Load and sample the diamonds dataset
set.seed(42)
df <- ggplot2::diamonds %>% sample_n(500)

# Check dimensions
cat("πŸ“ Dataset Dimensions:", dim(df)[1], "rows x", dim(df)[2], "columns\n\n")
πŸ“ Dataset Dimensions: 500 rows x 10 columns
# Preview the dataset
cat("πŸ” Dataset Preview:\n")
πŸ” Dataset Preview:
print(head(df))
# A tibble: 6 Γ— 10
  carat cut       color clarity depth table price     x     y     z
  <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1  0.39 Ideal     I     VVS2     60.8    56   849  4.74  4.76  2.89
2  1.12 Very Good G     SI2      63.3    58  4478  6.7   6.63  4.22
3  0.51 Very Good G     VVS2     62.9    57  1750  5.06  5.12  3.2 
4  0.52 Very Good D     VS1      62.5    57  1829  5.11  5.16  3.21
5  0.28 Very Good E     VVS2     61.4    55   612  4.22  4.25  2.6 
6  1.01 Fair      F     SI1      67.2    60  4276  6.06  6     4.05
# Inspect variable types
cat("\nπŸ”  Variable Types:\n")

πŸ”  Variable Types:
str(df)
tibble [500 Γ— 10] (S3: tbl_df/tbl/data.frame)
 $ carat  : num [1:500] 0.39 1.12 0.51 0.52 0.28 1.01 0.4 0.9 0.33 0.71 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 3 3 3 3 1 3 5 5 4 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 6 4 4 1 2 3 1 1 4 4 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 6 2 6 5 6 3 5 3 7 4 ...
 $ depth  : num [1:500] 60.8 63.3 62.9 62.5 61.4 67.2 60.8 62.1 62 62.1 ...
 $ table  : num [1:500] 56 58 57 57 55 60 59 57 55 62 ...
 $ price  : int [1:500] 849 4478 1750 1829 612 4276 954 4523 838 2623 ...
 $ x      : num [1:500] 4.74 6.7 5.06 5.11 4.22 6.06 4.74 6.18 4.45 5.71 ...
 $ y      : num [1:500] 4.76 6.63 5.12 5.16 4.25 6 4.76 6.25 4.49 5.65 ...
 $ z      : num [1:500] 2.89 4.22 3.2 3.21 2.6 4.05 2.89 3.86 2.77 3.53 ...

βœ… Always check the variable types before analysis β€” it helps prevent errors, ensures correct plotting and modeling behavior, and guides you in converting variables where needed (e.g., from text to category or numeric).