Polynomial Regression

Introduction

While linear regression works well for linear relationships, many real-world phenomena follow curved patterns. Polynomial regression extends linear regression by transforming features into polynomial terms, allowing us to model non-linear relationships while still using the familiar linear regression framework.

The key insight is that we can fit curves by treating polynomial terms (x², x³, etc.) as additional features in a linear model.

What You'll Learn

By the end of this module, you will:

Understand how polynomial features capture non-linear relationships
Learn to choose appropriate polynomial degrees
Recognize overfitting and underfitting in polynomial models
Apply feature normalization for polynomial regression
Interpret polynomial regression curves and coefficients

The Polynomial Model

From Linear to Polynomial

A linear model with one feature:

y = w₁x + w₀

A polynomial model of degree 2:

y = w₂x² + w₁x + w₀

A polynomial model of degree d:

y = wₐxᵈ + ... + w₂x² + w₁x + w₀

Feature Transformation

Polynomial regression works by creating new features from the original feature:

Original feature: x = 1, 2, 3, 4

Degree 2 features:

x¹ = 1, 2, 3, 4
x² = 1, 4, 9, 16

Degree 3 features:

x¹ = 1, 2, 3, 4
x² = 1, 4, 9, 16
x³ = 1, 8, 27, 64

Once we have these polynomial features, we apply standard linear regression!

How It Works

Step 1: Create Polynomial Features

Transform each input value x into a vector of polynomial terms:

x → [x, x², x³, ..., xᵈ]

Step 2: Normalize Features (Recommended)

Polynomial features can have very different scales (x vs x¹⁰), so normalization is crucial:

x_normalized = (x - mean) / std_dev

Step 3: Apply Linear Regression

Use gradient descent to find weights for each polynomial term, just like in linear regression.

Step 4: Make Predictions

For a new input x:

Create polynomial features: x, x², x³, ..., xᵈ
Normalize using training statistics
Compute: y = w₁x + w₂x² + ... + wₐxᵈ + w₀

Choosing the Polynomial Degree

The degree is the most important hyperparameter in polynomial regression.

Degree Too Low (Underfitting)

Model is too simple to capture the pattern
High training error
High test error
Example: Using degree 1 (linear) for a quadratic relationship

Degree Just Right

Model captures the true pattern
Low training error
Low test error
Generalizes well to new data

Degree Too High (Overfitting)

Model fits training data too closely, including noise
Very low training error
High test error
Wiggly, unrealistic curve
Example: Using degree 10 for a quadratic relationship

Guidelines for Degree Selection

Start simple: Begin with degree 2 or 3
Visualize: Plot the fitted curve - does it look reasonable?
Cross-validate: Use validation data to check generalization
Domain knowledge: Consider the physics or theory behind your data
Regularization: Use Ridge or Lasso to control complexity

The Bias-Variance Tradeoff

Polynomial regression perfectly illustrates the bias-variance tradeoff:

Low Degree (High Bias, Low Variance):

Underfits the data
Consistent but inaccurate predictions
Similar performance on training and test data

High Degree (Low Bias, High Variance):

Overfits the data
Accurate on training data but poor on test data
Predictions vary greatly with different training sets

Optimal Degree (Balanced):

Captures true pattern without overfitting
Good performance on both training and test data

Feature Normalization

Normalization is especially important for polynomial regression because polynomial features have vastly different scales.

Why Normalize?

Without normalization:

x = 10 → x² = 100 → x³ = 1000 → x¹⁰ = 10,000,000,000
Gradient descent struggles with such different scales
Numerical instability and slow convergence

With normalization:

All features have similar scales (mean=0, std=1)
Faster convergence
More stable training
Better numerical precision

Z-Score Normalization

x_norm = (x - μ) / σ

where:
μ = mean of feature
σ = standard deviation of feature

Performance Metrics

The same metrics from linear regression apply:

MSE: Penalizes large errors heavily
RMSE: Interpretable in original units
R² Score: Proportion of variance explained (watch for overfitting!)
MAE: Robust to outliers

For polynomial regression, also monitor:

Training vs Validation Error: Large gap indicates overfitting
Curve Smoothness: Excessive wiggling suggests overfitting

Real-World Applications

Polynomial regression is used in:

Physics: Modeling projectile motion, spring forces
Economics: Modeling diminishing returns, growth curves
Biology: Population growth, enzyme kinetics
Engineering: Stress-strain relationships, calibration curves
Climate Science: Temperature trends with seasonal patterns

Advantages

Models non-linear relationships
Still uses linear regression framework
Interpretable coefficients
Fast training and prediction
No need for complex algorithms

Limitations

Prone to overfitting with high degrees
Extrapolation can be unreliable (curves can shoot off)
Only models smooth, polynomial-like curves
Requires careful degree selection
Feature scaling is essential

Tips for Better Results

Always normalize features when degree > 2
Start with low degrees (2-3) and increase if needed
Use cross-validation to select degree
Visualize the fitted curve to check reasonableness
Consider regularization (Ridge/Lasso) for high degrees
Be cautious with extrapolation beyond training data range
Try other approaches if polynomial doesn't fit well (splines, GAMs)

Comparison with Other Approaches

vs Linear Regression:

More flexible, can model curves
More prone to overfitting
Requires degree selection

vs Splines:

Simpler, more interpretable
Less flexible for complex shapes
Global fit (splines are local)

vs Neural Networks:

Much simpler and faster
More interpretable
Limited to polynomial shapes
Better for small datasets

Summary

Polynomial regression extends linear regression to model non-linear relationships by:

Creating polynomial features from original features
Applying linear regression to transformed features
Carefully selecting the polynomial degree to balance bias and variance

The key challenge is choosing the right degree - too low and you underfit, too high and you overfit. Visualization, cross-validation, and domain knowledge are your best tools for finding the sweet spot.

Next Steps

After mastering polynomial regression, explore:

Ridge Regression: Add L2 regularization to control overfitting
Lasso Regression: Add L1 regularization for feature selection
Elastic Net: Combine Ridge and Lasso
Spline Regression: More flexible piecewise polynomials
Generalized Additive Models (GAMs): Flexible non-parametric curves

Polynomial Regression

Polynomial Regression

Introduction

What You'll Learn

The Polynomial Model

From Linear to Polynomial

Feature Transformation

How It Works

Step 1: Create Polynomial Features

Step 2: Normalize Features (Recommended)

Step 3: Apply Linear Regression

Step 4: Make Predictions

Choosing the Polynomial Degree

Degree Too Low (Underfitting)

Degree Just Right

Degree Too High (Overfitting)

Guidelines for Degree Selection

The Bias-Variance Tradeoff

Feature Normalization

Why Normalize?

Z-Score Normalization

Performance Metrics

Real-World Applications

Advantages

Limitations

Tips for Better Results

Comparison with Other Approaches

Summary

Next Steps

Interactive Exploration

Controls

Data

Model Parameters

Training Parameters

Visualization

Quiz

Quiz Coming Soon

Sign in to Continue