What is Regression?

Regression is a type of supervised machine learning that predicts a continuous numerical value based on input features. While classification answers "which category?" regression answers "how much?" or "how many?" It is one of the oldest and most fundamental techniques in statistics and machine learning, dating back to the early 1800s.

Imagine you want to predict the price of a house. You know its size, location, number of rooms, and age. Regression takes all of these input features and produces a single number: the predicted price. That output is a continuous value -- it could be $250,000, or $251,347, or any number in between. This is what makes regression different from classification, which would only tell you whether the house is "affordable" or "expensive."

Regression is everywhere in the real world. Weather forecasting uses regression to predict temperature. Financial models use regression to estimate stock prices. E-commerce platforms use regression to predict how much a customer will spend. Healthcare uses regression to estimate a patient's length of hospital stay. Whenever you need to predict a number rather than a category, regression is the tool you reach for.

At its core, regression is about finding the mathematical relationship between inputs and outputs. The model learns a function that maps features to predictions, and the quality of that function determines how accurate the predictions will be. The simplest form is a straight line, but regression can model curves, surfaces, and complex multi-dimensional relationships as well.

Linear vs Non-Linear

Linear regression is the simplest and most widely known form. It assumes that the relationship between the input features and the output is a straight line (or, in higher dimensions, a flat plane). The model learns a weight for each feature and a bias term, and the prediction is a weighted sum: y = w1*x1 + w2*x2 + ... + b. Despite its simplicity, linear regression is remarkably powerful for many real-world problems.

For a single feature, linear regression finds the best straight line through the data points. For two features, it finds the best flat plane. For more features, it finds the best hyperplane in a high-dimensional space. The word "best" here means the line that minimizes the total prediction error across all training examples.

However, many real-world relationships are not linear. The relationship between temperature and ice cream sales might follow a curve. The relationship between study hours and exam scores might plateau at some point. For these cases, we need non-linear regression.

Polynomial regression extends linear regression by adding powers of the features. Instead of y = w*x + b, you might use y = w1*x + w2*x^2 + w3*x^3 + b. This allows the model to fit curves rather than straight lines. Higher-degree polynomials can fit more complex shapes but also risk overfitting.

Other non-linear approaches include decision tree regression, which partitions the feature space into regions and predicts a constant value in each region, and neural network regression, which can learn arbitrary non-linear functions through layers of connected neurons. Support vector regression (SVR) and kernel methods can also model non-linear relationships by projecting data into higher-dimensional spaces.

The choice between linear and non-linear regression depends on the data. If plotting your feature against the target produces something that looks roughly like a line, linear regression is a great starting point. If the relationship is clearly curved or involves complex interactions between features, non-linear methods will perform better. Always start simple and add complexity only when justified by the data.

How Regression Works

The core mechanism of regression is finding the line (or curve) that best fits the data. But what does "best fit" mean? It means minimizing a loss function -- a mathematical measure of how wrong the model's predictions are.

The most common loss function for regression is Mean Squared Error (MSE). For each training example, you calculate the difference between the predicted value and the actual value, square that difference, and then average across all examples. Squaring the errors ensures that large errors are penalized much more heavily than small ones, and it makes the math for optimization convenient.

The process of finding the weights that minimize MSE is called optimization. For linear regression, there is actually a closed-form solution called the Normal Equation that gives you the optimal weights directly. However, for large datasets and more complex models, the standard approach is gradient descent: start with random weights, calculate the loss, compute how the loss changes with respect to each weight (the gradient), and adjust the weights in the direction that reduces the loss. Repeat this process thousands or millions of times until the loss converges.

Think of gradient descent like a blindfolded hiker trying to reach the bottom of a valley. At each step, the hiker feels the slope of the ground beneath their feet and steps in the downhill direction. Over many steps, they gradually descend to the lowest point. The "steepness" of each step is controlled by the learning rate -- too large and the hiker overshoots the valley; too small and the descent takes forever.

Beyond MSE, other loss functions are used in specific situations. Mean Absolute Error (MAE) uses the absolute value of errors instead of squares, making it more robust to outliers. Huber loss combines the best of both, using MSE for small errors and MAE for large ones. The choice of loss function affects what the model optimizes for and how sensitive it is to extreme values.

Once trained, the regression model can make predictions on new, unseen data by simply plugging the feature values into the learned equation. The quality of those predictions is evaluated using metrics like R-squared (which measures how much of the variance in the target the model explains), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE).

Use Cases

Regression is one of the most widely applied techniques in machine learning, touching virtually every industry. Here are some of the most important real-world applications.

Real estate pricing is perhaps the most classic regression application. Models predict house prices based on features like square footage, number of bedrooms, location, school district ratings, and recent comparable sales. Zillow's Zestimate and similar tools rely heavily on regression models to provide automated property valuations for millions of homes.

Financial forecasting uses regression to predict stock prices, revenue, costs, and economic indicators. While financial markets are notoriously difficult to predict (because prices already reflect available information), regression models are used to estimate expected returns, assess risk, and identify relationships between economic variables.

Demand forecasting helps businesses predict how much product they will sell. Retailers use regression to forecast sales based on historical data, seasonal patterns, promotional calendars, and economic conditions. Accurate demand forecasts reduce waste, prevent stockouts, and optimize supply chains -- saving companies millions of dollars.

Healthcare and medical research relies on regression to predict patient outcomes, estimate drug dosages, forecast hospital readmission rates, and analyze the relationship between risk factors and diseases. A regression model might predict a patient's blood sugar level based on their diet, exercise, medication, and medical history, helping doctors personalize treatment plans.

Environmental science uses regression to model climate data, predict pollution levels, estimate crop yields based on weather patterns, and project the impact of policy interventions. Climate models that predict temperature changes under different emissions scenarios are fundamentally regression problems at massive scale.

Engineering and manufacturing applies regression to predict equipment failure times, estimate manufacturing yields, model the relationship between process parameters and product quality, and optimize resource allocation. Predictive maintenance -- knowing when a machine will break down before it does -- is a regression problem that saves industries billions in unplanned downtime.

Key Takeaway

Regression is the workhorse of predictive modeling. Whenever you need to predict a continuous number -- a price, a temperature, a quantity, a duration -- regression is the technique to use. It is one of the first tools every data scientist learns and one of the last they stop using.

The beauty of regression lies in its flexibility. At its simplest, linear regression draws a straight line through data points and is interpretable enough that anyone can understand what the model is doing. At its most complex, neural network regression can model arbitrary non-linear relationships in high-dimensional spaces. Between these extremes lies a rich toolkit of methods that can handle almost any prediction problem.

The key to successful regression is choosing the right model complexity for your data. Start with linear regression as a baseline. If performance is insufficient and the data suggests non-linear patterns, try polynomial features or tree-based methods. Evaluate rigorously using cross-validation, and watch for signs of overfitting (training error much lower than validation error) or underfitting (both errors are high).

Whether you are predicting house prices, forecasting weather, optimizing manufacturing processes, or estimating patient outcomes, regression provides the foundation. It is simple enough to explain to a business stakeholder and powerful enough to drive real-world decisions. Master regression, and you have one of the most versatile tools in all of machine learning.

Next: K-Fold Cross-Validation →
House Size (sq ft) Price ($) y = wx + b (Linear) Non-Linear (Polynomial) Residuals (Errors) New input $? Scroll to draw the best-fit line