Linear vs Logistic Regression

April 07, 2024

Overview

Linear and Logistic Regression are mathematical techniques that are used in the context of line of best fit of a cluster of data or binary classification (respectively) in various fields, including Machine Learning. This blog post will focus on the context of Machine Learning.

Both techniques use training data that is used to iteratively find the line of best fit using gradient descent.

These techniques for finding a line of best fit for unary data (linear regression) and binary classification (logistic regression) are provided by libraries like TensorFlow. However, it's important to understand how these libraries work under-the-hood. A great way to gain a solid, concise understanding is through this Coursera course offered by Standford: Supervised Machine Learning: Regression and Classification.

Table of Contents

Gradient Descent
Linear Regression
Logistic Regression
Parameters (i.e. Features)
Over and Under-Fitting
Under-Fitting
Over-Fitting
Conclusion

Gradient Descent
^

Gradient descent is a technique for finding the minimum cost or loss for a given set of parameters (i.e. features).

  • Linear Regression: This minimum cost is a function of the cumulative distance of the line of best fit vs each training example.
  • Logistic Regression: This minimum loss is a function of the cumulative distance of the line of best fit vs each training example. However, this function takes into account the binary classification of the data.

In either case, the line can be curved (more parameters add more curves).

Gradient descent is a process that is run computationally (in code, using a library like TensorFlow) that iteratively finds a local minimum (that is also hopefully a global minimum) in a 3-dimensional graph. In Linear Regression, this 3D graph is shaped like a bowl, so there's only a single local minimum, and it is also the global minimum. In Logistic Regression, this 3D graph can take any shape with many local minima.

Parameters (i.e. Features)
^

Parameters (i.e. features) are literal additions in the sequence of mathematical descriptors for how the values should behave along the graph. For example, you may want to describe the growth in house prices in: lot size, age, number of bedrooms, zip code. Each one can be a parameter (i.e. feature).

Learn more: Logistic regression: Many explanatory variables.

Linear Regression
^

In Linear Regression, we want to find a line of best fit in a space (most commonly in a 2D space, but can be applied to spaces with more dimensions). The purpose of this is to find the estimated value of one axis vs the other axis (or axes).

Logistic Regression
^

In Logistic Regression, we want to find a curved line of best fit that fits between two clusters of data. This is called binary classification. However, Logistic Regression can be extended to more than two clusters of data, through multinomial logistic regression.

Over and Under-Fitting
^

Over and Under-Fitting are when the number of parameters (i.e. features) used in the equation of the line of best fit doesn't fit the training data well. This an occur for two reasons:

Under-Fitting
^

Under-fitting is when too few parameters are used in the equation for the line of best fit. This results in a line that is too flat or straight.

Over-Fitting
^

Over-fitting is when too many parameters are used in the equation for the line of best fit. This results in a line that matches the data very closely, however, it does not accurately predict new data.

Conclusion
^

Linear and Logistic Regression are the most fundamental tools used in Machine Learning classification. This blog post is intended to provide a concise overview.

To be updated with diagrams and equations (and a breakdown of the equations). It is currently intended as supplementary content (to academic material that already contains these equations). It will also include a step-by-step walk-through of finding the line of best fit.

It will be updated with nuances. For example, usages in classification and clustering.