Logistic Regression

ยท

3 min read

What is Logistic Regression

Logistic regression estimates the probability of an event occurring,Such as if a person would buy or not,like True or False

In logistic regression, the logistic function is used to map the odds to probabilities. The logistic function has a characteristic S-shaped curve and is a common method to model a binary outcome (success or failure, yes or no, etc.) based on one or more predictors.

Formulas in Logistic Regression

  • Odds: This is the ratio of the probability of success to the probability of failure. It is calculated as: Odds=๐‘1โˆ’๐‘Odds=1โˆ’ppโ€‹ where ๐‘p is the probability of success.

  • Log Odds (Logit Transformation): This transformation takes the natural logarithm of the odds: logit(๐‘)=lnโก(๐‘1โˆ’๐‘)logit(p)=ln(1โˆ’ppโ€‹) This transformation makes it easier to work with odds, especially when calculating the linear combination of predictors.

  • Logistic Function: This function maps log odds back to probabilities. It is given by: ๐œŽ(๐‘ฅ)=11+expโก(โˆ’๐‘ฅ)ฯƒ(x)=1+exp(โˆ’x)1โ€‹ where expโก(โˆ’๐‘ฅ)exp(โˆ’x) is the exponential function.

Logistic Regression

In logistic regression, you use a linear combination of predictor variables and apply the logistic function to get the predicted probability. Mathematically, this can be represented as:

๐‘=11+expโก(โˆ’(๐‘0+๐‘1๐‘ฅ1+โ€ฆ+๐‘๐‘›๐‘ฅ๐‘›))p\=1+exp(โˆ’(b0โ€‹+b1โ€‹x1โ€‹+โ€ฆ+bnโ€‹xnโ€‹))1โ€‹

where:

  • b0โ€‹ is the intercept (constant term),

  • b1โ€‹,โ€ฆ,bnโ€‹ are the coefficients for each predictor variable x1โ€‹,โ€ฆ,xnโ€‹,

  • p is the probability of the event occurring (success).

The goal of logistic regression is to estimate the coefficients b0โ€‹,b1โ€‹,โ€ฆ,bnโ€‹ such that the model can predict the probability of an outcome based on the input features.

Overall, logistic regression is a useful model for classification problems where the outcome variable is binary (like success/failure or yes/no), and it is commonly used in a variety of fields, including finance, healthcare, marketing, and social sciences.

Logistic Regression Explained in 7 Minutes.

y-axis is the probability of occurrence and x-axis is the continuous variable

DIFFERENCE BETWEEN LINEAR REGRESSION AND LOGISTIC REGRESSION

Linear Regression

  • Purpose: Linear regression is used to model relationships between a continuous dependent variable and one or more independent variables. Its goal is to identify a linear relationship or trend.

  • Application: It works well when the outcome is a continuous numerical value. For instance, it can be used to predict house prices based on factors like size and location.

  • Output: The result is a linear equation, usually in the form ๐‘ฆ=๐‘0+๐‘1๐‘ฅ1+โ€ฆ+๐‘๐‘›๐‘ฅ๐‘›_y_=_b_0+_b_1_x_1+โ€ฆ+_b_n_x_n, where ๐‘ฆ_y represents the predicted value, ๐‘0_b_0 is the intercept, and ๐‘1,โ€ฆ,๐‘๐‘›_b_1,โ€ฆ,๐‘_n are coefficients for the independent variables.

  • Assumptions: It relies on assumptions of linearity, independence, constant variance (homoscedasticity), and normally distributed residuals.

  • Visualization: Linear regression can be depicted as a straight line that best fits the data points.

Logistic Regression

  • Purpose: Logistic regression models the relationship between a categorical dependent variable (usually binary) and one or more independent variables. It's commonly used for classification tasks.

  • Application: Great for predicting binary outcomes, like whether a customer will purchase a product or not, or if an email is spam or not.

  • Output: The result is a probability, showing the chance of an event happening. It uses the logistic function to convert linear combinations of predictors into probabilities. The formula is: ๐‘=11+expโก(โˆ’(๐‘0+๐‘1๐‘ฅ1+โ€ฆ+๐‘๐‘›๐‘ฅ๐‘›))p\=1+exp(โˆ’(_b_0โ€‹+_b_1โ€‹_x_1โ€‹+โ€ฆ+bn_โ€‹_xn_โ€‹))1โ€‹ where ๐‘_p is the success probability, and ๐‘0,๐‘1,โ€ฆ,๐‘๐‘›_b_0โ€‹,_b_1โ€‹,โ€ฆ,_bn_โ€‹ are coefficients for the independent variables.

  • Assumptions: It assumes the outcome variable is binary and that the log odds of the dependent variable can be represented as a linear combination of the independent variables.

  • Visualization: Visualizing logistic regression shows an S-curve, illustrating the connection between the log odds and the linear combination of predictors.

ย